Scaling and Maintaining Ajax
- 6.1 General Practices
- 6.2 A Multitude of Simple Interfaces
- 6.3 Dense, Rich Interfaces
In This Chapter
- 6.1 General Practices 188
- 6.2 A Multitude of Simple Interfaces 194
- 6.3 Dense, Rich Interfaces 201
While performance optimization should wait until after the development of primary functionality, scalability and maintainability need to happen starting with the design of the application. The implemented architecture has a direct impact on scalability and needs to have enough consideration driving it to keep the application solid under any circumstance.
At the same time that the application developers create a scalable architecture, they can also use the same techniques for maintainability. The development team can separate each aspect of the code into logical, easy-to-load objects and libraries that the application then can load or pre-load as necessary. This isolation encourages abstraction between each object of the application, making it easier to track down bugs and to add functionality later in development.
6.1 General Practices
While an application’s architecture can dictate much of its scalability, some general coding practices can help keep smaller pieces of the application from growing sluggish under more demanding circumstances. If developers do not make an effort at the coding level to make the application scalable, unscalable functionality will mar the architectural scalability of the application. The users care only about the overall experience of the application, not at which point it fails.
Though many factors can affect an application’s scalability, over-usage of the processor and memory plague web applications in particular. PHP has a memory_limit setting in php.ini, which generally defaults to 8MB. This may not seem like much, but if a single hit uses more than 8MB, then a constant stream of multiple hits each second will pin memory usage. If performance starts dropping in that stream, the application will run itself into the ground.
6.1.1 Processor Usage
As the profiling output in Chapter 5, “Performance Optimization,” showed, particularly with the Xdebug examples, the amount of time spent in a function does not necessarily correlate with the amount of memory used in that function. Several other factors can cause slow-downs in a function, including disk access, database lag, and other external references. Sometimes, however, the function uses just too many processor cycles at once.
When this processor drain occurs in the JavaScript of the application, it can seize up the browser because most browsers run JavaScript in a single thread. For this reason, using DOM methods to retrieve a reference to a single node and then drilling down the DOM tree from there scales much better than custom methods to find elements by attributes such as a certain class or nodeValue.
As an example, an application could have a table with twenty columns and one thousand rows, with each table cell containing a number. Because this display gives the users quite a lot of information in a generic presentation, the application may offer a way of highlighting the cells containing values above a given threshold. In this example, the functions will have access to this minimum, held in a variable named threshold. This cell highlighting can come about in several ways.
The first of these methods, shown below, gets a NodeSet of td elements and then iterates through the entire list at once. For each cell, the function gets the text node value and compares it to the threshold. If the value exceeds the threshold, the cell gets a one-pixel border to highlight it:
function bruteForce() { var table = document.getElementById("data"); var tds = table.getElementsByTagName("td"); for (var i = 0; i < tds.length; i++) { var td = tds.item(i); var data = td.firstChild.nodeValue; if (parseInt(data) > threshold) { td.style.border = "solid 1px #fff"; } } }
While this function does work (running through 20,000 td elements and applying highlighting where required in just over a second), the browser stops responding entirely for the duration of the function. During that second, the processor usage of Firefox jumps to approximately 74 percent.
To prevent the browser from locking up, the script can simulate threading by splitting the work up into sections and iterating through each section after a minimal timeout. This method takes almost ten times the length of time that the bruteForce() function took to complete, but this next function runs in parallel to any actions the user may want to take while applying the highlighting:
function fakeThread() { var table = document.getElementById("data"); var tds = table.getElementsByTagName("td"); var i = 0; var section = 200; var doSection = function() { var last = i + section; for (; i < last && i < tds.length; i++) { var td = tds.item(i); var data = td.firstChild.nodeValue; if (parseInt(data) > threshold) { td.style.border = "solid 1px #fff"; } } if (i < tds.length) { setTimeout(doSection, 10); } } doSection(); }
The fastest method comes in revisiting the functionality required, namely that the user can enable highlighting of td elements when the value contained exceeds a threshold. If the server flags the td elements with a class when the value exceeds this threshold, it can cache these results, and the script then has to apply a style rule only for the given class. The example below assumes that the function needs to create a new style element and write the rule into that, though it could simply edit an existing rule if the stylesheet had one in place:
function useClass() { var head = document.getElementsByTagName("head")[0]; var style = head.appendChild( document.createElement("style") ); style.type = "text/css"; style.appendChild( document.createTextNode( ".high { border: solid 1px #fff; }" ) ); }
By rethinking functionality that takes large amounts of processor cycles to work, developers can enable the application to handle data and interfaces of enormous size without impacting performance.
6.1.2 Memory Usage
Similar to processor usage, memory usage rapidly increases in problem areas, but can have certain measures taken to prevent it. Some types of functions, especially those that load the entire data set into a returned value, will max out memory usage unless developers put thought and planning behind their usage.
For instance, many PHP database extensions offer methods of retrieving entire record sets into an array or even just a column of data into an array. These methods, though useful and easy to use, can drive up memory usage to the breaking point when not used carefully. The following code fetches a list of user IDs and names into an array using the PDO extension:
// First, run the query and get the list $query = 'SELECT 'id', 'name' FROM 'users' ORDER BY 'name''; $stmt = $database->prepare($query); $stmt->execute(); $users = $stmt->fetchAll(PDO::FETCH_ASSOC); <!-- Later in the application, output the list --> <ol> <?php foreach ($users as $user) { ?> <li><a href="?id=<?php echo (int)$user['id']; ?>"> <?php echo Utilities::escapeXMLEntities($user['name']); ?> </a></li> <?php } ?> </ol>
This example works perfectly well for a few dozen users, or even a hundred. However, once the list of users grows to hundreds, thousands, and especially millions, the $users = $stmt->fetchAll(PDO::FETCH_ASSOC); line will trigger an out of memory error, and the page will fail to render at all. To get around this issue without putting the database query and method calls directly into the template, the code can instead use a simple layer of abstraction and the implementation of the standard PHP library Iterator interface:
class PDOIterator implements Iterator { /** * The PDO connection object */ protected $database; protected $statement; /** * The query to run on the first iteration */ protected $query; /** * Optional parameters to use for prepared statements */ protected $parameters; /** * The current record in the results */ protected $current; /** * The row number of the current record */ protected $key; /** * A Boolean as to whether the object has more results */ protected $valid; /** * Forward-only cursor assumed and enforced */ public function rewind() { return false; } public function current() { if ($this->key === -1) { if (!$this->runQuery()) { $this->valid = false; return false; } else { $this->next(); } } return $this->current; } public function key() { return $this->key; } public function next() { $this->current = $this->statement->fetch(PDO::FETCH_ASSOC); if ($this->current) { $this->key++; if (!$this->valid) { $this->valid = true; } return true; } else { $this->statement = null; $this->valid = false; return false; } } protected function runQuery() { $this->statement = $this->database->prepare($this->query); $this->statement->execute($this->parameters); } public function valid() { return $this->valid; } public function setParameters($params) { $this->parameters = $params; } public function __construct($database, $query) { $this->database = $database; $this->query = $query; $this->parameters = null; $this->current = null; $this->key = -1; $this->valid = true; } }
This class may seem like a large amount of work when compared to the previous example, but it doesn’t replace that example just yet. The PDOIterator class merely gives the application the ability to replace the earlier example easily and cleanly, by using it as shown in this next example:
// First, run the query and get the list $query = 'SELECT 'id', 'name' FROM 'users' ORDER BY 'name''; $users = new PDOIterator($database, $query); <!-- Later in the application, output the list --> <ol> <?php foreach ($users as $user) { ?> <li><a href="?id=<?php echo (int)$user['id']; ?>"> <?php echo Utilities::escapeXMLEntities($user['name']); ?> </a></li> <?php } ?> </ol>
Because the PDOIterator class implements Iterator, the usage in the template does not change at all from the array of results originally assigned to the $users variable. In this example, though, $users contains a reference to the PDOIterator instance, and the query does not actually run until the first iteration, keeping the database connection clean and using very little memory. Once the code starts iterating through the results, it immediately renders that entry in the markup, keeping none of the results in memory afterward.
Any function that pulls a full list, a file’s contents, or any other resource of unknown size and then returns it should fall under heavy scrutiny. In some cases, these convenience functions does make sense. For instance, if a configuration file will never have more than five or ten lines in it, using file_get_contents makes the task of pulling in the contents of the file much simpler. However, if the application currently has only a dozen user preferences, it still cannot know that it will always have a reasonable list for retrieving in full.