DOM or SAX?
Now that you've seen (and hopefully understood) the two most common approaches to parsing XML with PHP, you're probably wondering: Which one do I use?
It's a good question, and one that doesn't have a one-size-fits-all answer. Both DOM and SAX approaches have advantages and disadvantages, and your choice of technique must depend on the type of data being parsed, the requirements of your application, and the constraints under which you are operating.
The SAX approach is linear: It processes XML structures as it finds them, generating events and leaving the event handlers to decide what to do with each structure. The advantage of this approach is that it is resource-friendly; because SAX does not build a tree representation of the document in memory, it can parse XML data in chunks, processing large amounts of data with very little impact on memory. This also translates into better performance; if your document structure is simple, and the event handlers don't have anything too complicated to do, SAX-based applications will generally offer a speed advantage over DOM-based ones.
The downside, though, is an inability to move around the document in a non- linear manner. SAX does not maintain any internal record of the relationships between the different nodes of an XML document (as the DOM does), making it difficult to create customized node collections or to traverse the document in a non-sequential manner. The only way around this with SAX is to create your own custom object model, and map the document elements into your own custom structuresa process that adds to complexity and can possibly degrade performance.
Where SAX flounders, though, the DOM shines. The DOM creates a tree representation of the document in memory, making it possible to easily travel from one node to another, or even access the same node repeatedly (again, not something you can do easily in SAX). This tree representation is based on a standard, easy-to-understand model, making it easier to write code to interact with it.
This flexibility does, however, come with an important caveat. Because the DOM builds a tree in memory, DOM processing cannot begin until the document has been fully parsed (SAX, on the other hand, can begin parsing a document even if it's not all available immediately). This reduces a developer's ability to "manage" the parsing process by feeding data to the parser in chunks, and also implies greater memory consumption and consequent performance degradation.
Consequently, the choice of technique depends significantly on the type of constraints the application will be performing under, and the type of processing it will be expected to carry out. For systems with limited memory, SAX is a far more efficient approach. On the other hand, complex data-processing requirements can benefit from the standard object model and API of the DOM.