- XML Reference Guide
- Overview
- What Is XML?
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Table of Contents
- The Document Object Model
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- DOM and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Implementations
- DOM and JavaScript
- Using a Repeater
- Repeaters and XML
- Repeater Resources
- DOM and .NET
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Downloads
- DOM and C++
- DOM and C++ Resources
- DOM and Perl
- DOM and Perl Resources
- DOM and PHP
- DOM and PHP Resources
- DOM Level 3
- DOM Level 3 Core
- DOM Level 3 Load and Save
- DOM Level 3 XPath
- DOM Level 3 Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- The Simple API for XML (SAX)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SAX and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- SAX and .NET
- Informit Articles and Sample Chapters
- SAX and Perl
- SAX and Perl Resources
- SAX and PHP
- SAX and PHP Resources
- Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Document Type Definitions (DTDs)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Schemas
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- RELAX NG
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Schematron
- Official Documentation and Implementations
- Validation in Applications
- Informit Articles and Sample Chapters
- Books and e-Books
- XSL Transformations (XSLT)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XSLT in Java
- Java in XSLT Resources
- XSLT and RSS in .NET
- XSLT and RSS in .NET Resources
- XSL-FO
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XPath
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Base
- Informit Articles and Sample Chapters
- Official Documentation
- XHTML
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XHTML 2.0
- Documentation
- Cascading Style Sheets
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XUL
- XUL References
- XML Events
- XML Events Resources
- XML Data Binding
- Informit Articles and Sample Chapters
- Books and e-Books
- Specifications
- Implementations
- XML and Databases
- Informit Articles and Sample Chapters
- Books and e-Books
- Online Resources
- Official Documentation
- SQL Server and FOR XML
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- Service Oriented Architecture
- Web Services
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Creating a Perl Web Service Client
- SOAP::Lite
- Amazon Web Services
- Creating the Movable Type Plug-in
- Perl, Amazon, and Movable Type Resources
- Apache Axis2
- REST
- REST Resources
- SOAP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SOAP and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- WSDL
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- UDDI
- UDDI Resources
- XML-RPC
- XML-RPC in PHP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Ajax
- Asynchronous Javascript
- Client-side XSLT
- SAJAX and PHP
- Ajax Resources
- JSON
- Ruby on Rails
- Creating Objects
- Ruby Basics: Arrays and Other Sundry Bits
- Ruby Basics: Iterators and Persistence
- Starting on the Rails
- Rails and Databases
- Rails: Ajax and Partials
- Rails Resources
- Web Services Security
- Web Services Security Resources
- SAML
- Informit Articles and Sample Chapters
- Books and e-Books
- Specification and Implementation
- XML Digital Signatures
- XML Digital Signatures Resources
- XML Key Management Services
- Resources for XML Key Management Services
- Internationalization
- Resources
- Grid Computing
- Grid Resources
- Web Services Resource Framework
- Web Services Resource Framework Resources
- WS-Addressing
- WS-Addressing Resources
- WS-Notifications
- New Languages: XML in Use
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Google Web Toolkit
- GWT Basic Interactivity
- Google Sitemaps
- Google Sitemaps Resources
- Accessibility
- Web Accessibility
- XML Accessibility
- Accessibility Resources
- The Semantic Web
- Defining a New Ontology
- OWL: Web Ontology Language
- Semantic Web Resources
- Google Base
- Microformats
- StructuredBlogging
- Live Clipboard
- WML
- XHTML-MP
- WML Resources
- Google Web Services
- Google Web Services API
- Google Web Services Resources
- The Yahoo! Web Services Interface
- Yahoo! Web Services and PHP
- Yahoo! Web Services Resources
- eBay REST API
- WordML
- WordML Part 2: Lists
- WordML Part 3: Tables
- WordML Resources
- DocBook
- Articles
- Books and e-Books
- Official Documentation and Implementations
- XML Query
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XForms
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Resource Description Framework (RDF)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Topic Maps
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation, Implementations, and Other Resources
- Rich Site Summary (RSS)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Simple Sharing Extensions (SSE)
- Atom
- Podcasting
- Podcasting Resources
- Scalable Vector Graphics (SVG)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- OPML
- OPML Resources
- Summary
- Projects
- JavaScript TimeTracker: JSON and PHP
- The Javascript Timetracker
- Refactoring to Javascript Objects
- Creating the Yahoo! Widget
- Web Mashup
- Google Maps
- Indeed Mashup
- Mashup Part 3: Putting It All Together
- Additional Resources
- Frequently Asked Questions About XML
- What's XML, and why should I use it?
- What's a well-formed document?
- What's the difference between XML and HTML?
- What's the difference between HTML and XHTML?
- Can I use XML in a browser?
- Should I use elements or attributes for my document?
- What's a namespace?
- Where can I get an XML parser?
- What's the difference between a well-formed document and a valid document?
- What's a validating parser?
- Should I use DOM or SAX for my application?
- How can I stop a SAX parser before it has parsed the entire document?
- 2005 Predictions
- 2006 Predictions
- Nick's Book Picks
The Simple API for XML, or SAX, was developed by the XML-DEV mailing list.
Rather than treating an XML document as a tree-like structure, SAX treats it
as a series of events such as startDocument
or endElement
.
To accomplish this, a SAX appllication consists of a parser that sends these
events to "handlers," methods or functions designated to handle them.
In PHP, you can handle this with straight functions, or you can handle them within a class. In this example, we'll do a combination of the two, handling the entire process from within a single class.
Creating a SAX application involves processing events as they arrive, keeping in mind that the handler class knows only about the current event; if you need information about previous events, you need to save it yourself. For example, consider this order file:
<?xml version="1.0"?> <order orderid="THX1138" customerNumber="3263827"> <lineitem itemid="C33"> <item>3/4" Hex Bolt</item> <quantity>36</quantity> <unitprice currency="dollars">.35</unitprice> </lineitem> <lineitem itemid="M48"> <item>Condenser</item> <quantity>1</quantity> <unitprice currency="dollars">2200</unitprice> </lineitem> <delivery>Overnight</delivery> </order>
We can create a SAX application that lists the order information, including the extended total for each item and the grand total for the order. We'd start by creating the main SAX application:
<?php class OrderProcessor { function OrderProcessor(){ } function ProcessOrder($url) { $parser = xml_parser_create(); $fp = fopen($url, "r"); while(!feof($fp)) { $line = fgets($fp, 4096); xml_parse($parser, $line); } fclose($fp); xml_parser_free($parser); } } $order =& new OrderProcessor(); $success = $order->ProcessOrder("order.xml"); ?>
We start by creating the new object and running the ProcessOrder
function. The function creates the parser and feeds the file to it 4K at a time.
At this point, however, the script doesn't actually do anything. In order to get
it to act on the file, we need to assign handlers:
<?php class OrderProcessor { function OrderProcessor(){ } function ProcessOrder($url) { $parser = xml_parser_create(); xml_set_element_handler($parser, "_startElement", "_endElement"); xml_set_character_data_handler($parser, "_charHandler"); $fp = fopen($url, "r"); while(!feof($fp)) { $line = fgets($fp, 4096); xml_parse($parser, $line); } fclose($fp); xml_parser_free($parser); } function _startElement($parser, $name, $attrs) { } function _endElement($parser, $name) { } function _charHandler($parser, $data) { } } $order =& new OrderProcessor(); $success = $order->ProcessOrder("order.xml"); ?>
The xml_set_element_handler()
function sets both the "start element"
and "end element" handlers, and the xml_set_character_data_handler()
function takes care of, well, setting the character data handler.
Note that in PHP there is no way to set handlers for the "start document" and "end document" events, so we'll run them manually within the application:
<?php class OrderProcessor { var $totalPrice; function OrderProcessor(){ } function ProcessOrder($url) { $parser = xml_parser_create(); xml_set_object($parser, &$this); xml_set_element_handler($parser, "_startElement", "_endElement"); xml_set_character_data_handler($parser, "_characters"); $this->_startDocument($parser); $fp = fopen($url, "r"); while(!feof($fp)) { $line = fgets($fp, 4096); xml_parse($parser, $line); } fclose($fp); $this->_endDocument($parser); xml_parser_free($parser); } function _startDocument($parser){ $this->totalPrice = 0; } function _endDocument($parser){ echo("<br />Order total: ".$this->totalPrice."<br />"); } function _startElement($parser, $name, $attrs) { } function _endElement($parser, $name) { } function _characters($parser, $data) { } } $order =& new OrderProcessor(); $success = $order->ProcessOrder("order.xml"); ?>
In this case, we're going to gather information about the order, so we'll
start by initializing the totalPrice
variable before we
start parsing, and displaying its value when parsing is finished.
Most events fire multiple times. For example, the first events in the sample document are:
startDocument characters (white space) startElement (lineitem) characters (white space) startElement (item) characters (3/4" Hex Bolt) endElement (item) characters (white space) startElement (quantity) characters (36) endElement (quantity) ...
Now, it's important to understand that each of these events are completely
independent of each other. When the characters event fires to note the
3/4" Hex Bolt -- more on the _characters
function in a moment --
the handler has no way of knowing that that text is part of the item
element. If this information is important (as it is here) we need to keep
track of it ourselves.
For our purposes, that means that when we close an element we're tracking,
such as item
or quantity
, we need to store the text
that's been flowing through the _characters
method function, like so:
<?php class OrderProcessor { var $totalPrice = 0; var $itemid = ""; var $itemname = ""; var $quantity = 0; var $unitprice = 0; var $currentElement = ""; var $thisText = ""; function OrderProcessor(){ } function ProcessOrder($url) { $parser = xml_parser_create(); xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false); xml_set_object($parser, &$this); xml_set_element_handler($parser, "_startElement", "_endElement"); xml_set_character_data_handler($parser, "_charHandler"); $this->_startDocument($parser); $fp = fopen($url, "r"); while(!feof($fp)) { $line = fgets($fp, 4096); xml_parse($parser, $line); } fclose($fp); $this->_endDocument($parser); xml_parser_free($parser); } function _startDocument($parser){ $this->totalPrice = 0; } function _endDocument($parser){ echo("<br />Order total: ".$this->totalPrice."<br />"); } function _startElement($parser, $name, $attrs) { if ($name == "order"){ $orderid = $attrs["orderid"]; $customerid = $attrs["customerNumber"]; echo("Order ".$orderid." for customer ".$customerid.":<br /><br />"); } else if ($name == "lineitem"){ $this->itemid = $attrs["itemid"]; } $currentElement = $name; } function _endElement($parser, $name) { if (strlen($this->thisText) > 0) { if ($name == "item"){ $this->itemname = $this->thisText; } else if ($name == "quantity"){ $this->quantity = $this->thisText; } else if ($name == "unitprice"){ $this->unitprice = $this->thisText; } $this->thisText = ""; } if ($name == "lineitem"){ $this->extendedPrice = $this->quantity * $this->unitprice; echo(" Item: ".$this->itemname." (".$this->itemid.") ".$this->quantity. " @ ".$this->unitprice." = ".$this->extendedPrice."<br />"); $this->totalPrice = $this->totalPrice + $this->extendedPrice; $this->itemname = ""; $this->quantity = ""; $this->quantityInt = 0; $this->unitprice = ""; $this->unitpriceDbl = 0; } } function _charHandler($parser, $data) { $this->thisText = $this->thisText . $data; } } $order =& new OrderProcessor(); $success = $order->ProcessOrder("order.xml"); ?>
Let's start with _startElement
. If it's the order
element or the lineitem
element we've run across, we're pulling
the appropriate information from the attributes present, which are fed to the
function as an array. In any case, we're storing the name of the element.
In most cases, the next event that will fire is the characters
event as the content of the element is processed. One thing that's a little
strange about SAX is that you never really know just how text will be processed.
You might get it all in one big chunk, or you might get it in a series of
smaller pieces. Because of this little idiosyncrasy, we need to store each
call in the thisText
variable. When we get to the end of the
element, the _endElement
function executes, and we can check
(and clear) the contents of the variable.
Note that our method of saving the "current" element only works because
we're only looking for the text children of simple elements. If we needed
to track multiple levels of elements, we'd have to find another way of
storing the information (or use
another way of parsing the document, such as DOM). In this case, though,
it's sufficient, so as each element closes, we check to see what it was and
perform the appropriate actions. If it was an item
, quantity
,
or unitprice
element, we simply store the appropriate values.
If, on the other hand, its the end of a lineitem
element, we
perform the appropriate calculations, display the information for that item
,
and reinitialize the variables.
Calling up the PHP page displays a result of
Order THX1138 for customer 3263827 Item: 3/4" Hex Bolt (C33) 36 @ .35 = 12.6 Item: Condenser (M48) 1 @ 2200 = 2200.0 Order total: 2212.6
SAX is, in many cases, faster and more efficient than DOM, because it only deals with the information that's relevant at that particular moment rather than keeping the entire tree in memory at once. It may take a little getting used to, but you'll find that it can be an extremely versatile item in your toolbox.