- XML Reference Guide
- Overview
- What Is XML?
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Table of Contents
- The Document Object Model
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- DOM and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Implementations
- DOM and JavaScript
- Using a Repeater
- Repeaters and XML
- Repeater Resources
- DOM and .NET
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Downloads
- DOM and C++
- DOM and C++ Resources
- DOM and Perl
- DOM and Perl Resources
- DOM and PHP
- DOM and PHP Resources
- DOM Level 3
- DOM Level 3 Core
- DOM Level 3 Load and Save
- DOM Level 3 XPath
- DOM Level 3 Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- The Simple API for XML (SAX)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SAX and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- SAX and .NET
- Informit Articles and Sample Chapters
- SAX and Perl
- SAX and Perl Resources
- SAX and PHP
- SAX and PHP Resources
- Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Document Type Definitions (DTDs)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Schemas
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- RELAX NG
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Schematron
- Official Documentation and Implementations
- Validation in Applications
- Informit Articles and Sample Chapters
- Books and e-Books
- XSL Transformations (XSLT)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XSLT in Java
- Java in XSLT Resources
- XSLT and RSS in .NET
- XSLT and RSS in .NET Resources
- XSL-FO
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XPath
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Base
- Informit Articles and Sample Chapters
- Official Documentation
- XHTML
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XHTML 2.0
- Documentation
- Cascading Style Sheets
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XUL
- XUL References
- XML Events
- XML Events Resources
- XML Data Binding
- Informit Articles and Sample Chapters
- Books and e-Books
- Specifications
- Implementations
- XML and Databases
- Informit Articles and Sample Chapters
- Books and e-Books
- Online Resources
- Official Documentation
- SQL Server and FOR XML
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- Service Oriented Architecture
- Web Services
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Creating a Perl Web Service Client
- SOAP::Lite
- Amazon Web Services
- Creating the Movable Type Plug-in
- Perl, Amazon, and Movable Type Resources
- Apache Axis2
- REST
- REST Resources
- SOAP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SOAP and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- WSDL
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- UDDI
- UDDI Resources
- XML-RPC
- XML-RPC in PHP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Ajax
- Asynchronous Javascript
- Client-side XSLT
- SAJAX and PHP
- Ajax Resources
- JSON
- Ruby on Rails
- Creating Objects
- Ruby Basics: Arrays and Other Sundry Bits
- Ruby Basics: Iterators and Persistence
- Starting on the Rails
- Rails and Databases
- Rails: Ajax and Partials
- Rails Resources
- Web Services Security
- Web Services Security Resources
- SAML
- Informit Articles and Sample Chapters
- Books and e-Books
- Specification and Implementation
- XML Digital Signatures
- XML Digital Signatures Resources
- XML Key Management Services
- Resources for XML Key Management Services
- Internationalization
- Resources
- Grid Computing
- Grid Resources
- Web Services Resource Framework
- Web Services Resource Framework Resources
- WS-Addressing
- WS-Addressing Resources
- WS-Notifications
- New Languages: XML in Use
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Google Web Toolkit
- GWT Basic Interactivity
- Google Sitemaps
- Google Sitemaps Resources
- Accessibility
- Web Accessibility
- XML Accessibility
- Accessibility Resources
- The Semantic Web
- Defining a New Ontology
- OWL: Web Ontology Language
- Semantic Web Resources
- Google Base
- Microformats
- StructuredBlogging
- Live Clipboard
- WML
- XHTML-MP
- WML Resources
- Google Web Services
- Google Web Services API
- Google Web Services Resources
- The Yahoo! Web Services Interface
- Yahoo! Web Services and PHP
- Yahoo! Web Services Resources
- eBay REST API
- WordML
- WordML Part 2: Lists
- WordML Part 3: Tables
- WordML Resources
- DocBook
- Articles
- Books and e-Books
- Official Documentation and Implementations
- XML Query
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XForms
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Resource Description Framework (RDF)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Topic Maps
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation, Implementations, and Other Resources
- Rich Site Summary (RSS)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Simple Sharing Extensions (SSE)
- Atom
- Podcasting
- Podcasting Resources
- Scalable Vector Graphics (SVG)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- OPML
- OPML Resources
- Summary
- Projects
- JavaScript TimeTracker: JSON and PHP
- The Javascript Timetracker
- Refactoring to Javascript Objects
- Creating the Yahoo! Widget
- Web Mashup
- Google Maps
- Indeed Mashup
- Mashup Part 3: Putting It All Together
- Additional Resources
- Frequently Asked Questions About XML
- What's XML, and why should I use it?
- What's a well-formed document?
- What's the difference between XML and HTML?
- What's the difference between HTML and XHTML?
- Can I use XML in a browser?
- Should I use elements or attributes for my document?
- What's a namespace?
- Where can I get an XML parser?
- What's the difference between a well-formed document and a valid document?
- What's a validating parser?
- Should I use DOM or SAX for my application?
- How can I stop a SAX parser before it has parsed the entire document?
- 2005 Predictions
- 2006 Predictions
- Nick's Book Picks
Rather than treating an XML document as a tree-like structure,
the Simple API for XML, or SAX, treats it as a series of events such as
startDocument
or endElement
.
Creating a SAX application involves processing events as they arrive,
keeping in mind that at any given moment, the application knows only
about the current event; if you need information about previous events,
you need to save it yourself. For example, consider the order.xml
file:
<?xml version="1.0"?> <order orderid="THX1138" customerNumber="3263827"> <lineitem itemid="C33"> <item>3/4" Hex Bolt</item> <quantity>36</quantity> <unitprice currency="dollars">.35</unitprice> </lineitem> <lineitem itemid="M48"> <item>Condenser</item> <quantity>1</quantity> <unitprice currency="dollars">2200</unitprice> </lineitem> <delivery>Overnight</delivery> </order>
In order to create a Perl application that displays the content of the file, we would need to first create the parser and use it to parse the file:
package MyContentHandler; use XML::SAX; use base qw(XML::SAX::Base); XML::SAX->add_parser(q(XML::SAX::PurePerl)); my $factory = XML::SAX::ParserFactory->new(); my $parser = $factory->parser( Handler => MyContentHandler->new() ); eval { $parser->parse_uri('order.xml'); }; print "Error parsing file: $@" if $@;
Starting at the top, we're declaring this code to be part of a package,
which will give us the opportunity to reference it as a class. Next,
we're making the XML::SAX
and XML::SAX::Base
packages available. (You
can install these from CPAN.) Next, we're making the XML::SAX::PurePerl
parser, part of the XML::SAX
package, available. (Depending on your
installation, you may be able to skip this step.)
Next, create the ParserFactory
, and then the parser
itself. Notice that when we create the parser, we're also providing
a reference for the Handler
, which will handle the actual
events when the parser parses the file.
Finally, we're parsing the file and outputting any errors that were encountered.
Now, if you run this script, you'll notice that nothing happens. Or
at least, nothing seems to happen. The various methods, such as start_element
,
are executed, but the default implementation is to do nothing with them. To
change that, we can override those methods here in the script, like so:
package MyContentHandler; use base qw(XML::SAX::Base); sub start_document { print "Document START\n"; } sub end_document { print "Document END\n"; } sub start_element{ my $self = shift; my $el = shift; $self->display_text(); print "Element START: \n"; print "Namespace = $el->{NamespaceURI}\n"; print "LocalName= $el->{LocalName}\n"; foreach my $ak (keys %{ $el->{Attributes} } ) { my $at = $el->{Attributes}->{$ak}; print qq(Attribute $at->{Name} = "$at->{Value}"\n); } print "***************n"; } sub end_element{ my $self = shift; my $el = shift; print "Element END: \n"; print "Namespace = $el->{NamespaceURI}\n"; print "LocalName= $el->{LocalName}\n"; print "***************n"; } use XML::SAX; XML::SAX->add_parser(q(XML::SAX::PurePerl)); my $factory = XML::SAX::ParserFactory->new(); my $parser = $factory->parser( Handler => MyContentHandler->new() ); eval { $parser->parse_uri('order.xml'); }; print "Error parsing file: $@" if $@;
The first two subroutines are easy. When the parser starts parsing the
document, the start_document
event fires, and which causes the script to
execute the start_document()
method. When it finishes the file, the
end_document()
method executes.
For each element, it's a little more complicated. The start_element()
and end_element()
methods execute when you'd expect them to (that is,
with their corresponding events) but in this case they also serve additional
functions. In each case, _@
represents the arguments passed to the
method, so first we're retrieving a reference to the overall class. We can then
retrieve the actual element object itself.
In the case of the start_element()
method, the element is
represented by a hash that includes information on the namespace, the local
name of the element, and other information, including the attributes.
The attributes are themselves a hash, with the name and value accessible.
If we run the script, we get the following output:
Document START Element START: Namespace= LocalName= order Attribute customerNumber = "3263827" Attribute orderid = "THX1138" *************** Element START: Namespace= LocalName= lineitem Attribute itemid = "C33" *************** Element START: Namespace= LocalName= item *************** Element END: Namespace= LocalName= item *************** Element START: Namespace= LocalName= quantity *************** Element END: Namespace= LocalName= quantity *************** Element START: Namespace= LocalName= unitprice Attribute currency = "dollars" *************** Element END: Namespace= LocalName= unitprice *************** Element END: Namespace= LocalName= lineitem *************** Element START: Namespace= LocalName= lineitem Attribute itemid = "M48" *************** Element START: Namespace= LocalName= item *************** Element END: Namespace= LocalName= item *************** Element START: Namespace= LocalName= quantity *************** Element END: Namespace= LocalName= quantity *************** Element START: Namespace= LocalName= unitprice Attribute currency = "dollars" *************** Element END: Namespace= LocalName= unitprice *************** Element END: Namespace= LocalName= lineitem *************** Element START: Namespace= LocalName= delivery *************** Element END: Namespace= LocalName= delivery *************** Element END: Namespace= LocalName= order *************** Document END
So far we've looked at the elements, but not to their content. The text children
of an element are fed as characters, but there's no guaranteeing
that you'll get all of them in one event. For example, the item name
"Condenser" could be supplied in a single event, or it could come as a series
of events, where one or more characters are fed to the characters()
method. To solve that problem, we can save the current text as part of the
object, and then display and clear it as necessary:
package MyContentHandler; use base qw(XML::SAX::Base); sub start_document { print "Document START\n"; my $self = shift; $self->{text} = ''; } sub end_document { print "Document END\n"; } sub start_element{ my $self = shift; my $el = shift; $self->display_text(); print "Element START: \n"; print "Namespace= $el->{NamespaceURI}\n"; print "LocalName= $el->{LocalName}\n"; foreach my $ak (keys %{ $el->{Attributes} } ) { my $at = $el->{Attributes}->{$ak}; print qq(Attribute $at->{Name} = "$at->{Value}"\n); } print "***************\n"; } sub end_element{ my $self = shift; my $el = shift; $self->display_text(); print "Element END: \n"; print "Namespace= $el->{NamespaceURI}\n"; print "LocalName= $el->{LocalName}\n"; print "***************\n"; } sub display_text { my $self = shift; if ( defined( $self->{text} ) && $self->{text} ne "" ) { print " text: [$self->{text}]\n"; $self->{text} = ''; } } sub characters { my $self = shift; my $text = shift; $self->{text} .= $text->{Data}; } use XML::SAX; XML::SAX->add_parser(q(XML::SAX::PurePerl)); my $factory = XML::SAX::ParserFactory->new(); my $parser = $factory->parser( Handler => MyContentHandler->new() ); eval { $parser->parse_uri('order.xml'); }; print "Error parsing file: $@" if $@;
The issue of text illustrates the point I made earlier about having to save information explicitly if it needs to be available across events, but it's not the only reason. For example, consider this version of the script, which lists the order information, including the extended total for each item and the grand total for the order:
package MyContentHandler; use base qw(XML::SAX::Base); sub start_document { my $self = shift; $self->{text} = ''; $total_price = 0; } sub end_document { print "Order total: $total_price\n"; } sub start_element{ my $self = shift; my $el = shift; $local_name = $el->{LocalName}; if ($local_name eq "order") { my $at = $el->{Attributes}->{"{}orderid"}; my $order_id = $at->{Value}; my $customer_id = $el->{Attributes}->{"{}customerNumber"}->{Value}; print "Order $order_id for customer $customer_id\n"; } elsif ($local_name eq "lineitem"){ $item_id = $el->{Attributes}->{"{}itemid"}->{Value}; } $currentElement = $local_name; $self->clear_text(); } sub end_element{ my $self = shift; my $el = shift; my $text = $self->get_text(); my $local_name = $el->{LocalName}; if ($text ne '') { if ($local_name eq "item"){ $item_name = $text; } elsif ($local_name eq "quantity"){ $quantity = $text; } elsif ($local_name eq "unitprice"){ $unit_price = $text; } } if ($local_name eq "lineitem"){ my $extended_price = $quantity * $unit_price; print "Item: $item_name ($item_id) $quantity @ $unit_price = $extended_price\n"; $total_price = $total_price + $extended_price; $item_name = ""; $quantity = ""; $unitprice = ""; } } sub display_text { my $self = shift; print $self->get_text(); } sub get_text { my $self = shift; my $text = ''; if ( defined( $self->{text} ) ) { $text = $self->{text}; $self->{text} = ''; } return $text; } sub clear_text { my $self = shift; $self->{text} = ''; } sub characters { my $self = shift; my $text = shift; $self->{text} .= $text->{Data}; } package main; use XML::SAX; XML::SAX->add_parser(q(XML::SAX::PurePerl)); $total_price = 0; $item_id = ''; $current_element = ''; $item_name = ''; $quantity = ''; $unit_price = ''; my $factory = XML::SAX::ParserFactory->new(); my $parser = $factory->parser( Handler => MyContentHandler->new() ); eval { $parser->parse_uri('order.xml'); }; print "Error parsing file: $@" if $@;
In essence, this script does the same job the last one did, in that it parses the XML file and saves data so that it can act on it later. In this case, however, the script is also performing (albeit rudimentary) analysis on the streamed data. Executing it provides the result:
Order THX1138 for customer 3263827 Item: 3/4" Hex Bolt (C33) 36 @ .35 = 12.6 Item: Condenser (M48) 1 @ 2200 = 2200 Order total: 2212.6