- XML Reference Guide
- Overview
- What Is XML?
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Table of Contents
- The Document Object Model
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- DOM and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Implementations
- DOM and JavaScript
- Using a Repeater
- Repeaters and XML
- Repeater Resources
- DOM and .NET
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Downloads
- DOM and C++
- DOM and C++ Resources
- DOM and Perl
- DOM and Perl Resources
- DOM and PHP
- DOM and PHP Resources
- DOM Level 3
- DOM Level 3 Core
- DOM Level 3 Load and Save
- DOM Level 3 XPath
- DOM Level 3 Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- The Simple API for XML (SAX)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SAX and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- SAX and .NET
- Informit Articles and Sample Chapters
- SAX and Perl
- SAX and Perl Resources
- SAX and PHP
- SAX and PHP Resources
- Validation
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Document Type Definitions (DTDs)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Schemas
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- RELAX NG
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Schematron
- Official Documentation and Implementations
- Validation in Applications
- Informit Articles and Sample Chapters
- Books and e-Books
- XSL Transformations (XSLT)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XSLT in Java
- Java in XSLT Resources
- XSLT and RSS in .NET
- XSLT and RSS in .NET Resources
- XSL-FO
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XPath
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XML Base
- Informit Articles and Sample Chapters
- Official Documentation
- XHTML
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XHTML 2.0
- Documentation
- Cascading Style Sheets
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XUL
- XUL References
- XML Events
- XML Events Resources
- XML Data Binding
- Informit Articles and Sample Chapters
- Books and e-Books
- Specifications
- Implementations
- XML and Databases
- Informit Articles and Sample Chapters
- Books and e-Books
- Online Resources
- Official Documentation
- SQL Server and FOR XML
- Informit Articles and Sample Chapters
- Books and e-Books
- Documentation and Implementations
- Service Oriented Architecture
- Web Services
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Creating a Perl Web Service Client
- SOAP::Lite
- Amazon Web Services
- Creating the Movable Type Plug-in
- Perl, Amazon, and Movable Type Resources
- Apache Axis2
- REST
- REST Resources
- SOAP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- SOAP and Java
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- WSDL
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- UDDI
- UDDI Resources
- XML-RPC
- XML-RPC in PHP
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Ajax
- Asynchronous Javascript
- Client-side XSLT
- SAJAX and PHP
- Ajax Resources
- JSON
- Ruby on Rails
- Creating Objects
- Ruby Basics: Arrays and Other Sundry Bits
- Ruby Basics: Iterators and Persistence
- Starting on the Rails
- Rails and Databases
- Rails: Ajax and Partials
- Rails Resources
- Web Services Security
- Web Services Security Resources
- SAML
- Informit Articles and Sample Chapters
- Books and e-Books
- Specification and Implementation
- XML Digital Signatures
- XML Digital Signatures Resources
- XML Key Management Services
- Resources for XML Key Management Services
- Internationalization
- Resources
- Grid Computing
- Grid Resources
- Web Services Resource Framework
- Web Services Resource Framework Resources
- WS-Addressing
- WS-Addressing Resources
- WS-Notifications
- New Languages: XML in Use
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Google Web Toolkit
- GWT Basic Interactivity
- Google Sitemaps
- Google Sitemaps Resources
- Accessibility
- Web Accessibility
- XML Accessibility
- Accessibility Resources
- The Semantic Web
- Defining a New Ontology
- OWL: Web Ontology Language
- Semantic Web Resources
- Google Base
- Microformats
- StructuredBlogging
- Live Clipboard
- WML
- XHTML-MP
- WML Resources
- Google Web Services
- Google Web Services API
- Google Web Services Resources
- The Yahoo! Web Services Interface
- Yahoo! Web Services and PHP
- Yahoo! Web Services Resources
- eBay REST API
- WordML
- WordML Part 2: Lists
- WordML Part 3: Tables
- WordML Resources
- DocBook
- Articles
- Books and e-Books
- Official Documentation and Implementations
- XML Query
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- XForms
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Resource Description Framework (RDF)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Topic Maps
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation, Implementations, and Other Resources
- Rich Site Summary (RSS)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- Simple Sharing Extensions (SSE)
- Atom
- Podcasting
- Podcasting Resources
- Scalable Vector Graphics (SVG)
- Informit Articles and Sample Chapters
- Books and e-Books
- Official Documentation
- OPML
- OPML Resources
- Summary
- Projects
- JavaScript TimeTracker: JSON and PHP
- The Javascript Timetracker
- Refactoring to Javascript Objects
- Creating the Yahoo! Widget
- Web Mashup
- Google Maps
- Indeed Mashup
- Mashup Part 3: Putting It All Together
- Additional Resources
- Frequently Asked Questions About XML
- What's XML, and why should I use it?
- What's a well-formed document?
- What's the difference between XML and HTML?
- What's the difference between HTML and XHTML?
- Can I use XML in a browser?
- Should I use elements or attributes for my document?
- What's a namespace?
- Where can I get an XML parser?
- What's the difference between a well-formed document and a valid document?
- What's a validating parser?
- Should I use DOM or SAX for my application?
- How can I stop a SAX parser before it has parsed the entire document?
- 2005 Predictions
- 2006 Predictions
- Nick's Book Picks
Perl was originally designed as a lnaguage for sorting through text, so it's not surprising that it is a good fit for XML. In fact, there are multiple ways to handle XML using Perl, so in this section we're going to look at manipulating DOM "objects" using the Perl XML::DOM module, available on CPAN.
As of version 2.0, the Document Object Model defines the ways and means of manipulating a
Document
object and the objects contained by it, but doesn't define any way to
create, load, or save (or persist, serialize, or whatever other verb you'd like to use) a Document
.
In this section, we'll concentrate on getting a feel for how these manipulations work by loading a simple
document, making some changes to it, and viewing the results.
Consider, for example, the following sample file, candy.xml
:
<?xml version="1.0"?> <candy> <product>Mints</product> <product>Chocolate</product> <product>Circus Peanuts</product> </candy>
Using the XML capabilities of the XML::DOM module, we can create a new Document
object that represents the data in that file using the module's built-in parser:
use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("candy.xml");
Remember, this capability is not part of the DOM Recommendation, and is implementation-specific. A different XML-related module might do this differently.
Manipulating the Document
object, on the other hand, is part of the recommendation:
use XML::DOM; my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("candy.xml"); my $root = $doc->getDocumentElement(); print "\nThe root element is ", $root->getNodeName(), ".\n"; my @children = $root->getChildNodes(); print "There are ", scalar(@children), " child elements.\n"; print "They are: \n"; foreach my $child ($root->getChildNodes()) { if ($child->getNodeType == TEXT_NODE){ print "Text: ", $child->getData(); } elsif ($child->getNodeType == ELEMENT_NODE) { print $child->getNodeName(), " = ", $child->getFirstChild()->getData(), "\n"; } }
To start with, we get a reference to the root element of the document, also called the
document element. That element is a Node
object, but because it's also an
Element
it has a name, which we can retrieve using the getNodeName()
method.
Next, we can get all of the child nodes of the root element using the getChildNodes()
method,
which returns an array that includes both the candy
elements and the text nodes in between
them. We can iterate through that array much as we would iterate through any other array.
Each node has a type, which you can test against constants such as TEXT_NODE
and
ELEMENT_NODE
to determine what it is. That information can be important. For example, an
Element
node has a name, so you can use getNodeName()
, but not a value, so
you can't use getData()
. The situation is reversed for a Text
node.
Note that the text "content" of an element is also the first child of that element, as you can
see when we retrieve the text contained within each candy
element using first getFirstChild()
and then getData()
.
The results are as follows:
The root element is candy. There are 7 child elements. They are: Text: product = Mints Text: product = Chocolate Text: product = Circus Peanuts Text:
DOM also defines the ways in which you can add content to a document:
... my @products = $root->getElementsByTagName("product"); $productNum = 1; foreach my $product (@products) { my $productElement = $product; $productElement->setAttributeNode($doc->createAttribute("productNumber")); $productElement->setAttribute("productNumber", ("Product " + $productNum)); $productName = $productElement->getFirstChild()->getData(); $productElement->getFirstChild()->setNodeValue(uc($productName)); $updateElement = $doc->createElement("updated"); $rightNow = time(); $updateText = $doc->createTextNode($rightNow); $updateElement->appendChild($updateText); $productElement->appendChild($updateElement); $productNum = $productNum + 1; }
Here we are creating a NodeList
array by selecting all of the product
elements.
For each one, we can create and populate an attribute, and then get and set the value of it's text child.
As far as creating a new node, that's a job for the Document
object, which carries
methods such as createElement()
and createTextNode()
. Once you create
and populate these nodes, you can append them to a particular element.
As far as viewing the results of all these machinations, we can create a traverse()
routine that loops through each of the element
s, along with their attributes and children:
use XML::DOM; sub traverse { my($node)= @_; if ($node->getNodeType == ELEMENT_NODE) { print "<", $node->getNodeName; $thisItem = 0; $atts = $node->getAttributes(); while ($atts->item($thisItem)){ print " ", $atts->item($thisItem)->getNodeName(); print "=\"", $atts->item($thisItem)->getNodeValue(), "\""; $thisItem++ } print ">"; foreach my $child ($node->getChildNodes()) { traverse($child); } print "</", $node->getNodeName, ">"; } elsif ($node->getNodeType() == TEXT_NODE) { print $node->getData; } } my $parser = new XML::DOM::Parser; my $doc = $parser->parsefile ("candy.xml"); ... $updateElement->appendChild($updateText); $productElement->appendChild($updateElement); $productNum = $productNum + 1; } traverse($root)
Running this routine shows you the new structure:
<candy> <product productNumber="1">MINTS<updated>1077652330</updated></product> <product productNumber="2">CHOCOLATE<updated>1077652330</updated></product> <product productNumber="3">CIRCUS PEANUTS<updated>1077652330</updated></product> </candy>