XML Reference Guide

Mar 14, 2003

␡

⎙ Print

< Back Page 26 of 278 Next >

Perl was originally designed as a lnaguage for sorting through text, so it's not surprising that it is a good fit for XML. In fact, there are multiple ways to handle XML using Perl, so in this section we're going to look at manipulating DOM "objects" using the Perl XML::DOM module, available on CPAN.

As of version 2.0, the Document Object Model defines the ways and means of manipulating a Document object and the objects contained by it, but doesn't define any way to create, load, or save (or persist, serialize, or whatever other verb you'd like to use) a Document. In this section, we'll concentrate on getting a feel for how these manipulations work by loading a simple document, making some changes to it, and viewing the results.

Consider, for example, the following sample file, candy.xml:

<?xml version="1.0"?>
<candy>
  <product>Mints</product>
  <product>Chocolate</product>
  <product>Circus Peanuts</product>
</candy>

Using the XML capabilities of the XML::DOM module, we can create a new Document object that represents the data in that file using the module's built-in parser:

use XML::DOM;

my $parser = new XML::DOM::Parser;
my $doc = $parser->parsefile ("candy.xml");

Remember, this capability is not part of the DOM Recommendation, and is implementation-specific. A different XML-related module might do this differently.

Manipulating the Document object, on the other hand, is part of the recommendation:

use XML::DOM;

my $parser = new XML::DOM::Parser;
my $doc = $parser->parsefile ("candy.xml");

my $root = $doc->getDocumentElement();
print "\nThe root element is ", $root->getNodeName(), ".\n";

my @children = $root->getChildNodes();
print "There are ", scalar(@children), " child elements.\n";
  
print "They are: \n";

foreach my $child ($root->getChildNodes()) {
     if ($child->getNodeType == TEXT_NODE){
         print "Text: ", $child->getData();
     } elsif ($child->getNodeType == ELEMENT_NODE) {
         print $child->getNodeName(), " = ", $child->getFirstChild()->getData(), "\n";	
     }
}

To start with, we get a reference to the root element of the document, also called the document element. That element is a Node object, but because it's also an Element it has a name, which we can retrieve using the getNodeName() method.

Next, we can get all of the child nodes of the root element using the getChildNodes() method, which returns an array that includes both the candy elements and the text nodes in between them. We can iterate through that array much as we would iterate through any other array. Each node has a type, which you can test against constants such as TEXT_NODE and ELEMENT_NODE to determine what it is. That information can be important. For example, an Element node has a name, so you can use getNodeName(), but not a value, so you can't use getData(). The situation is reversed for a Text node.

Note that the text "content" of an element is also the first child of that element, as you can see when we retrieve the text contained within each candy element using first getFirstChild() and then getData().

The results are as follows:

The root element is candy.
There are 7 child elements.
They are:
Text:
  product = Mints
Text:
  product = Chocolate
Text:
  product = Circus Peanuts
Text:

DOM also defines the ways in which you can add content to a document:

...

my @products = $root->getElementsByTagName("product");
$productNum = 1;
foreach my $product (@products) {
      my $productElement = $product;

      $productElement->setAttributeNode($doc->createAttribute("productNumber"));
      $productElement->setAttribute("productNumber", ("Product " + $productNum));

      $productName = $productElement->getFirstChild()->getData();
      $productElement->getFirstChild()->setNodeValue(uc($productName));

      $updateElement = $doc->createElement("updated");
      $rightNow = time();
      $updateText = $doc->createTextNode($rightNow);

      $updateElement->appendChild($updateText);
      $productElement->appendChild($updateElement);

      $productNum = $productNum + 1;
}

Here we are creating a NodeList array by selecting all of the product elements. For each one, we can create and populate an attribute, and then get and set the value of it's text child.

As far as creating a new node, that's a job for the Document object, which carries methods such as createElement() and createTextNode(). Once you create and populate these nodes, you can append them to a particular element.

As far as viewing the results of all these machinations, we can create a traverse() routine that loops through each of the elements, along with their attributes and children:

use XML::DOM;

sub traverse {
  my($node)= @_;
  if ($node->getNodeType == ELEMENT_NODE) {
    print "<", $node->getNodeName;
    $thisItem = 0;
    $atts = $node->getAttributes();
    while ($atts->item($thisItem)){
       print " ", $atts->item($thisItem)->getNodeName();
       print "=\"", $atts->item($thisItem)->getNodeValue(), "\"";
       $thisItem++
    }
    print ">";
    foreach my $child ($node->getChildNodes()) {
      traverse($child);
    }
    print "</", $node->getNodeName, ">";
  } elsif ($node->getNodeType() == TEXT_NODE) {
    print $node->getData;
  }
}

my $parser = new XML::DOM::Parser;
my $doc = $parser->parsefile ("candy.xml");

...

      $updateElement->appendChild($updateText);
      $productElement->appendChild($updateElement);

      $productNum = $productNum + 1;
}

traverse($root)

Running this routine shows you the new structure:

<candy>
  <product productNumber="1">MINTS<updated>1077652330</updated></product>
  <product productNumber="2">CHOCOLATE<updated>1077652330</updated></product>
  <product productNumber="3">CIRCUS PEANUTS<updated>1077652330</updated></product>
</candy>

< Back Page 26 of 278 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address