- XML Support in Java
- XML and .NET
- Reading and Writing XML
- Using the DOM API in .NET
- Summary
20.4 Using the DOM API in .NET
The .NET DOM implementation (System.Xml.XmlDocument) supports all of DOM Level 1 and all of DOM Level 2 Core, but with a few minor naming changes. DOM loading is built on top of XmlReader, and DOM serialization is built on XmlWriter.
As mentioned earlier, central to the DOM API is the DOM tree data structure. The tree consists of nodes or elements. All nodes of a typical XML document (elements, processing instructions, attributes, text, comments, and so on) are represented in the DOM tree. Because the DOM tree is built in memory for the entire document, you are free to navigate anywhere in the document (unlike the XmlReader class, which has a forward-only cursor).
The core class that forms the root of this tree is the XmlDocument class. You create an XmlDocument object by loading any well-formed XML data stream. Listing 20.8 shows how to load the XmlDocument from the people.xml file and how to traverse the document. The best way to traverse a tree data structure is by recursion. Listing 20.8 shows how you can use recursion to traverse the XML DOM tree.
Listing 20.8 Traversing the DOM Tree (C#)
using System; using System.Xml; public class Test { public static void Main(string[] args) { XmlDocument doc = new XmlDocument(); doc.Load("c:\\people.xml"); PrintNodeDetail(doc.GetElementsByTagName("People")[0]); } private static void PrintNodeDetail (XmlNode node) { //Print the node type, node //name and node value (if any) of the node if (node.NodeType == XmlNodeType.Text) { Console.WriteLine("Type= [" ode.NodeType+"] Value=" ode.Value); } else { Console.WriteLine("Type= [" ode.NodeType+"] Name=" ode.Name); } //Print attributes of the node if (node.Attributes != null) { XmlAttributeCollection attrs = node.Attributes; foreach (XmlAttribute attr in attrs) { Console.WriteLine("\tAttribute Name ="+attr.Name+" Attribute Value ="+attr.Value); } } //Print individual children of the node XmlNodeList children = node.ChildNodes; foreach (XmlNode child in children) { PrintNodeDetail (child); } } }
Here is a partial output of Listing 20.8:
Type= [Element] Name=People Type= [Element] Name=Person Attribute Name =id Attribute Value =1 Attribute Name =ssn Attribute Value =555121212 Type= [Element] Name=Name Type= [Element] Name=FirstName Type= [Text] Value=Joe Type= [Element] Name=LastName Type= [Text] Value=Suits Type= [Element] Name=Address Type= [Element] Name=Street Type= [Text] Value=1800 Success Way Type= [Element] Name=City Type= [Text] Value=Redmond Type= [Element] Name=State Type= [Text] Value=WA Type= [Element] Name=ZipCode Type= [Text] Value=98052 Type= [Element] Name=Job Type= [Element] Name=Title Type= [Text] Value=CEO Type= [Element] Name=Description Type= [Text] Value=Wears the nice suit
The XmlDocument is first instantiated, and a file URL is passed to it that contains well-formed XML. The document loads the XML from the file and automatically generates the DOM tree:
XmlDocument doc = new XmlDocument(); doc.Load("c:\\people.xml");
Next, we get a handle to the very first node of the document tree:
doc.GetElementsByTagName("People")[0]
If you have dabbled with DHTML or JavaScript you will recognize the method name GetElementsByTagName.
After the first node is obtained, we use the PrintNodeDetail method to recursively traverse through all the children of that node. PrintNodeDetail is generic enough to print details of any node type. Remember that the DOM tree consists of nodes of different types (elements, attributes, processing instructions, comments, text nodes, and so on). We first print generic information about the node (name, type); if it's a text node, we print the value of the node:
if (node.NodeType == XmlNodeType.Text) { Console.WriteLine("Type= ["+node.NodeType+"] Value="+node.Value); } else { Console.WriteLine("Type= ["+node.NodeType+"] Name="+node._Name); }
Next, we print any attributes associated with the node:
if (node.Attributes != null) {
XmlAttributeCollection attrs = node.Attributes;
foreach (XmlAttribute attr in attrs) {
Console.WriteLine("\tAttribute Name ="+attr.Name+
" Attribute Value ="+attr.Value);
}
}
The last step gets all the children of the current node and calls Print_NodeDetail on each of the children.
Note that the ChildNodes method gets only the direct children of the node. To get all children of a node, you must use recursive code:
XmlNodeList children = node.ChildNodes; foreach (XmlNode child in children) { PrintNodeDetail (child); }
If you have used the JDOM library (http://www.jdom.org) for parsing DOM trees in Java, the .NET DOM API should be fairly easy to follow.
Because of its in-memory storage of the entire XML document, the DOM API provides a convenient way to query for nodes matching certain criteria. Listing 20.9 shows ways to query for specific nodes.
Listing 20.9 Querying for Specific Nodes (C#)
using System; using System.Xml; public class Test { public static void Main(string[] args) { XmlDocument doc = new XmlDocument(); doc.Load("c:\\people.xml"); //Get all job titles in the people.xml XmlNodeList list = doc.GetElementsByTagName("Title"); foreach (XmlNode node in list) { Console.WriteLine(node.FirstChild.Value); } //Get the last name of the first person XmlNode nn = doc.GetElementsByTagName("Person")[0]; Traverse(nn); } private static void Traverse(XmlNode node) { foreach (XmlNode child in node.ChildNodes) { if ("LastName".Equals(child.Name) && child.NodeType == XmlNodeType.Element) { Console.WriteLine(child.FirstChild.Value); } Traverse(child); } } }
The output of Listing 20.9 is as follows:
CEO Attorney Pro Surfer Web Site Developer Suits
The XmlDocument object supports two methods: GetElementsByTag_Name and GetElementByID. These method names are very efficient at zooming into a specific location in the XML file. For example, to get a list of all job titles in the XML document, you search for all elements having the tag name "Title":
XmlNodeList list = doc.GetElementsByTagName("Title"); foreach (XmlNode node in list) { Console.WriteLine(fnode.FirstChild.Value); }
Although the DOM tree provides for bidirectional navigation of the XML document, the search for specific details is still sequential, as indicated by the Traverse(..) method of Listing 20.9. In the Traverse method we must recursively iterate through each child of the first Person node and check whether the node is a LastName node. The DOM API is not the most efficient way for advanced data mining of XML documents. You should use XPath for that purpose.
The DOM API can also be used to write XML documents. Listing 20.10 shows the ToXml() method of the Person object written using the DOM API.
Listing 20.10 Writing XML Using DOM (C#)
using System; using System.Text; using System.IO; using System.Xml; using System.Collections; public class Person { Hashtable attributes; public Person () { attributes = new Hashtable(); attributes.Add("id", null); attributes.Add("ssn", null); attributes.Add("FirstName", null); attributes.Add("LastName", null); attributes.Add("City", null); attributes.Add("State", null); attributes.Add("Street", null); attributes.Add("ZipCode", null); attributes.Add("Title", null); attributes.Add("Description", null); } public object GetID() { return attributes["id"]; } public object GetFirstName() { return attributes["First_ame"]; } public object GetLastName() { return attributes["LastName"]; } public object GetSSN() { return attributes["ssn"]; } public object GetCity() { return attributes["City"]; } public object GetState() { return attributes["State"]; } public object GetStreet() { return attributes["Street"]; } public object GetZip() { return attributes["ZipCode"]; } public object GetTitle() { return attributes["Title"]; } public object GetDescription() { return attributes["Description"]; } public void SetID(object o) { attributes["id"] = o; } public void SetFirstName(object o) { attributes["FirstName"] = o; } public void SetLastName(object o) { attributes["LastName"]= o; } public void SetSSN(object o) { attributes["ssn"] = o; } public void SetCity(object o) { attributes["City"] = o; } public void SetState(object o) { attributes["State"] = o; } public void SetStreet(object o) { attributes["Street"] = o; } public void SetZip(object o) { attributes["ZipCode"] = o; } public void SetTitle(object o) { attributes["Title"] = o; } public void SetDescription(object o) { attributes["Description"] = o; } public void ToXml() { XmlTextWriter tw = new XmlTextWriter(Console.Out); XmlNode childNode = null; tw.Formatting = Formatting.Indented; XmlDocument doc = new XmlDocument(); XmlNode node = doc.CreateElement("Person"); AppendAttribute(doc, node, "id"); AppendAttribute(doc, node, "ssn"); doc.AppendChild(node); childNode = AppendElement(doc, node, "Name",true); AppendElement(doc, childNode, "LastName", false); AppendElement(doc, childNode, "FirstName", false); childNode = AppendElement(doc, node, "Address",true); AppendElement(doc, childNode, "Street", false); AppendElement(doc, childNode, "City", false); AppendElement(doc, childNode, "State", false); AppendElement(doc, childNode, "ZipCode", false); childNode = AppendElement(doc, node, "Job",true); AppendElement(doc, childNode, "Title", false); AppendElement(doc, childNode, "Description", false); doc.WriteContentTo(tw); } private XmlNode AppendElement(XmlDocument doc, XmlNode parent, string name, bool containerElement) { XmlNode child = doc.CreateNode(XmlNodeType.Element,name,""); if (!containerElement) { child.AppendChild(doc.CreateTextNode( (string)attributes[name])); } parent.AppendChild(child); return child; } private void AppendAttribute(XmlDocument doc, XmlNode parent, string name) { XmlAttribute child = doc.CreateAttribute(name); child.Value = (string)attributes[name]; parent.Attributes.Append(child); } public static void Main(string[] args) { Person p = new Person(); p.SetFirstName("Jack"); p.SetID("jnicholson"); p.SetSSN("123456789"); p.SetLastName("Nicholson"); p.SetStreet("101 Acting Blvd"); p.SetCity("Beverly Hills"); p.SetState("CA"); p.SetZip("90210"); p.SetTitle("Actor"); p.SetDescription("Acted as Colonel Nathan Jessop in A Few Good Men"); p.ToXml(); } }
The output of Listing 20.10 is as follows:
<Person id="jnicholson" ssn="123456789"> <Name> <LastName>Nicholson</LastName> <FirstName>Jack</FirstName> </Name> <Address> <Street>101 Acting Blvd</Street> <City>Beverly Hills</City> <State>CA</State> <ZipCode>90210</ZipCode> </Address> <Job> <Title>Actor</Title> <Description>Acted as Colonel Nathan Jessop in A Few Good Men</Description> </Job> </Person>
The ToXml() uses the AppendAttribute and AppendElement methods to abstract out the creation of elements and attributes. We start by creating the XmlTextWriter and signaling it to pretty-print the XML to Console.Out:
XmlTextWriter tw = new XmlTextWriter(Console.Out); XmlNode childNode = null; tw.Formatting = Formatting.Indented;
Next, we create the XmlDocument object to which all the elements will be appended. We create a Person node from the document object, which automatically appends the node to the document.
Next, we use the AppendAttribute call to set the attributes of the _Person node:
XmlNode node = doc.CreateElement("Person"); AppendAttribute(doc, node, "id");
The AppendAttribute method creates an attribute, sets its value, and then appends the attribute node to the collection of attributes of the _parent node.
Note that in the .NET API, an attribute is a type of node and hence must be appended to the parent node just as any other child node is:
XmlAttribute child = doc.CreateAttribute(name); child.Value = (string)attributes[name]; parent.Attributes.Append(child);
Next, we use repetitive calls to the AppendElement method to append the element nodes to the document. The method creates an element node and sets its value from the object if it is not a container node. (A container node is an element node that is not a text node; that is, it contains other element nodes as its children.) The method returns the child node created, and thus successive calls to the AppendElement method will append new element nodes to the most recent child node:
XmlNode child = doc.CreateNode(XmlNodeType.Element,name,""); if (!containerElement) { child.AppendChild(doc.CreateTextNode((string)attributes[_name])); } parent.AppendChild(child); return child;
DOM and the .NET version of SAX provide a way to traverse the document by going through every tag. To search XML documents, there is a better mechanism that allows you to construct complex query strings and zoom into the desired parts of an XML document. The XPath query language (discussed next) is a popular standard for querying XML documents. The .NET Framework has built-in support for XPath query processing. The next section looks at XSLT support in .NET.
20.4.1 XPath
Searching XML documents for meaningful data can become challenging without some kind of macro language that allows you to specify what you are searching for in a terse manner. As shown in the previous examples, although the DOM API provides methods for iterating over nodes, it is not powerful enough to do advanced queries. It also requires loading of the entire document.
The XPath specification provides a standard syntax for constructing search expressions that query and extract data from XML documents. The role of XPath expressions in XML is similar to that of SQL for an RDBMS data store. A discussion of the syntax of the XPath language is beyond the scope of this chapter.
The .NET Framework bundles all XPath-related classes in the _System.Xml.XPath namespace. The core class of this namespace is _XPathNavigator, which has methods for running XPath expressions, extracting data, and navigating through the data. In the next few examples we use .NET classes to run useful queries on the people.xml file.
Listing 20.11 is the first XPath example. Here, we create and run a simple XPath expression and extract data from it.
Listing 20.11 Using XPath (C#)
using System; using System.Xml; using System.Xml.XPath; public class Test { public static void Main(string[] args) { XPathDocument doc = new XPathDocument("c:\\people.xml"); XPathNavigator navig = doc.CreateNavigator(); //Get all FirstNames RunQuery(navig, "//FirstName/text()"); //Get the Job title of the person with SSN = 666131313 RunQuery(navig, "//Person[@ssn='666131313']/Job/Title"); //Get the cities corresponding to the addresses with state WA. RunQuery(navig, "//*[text()='WA']/parent::Address/City"); //Evaluate the sum of all ids. EvaluateQuery(navig, "sum(//@id)"); //Get the cities corresponding to the addresses with state CA. //Use a compiled expression XPathExpression cachedExpr = navig.Compile("//*[text()='CA']/parent::Address/City"); RunQuery(navig, cachedExpr); } private static void RunQuery(XPathNavigator navig, string expression) { XPathNodeIterator itr = navig.Select(expression); while (itr.MoveNext()) { XPathNavigator currNode = itr.Current; Console.WriteLine(currNode.Value); } } private static void RunQuery(XPathNavigator navig, XPathExpression expression) { XPathNodeIterator itr = navig.Select(expression); while (itr.MoveNext()) { XPathNavigator currNode = itr.Current; Console.WriteLine(currNode.Value); } } private static void EvaluateQuery(XPathNavigator navig, string expression) { Console.WriteLine(navig.Evaluate(expression)); } }
The output of Listing 20.11 is as follows:
Joe Linda Jeremy Joan Attorney Redmond Redmond 10 Paso Robles
As mentioned earlier, the XPathNavigator class is the workhorse of the XPath classes in C#. To get a handle to the navigator, you instantiate an XPathDocument object and then create the navigator object:
XPathDocument doc = new XPathDocument("c:\\people.xml"); XPathNavigator navig = doc.CreateNavigator();
The navigator object can be used to iterate through the NodeSet that is returned by the XPath expressions.
As you can see, you can get very specific with your search criteria using the XPath expression syntax. Queries either can return a node set that can be iterated over or can return an object value evaluated by the XPath language. To improve performance, you can also precompile the XPath expression and cache it for future evaluation:
XPathExpression cachedExpr = navig.Compile("//*[text()='CA']/parent::Address/City");
Listing 20.12 shows how to iterate through all the Person nodes to see whether any of the nodes match a criterion.
Listing 20.12 Matching Nodes in XPath (C#)
using System; using System.Xml; using System.Xml.XPath; public class Test { public static void Main(string[] args) { XPathDocument doc = new XPathDocument("c:\\people.xml"); XPathNavigator nav = doc.CreateNavigator(); XPathNodeIterator ni = nav.SelectDescendants("Person", "", false); XPathExpression expr = nav.Compile("Person[@id='2']"); while (ni.MoveNext()) { XPathNavigator nav2 = ni.Current.Clone(); if (nav2.Matches(expr)){ nav2.MoveToFirstChild(); Console.WriteLine(nav2.Value); } } } }
Here is the output of Listing 20.12:
LindaSue
It is sometimes necessary to transform a document from one form to another. For example, in a workflow system made up of different components that communicate with each other through XML, it is necessary to transform the output XML of one component into the input XML of another component. The XSLT language is a popular tool for transforming XML documents into other formats, not necessarily limited to XML. The next section looks at XSLT support in .NET.
20.4.2 XSLT
The .NET XML classes also provide support for XSLT transformation. The XslTransform class manages XSLT transformations in the new framework. XslTransform lives in the System.Xml.Xsl namespace, and it uses _XmlNavigator during the transformation process.
As with all XSLT processors, XslTransform accepts as input an XML document, an XSLT document, and some optional parameters. It can produce any type of text-based output (XML, HTML, and so on).
To perform a transformation using XslTransform, you first create an XslTransform object and then call Load to load it with the desired XSLT document. Then you create an XmlNavigator object and initialize it with the source XML document that you want to transform. Finally, you call Transform to begin the process.
Listings 20.13 and 20.14 show a sample XSLT transformation. The sample XSL file is stored in people.xsl and is shown in Listing 20.13.
Listing 20.13 The People.xsl File
<?xml version="1.0"?> <?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html"/> <xsl:template match="/"> <html> <xsl:apply-templates/> </html> </xsl:template> <xsl:template match="People"> <table> <xsl:apply-templates select="Person" /> </table> </xsl:template> <xsl:template match="Person"> <tr> <xsl:apply-templates select="Name" /> </tr> </xsl:template> <xsl:template match="Name"> <td> <xsl:value-of select="LastName" />,<xsl:value-of select="FirstName" /> </td> </xsl:template> </xsl:stylesheet>
Listing 20.14 takes the people.xml file as input and runs it through the people.xsl file, transforming it to HTML output.
Listing 20.14 Using XSLT Transformation (C#)
using System; using System.Xml; using System.Xml.XPath; using System.Xml.Xsl; public class Test { public static void Main(string[] args) { //Create a new XslTransform object. XslTransform trans = new XslTransform(); //Load the stylesheet. trans.Load("c:\\people.xslt"); XPathDocument doc = new XPathDocument("c:\\people.xml"); XmlWriter tw = new XmlTextWriter(Console.Out); trans.Transform(doc,null,tw); } }
The output of Listing 20.14 is as follows:
<html> <table> <tr> <td>Suits,Joe</td> </tr> <tr> <td>Sue,Linda</td> </tr> <tr> <td>Boards,Jeremy</td> </tr> <tr> <td>Page,Joan</td> </tr> </table> </html>
XSLT transformation of XML documents is a powerful mechanism and is often used to separate content from the view. By merely changing the XSL file, you can render a given XML document in several different formats.