The XmlDocument Class
The XmlDocument class provides support for the Document Object Model (DOM) levels 1 and 2, as defined by W3C. This class represents the entire XML document as an in-memory node tree, and it permits both navigation and editing of the document. The DOM implemented by the XmlDocument class is essentially identical to the DOM implemented by the MSXML Parser, as was covered in detail in Chapter 10. The properties and methods are the same, and rather than duplicating that information here I suggest that you turn to that chapter.
When do you use the XmlDocument class in preference to the XmlTextReader class? The criteria are similar to those for deciding between using the MSXML DOM and the Simple API for XML.
Use XmlTextReader when
Memory and processing resources are a consideration, particularly for large documents.
You are looking for specific pieces of information in the document. For example, in a library catalog, use XmlTextReader when you need to locate all works by a specific author.
You do not need to modify the document structure.
You want to only partially parse the document before handing it off to another application.
Use XmlDocument when
You need random access to all of document's contents.
You need to modify the document structure.
You need complex XPath filtering.
You need to perform XSLT transformations.
There are various ways to use the XmlDocument class. You can use it alone, applying the class methods and properties to "walk the tree" and make changes. You can also use the XmlDocument class in conjunction with the XPathNavigator class, which provides more sophisticated navigational and editing capabilities as well as XPath support. The following sections look at both. The first section presents a C# demonstration of using the XmlDocument class to modify the contents of an XML file. The second section explains how to use the XPathNavigator class.
Using the XmlDocument Class to Modify an XML Document
The first demonstration of the XmlDocument class shows how to use it to modify the contents of an XML document. In this case the task is to add a new <person> element to the XML file shown in Listing 13.4 and save the modified file under the name OutputFile.xml. The program, shown in Listing 13.5, is a C# console application, and the code is fully commented so you can figure out how it works.
Listing 13.4 InputFile.xml Is the File to Be Modified
<?xml version="1.0"?> <contacts> <person category="personal"> <name>John Adams</name> <phone>555-555-1212</phone> <email>john.adams@nowhere.net</email> </person> <person category="business"> <name>Mandy Pearson</name> <phone>555-444-3232</phone> <email>mandyp@overthere.org</email> </person> <person category="family"> <name>Jack Sprat</name> <phone>000-111-2222</phone> <email>jack001@earth.net</email> </person> </contacts>
Listing 13.5 C# Program to Modify the Contents of InputFile.xml
using System; using System.IO; using System.Xml; class Class1 { private const string m_InFileName = "InputFile.xml"; private const string m_OutFileName = "OutputFile.xml"; static void Main() { bool ok = true; XmlDocument xmlDoc = new XmlDocument(); try {´ //Load the input file. xmlDoc.Load( m_InFileName ); //Create a new "person" element. XmlElement elPerson = xmlDoc.CreateElement( "person" ); //Add the "category" attribute. elPerson.SetAttribute( "category", "family" ); //Create "name," "phone," and "email" elements. XmlElement elName = xmlDoc.CreateElement( "name", "Ann Winslow" ); XmlElement elPhone = xmlDoc.CreateElement( "phone", "000-000-0000" ); XmlElement elEmail = xmlDoc.CreateElement( "email", "anne123@there.net" ); //Add them as children of the "person" element. elPerson.AppendChild( elName ); elPerson.AppendChild( elPhone ); elPerson.AppendChild( elEmail ); //Get a reference to the document's root element. XmlElement elRoot = xmlDoc.DocumentElement; //Add the "person" element as a child of the root. elRoot.AppendChild( elPerson ); //Save the document. xmlDoc.Save( m_OutFileName ); } catch ( Exception e ) { ok = false; Console.WriteLine( "Exception: " + e.Message ); } finally { if (ok) Console.WriteLine( "Element added successfully." ); else Console.WriteLine( "An error occurred." ); } } }
Using XPathNavigator with XmlDocument
The XPathNavigator class is designed specifically to facilitate navigating through XML that is contained in an XmlDocument object. It provides a cursor model, meaning that the navigator almost always has a position within the document's node tree. Many of the actions you can take with the navigator are performed relative to the current position, such as "move to the next node." When an action is performed successfully, the cursor is left pointing at the location where the action occurred. When an action fails, the cursor remains at its original position. You can always use the MoveToRoot() method to move the cursor to the document's root node.
Much of the power of the XPathNavigator class comes from its support for XPath expressions. You can select all of the nodes that match an XPath expression, and then conveniently work with them. However, many of the uses of this class do not in fact involve XPath expressions and hence its name is a bit misleading. These are the steps required to work with the XPathNavigator class if you are not going to use XPath expressions:
Create an instance of the XmlDocument class.
Load the XML document into the XmlDocument object.
Call the XmlDocument object's CreateNavigator() method to create an instance of the XPathNavigator class and return a reference to it.
Use the XPathNavigator object's properties and methods to move around the document and access its content.
The following code fragment shows how the preceding steps would be done in C#:
XmlDocument xmlDoc = new XmlDocument; xmlDoc.Load( "original.xml" ); XPathNavigator nav = xmlDoc.CreateNavigator(); // Work with navigator here.
If you want to use XPath expressions, you call the XPathNavigator object's Select() method, which returns a type XPathNodeIterator that contains the nodes matching the XPath expression. This is explained later in this chapter.
When the XPathNavigator is first created, it is by default positioned on the document's root node. Even so, many programmers call the MoveToRoot() method to ensure that they know where they are starting. Then a call to MoveToFirstChild() moves to the first element in the file, typically the <?xml version="1.0"?> node. At this point, a typical approach is to call MoveToNext() repeatedly until you reach the document's root element (the <contacts> element in Listing 13.4). Then you can use the various methods to move around the document as needed. You'll see this in the first demonstration program later in this chapter.
Note that there is some potential confusion regarding the use of the term "root" because the root node as seen by XPathNavigator is not the same as the document's root element. The root node encompasses the entire XML document, and the root element is a child node of this root node.
The XPathNavigator class has a large number of properties and methods, and many of them are infrequently needed. Rather then presenting all of them here, I have limited coverage to those properties and methods that you most often need. (You can refer to the .NET online documentation for information on the others.) Table 13.7 lists these properties and methods of the XPathNavigator class. Following the tables, I present two sample programs that use the XPathNavigator class.
Table 13.7 Commonly Used Properties and Methods of the XPathNavigator Class
Property/Method |
Description |
GetAttribute (name, ns) |
Returns the value of the attribute with the specified name and namespace URI. Returns null if a matching attribute is not found. |
HasAttributes |
Returns True if the current node has attributes. Returns False if the current node has no attributes or is not an element node. |
HasChildren |
Returns True if the current node has child nodes. |
IsEmptyElement |
Returns True if the current node is an empty element (such as <element/>). |
LocalName |
Gets the name of the current node without its namespace prefix. |
Matches (XPathExpr) |
Returns True if the current node matches the specified XPath expression. The argument can be a string or a type XPathExpression. |
MoveTo() |
Moves to the first sibling of the current node. Returns True if there is a first sibling node or False if not or if the current node is an attribute node. |
MoveToAttribute (name, ns) |
Moves to the attribute with the matching local name and namespace URI. Returns True if a matching attribute is found or False if not. |
MoveToFirstChild() |
Moves to the first child of the current node. Returns True on success or False if there is no child node. |
MoveToID(id) |
Moves to the node that has a type ID attribute with the specified value. Returns True on success or False if there is no matching node. |
MoveToNext() |
Moves to the next sibling of the current node. Returns True on success or False if there are no more siblings or if the current node is an attribute node. |
MoveToNextAttribute() |
Moves to the next attribute node. Returns True on success or False if there are no more attribute nodes or if the current node is not an attribute node. |
MoveToParent() |
Moves to the current node's parent. Returns True on success or False if the current node has no parent (is the root node). |
MoveToPrevious() |
Moves to the previous sibling node. Returns True on success or False if there is no previous sibling or if the current node is an attribute node. |
MoveToRoot() |
Moves to the root node. This method is always successful and has no return value. |
Name |
Returns the name of the current node with namespace prefix (if any). |
NodeType |
Returns an XPathNodeType value identifying the type of the current node. See Table 13.8 for possible values. |
Select(match) |
Selects a node set that matches the specified XPath expression and returns a type XPathNodeIterator. The argument can be a string or a type XPathExpression. |
Value |
Returns the text value of the current nodefor example, the value of an attribute node or the text in an element node. |
Table 13.8 Members of the XPathNodeType Enumeration Returned by the XPathNavigator Class's NodeType Property
Constant |
Description |
All |
All node types |
Attribute |
Attribute node |
Comment |
Comment node |
Element |
Element node |
Namespace |
A namespace node (for example, xmlns="xxx") |
ProcessingInstruction |
A processing instruction (not including the XML declaration) |
Root |
Root node |
SignificantWhitespace |
A node that contains white space and has xml:space set to "preserve" |
Text |
A text node (the text content of an element or attribute) |
Whitespace |
A node that contains only white space characters |
Demonstrating XPathNavigator
This first demonstration shows how to use the XPathNavigator class to "walk" the tree of an XML document. The demonstration makes use of the XML data file presented later in the book in Listing 18.5. This file contains a database of books and is structured as shown in this fragment:
<books> <book category="reference"> <title>The Cambridge Biographical Encyclopedia</title> <author>David Crystal</author> <publisher>Cambridge University Press</publisher> </book> ... </books>
The objective of the demonstration is to let the user select a category of books, and then display a list of all matching books. It is created as a Web application. The user selects the category on an HTML page, as shown in Listing 13.6. This page presents a list of categories from which the user selected. The request is sent to the ASP.NET application in Listing 13.7. The code in this page uses an XPathNavigator object to move through the XML file. Specifically, the code locates each <book> element and checks its "category" attribute. If the value of this attribute matches the category requested by the user, the program walks through the <book> element's children (the title, author, and publisher elements), extracts their data, and outputs it in the form of an HTML table. The results of a search are shown in Figure 13.3.
Figure 13.3 The results of a book query displayed by Listing 13.7
Listing 13.6 The HTML Page That Lets the User Select a Book Category
<html> <head> <title>Book search</title> </head> <body> <h2>Find books by category.</h2> <hr/> <form method="GET" action="list1307.aspx"> <p>Select your category, then press Submit.</p> <p>Category: <select name="category" size="1"> <option value="biography">Biography</option> <option value="fiction">Fiction</option> <option value="reference">Reference</option> <select> </p> <p><input type="submit" value="Submit"/> </p><hr/> </form> </body> </html>
Listing 13.7 ASP.NET Script That Uses the XPathNavigator Class to Access XML Data
<%@ Import Namespace="System.Xml" %> <%@ Import Namespace="System.Xml.XPath" %> <script language="C#" runat="Server"> void Page_Load(object sender, EventArgs e) { try { XmlDocument xmlDoc = new XmlDocument(); xmlDoc.Load(Server.MapPath("list1805.xml")); XPathNavigator nav = xmlDoc.CreateNavigator(); //Get the query string submitted by the client. NameValueCollection coll = Request.QueryString; string category = coll.Get( "category" ); //Move to the document's root element. nav.MoveToRoot(); //Move to the first child. nav.MoveToFirstChild(); while (nav.LocalName != "books") nav.MoveToNext(); //At this point we are positioned at the root element. //Move to the first child (the first <book> element). nav.MoveToFirstChild(); //Start writing the HTML to the output. Response.Write( "<html><body>"); Response.Write("<h2>Books in the '" + category + "' category:</h2><hr/>"); //Write out the table headings. Response.Write(" <table cellpadding='4'>" ); Response.Write( "<thead><tr>" ); Response.Write( "<th>Title</th><th>Author</th>" ); Response.Write( "<th>Publisher</th></tr></thead>" ); Response.Write( "<tbody>" ); bool more = true; while (more) { //Is this book in the selected category? if (nav.GetAttribute( "category", "") == category) { //Move to the first child (<title>) and write its data. nav.MoveToFirstChild(); Response.Write( "<tr><td>" + nav.Value + "</td>"); //Move to next (<author>). nav.MoveToNext(); Response.Write( "<td>" + nav.Value + "</td>"); //Move to next (<publisher>). nav.MoveToNext(); Response.Write( "<td>" + nav.Value + "</td></tr>"); //Move back to the parent <book> node. nav.MoveToParent(); } //Move to the next <book> node, if any. more = nav.MoveToNext(); } //Finish the table. Response.Write( "</tbody></table><hr/></body></html>"); } catch(Exception ex) { Response.Write(ex.ToString()); } } </script>
Using the Select() Method and the XPathNodeIterator Class
The XPathNavigator class has the Select() method, which permits you to select a node set that matches an XPath expression. The method returns an object of type XPathNodeIterator that contains the matching nodes. If there are no matching nodes, the XPathNodeIterator object's Count property will be 0; otherwise, this property returns the number of nodes. For example, this code assumes that the variable selectExpr contains the XPath expression that you want to use:
XmlDocument xmlDoc = new XmlDocument(); xmlDoc.Load( "InputFile.xml" ); XPathNavigator nav = xmlDoc.CreateNavigator(); XPathNodeIterator xpi = nav.Select( selectExpr ); if ( xpi.Count != 0 ) { // At least one matching node was found. } else { // No matching nodes were found. }
When to Use XPathNodeIterator
ThereÕs not much that really requires the use of the XPathNodeIterator class, but it does make certain tasks more efficient. You can always locate the node(s) that you want by using the XPathNavigator classÕs methods to move around the document tree and examine nodes as you go. However, the ability to quickly select a subset of nodes based on an XPath expression can make this sort of brute force technique unnecessary.
Table 13.9 describes members of the XPathNodeIterator class. You will note that the Current property returns a reference to an XPathNavigator object that is positioned on the current node. However, you cannot use this XPathNavigator object to move away from the current node (unless you first clone it)you can use it only to get information about the current node.
Table 13.9 Members of the XPathNodeIterator Class
Member |
Description |
Count |
Returns the index of the last selected node or 0 if there are no nodes. |
Current |
Returns a type XPathNavigator positioned on the current node. |
CurrentPosition |
The 1-based index of the current node. |
MoveNext() |
Moves to the next selected node. Returns True on success or False if there are no more selected nodes. |
To demonstrate using the Select() method and the XPathNodeIterator class, I turn again to the XML file Inputfile.xml from Listing 13.4. The goal of this application is to list the names of all the people in the XML database. In other words, the application needs to go through the XML file, select all <name> nodes, and display their values. This could be done using the "brute force" method of going through all the nodes in the document, but the code is a lot simpler if you use the Select() method. This is a console application that opens the file and displays the names on the screen. Listing 13.8 presents the source code.
Listing 13.8: Using XPathNavigator and XPathNodeIterator to Access XML Data
using System; using System.IO; using System.Xml; using System.Xml.XPath; namespace XPathNavDemo { class SearchXML { static void Main(string[] args) { SearchXML ex = new SearchXML(); } public SearchXML() { try { XmlDocument xmlDoc = new XmlDocument(); xmlDoc.Load( "InputFile.xml" ); XPathNavigator nav = xmlDoc.CreateNavigator(); // Select all the <name> nodes. string select = "descendant::person/name"; XPathNodeIterator xpi = nav.Select(select); if ( xpi.Count != 0 ) { // At least one <name> node was found. // Move through them and display the values. Console.WriteLine("The following people are in this file:"); while (xpi.MoveNext()) Console.WriteLine(xpi.Current.Value); } else Console.WriteLine("No <name> elements found."); } catch ( System.Exception ex ) { Console.WriteLine("Exception: " + ex.Message ); } finally { Console.ReadLine(); } } } }