XML in .NET: The DOM Interface
- The Common Node
- Navigating the DOM
- Key Extended Nodes
- Building the DOM
- Conclusion
The .NET Framework brings about a new paradigm of rapid-application development and cross-platform network integration. XML is a key enabling technology in this endeavor, and the framework uses it to full effect: configuration management, object serialization, remoting, Web services, database access, and file storage. The framework provides several APIs to work with the XML data: an in-memory DOM compliant interface, a streaming interface, and the XML functionality built into the DataSet. This article concentrates on the Document Object Model (DOM) API for XML information processing that most programmers have become accustomed to using with XML parsers such as MXSML. Later articles will discuss the streaming and DataSet XML functionality provided by the framework.
The DOM models an XML document as a tree of nodes representing the information items of a document. A parser takes the serialized representation of the XML and creates an in-memory graph of nodes, as shown in the process illustrated in Figure 1.
Figure 1 XML document parsed into DOM nodes.
The Common Node
Each node in the document has a uniform interface to represent common functionality. The XmlNode takes on this responsibility in the .NET Framework by providing value, naming, navigation, and lifetime management of nodes in a document. The following list enumerates the base set of XmlNode properties for name and value acquisition.
Name
Value
NodeType
LocalName
Prefix
NamespaceURI
InnerXml
OuterXml
The OuterXml and InnerXml properties are Microsoft extensions that provide you with the capability to retrieve or set the XML content of a node or its child nodes, respectively, in a serialized form via a string parameter. They enable you to display node values easily or build complex document content quickly outside the node-construction methods discussed later in the article.
The NodeType property enables you to cast to the specific .NET class that a node represents. Table 1 shows how the different node types in the framework interact with the Name and Value properties.
Table 1 XmlNode Node Types
Node |
Name Property |
Value Property |
Specific .NET Class |
Attribute |
Attribute name |
Text value of attribute |
XmlAttribute |
CDATA |
#cdata-section |
Text of CDATA section |
XmlCDataSection |
Comment |
#comment |
Text of comment |
XmlComment |
Document |
#document |
null |
XmlDocument |
DocumentFragment |
#document-fragment |
null |
XmlDocumentFragment |
DocumentType |
Document type name |
null |
XmlDocumentType |
Element |
Tag name |
null |
XmlElement |
Entity |
Entity name |
null |
XmlEntity |
EntityReference |
Name of entity referenced |
null |
XmlEntityReference |
Notation |
Notation name |
null |
XmlNotation |
ProcessingInstruction |
Name of PI |
Entire content, excluding the target |
XmlProcessingInstruction |
Text |
#text |
Content of text node |
XmlText |
Whitespace |
#whitespace |
Whitespace text |
XmlWhitespace |
Significant Whitespace |
#significant-whitespace |
Whitespace text |
XmlSignificantWhitespace |
XmlDeclaration |
#xml-declaration |
Content between the ? chars in the declaration |
XmlDeclaration |
The xml declaration and whitespace node types are extensions by Microsoft to the DOM node types to provide whitespace handling for interested parties. The XmlDocument class generates them when its PreserveWhitespace property set to true before Load execution. The XmlWhitespace node represents the whitespace that is outside of element content in an element-only schema. The XmlSignificantWhitespace node shows up in documents having mixed content schema with whitespace interspersed between markup tags.