- Other Uses for XML Parsing
- Generating an XML File
- Parsing the XML File
- Summary
Parsing the XML File
The XML output of the sample application isn't much good if you can't read it later to extract all the information it contains. Some developers try to parse the XML file as an entity, but it's actually easier to perform the process in two steps. First, open the file and obtain one node. Second, process just that node without regard to the remaining nodes. Listing 3 shows the first part of the process: obtaining a single node.
Listing 3 Overview of XML File Parsing
System::Void btnParse_Click(System::Object * sender, System::EventArgs * e) { XmlTextReader* DataRead; // Loads the data for reading. StringBuilder* Output; // Parsed output information. // Open the file to read the header. DataRead = new XmlTextReader(txtFilename->Text); // Initialize the output. Output = new StringBuilder(); // Keep reading header nodes until finished. while (DataRead->Read()) { // Get the information from each node. Output->Append(ProcessEntry(DataRead)); Output->Append("\r\n"); } // Display the result. MessageBox::Show(Output->ToString(), "XML Parsing Results", MessageBoxButtons::OK, MessageBoxIcon::Information); // Close the document. DataRead->Close(); }
The process begins when the code opens the file by creating a new XmlTextReader. Opening the XmlTextReader places the file pointer at the very beginning of the file. You must perform some other task, such as using the Read() method shown to obtain the first node. If you try to perform processing without using a Read() or other acceptable method, the application fails with a null reference error.
The example processes one node at a time by calling the ProcessEntry() method that appears in Listing 4. It appends the data from the node to the StringBuilder object, Output. The Read() method returns true until the test program processes all nodes in the file. At that point, the processing loop ends and the example calls the Close() method to close the file. Make sure that you perform this essential step.
The ProcessEntry() method performs the second part of the process: obtaining data from a single node. As shown in Listing 4, this method retrieves common information you'll need to process the data.
Listing 4 XML File Parsing Details
StringBuilder* ProcessEntry(XmlTextReader* Reader) { StringBuilder* DataOut; // Contains the output. // Initialize the output. DataOut = new StringBuilder(); // Determine the node type. switch (Reader->NodeType) { case XmlNodeType::Attribute: DataOut->Append("Attribute"); break; // ... Bunches of Other Types ... case XmlNodeType::XmlDeclaration: DataOut->Append("XML Declaration"); break; default: DataOut->Append("Type Unknown"); } // Add the element name. DataOut->Append("\t"); DataOut->Append(Reader->Name); // Add the element value. if (Reader->HasValue) { DataOut->Append("\t"); DataOut->Append(Reader->Value); } // Add the attributes. if (Reader->HasAttributes) for (Int32 Counter = 0; Counter < Reader->AttributeCount; Counter++) { Reader->MoveToAttribute(Counter); DataOut->Append("\r\n\t"); DataOut->Append(Counter); DataOut->Append("\t"); DataOut->Append(Reader->Name); DataOut->Append("\t"); DataOut->Append(Reader->Value); } // Return the results. return DataOut; }
The ProcessEntry() method begins by determining the node type using the NodeType property. The example outputs a string that identifies the node type, but a production application often uses the node type to define the processing action or determine whether it needs to perform processing at all. In this case, the example creates the DataOut StringBuilder object to hold the node type information.
Every node has a name, so the next text is to add the Name property value to DataOut. The name value is either a value that you assign, such as DataString1 in Figure 1, or a default value assigned by the .NET Framework. For example, a comment doesn't have a precise name, so the .NET Framework gives it the name Comment.
Most nodes also have values. However, you need to verify that the current node has a value by using the HasValue property. If the example detects a value for the current node, it adds the Value property to DataOut.
Some nodes also have attributes. However, you must check for attributes before you begin checking them using the HasAttributes property. The AttributeCount property tells how many attributes a node has (it can have more than one). This is a zero-based value. The example uses the MoveToAttribute() method to load the current attribute into Reader. The code can then use the Name and Value properties as normal.
It's important to note that you don't have to perform detailed attribute processing. If all you need is the attribute value, you can use the Reader->GetAttribute() method, which method returns just the value of the numbered attribute and can save you a few processing steps (not to mention time). However, you'll generally need to know both the name and the value of the attribute, so the processing technique in Listing 4 is more common than other techniques you might use. Figure 2 shows typical output for this program.
Figure 2 Parsing an XML file means retrieving the individual values.
Notice that Figure 2 brings out a few things you might not have considered. For example, the bits of information in the XML header, such as the version, appear as attributes to the .NET Framework. As an interesting test, the example code also includes a copy of the manifest file in Listing 1. You can enter the path and name of this file in the Filename field. The application can parse this file just as easily as it can the one it generated. In fact, the application works with any well formed XML file.