Using XML
In This Chapter
- Advantages of XML
- XML Document Structure and Syntax
- Accessing XML Data
- Creating XSD Schemas
- Class Reference
Here's a problem you've probably faced before. A customer or colleague comes to you asking for help working with an application that was written five years ago. Nobody who originally worked on the application still works for the company; the original developer died in a bizarre gardening accident some years back. The customer wants you to write a Web-based reporting system to handle the data emitted by this dinosaur application.
You now have the unenviable task of figuring out how this thing works; parsing the data it emits; and arranging that data in some recognizable formata report.
Let's assume that the developer of the original application attempted to make it easy on you by expressing the data in some standardized format. A common format is one in which elements within rows of data are separated from each other by a designated character, like a comma or a tab. This is known as a delimited format. The following listing demonstrates a comma-delimited document:
Jones,Machine Gun,401.32,New York Janson,Hand Grenade,79.95,Tuscaloosa Newton,Artillery Cannon,72.43,Paducah
But there are a few problems with the delimited format. First of all, what happens if the data itself contains a comma or a tab? In this case, you're forced to use a more complicated delimiter, typically a comma with data enclosed in quotation marks. The fact that different documents can use different delimiters is a problem in itself, though. There's no such thing as a single universal parse algorithm for delimited documents.
To make it even more difficult, different operating systems have different ideas about what constitutes the end of a line. Some systems (like Windows) terminate a line with a carriage return and a line feed (ASCII 13 and 10, respectively), while others (such as Unix) just use a line feed.
Another problem: What is this data? Some of it, like the customer's name and the item, is obvious. But what does the number "401.32" represent? Ideally we want a document that is self-describingone that tells us at a glance what all the data represents (or at least gives us a hint).
A third big problem with delimited documents: How can you represent related data? For example, it might be nice to be able to see all the information about customers and orders in the same document. You can do this with a delimited document, but it can be awkward. And if you've written a parser that expects four fields and you suddenly bring in six more related fields between the customer name and the product name, you've broken your parser.
Internet technology mavens realized that this scenario is frighteningly common in the world of software development, particularly in Internet development. XML was designed to replace delimited data (as well as other data formats) with something standard, easy to use and understand, and powerful.
Advantages of XML
In a net application, interoperability between various operating systems is crucial; the transfer of data from point A to point B in a standard, understandable way is what it's all about. For tasks that involve parsing data, then, using XML means spending less time worrying about the details of the parser itself and more time working on the application.
Here are some specific advantages of XML over other data formats:
Documents are easily readable and self-describingLike HTML, an XML document contains tags that indicate what each type of data is. With good document design, it should be reasonably simple for a person to look at an XML document and say, "this contains customers, orders and prices."
XML is interoperableThere's nothing about XML that ties it to any particular operating system or underlying technology. You don't have to ask anyone's permission or pay anyone money to use XML. If the computer you're working on has a text editor, you can use it to create an XML document. Several types of XML parsers exist for virtually every operating system in use today (even really weird ones).
XML Documents can be hierarchicalIt's easy to add related data to a node in an XML document without making the document unwieldy.
You don't have to write the parserThere are several types of object-based parser components available for XML. XML parsers work the same way on virtually every platform. The .NET platform contains support for the Internet-standard XML Document Object Model (DOM), but Microsoft has also thrown in a few XML parsing widgets that are easier to use and perform better than the XML DOM; we'll cover these later in this chapter.
Changes to your document won't break the parserAssuming that the XML you write is syntactically correct, you can add elements to your data structures without breaking backward compatibility with earlier versions of your application.
Is XML the universal panacea to every problem faced by software developers? XML won't wash your car or take out the garbage for you, but for many tasks that involve data, it's a good choice.
At the same time, Visual Studio.NET hides much of the implementation detail from you. Relational data in the form of XML is abstracted in the form of a DataSet object. XML schemas (a document that defines data types and relationships in XML) can be created visually, without writing code. In fact, VS.NET can generate XML schemas for you automatically by inspecting an existing database structure.
So why learn XML? In the .NET framework, XML is very important. It serves as the foundation for many of the .NET technologies. Database access is XML-based in ADO.NET. Remote interoperability, known as Web Services or SOAP, is also XML-based. It is true that many of the implementation details of XML are hidden inside objects or inside the Visual Studio.NET development environment. But for tasks like debugging, interoperability with other platforms, performance analysis and your own peace of mind, it still makes sense for a .NET developer to have a handle on what XML is, how it works and how it is implemented in the .NET framework.