- XML Elements
- Generic Identifiers
- Some Rules for Naming Elements
- Storing the Data in XML
- Parsed Character Data
- Bypassing Parsing with CDATA
- Attributes
- When to Use Attributes
- Classifying Attributes: Attribute Types
- Attribute Rules
- Well-Formedness Rules
- Creating a Well-Formed XML Document
- The Basics of Validation
- How Do Applications Use XML?
- An Overview of XML Tools
- Roadmap
- Additional Resources
How Do Applications Use XML?
XML files by themselves are just as Word documents are by themselveswithout an application in which to open and manipulate the contents of the file, there is not much point.
Using the data in an XML document requires applications that are capable of handling XML, such as browsers for viewing and displaying XML, and data-processing applications that can read XML files as well. The software component responsible for reading the XML document and building a representation of the document that can be accessed by other parts of an application is called an XML parser.
For example, consider a very simple XML document:
<?xml version="1.0" ?> <contact> <name>Jane Doe</name> <address>123 Fake Street</address> <phone>312-555-1212</phone> </contact>
One of the benefits of XML is that the documents are, for the most part, human readable. When we read the preceding example, we know that the name element represents a person's name, the address element represents an address, and so on. However, to a piece of software, these elements have no semantic meaning; software cannot reason or surmise the content of elements from the element name, barring some significant advances in artificial intelligence.
So, to actually use the information in a software application, you need to create code that reads each character until it encounters a less-than symbol, <. That signals that a tag is about to start. Then, each character that follows can be read as the name of the element, until a greater-than symbol > is encountered. After a > is hit, that signifies the end of the start tag to the program.
From there, each character that follows is part of the element's contentthat is, until it hits a less-than symbol, followed by a slash, </, which signifies the end tag. The whole process also needs to repeat all the way through the file, "parsing" the file into tag pairs and their content, and keeping track of the relationships of the tag pairs. Oh yeah, it also has to deal with entities, attributes, comments, and so on. Now, you need to do this type of parsing each time you are reading in an XML file. This is the job of the XML parser.
An XML parser is an XML processor that reads in the file, and does the job we've just described, and, in fact, more. XML parsers also assist in finding errors in the XML file, and helping build data structures for storing the XML information in your applications.
Now, even if you are not a programmer, knowing that your XML documents are really written for the parser will help you understand why some of the well-formedness constraints exist. Also, you may be developing custom applications for XML where you will need to make decisions about what parsers to use.
Non-Validating Parsers
The first category of parsers is known as non-validating parsers. These are parsers that deal only with well-formed XML files, and they don't do anything with Document Type Definitions or XML Schemas. They do, however, work to enforce well-formedness, and help you add the ability to process XML files to your applications. The advantage of using non-validating parsers in your applications is that you gain XML compatibility, but you don't take on the overhead of validation. This tends to lead to lightweight parsers that process XML files very quickly, but at the expense of validation.
An example of a non-validating parser:
XP and expatFrom James Clark. XP is an XML parser written in Java, whereas expat is a non-validating parser written in C. See http://www.jclark.com/xml/expat.html.
Non-validating parsers will report errors in your files as well; however, keep in mind that even non-validating parsers are not required by the XML 1.0 Recommendation to read nonwell-formed XML documents. Also, if you need your data to be validated against a DTD or XML Schema, you need to turn to a validating parser.
Validating Parsers
A validating parser does all the things that a non-validating parser does. A validating parser reads in XML files, and checks them for errors of well-formedness. A validating parser can also help you build appropriate data structures. But a validating parser goes a step beyond.
In addition to reading the XML file and parsing it, a validating parser also reads in the DTD or Schema that is associated with your XML file. This allows the parser to check and make sure that the XML file conforms to the rules established in the DTD/Schema. Many parsers will also report errors in the DTD or Schema itself, which can be an incredible benefit, especially if you aren't feeling all that comfortable with DTD/Schema authoring.
Some examples of validating XML Parsers include
MSXMLFrom Microsoft, this is a C++ validating parser. See http://www.microsoft.com/xml.
XercesFrom the Apache XML Project, this is a Java-based, open-source validating XML parser. See http://xml.apache.org.
In addition to these parsers, there are also parsers from Sun, Oracle, and a number of other vendors. Each one is designed with certain performance characteristics, and it pays to shop around when looking for a parser. Keep in mind also that at the time of this writing, XML Schemas are a relatively new technology, and therefore they are not supported by a large number of parsers. This will undoubtedly change as the technology matures.
For many authors, this is all you will ever need to know about XML parsers and their use. For developers who need to learn more, specifics of parser implementations are covered in Chapter 16, "Working with XML and Java," and in Chapter 17, "Working with XML and .NET."