You Can Parse for Validity
You have already used a parser in the exercises you have completed to this point. The Microsoft Internet Explorer browser includes the MSXML parser in distributions of the product at version 5.0 or above. You have, therefore, checked your documents to ensure well-formedness utilizing the MSXML parser. However, you have really only used the parser to prove that your documents are well-formed. Parsers can also be used to validate your XML against a schema, such as, DTD, XDR, XSD, or another schema language document, provided the parser is programmed with the rules for those languages. Figure 3.2 depicts this. In the figure, yes refers to documents that are considered valid, whereas no indicates those that are not.
Figure 3.2 An XML document with an associated schema is passed to a validating XML parser.
TIP
A valid XML document is also well-formed; however, there is no guarantee that a well-formed XML instance is also valid.
A validating parser will first check an instance to ensure that it conforms to basic XML syntax rules, or is well-formed. Then it will confirm that all of the document content rule constraints characterized by the schema that you associated with it (or schemata, because you might associate more than one with your document) are completely satisfied, thereby guaranteeing validity of your document. In other words, validating parsers confirm syntax and data structure.
A parser is just a software application that interprets text a single character at a time, unless it is instructed by a programmer or schema author to skip over particular sequences of data. XML provides you a means to use these parser programs to comprehend the semantic intent of the markup being applied to the text.
NOTE
The W3C refers to a parser as an XML processor in the official specification for XML 1.0: "A software module called an XML processor is used to read XML documents and provide access to their content and structure." (Please see the XML Technical Recommendation at http://www.w3.org/TR/1998/REC-xml-19980210 for further details.)
After the parser has interpreted the XML document and checked for well-formedness and validity, it then exposes the data in the form of a document tree structure to other applications for further processing. You have explored the use of IE5 in this regard, which uses the tree structure returned by the MSXML parser to display a structured document in the browser window showing that everything is as it should be. Recall that IE uses an XSLT process to assign a default stylesheet to the XML document at the time that you view it in the browser window. That is why you get things like the colored markup display in IE. Alternately, IE catches an error thrown by the parser and displays that, along with a snippet of code at the point in your document at which the error occurred without listing, or transforming for the purpose of formatting, the entire XML instance document.
This prevailing concept of documents built as tree structures is one of the reasons that you learned to represent a document in this fashion on Day 2 (recall Figure 2.6). You will revisit the concept of a document tree many more times throughout the course of the next few weeks. On Day 12 you will learn to use the Document Object Model (DOM) API to manipulate the individual nodes of your XML trees.