- XML Elements
- Generic Identifiers
- Some Rules for Naming Elements
- Storing the Data in XML
- Parsed Character Data
- Bypassing Parsing with CDATA
- Attributes
- When to Use Attributes
- Classifying Attributes: Attribute Types
- Attribute Rules
- Well-Formedness Rules
- Creating a Well-Formed XML Document
- The Basics of Validation
- How Do Applications Use XML?
- An Overview of XML Tools
- Roadmap
- Additional Resources
The Basics of Validation
There is another concept in XML that is just as important, if not more so, than well-formedness: validation. The idea behind validation is to create a document with defined structure and rules for how the content is to be organized. Then, by checking the document against the set of rules, the document can be declared valid or an error can be generated, indicating where the document is incorrectly formatted or structured.
The document that establishes the set of rules is called a schema, with a lowercase s. The terminology here can become somewhat confusing, because a schema in the generic sense is just a set of rules that define the structure. However, with XML, there are two common types of schemas that are used to support validation: Document Type Definitions (DTDs) and XML Schemas.
Within both DTDs and XML Schemas, you can establish rules for what elements and attributes may be used in your XML documents, as well as define other resources, such as declaring any entities to be used in your documents.
After the schema (in the form of a DTD or XML Schema) is written, the schema is then linked to your XML document. In the case of a DTD, this is accomplished with a DOCTYPE declaration in the document. When the document is read by a parser that supports validation, or a validating parser, the document is checked against the rules contained in the DTD. If the document fails to comply with the rules, then an error is generated. If the document complies with the rules in the DTD, then it is valid. A similar mechanism is employed to link an XML Schema with a document, and the result of validation is the same: A document that meets the validity constraints is valid. The specifics of DTDs are discussed at length in Chapter 4 and the specifics of XML Schemas are discussed in Chapter 5.
Validating an XML document provides many benefits. Validation can provide a mechanism for enforcing data integrity. It can be a method for expediting searching or indexing. It can also help manage large documents or collaborative documents that might be broken into chunks for editing purposes.
All of these issues, and many more, make validation one of the more powerful tools of XML.
Document Type Definitions: A Glimpse
One common mechanism for validating XML documents is the Document Type Definition. An XML document is valid if it has an associated document type declaration and if the document complies with the constraints expressed in it.
The document type declaration is the statement in your XML file that points to the location of the DTD. For example:
<?xml version="1.0" ?> <!DOCTYPE document SYSTEM "example.dtd"> <document></document>
Here we have a document called document, which is linked by the document type declaration to the example.dtd file. This means that to be valid, the document would need to match all the rules established in that DTD.
Likewise, the document type declaration can also include the rules itself, rather than pointing to an external DTD:
<?xml version="1.0" ?> <!DOCTYPE document [ <!ENTITY legal "This document is confidential."> <!-- More rules would be included here --> ]> <document> &legal; </document>
In this case, you would include the same rules in this form as you would have in the external DTD. This can be very useful for keeping your files linked to the rules, or for including a few simple entity declarations. There are advantages to including your declarations in the internal DOCTYPE, or in pointing to an external DTD. We will discuss those issues more in Chapter 4.
XML Schemas: A Glimpse
Document Type Definitions are actually a holdover from SGML, and as such, there have been many critics of DTDs with respect to XML. The first problem with DTDs is that they use their own special syntax, which is not very intuitive to many authors. The second problem is that DTDs themselves are not well-formed XML. Finally, DTDs do not provide a mechanism for defining complex datatypes, which limits some of the potential of XML.
In response to the limitations of DTDs, the W3C has developed a schema mechanism that is specific to XML: XML Schemas. XML Schemas use a (somewhat) more intuitive syntax, and are actually XML documents themselves. This makes it easier for XML developers to integrate XML Schema support into applications.
Additionally, XML Schemas provide a means for defining datatypes for elements and attributes. Datatypes allow you to restrict the content of elements and attributes to specific types of data, such as a digit, a date, or a string. This is a very powerful new aspect of schemas that was not possible with DTDs. We will discuss XML Schemas and datatypes at length in Chapter 5.
XML Schemas are also external files, which are linked to XML documents through a couple of special attributes:
<?xml version="1.0" ?> <document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="example.xsd"> </document>
The first attribute is the xmlns:xsi, which defines the namespace for the xsi:noNamespaceSchemaLocation attribute, which is actually used to point to the location of the schema.
XML Schemas can also be included within an XML document by making use of namespaces (which are discussed in detail in Chapter 6). Because XML Schemas are XML, they can be included directly in the document:
<?xml version="1.0" ?> <document xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="title" type="xs:string"/> <!-- More schema rules defined here --> </xs:schema> <title>My Document</title> </document>
We will discuss the mechanisms for writing and including XML Schemas in your XML documents in greater detail in Chapter 5; however, you should keep XML Schemas in mind for validation. XML Schemas are easier to author and offer more power and flexibility than DTDs. However, because DTDs are essentially a part of XML 1.0, in which Schemas are a separate Recommendation, there may be more application support for DTDs until Schema usage becomes more widespread.
Another point to keep in mind: Validation is not necessary. Well-formed XML can be used in many applications without any problems whatsoever. However, validation can be a valuable tool, and it is important to consider this idea of validation. If you are using XML as a data format then validation can really be an important asset. By using a DTD or Schema for validation, you can enforce your markup language's rules for others authoring XML instance documents.
Validation can also be used to make sure that users do not corrupt the data being stored in your XML files. This is perhaps the most important reason for validation: It enables you to enforce some degree of data integrity.