- Dissecting an XML Document Type Definition
- Using Document Type Definition Notation and Syntax
- Understanding Literals
- Declaring a NOTATION
- Creating ATTLIST Declarations
- Using Special XML Datatype Constructions
- Understanding the Difference Between Well-Formed and Valid XML
- Learning How to Use External DTDs and DTD Fragments
- Altering an XML DTD
- Getting Down to Cases
Understanding the Difference Between Well-Formed and Valid XML
The difference between well-formed and valid XML is simple: Valid XML has a DTD associated with it and has been verified against all the rules contained in the DTD in addition to being well-formed. Merely well-formed XML, on the other hand, is not necessarily valid, although it may be.
Well-formed XML follows a these rules:
Tags must nest properly. Every beginning and ending tag pair must fully contain any tag pair that begins inside it. In other words, no start-tag, end-tag, empty-element tag, element, comment, processing instruction, character reference, or entity reference can begin in one entity and end in another.
In the internal DTD subset, parameter entity references can occur only at the top level, where markup declarations can occur, and not within markup declarations. Parameter entities are not restricted in the external DTD subset.
The name in an element's end-tag must match the element type in the start-tag.
No attribute name may appear more than once in the same start-tag or empty-element.
Attribute values cannot contain direct or indirect entity references to external entities.
The replacement text of any entity referred to directly or indirectly in an attribute value (other than <) must not contain a <.
Characters referred to using character references must be legal characters.
The declaration of a parameter entity must precede any reference to it.
The declaration of a general entity must precede any reference to it that appears in a default value in an attribute-list declaration.
Parameter-entity references may only appear in the DTD and have restrictions in the internal DTD subset.
The only practical way to tell whether you have a valid XML document is to use an automated tool to read in the document itself, including its DTD, and let the tool parse it. Hand-checking can help but is notoriously error prone, even when you're careful. It typically takes two, three, or even more attempts to get a new DTD and document type to load for the first time. Sometimes it takes many more attempts, so be patient and persevere because you will succeed.