- XML Elements
- Generic Identifiers
- Some Rules for Naming Elements
- Storing the Data in XML
- Parsed Character Data
- Bypassing Parsing with CDATA
- Attributes
- When to Use Attributes
- Classifying Attributes: Attribute Types
- Attribute Rules
- Well-Formedness Rules
- Creating a Well-Formed XML Document
- The Basics of Validation
- How Do Applications Use XML?
- An Overview of XML Tools
- Roadmap
- Additional Resources
Well-Formedness Rules
XML documents must be well formed in order to be considered XML. The XML 1.0 Recommendation spells out some conditions that must be met for a document to be considered well formed. These conditions are called well-formedness constraints, and if the document fails to meet these constraints, it is not an XML document.
This can lead to some confusion when working with XML, as you may have a document which seems to be perfect XML, yet it will not load into your XML application. That is because if the well-formedness constraints are not met, XML parsers cannot properly load the document. This is very contrary to the behavior of most Web browsers, which are very forgiving of errors in HTML.
It would be impossible to enumerate each of the well-formedness constraints in the XML 1.0 Recommendation without delving into minutia that are not really very germane to creating XML documents. For example, if a document uses element names that are forbidden, such as <411>, then the document is not well formed. However, we've already discussed this rule in the context of naming your elements, so rehashing each of these details here would be tedious.
The important aspects of well-formedness can be boiled down into a few rules that should always be followed, and significantly lower your chances of creating a malformed XML document:
All element and attribute names must follow the conventions for XML naming, as outlined previously (that is, not starting with a digit, and so on).
Elements must be properly nested.
Every start tag must have an end tag, or take the form of the empty element.
All tags must properly match case.
A well-formed document must have one, and only one, root element that contains all the other elements in the XML document.
All entities must be properly referenced.
If you follow these rules, chances are your XML documents will be well formed.
Well-Formedness and Entities
An entity is just a way of using shorthand in XML. Entities can also be found in HTML. For example:
©
is an entity that represents the copyright © symbol.
The syntax for most entities is
&entityname;
Entities can be used to replace long strings, or to represent symbols that you cannot include legally in an XML document. For example, let's say that you wanted to include a less-than symbol:
<equation>2 is less-than 7</equation>
You could not legally say
<equation>2 < 7</equation>
This violates the well-formedness constraints because it includes the < symbol that signifies the beginning of a tag. Fortunately, entities provide a way to reference this without actually including the symbol: <. An entity exists for the greater-than symbol as well: >.
There are a number of entities that are predefined for XML, so using these entities in your document does not violate any rules for well-formedness:
&
This entity is used to represent the ampersand symbol &.
<
The less-than entity is used to represent the less-than sign <, which is also the beginning sign of any tag. Because it denotes the beginning of a tag, if you want to show a tag in text, or use the less-than symbol, you should use the < entity.
>
The greater-than entity is similar to that of the less-than entity. You would use it to represent the greater-than symbol > in the content portion of an element.
'
The apostrophe entity is used to represent an apostrophe ' or a single quotation mark.
"
This entity is used to represent a quotation mark:".
You should note that although these entities are found in HTML, some entities found in HTML such as © are not present in XML. Any other entities that are used in your document would need to be defined by you in a DTD or XML Schema in order for the document to comply with well-formedness.
There are actually two ways that you can define entities. You can use an entity declaration in an external DTD, or you can also declare entities in the internal DTD subset, self-contained within your document. We will discuss Document Type Definitions, both internal and external, in more detail in Chapter 4.
A well-formed document does not have to have a DTD associated with it to be well formed. As long as the document is structured correctly, it can be considered well formed. For many documents, there is no need for a DTD or Schema. By enforcing well-formedness, XML enables you to create flexible documents that might serve your needs without adding a level of complexity with a Document Type Definition (DTD) or XML Schema.