XML Building Blocks: Elements and Attributes
- XML Elements
- Generic Identifiers
- Some Rules for Naming Elements
- Storing the Data in XML
- Parsed Character Data
- Bypassing Parsing with CDATA
- Attributes
- When to Use Attributes
- Classifying Attributes: Attribute Types
- Attribute Rules
- Well-Formedness Rules
- Creating a Well-Formed XML Document
- The Basics of Validation
- How Do Applications Use XML?
- An Overview of XML Tools
- Roadmap
- Additional Resources
XML Elements
Much like HTML, XML is a markup language with documents composed of tags that "mark up" the data in a document. A typical XML document will contain a large number of these tags, start and end tags, with data contained within the tags. For example, if we were looking at the representation of a name in XML, we might have
<name>John Doe</name>
Here we have two tagsthe start tag <name> and the end tag </name>. Tags are a very important part of XML. They are what you use to mark the beginning and ending of elements in your XML documents. The two tags, taken together along with the content between them, constitute an XML element.
We would actually refer to the element by the element type, which is synonymous with the name used in the start/end tag pair. In the previous example, we have a name element, the content of which happens to be the name John Doe.
Elements are referred to by their names, or element types. However, the actual element instance is both tags and the element's content nested between the tags. Elements can have text content, which is called Parsed Character Data, or PCDATA, or they can have other elements as their content. For example, we might alter the name element to contain more information:
<name> <first>John</first> <last>Doe</last> </name>
Now we have three elementsa name element, which has as its content the first element and the last element. The first and last elements contain PCDATA, which represents the actual name of the person being stored in the name element.
Elements in XML must be composed of both start and end tags (with one exception for empty elements, which we will discuss later). This is one way in which XML differs significantly from HTML. For example, in HTML, there are a number of tags that can be used without end tags, such as <P>, <HR>, or <BR>.
With XML, each start tag contains the name of the element type, and each end tag contains the name of the element type as well, preceded by a / to denote that it is an end tag. The start and end tags must match exactlyfor example, the following tags do not match:
<name>John Doe</NAME>
Unlike HTML, XML is case sensitive, so start and end tags must match in case as well. You might be surprised at how strict XML seems as compared to HTML; however, this does help keep your documents consistent and readable.
NOTE
Current versions of HTML do allow some tags without end tags, and HTML is also not case sensitive. However, in an effort to promote compatibility and extensibility, the W3C is in the process of rewriting the HTML Recommendation using XML, and the result is XHTML. XHTML requires all tags to be properly closed, and introduces case sensitivity to HTML. There are some other differences as well; the specifics of XHTML are covered later in Chapter 21, "The Future of the Web: XHTML."
Shorthand for Empty Elements
There are times when you might have an element in your document that does not contain any data. For example, in a contact document, you might have an element for cellular phone numbers:
<cellular>312-555-1212</cellular>
This is fine, assuming that your contact has a cellular phone. However, if they do not, then you might have a document with some empty <cellular> elements:
<cellular></cellular>
These empty elements can be written as shown, with a start and end tag; however, there is also a shorthand which can be used for empty elements:
<cellular/>
By including the / character at the end of the tag, an XML processor will know that the element is empty. Use of the empty element form can reduce clutter in your documents, and also save time when authoring.