- XML Elements
- Generic Identifiers
- Some Rules for Naming Elements
- Storing the Data in XML
- Parsed Character Data
- Bypassing Parsing with CDATA
- Attributes
- When to Use Attributes
- Classifying Attributes: Attribute Types
- Attribute Rules
- Well-Formedness Rules
- Creating a Well-Formed XML Document
- The Basics of Validation
- How Do Applications Use XML?
- An Overview of XML Tools
- Roadmap
- Additional Resources
Some Rules for Naming Elements
XML is designed to allow you to be descriptive with element names, in keeping with the idea that XML should be readable. Allowing you to use descriptive names also contributes to the structure of the XML document, enabling you to use element names that are descriptive of their content and relationships to other elements. This is why we use names such as phone and cellular for elements as opposed to drexel. There is nothing in the XML Recommendation to stop us from using <drexel> as an element for cellular phone numbers. But it wouldn't be the most descriptive name we could use, and wouldn't help anyone derive meaning from our document.
At the same time, you don't want to get carried away, and overuse the capability to name an element. Calling the phone element mobile-phone-number-for-employee would be a bit verbose. Imagine trying to read a document with
<mobile-phone-number-for-employee>328-233-1231</mobile-phone-number-for-employee>
That would be quite a chore. Although brevity is not an essential aspect of XML, common sense does dictate that element names should be as concise as possible without detracting from descriptiveness.
Element Naming Conventions
The XML Recommendation defines some rules that must be followed to produce a valid XML name. These rules for naming do not apply solely to element names, but any XML component that requires a valid XML name.
The XML Recommendation says that a name must begin with a letter or one of a few punctuation characters, followed by letters, digits, hyphens, underscores, or colons.
This means that you are therefore limited in what characters you can use in an element name in XML. In fact, you can include only the following characters other than letters in a name:
.
The period may be used in an element namefor example, <first.name> is a valid element type.
A hyphen may also be used in element names. You can use it to separate or hyphenate, as in <first-name>.
_
Underscores are commonly used in variable names in many programming languages and they can be useful in XML as well. Because you can't have spaces in element names, the underscore is commonly used in place of a spacefor example, <first_name>.
:
The colon is a valid character to be used in XML names; however, it is also a special character that is reserved for use with XML Namespaces. Therefore, you should refrain from using the colon in your names (unless you are using a Namespace). Namespaces are discussed in greater detail in Chapter 6, "Avoiding XML Confusion with XML Namespaces."
There are a number of other characters that you might be tempted to use in names, such as the % sign or &; however, these characters cannot be used in XML names. Also, names in XML may not contain spaces.
NOTE
Keep in mind that XML is designed to be an internationally compatible technology. So, just because a character isn't a letter in the English alphabet doesn't mean that you can't use it. If the character is a valid letter, such as an á, an é, an ö, or another diacritical mark, then it is valid in XML. However, you really want to avoid special symbols such as the dollar sign or pound sign.
Additionally, XML names must begin with letters. XML names may contain digits, however they may not begin with them. So, any of the following names would be valid XML names:
<contact4sales>Digits are acceptable within an element name.
<home-phone>Hyphens are fine after letters as well.
<señor>Accents and diacritics are acceptable.
The following are not valid XML names:
<2do>Names must not begin with a digit.
<-name>Names cannot start with special characters such as a hyphen or period.
<x+2=4>Names cannot contain special characters other than the period, hyphen, underscore, and colon.
In addition to these naming rules, XML names cannot start with XML, xml, XmL, xMl, or any variation of the letter X, followed by the letter M, followed by the letter L. The xml designation is reserved for special features of XML that might be implemented in future versions. The only people who can create names using xml are the W3C.
CAUTION
XML is case sensitive; however there is no established convention for using uppercase, lowercase, or mixed-case. You may encounter documents that contain names that are all uppercase, which increases readability by clearly delineating between markup and content. Other documents may use mixed case to correspond to existing data, such as programming language conventions. You should always make sure to check the XML vocabulary guidelines for the application you are using. For example, XHTML makes use of all lowercase names.