- XML Elements
- Generic Identifiers
- Some Rules for Naming Elements
- Storing the Data in XML
- Parsed Character Data
- Bypassing Parsing with CDATA
- Attributes
- When to Use Attributes
- Classifying Attributes: Attribute Types
- Attribute Rules
- Well-Formedness Rules
- Creating a Well-Formed XML Document
- The Basics of Validation
- How Do Applications Use XML?
- An Overview of XML Tools
- Roadmap
- Additional Resources
Creating a Well-Formed XML Document
Let's take a look at the creation of a well-formed XML document. In this example, we will create a simple document for keeping track of appointments.
First, we could create this document using only elements. Here are the elements we're going to use in the document:
appointmentsThis element will serve as the root element of the document, containing all the other elements in the document.
eventThe event element will contain the information about each individual appointment.
dateA child of the event element, this is the date of the event.
start-timeA child of the event element, this is the starting time for the appointment.
end-timeA child element of the event element, this is the ending time for the appointment.
typeAlso a child of event, the type of appointment, such as a meeting, doctor's appointment, and so on.
titleA child of event, a title for the appointment.
descriptionA child of event, the description of the appointment.
locationA child of event, the location of the appointment.
reminderA child of event, this element is used to define a reminder (instant message or e-mail) for the event.
statusA child of reminder, the status of whether or not a reminder should be sent.
intervalA child of reminder, the interval of time before the event when a reminder should be sent.
methodA child of reminder, the method by which the reminder should be sent.
We can create the XML document using only these elements. All XML documents should begin with the XML declaration. The XML declaration takes the following form:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
The declaration has three attributes:
versionThe version attribute specifies the version of XML that was used to author the document. The version attribute is required.
encodingThe encoding attribute is used to specify the character set that the document utilizes. The attribute is not required, but if it is not specified, the parser will assume the default value of "UTF-8," which is the standard 8-bit Unicode encoding.
standaloneThe standalone attribute is optional, and is used to indicate whether the document is self-contained (standalone="yes") or whether it requires a DTD/Schema (standalone="no"). In our example, we won't be using a DTD or XML Schemaour document will be self-contained, so the attribute value will be "yes."
The XML declaration is not required in order for the XML document to be considered well formed; however, there are very few instances when you should not use the XML declaration. Unless you have a specific reason not to (such as working with a document fragment), you should always use it.
The appointments element is the root element of the document, and will contain the remaining elements. When we populate the document with the elements with the proper relationships, here is the result:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <appointments> <event> <date>03-05-02</date> <start-time>09:00</start-time> <end-time>10:00</end-time> <type>Meeting</type> <title>Staff Meeting</title> <description>Weekly staff meeting.</description> <location>Conference Room</location> <reminder> <status>yes</status> <interval>1-day</interval> <method>e-mail</method> </reminder> </event> </appointments>
This is a well-formed XML document, which does describe appointments adequately. However, you will notice that the document does not make any use of attributes, and there are a couple of places where it might be easier to use attributes to describe our data.
For example, we could easily describe some of the factual information regarding the event, such as the date and times, as attributes. If we start with the event element
<event> <date></date> <start-time></start-time> <end-time></end-time> </event>
we can easily rework this structure to make use of attributes. The result is an event element with three attributes:
<event date="" start-time="" end-time="">
We can do something similar with the reminder element. We start with the element-only structure:
<reminder> <status></status> <interval></interval> <method></method> </reminder>
And change the child elements into attributes as well:
<reminder status="" interval="" method=""/>
Although both structures work, the use of the attributes helps streamline the data and results in a cleaner-looking document:
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <appointments> <event date="" start-time="" end-time=""> <type></type> <title></title> <description></description> <location></location> <reminder status="" interval="" method=""/> </event> </appointments>
In the content of the attributes is primarily a limited set of data; with the date, we will keep a standard format "MM-DD-YY" and the start time and end times will be in 24-hour time format. Later, when we discuss XML Schemas and datatypes, we could easily make these attributes date/time datatypes, which would mean that if the content of the attribute weren't properly formatted, the document would be invalid. Because here we are concerned only with well-formedness, that capability isn't as crucial.
However, data that is always of a limited type is often appropriate for attributes. For example, the "status" of the reminder is a boolean, either a yes or a no. Booleans are a perfect application for attributes because they explicitly modify the state of an element.
Again, there are no clear-cut rules for when to use attributes and when to use elements. You have to use a combination of consensus among the other developers you are working with and your own personal preferences. But as you can see from this example, there are often as many ways to structure an XML document as there are to write a sentence. Let your applications dictate how you structure your documents.
Now, if we take our document and populate it with some data, we get the results shown in Listing 3.1.
Listing 3.1 A Complete, Well-Formed XML Document with Data
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <appointments> <event date="03-05-02" start-time="09:00" end-time="10:00"> <type>Meeting</type> <title>Staff Meeting</title> <description>Weekly staff meeting</description> <location>Conference Room</location> <reminder status="no"/> </event> <event date="03-06-02" start-time="14:00" end-time="15:00"> <type>Interview</type> <title>Developer Interview</title> <description>Interview new developer candidate.</description> <location>Office</location> <reminder status="yes" interval="15-min" method="ICQ"/> </event> <event date="03-15-02" start-time="13:45" end-time="15:00"> <type>Dentist</type> <title>Root Canal</title> <description>Root canal on lower left molar.</description> <location>Dr. Scrivello's Office</location> <reminder status="yes" interval="1-day" method="e-mail"/> </event> </appointments>
This is a simple document, but it does utilize all the concepts outlined here. The element and attribute names meet the naming requirements, the elements are properly nested, with proper start and end tags. The document contains a single root element, appointments, and there are no errors in the content that would cause parsing problems, such as a < symbol. Yet, this document could be used by a calendar application to store information about appointments. That is the power of XML: Documents do not have to be overly complicated in order to be useful.