Converting the DTD to a Schema
Now that we have a finished DTD, you could associate that DTD with the XML document and start writing valid XML. But of course, that isn't what you want to do. What you really want is to be able to use an XML Schema. So now let's take a look at converting the DTD into an XML Schema by walking through it step by step.
Converting the DTD Step by Step
The first order of business, now that we are building an XML Schema, is to start out with the XML declaration and set up the Schema namespace:
<?xml version="1.0" encoding="UTF-8" ?> <schema xmlns="http://www.w3.org/2001/XMLSchema"> Schema content... </schema>
NOTE
In the following section, we are going to take the element and attribute declarations slightly out of context (by not showing the examples nested in the element content of their parent elements). That is because when we build the final Schema, those declarations are going to be nested within each other, and that might make them more difficult to read. We will bring them all together at the end of the section, but in the meantime, keep in mind that these will later be nested.
Now that we have our schema element set up, we have to declare the root element for our XML document, <itinerary>. This is accomplished using the <element> tag and the name attribute:
<element name="itinerary"></element>
That's all there is to it; however, in this case, it's not quite that simple. Because the <itinerary> element is going to contain other elements, we need to create the tags which will allow us to include other elements inside an <itinerary> element. The result looks like this:
<element name="itinerary"> <complexType> <sequence> this is where the ticket element will nest </sequence> </complexType> </element>
You can see that we've added two tags, <complexType> and <sequence>. Those are tags that establish that the <itinerary> element is a complexType element, and that it will contain a sequence of other elements, which we will then specify by nesting their declarations between the two <sequence> tags.
NOTE
This is the Russian Doll method first introduced in Chapter 2. Chapter 17, "XML Schema Best Practices," discusses Russian Doll in greater detail.
This is often the simplest method of authoring an XML Schema because it describes the content model simultaneously with the element declarations. However, in DTDs, content model declarations are separate from the declarations of elements contained in them. A content model in DTD is always exactly one level deep, which is the exact opposite of a Russian Doll, which contains all levels of a content model in one place. There are other methods for structuring your Schema, and we will discuss them in greater detail in Chapter 17.
With the <itinerary> element declared, now we can move on to declaring our <ticket> element, which is going to follow the same structure, right down to the <complexType> and <sequence> tags, because it too will contain other elements:
<element name="ticket" maxOccurs="unbounded"> <complexType> <sequence> this is where the other elements will nest </sequence> </complexType> </element>
The only difference, aside from the name, is that the <ticket> declaration also makes use of the maxOccurs="unbounded" attribute. This is an attribute that allows us to specify that there may be any number of <ticket> elements inside an <itinerary>, which is similar to using the * symbol in a DTD. Now that the <ticket> element is defined, we can move on to the next element in our DTD.
The next element is the <airline> element, which only has text as the content, and is therefore a simple type:
<element name="airline" type="string"/>
Here, we make use of only the type attribute to specify that the element is a string. In fact, we can define our <flight>, <miles>, and <duration> elements in the same way, since they don't have any complex content either, nor any attributes:
<element name="flight" type="string"/> <element name="miles" type="string"/> <element name="duration" type="string"/>
With those elements out of the way, we only have three elements remaining to define: <departing>, <arriving>, and <seating>. As you can imagine, the syntax for declaring our <departing> and <arriving> elements is nearly identical, save the value of the name. They will both look like this:
<element name="departing"> <complexType> </complexType> </element>
Now, this is not yet complete, because these elements do have attributes. The attributes are declared using an <attribute> element, along with attributes that specify the name of the attribute, the type of the attribute, and the use of the attribute. So for the <departing> and <arriving> elements, our attribute elements will look like this:
<attribute name="date" type="string" use="required"/> <attribute name="time" type="string" use="required"/> <attribute name="airport" type="string" use="required"/> <attribute name="gate" type="string" use="optional"/>
Each of the attributes is given a unique name. All of the attributes share the same type, "string", since they are all going to contain text as their values. Note, however, that similar to the #REQUIRED and #IMPLIED keywords in the DTD, we can use the use attribute in the Schema with "required" or "optional" to specify whether an attribute must be used.
Now, if we bring the attribute and element declarations together, we end up with this:
<element name="arriving"> <complexType> <attribute name="date" type="string" use="required"/> <attribute name="time" type="string" use="required"/> <attribute name="airport" type="string" use="required"/> <attribute name="gate" type="string" use="optional"/> </complexType> </element>
Now, we only have one remaining element to declare, <seating>, which also has attributes. Following the examples above, we use the <element> tag to declare the element, and then we nest the <attribute> tags accordingly, and end up with
<element name="seating"> <complexType> <attribute name="class" type="string" use="required"/> <attribute name="seat" type="string" use="optional"/> </complexType> </element>
That's it. If we wanted to, we could pull this all together now and we would have a finished XML Schema. But why don't we pause for a moment and see if there is anything we can do to make use of the expanded capabilities of XML Schema?
Extending the DTD
As it so happens, we can do something to improve upon our base converted Schema that is not complicated, but would still make an improvement.
Remember the <arriving> and <departing> elements? They both feature date and time attributes that store information about the flight. As it so happens, the XML Schema Recommendation has both a date and a time datatype that we could use in our Schema, to reflect the content of those elements.
The date and time datatypes make use of the ISO 8601 standard for the format, so if we use them, our date will need to be in the form
And our time will be in the form
where the second set of hours and minutes is the deviation from Universal Coordinated Time. So, for example, if we wanted to say 1:30 p.m. EST, we would use:
Now, even though this places some restrictions on the format for our data, it also helps ensure that our data will work well with other applications, and makes it easier in the long run for us to work with that data. So to update our Schema, we would need to change the attribute declarations for date and time as follows:
<attribute name="date" type="date" use="required"/> <attribute name="time" type="time" use="required"/>
Now we're ready to bring it all together.