A Sample Conversion
Let's take a look at converting a DTD into an XML Schema. We'll start with an XML instance document that describes a basic airline ticket itinerary:
<?xml version="1.0" encoding="UTF-8" ?> <itinerary> <ticket> <airline>American</airline> <flight>4090</flight> <departing date="10-Apr-02" time="6:45AM" airport="IND" gate="A20"/> <arriving date="10-Apr-02" time="7:52AM" airport="ORD" gate="B64"/> <miles>168</miles> <seating class="Coach" seat="D3"/> <duration>1hr 7mn</duration> </ticket> </itinerary>
Here's the DTD that describes this document:
<!-- A Simple Airline Ticket Itinerary DTD --> <!ELEMENT itinerary (ticket*)> <!ELEMENT ticket (airline, flight, departing, arriving, miles, seating, duration)> <!ELEMENT airline (#PCDATA)> <!ELEMENT flight (#PCDATA)> <!ELEMENT departing (#PCDATA)> <!ATTLIST departing date CDATA #REQUIRED time CDATA #REQUIRED airport CDATA #REQUIRED gate CDATA #IMPLIED> <!ELEMENT arriving (#PCDATA)> <!ATTLIST arriving date CDATA #REQUIRED time CDATA #REQUIRED airport CDATA #REQUIRED gate CDATA #IMPLIED> <!ELEMENT miles (#PCDATA)> <!ELEMENT seating (#PCDATA)> <!ATTLIST seating class CDATA #REQUIRED seat CDATA #IMPLIED> <!ELEMENT duration (#PCDATA)>
First, because XML Schemas are XML documents, we need to start off with the XML declaration, and then the schema element with the XML Schema namespace:
<?xml version="1.0" encoding="UTF-8" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- Schema content --> </xs:schema>
Because the <itinerary> element is going to contain other elements, we need to create the tags that will allow us to include other elements inside the element:
<xs:element name="itinerary"> <xs:complexType> <xs:sequence> <!-- Ticket element information --> </xs:sequence> </xs:complexType> </xs:element>
<complexType> and <sequence> establish that the itinerary element is a complex type, and that it will contain a sequence of other elements, which we will then specify by nesting their declarations in the <sequence> tags. This nested method of creating element declarations is called the Russian Doll method, as mentioned in an earlier article in this series. This method is often the simplest way of converting a DTD, because it describes the content model simultaneously with the element declarations, much like a DTD.
With the itinerary element declared, now we can move on to declaring our ticket element, which is going to follow the same structure:
<xs:element name="ticket" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <!-- Ticket content --> </xs:sequence> </xs:complexType> </xs:element>
The only difference, aside from the name, is that the ticket declaration also makes use of the maxOccurs="unbounded" attribute. This is an attribute that allows us to specify that there may be any number of ticket elements inside an itinerary, which is similar to using the * symbol in a DTD.
The next element is the <airline> element, which only has text as the content, and is therefore a simple type:
<xs:element name="airline" type="xs:string"/>
Here, we only make use of the type attribute to specify that the element is a string. In fact, we can define our flight, miles, and duration elements in the same way, since they also don't have any complex content or any attributes:
<xs:element name="flight" type="xs:string"/> <xs:element name="miles" type="xs:string"/> <xs:element name="duration" type="xs:string"/>
With those elements out of the way, we only have three elements remaining to define: departing, arriving, and seating. The syntax for declaring our departing and arriving elements are nearly identical, save the value of the name. They will both look like this:
<xs:element name="arriving"> <xs:complexType> <xs:attribute name="date" type="xs:string" use="required"/> <xs:attribute name="time" type="xs:string" use="required"/> <xs:attribute name="airport" type="xs:string" use="required"/> <xs:attribute name="gate" type="xs:string" use="optional"/> </xs:complexType> </xs:element>
Each of the attributes is given a unique name. All of the attributes share the same type, string, since they will all contain text as their values. But note that, similar to the #REQUIRED and #IMPLIED keywords in the DTD, in the Schema we can use the use attribute with required or optional to specify whether an attribute has to be used.
Now we only have one remaining element to declare, seating, which also has attributes:
<xs:element name="seating"> <xs:complexType> <xs:attribute name="class" type="xs:string" use="required"/> <xs:attribute name="seat" type="xs:string" use="optional"/> </xs:complexType> </xs:element>
If we were to pull together these elements now, we would have an XML Schema that duplicates the functionality of the DTD. However, because we're working with an XML Schema, we can extend the base schema to add some new functionality.
Extending the DTD
We can do something to improve our base converted schema, and it's not complicated. Both the arriving and departing elements feature a date and a time attribute that stores information about the flight. Since the XML Schema Recommendation has both a date and a time datatype, we could use both in our schema, to reflect the content of those elements:
<xs:attribute name="date" type="xs:date" use="required"/> <xs:attribute name="time" type="xs:time" use="required"/>
Now we're ready to bring it all together.
The Finished Schema
Bringing together all of the components yields the following XML Schema:
<?xml version="1.0" encoding="UTF-8" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:annotation> <xs:documentation> A Simple Airline Ticket Itinerary XML Schema </xs:documentation> </xs:annotation> <xs:element name="itinerary"> <xs:complexType> <xs:sequence> <xs:element name="ticket" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="airline" type="xs:string"/> <xs:element name="flight" type="xs:string"/> <xs:element name="departing"> <xs:complexType> <xs:attribute name="date" type="xs:date" use="required"/> <xs:attribute name="time" type="xs:time" use="required"/> <xs:attribute name="airport" type="xs:string" use="required"/> <xs:attribute name="gate" type="xs:string" use="optional"/> </xs:complexType> </xs:element> <xs:element name="arriving"> <xs:complexType> <xs:attribute name="date" type="xs:date" use="required"/> <xs:attribute name="time" type="xs:time" use="required"/> <xs:attribute name="airport" type="xs:string" use="required"/> <xs:attribute name="gate" type="xs:string" use="optional"/> </xs:complexType> </xs:element> <xs:element name="seating"> <xs:complexType> <xs:attribute name="class" type="xs:string" use="required"/> <xs:attribute name="seat" type="xs:string" use="optional"/> </xs:complexType> </xs:element> <xs:element name="miles" type="xs:string"/> <xs:element name="duration" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>
We can now associate the XML Schema with the instance document using the XML instance namespace and the noNamespaceSchemaLocation attribute. We use the noNamespaceSchemaLocation because we're not using a namespace in conjunction with this schema. We can also use the date and time elements using the properly formatted data to comply with the datatypes:
<?xml version="1.0" encoding="UTF-8" ?> <itinerary xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="itinerary.xsd"> <ticket> <airline>American</airline> <flight>4090</flight> <departing date="2001-04-02" time="06:45:00-05:00" airport="IND" gate="A20" /> <arriving date="2001-04-02" time="07:52:00-05:00" airport="ORD" gate="B64" /> <seating class="Coach" seat="D3"/> <miles>168</miles> <duration>1hr 7mn</duration> </ticket> </itinerary>
As the complexity of the DTD increases, obviously so does the time required to convert the DTD to an XML Schema. However, because XML Schemas have increased functionality built in, the resulting XML Schema can sometimes be simpler.