- Creating Applications with Java API for XML Parsing (JAXP)
- Understanding XML
- XML Related Tools
- Creating an XML Document
- Creating a Document Type Definition (DTD)
- Parsing with the Simple API for XML (SAX)
- Parsing with the Document Object Model (DOM)
- An XML Version of the CruiseList Application
- Summary
Creating a Document Type Definition (DTD)
Most programmers would rather be poked in the eye than read about how to create a grammar. The subject brings back memories of diagramming sentences and memorizing rules about present participles, gerunds, and tenses. This being said, the grammar validation facility defined in XML is your friend, not a return to seventh grade torture.
If you think of a parser as a type of compiler, you will see its value. The Java compiler finds problems in your code that are normally simple to fix. This keeps you from having to find these problems yourself whenever an error in processing occursa big time-saver.
When another program sends you a file containing a ticket request, how do you know that it is a valid XML document? The parsers can tell whether it is well formed, but that doesn't guarantee very much. It would be nice if you could specify which elements are required, which are optional, which ones can have more than one element of a tag, and so on. Then, whenever you receive an XML document, the parser can "compile" it for you and tell you whether it passes. Better still, the sender can run it through the same "compiler" (the XML parser using the same DTD) before sending it to you, thereby guaranteeing that it passes this inspection. This still doesn't guarantee that everything in the file is okay, but it does catch another whole class of potential problems.
The process of creating a DTD is fairly simple if a bit precise. You create a grammar and store it in a file. When you receive an XML document, you parse it with a validating parser. The validation process throws exceptions when it encounters problems. Your code can decide to reject the document in order to prevent introduction of garbage data into your system.
That being said, let's create a DTD for the ticket request. This example is simple and very short. Listing 3.2 shows the DTD for the ticketRequest element, which serves as a grammar that allows the parser to validate XML documents.
Listing 3.2 The ticketRequest DTD File
<!ELEMENT ticketRequest (customer, cruise)> <!ELEMENT customer ( lastName, firstName)> <!ATTLIST customer custID NMTOKEN #REQUIRED> <!ELEMENT lastName (#PCDATA)> <!ELEMENT firstName (#PCDATA)> <!ELEMENT cruise (destination, port, sailing, numberOfTickets, isCommissionable?)> <!ATTLIST cruise cruiseID NMTOKEN #REQUIRED> <!ELEMENT destination (#PCDATA)> <!ELEMENT port (#PCDATA)> <!ELEMENT sailing (#PCDATA)> <!ELEMENT numberOfTickets (#PCDATA)> <!ELEMENT isCommissionable EMPTY>
Let's look at this file line by line to get a better understanding of how to create DTDs.
The ticket request must contain exactly one customer and one cruise. I could have placed qualifiers after each word if there were other options. customer+ means one or more, customer* means zero or more, and customer? means zero or one of these elements must be present in order for the request to be valid.
<!ELEMENT ticketRequest (customer, cruise)>
Likewise, a customer must have a lastName and a firstName:
<!ELEMENT customer ( lastName, firstName)>
Each lastName and firstName contains only PCDATA, also known as text:
<!ATTLIST customer custID NMTOKEN #REQUIRED> <!ELEMENT lastName (#PCDATA)> <!ELEMENT firstName (#PCDATA)>
The only attribute that the customer needs is a custID. Its attribute type is NMTOKEN, which is like CDATA but allows fewer special characters.
The cruise element must contain exactly one destination, port, sailing, and numberOfTickets. It might contain a single isCommissionable, and it might not. It is valid with no tag for isCommissionable, but it will be invalid if there are two of them.
<!ELEMENT cruise(destination, port, sailing, numberOfTickets, isCommissionable?)>The cruiseID is a required attribute of the cruise tag:
<!ATTLIST cruise cruiseID NMTOKEN #REQUIRED>
The destination, port, sailing, and numberOfTickets will contain only text. Notice that this is true even for numbers.
<!ELEMENT destination (#PCDATA)> <!ELEMENT port (#PCDATA)> <!ELEMENT sailing (#PCDATA)> <!ELEMENT numberOfTickets (#PCDATA)>
Finally, the isCommissionable tag is designated as EMPTY. This means that it can have no text or other elements in it. This makes sense for flag type elements.
<!ELEMENT isCommissionable EMPTY>
This is all that we need to specify our document in a DTD. You might be wondering how you might specify that a value must be all characters and no digits, or all digits. Unfortunately, the DTD standard doesn't allow that level of specification past a certain point. A W3C XML Working Group committee called the Schema Working Group is working on expanding the DTD to the level of a schema, but that work has not been incorporated into JAXP as of this writing. Microsoft has also proposed an XML Schema.