External DTDs
Before the DTD gets any bigger, let's go ahead and move it out to its own document, as we did with the style sheets. First, we'll create the DTD file itself and call it products.dtd. We'll save it in the same directory as products.xml. As shown in Listing 3.9, we're keeping the content the same, but this file doesn't include the <!DOCTYPE> declaration that was in the products.xml file.
Listing 3.9 Creating an External DTD
0: <!ELEMENT products (vendor)+> 1: 2: <!ELEMENT vendor (vendor_name, advertisement?, product*)> 3: <!ATTLIST vendor webvendor CDATA #REQUIRED> 4: 5: <!ELEMENT vendor_name (#PCDATA)> 6: 7: <!ELEMENT advertisement (ad_sentence)+> 8: <!ELEMENT ad_sentence (#PCDATA | b | i | p )*> 9: <!ELEMENT b (#PCDATA)> 10:<!ELEMENT i (#PCDATA)> 11:<!ELEMENT p (#PCDATA)> 12: 13:<!ELEMENT product (product_id, short_desc, product_desc?, price+, inventory+, giveaway?)> 14: 15:<!ELEMENT product_id (#PCDATA)> 16:<!ELEMENT short_desc (#PCDATA)> 17:<!ELEMENT product_desc (#PCDATA)> 18: 19:<!ELEMENT price (#PCDATA)> 20<!ATTLIST price pricetype (cost | sale | retail) 'retail'> 21: 22:<!ELEMENT inventory (#PCDATA)> 23:<!ATTLIST inventory color CDATA #IMPLIED 24: location (showroom | warehouse) 'warehouse'> 25: 26:<!ELEMENT giveaway (giveaway_item, giveaway_desc)> 27:<!ELEMENT giveaway_item (#PCDATA)> 28:<!ELEMENT giveaway_desc (#PCDATA)>
Now we need to link this DTD to the products.xml file. We will do this using the <!DOCTYPE> declaration, which remains in that file, as shown in Listing 3.10.
Listing 3.10 Linking to an External DTD
0:<?xml version="1.0"?> 1:<!DOCTYPE products SYSTEM "products.dtd"> 2: 3:<products> 4:<vendor webvendor="full"> 5: <vendor_name>Conners Chair Company</vendor_name> ...
In this case, the <!DOCTYPE> declaration has just been changed to point to the file itself (which must be in the same directory as products.xml, in this case).
Just to be sure that we don't have any glitches or typos, go ahead and validate the document once more.
Let's take a look at this for a moment. In our case, we want to point to a specific file on our machine, so we use the keyword SYSTEM before we give it a location. There is, however, another use for the DOCTYPE declaration.
As mentioned earlier, one reason to learn the DTD syntax is because existing XML vocabularies have been written in it. For example, if the furniture industry decided on a particular DTD that would be used to describe its products, we might use a declaration like this:
<!DOCTYPE products PUBLIC "-//Furniture, Inc.//Furniture Catalog//EN" "http://www.nicholaschase.com/dtds/furniture.dtd">
In this case, the keyword PUBLIC alerts the processor that it should check the public identifier, "-//Furniture, Inc.//Furniture Catalog//EN", against its list of local DTD copies. If it's not found, it should go to http://www.nicholaschase.com/ dtds/furniture.dtd and retrieve the DTD information.
Let's take a moment to examine the public identifier
-//Furniture, Inc.//Furniture Catalog//EN
The first item, -, indicates that this DTD is not registered with the ISO. If it were, this item would be a +. The second item, Furniture, Inc., is the owner of the DTD. The third, Furniture Catalog, is a human-readable description of the DTD. Finally, the last item, EN, indicates the language of the DTD.
Adding the Rest of Our Vendors
Now that we think we've got the DTD pretty well finalized, we can put the rest of the data back into it.
Now we should be able to parse the document without a problem. When we do parse it, however, we see quite a few problems, which are shown in Listing 3.11.
Listing 3.11 Parsing Errors for products.xml
0: [Error] products.xml:81:13: Element type "suite" must be declared. 1: [Error] products.xml:84:21: Element type "long_desc" must be declared. 2: [Error] products.xml:125:16: The content of element type "product" must match "(product_id,short_desc,product_desc?,price+,inventory+,giveaway?)". 3: [Error] products.xml:129:56: Attribute "color" must be declared for element type "price". 4: [Error] products.xml:131:56: Attribute "color" must be declared for element type "price". 5: [Error] products.xml:143:15: The content of element type "product" must match "(product_id,short_desc,product_desc?,price+,inventory+,giveaway?)". 6: [Error] products.xml:145:11: The content of element type "vendor" must match "(vendor_name,advertisement?,product*)". 7: [Error] products.xml:178:16: The content of element type "product" must match "(product_id,short_desc,product_desc?,price+,inventory+,giveaway?)". 8: [Error] products.xml:183:37: Attribute "pricetype" with value "starting" must have a value from the list "(cost|sale|retail)". 9: [Error] products.xml:186:15: Element type "item" must be declared. 10:[Error] products.xml:192:15: Element type "item" must be declared. 11:[Error] products.xml:206:16: The content of element type "product" must match "(product_id,short_desc,product_desc?,price+,inventory+,giveaway?)". 12:[Error] products.xml:220:16: The content of element type "product" must match "(product_id,short_desc,product_desc?,price+,inventory+,giveaway?)". 13:[Error] products.xml:222:35: Element type "special" must be declared. 14:[Error] products.xml:222:35: Attribute "specialtype" must be declared for element type "special". 15:[Error] products.xml:227:11: The content of element type "vendor" must match "(vendor_name,advertisement?,product*)". 16:data/products.xml: 3740 ms (127 elems, 75 attrs, 1125 spaces, 2534 chars) 17:data/products.xml: 2920 ms (97 elems, 73 attrs, 353 spaces, 1641 chars)
This really isn't as bad as it looks. Although there are technically 16 errors, many of them are duplicates, in that the same problem causes several errors.
But wait a minute, if we took all this time to get the DTD right, why are we getting so many errors?
Because it's the nature of the business. This is the reason that a DTD must be tested against as much of the intended data as possible. Sometimes we'll change the data to match the DTD; sometimes we'll change the DTD to match the data. We need to do this now, during the design phase, so we don't find ourselves in a position later where we can't record the data we want because the structure won't allow it!
Let's take these errors one at a time. Lines 0 and 1 are easy; we never defined the suite in the DTD. We'll add the following to it:
<!ELEMENT vendor (vendor_name, advertisement?, suite*, product*)> <!ELEMENT suite (product_id, short_desc, long_desc, price+, product*)> <!ELEMENT long_desc (#PCDATA)>
This also fixes line 6.
The error on line 2 tells us that we have a product that's not conforming to its content model, or definition. Fortunately, the error message tells us approximately where the problem is. In this case, it's somewhere around line 125, column 16. This works out to be the following product:
<product> <short_desc>Hall Bench</short_desc> <price pricetype="cost">$75</price> <price pricetype="sale">$62</price> <price pricetype="retail">$120</price> <inventory location="warehouse">143</inventory> <inventory location="showroom">5</inventory> </product>
Looking at the content model for product,
<!ELEMENT product (product_id, short_desc, product_desc?, price+, inventory+, giveaway?)>
we see that this product is missing the mandatory product_id. In this case, the DTD did exactly what it was intended toit enforced consistency in the data. We could alter the DTD to make the product_id optional, but that's probably not a good idea, so we'll go ahead and add a product_id for this product and any others that are missing one.
Lines 3 and 4 are both referring to the same problem. On lines 129 and 131 of products.xml, we've introduced color to the price element. Here we have a few choices:
-
We can change the data so that each color item is its own product and then record the color in its own element. This won't take too much trouble, although we'll have to make sure to maintain both products.
-
Because the sale and retail prices are the same for both colors, we can change the DTD to add a new element, cost, and change the pricing structure. This seems a bit of overkill, however, even in a small file like this.
-
We can change the DTD to allow for a color attribute. This seems like the best solution because it also allows for different color items to be priced differently, if necessary. This is what we'll do:
<!ATTLIST price pricetype (cost | sale | retail) 'retail' color CDATA #IMPLIED>
At first, line 5 doesn't seem to make any sense. The product in question is
<product> <product_id>3253435</product_id> <short_desc>Sleepeazy Mattresses</short_desc> <price pricetype="cost">$162</price> <price pricetype="retail">$300</price> <product_desc>per set, any size</product_desc> <giveaway> <giveaway_item> Free pillows </giveaway_item> <giveaway_desc> with every set </giveaway_desc> </giveaway> <inventory location="showroom">23</inventory> <inventory location="warehouse">15</inventory> </product>
which seems to match the content model just fine. Or does it? Remember, order matters. To fix this error, we'll reorder the data.
Lines 6 through 10 are a little tougher and will probably involve changing the DTD. The product is
<product> <product_id>5622345</product_id> <short_desc>CozyComfort Mattresses</short_desc> <price pricetype="starting"> starting at only $99.99 </price> <item> <product_desc>Queen</product_desc> <price pricetype="cost">$59.00</price> <price pricetype="sale">$69.00</price> <price pricetype="retail">$99.00</price> </item> <item> <product_desc>King</product_desc> <price pricetype="cost">$159.00</price> <price pricetype="sale">$209.00</price> <price pricetype="retail">$359.00</price> </item> <giveaway> <giveaway_item> Free sheets </giveaway_item> <giveaway_desc> with every set </giveaway_desc> </giveaway> </product>
For the starting price, we need to make a decision: Do we add this as a possible attribute value, or change the structure to allow for an infinite number of different pricing options?
If it looked like we'd have to deal with more of these, that'd be the way to go. This is probably the last one, though, so let's just add it to the definition of price.
<!ELEMENT price (#PCDATA)> <!ATTLIST price pricetype (cost | sale | retail | starting) 'retail' color CDATA #IMPLIED>
This brings us to item. Unless we want to make significant changes to the data file (which we don't), we're going to need to change the DTD to accommodate it.
<!ELEMENT product (product_id, short_desc, product_desc?, price+, item*, inventory+, giveaway?)> <!ELEMENT item (product_desc, price+)>
Finally, to conform to the content model for product, we'll have to add an inventory element. In doing this, we don't have any information, so the natural choice would be to leave the element completely blank, as in
<inventory />
This, however, makes no sense, and doesn't give us much to go on from a programming standpoint. Instead, we can make the inventory element optional, and use its absence as an indication that there's something wrong.
<!ELEMENT product (product_id, short_desc, product_desc?, price+, item*, inventory*, giveaway?)>
This is another advantage to checking against as much data as possible. You are likely to find problems that aren't directly causing us errorsyet.
We have a similar issue on line 11, where an item doesn't have a price. In this case, however, we don't want to make the price optional, so we need to add one. Because we don't have any information, however, we'll use the data to indicate that there's something wrong:
<product> <product_id>39981234</product_id> <short_desc>Floataway Waterbeds</short_desc> <price pricetype="cost">TBD</price> <giveaway> <giveaway_item> 15 different styles to choose from </giveaway_item> <giveaway_desc> with free delivery -- we'll take your old mattress as a trade in! </giveaway_desc> </giveaway> </product>
Finally, lines 14 through 16 refer to the fact that we never defined the special. We can take care of that easily:
<!ELEMENT vendor (vendor_name, advertisement?, suite*, product*, special?)> <!ELEMENT special (#PCDATA)> <!ATTLIST special specialtype CDATA #FIXED 'weekly'>
Right now we can accommodate only weekly specials, so we'll require that all specials be weekly.
We run the validation one more time and find that everything comes out clean. The final DTD and XML files are shown in Listings 3.12 and 3.13, respectively.
Listing 3.12 products.dtd: The Complete DTD
0: <!ELEMENT products (vendor)+> 1: 2: <!ELEMENT vendor (vendor_name, advertisement?, suite*, product*, special?)> 3: <!ATTLIST vendor webvendor CDATA #REQUIRED> 4: 5: <!ELEMENT special (#PCDATA)> 6: <!ATTLIST special specialtype CDATA #FIXED 'weekly'> 7: 8: <!ELEMENT vendor_name (#PCDATA)> 9: <!ELEMENT advertisement (ad_sentence)+> 10:<!ELEMENT ad_sentence (#PCDATA | b | i | p )*> 11:<!ELEMENT b (#PCDATA)> 12:<!ELEMENT i (#PCDATA)> 13:<!ELEMENT p (#PCDATA)> 14: 15:<!ELEMENT product (product_id, short_desc, product_desc?, price+, item*, inventory*, giveaway?)> 16: 17:<!ELEMENT product_id (#PCDATA)> 18:<!ELEMENT short_desc (#PCDATA)> 19:<!ELEMENT product_desc (#PCDATA)> 20: 21:<!ELEMENT price (#PCDATA)> 22:<!ATTLIST price pricetype (cost | sale | retail | starting) 'retail' 23: color CDATA #IMPLIED> 24: 25:<!ELEMENT item (product_desc, price+)> 26: 27:<!ELEMENT inventory (#PCDATA)> 28:<!ATTLIST inventory color CDATA #IMPLIED 29: location (showroom | warehouse) 'warehouse'> 30: 31:<!ELEMENT giveaway (giveaway_item, giveaway_desc)> 32:<!ELEMENT giveaway_item (#PCDATA)> 33:<!ELEMENT giveaway_desc (#PCDATA)> 34: 35: 36:<!ELEMENT suite (product_id, short_desc, long_desc, price+, product*)> 37:<!ELEMENT long_desc (#PCDATA)>
Listing 3.13 products.xml: The Complete, Updated XML File
0: <?xml version="1.0"?> 1: <!DOCTYPE products SYSTEM "products.dtd"> 2: 3: <products> 4: 5: <vendor webvendor="full"> 6: <vendor_name>Conners Chair Company</vendor_name> 7: <advertisement> 8: <ad_sentence> 9: Conners Chair Company presents their annual big three 10: day only chair sale. We're making way for our new 11: stock! <b>All current inventory must go!</b> Regular prices 12: slashed by up to 60%! 13: </ad_sentence> 14: </advertisement> 15: 16: <product> 17: <product_id>QA3452</product_id> 18: <short_desc>Queen Anne Chair</short_desc> 19: <price pricetype="cost">$85</price> 20: <price pricetype="sale">$125</price> 21: <price pricetype="retail">$195</price> 22: <inventory color="royal blue" location="warehouse"> 23: 12</inventory> 24: <inventory color="royal blue" location="showroom"> 25: 5</inventory> 26: <inventory color="flower print" location="warehouse"> 27: 16</inventory> 28: <inventory color="flower print" location="showroom"> 29: 3</inventory> 30: <inventory color="seafoam green" location="warehouse"> 31: 20</inventory> 32: <inventory color="teal" location="warehouse"> 33: 14</inventory> 34: <inventory color="burgundy" location="warehouse"> 35: 34</inventory> 36: <giveaway> 37: <giveaway_item> 38: Matching Ottoman included 39: </giveaway_item> 40: <giveaway_desc> 41: while supplies last 42: </giveaway_desc> 43: </giveaway> 44: </product> 45: 46: <product> 47: <product_id>RC2342</product_id> 48: <short_desc>Early American Rocking Chair</short_desc> 49: <product_desc> 50: with brown and tan plaid upholstery 51: </product_desc> 52: <price pricetype="cost">$75</price> 53: <price pricetype="sale">$62</price> 54: <price pricetype="retail">$120</price> 55: <inventory location="warehouse">40</inventory> 56: <inventory location="showroom">2</inventory> 57: </product> 58: 59: <product> 60: <product_id>BR3452</product_id> 61: <short_desc>Bentwood Rocker</short_desc> 62: <price pricetype="cost">$125</price> 63: <price pricetype="sale">$160</price> 64: <price pricetype="retail">$210</price> 65: <inventory location="showroom">3</inventory> 66: </product> 67: 68:</vendor> 69: 70: <vendor webvendor="partial"> 71: <vendor_name> 72: Wally's Wonderful World of Furniture 73: </vendor_name> 74: <advertisement> 75: <ad_sentence> 76: Wally's Wonderful World of Furniture is closing its 77: doors forever. Last chance to get great bargains. 78: Make us an offer. We can't refuse! 79: </ad_sentence> 80: </advertisement> 81: 82: <suite> 83: <product_id>CDRS</product_id> 84: <short_desc>Complete Dining Room Set</short_desc> 85: <long_desc> 86: This five piece dining site set features swivel 87: chairs with cushions in five exciting colors. 88: </long_desc> 89: <price pricetype="cost">$435</price> 90: <price pricetype="sale">$699</price> 91: <price pricetype="retail">$999</price> 92: 93: <product> 94: <product_id>WWWdrt</product_id> 95: <short_desc>Dining Room Table</short_desc> 96: <price pricetype="cost">$105</price> 97: <price pricetype="sale">$145</price> 98: <price pricetype="retail">$195</price> 99: <inventory location="warehouse">132</inventory> 100: </product> 101: <product> 102: <product_id>WWWsc</product_id> 103: <short_desc>Swivel Chair</short_desc> 104: <price pricetype="cost">$50</price> 105: <price pricetype="sale">$45</price> 106: <price pricetype="retail">$99</price> 107: <inventory location="warehouse">300</inventory> 108: </product> 109: <product> 110: <product_id>WWWhch</product_id> 111: <short_desc>Hutch</short_desc> 112: <price pricetype="cost">$346</price> 113: <price pricetype="sale">$425</price> 114: <price pricetype="retail">$600</price> 115: <inventory location="warehouse">232</inventory> 116: </product> 117: </suite> 118: 119: <product> 120: <product_id>HallBench</product_id> 121: <short_desc>Hall Bench</short_desc> 122: <price pricetype="cost">$75</price> 123: <price pricetype="sale">$62</price> 124: <price pricetype="retail">$120</price> 125: <inventory location="warehouse">143</inventory> 126: <inventory location="showroom">5</inventory> 127: </product> 128: 129: <product> 130: <product_id>SofaLoveSeat</product_id> 131: <short_desc>Sofa and Love Seat</short_desc> 132: <price color="magnolia print" pricetype="cost"> 133: $125</price> 134: <price color="nautical print" pricetype="cost"> 135: $145</price> 136: <price pricetype="sale">$175</price> 137: <price pricetype="retail">$250</price> 138: <inventory color="magnolia print" location="showroom"> 139: 3</inventory> 140: <inventory color="magnolia print" location="warehouse"> 141: 36</inventory> 142: <inventory color="nautical print" location="warehouse"> 143: 1</inventory> 144: <inventory color="nautical print" location="showroom"> 145: 432</inventory> 146: </product> 147: 148:</vendor> 149: 150:<vendor webvendor="no"> 151: <vendor_name>Crazy Marge's Bed Emporium</vendor_name> 152: <advertisement> 153: <ad_sentence> 154: We never have a sale because we've got the lowest 155: prices in town! Come in today and shop around. If 156: you can find lower prices anywhere Crazy Marge will 157: shave her husband's head!!! 158: </ad_sentence> 159: <ad_sentence> 160: We have all kinds, all sizes. Don't see what you 161: want? Don't worry. We customize orders! 162: </ad_sentence> 163: </advertisement> 164: 165: <product> 166: <product_id>3253435</product_id> 167: <short_desc>Sleepeazy Mattresses</short_desc> 168: <product_desc>per set, any size</product_desc> 169: <price pricetype="cost">$162</price> 170: <price pricetype="retail">$300</price> 171: <inventory location="showroom">23</inventory> 172: <inventory location="warehouse">15</inventory> 173: <giveaway> 174: <giveaway_item> 175: Free pillows 176: </giveaway_item> 177: <giveaway_desc> 178: with every set 179: </giveaway_desc> 180: </giveaway> 181: </product> 182: 183: <product> 184: <product_id>5622345</product_id> 185: <short_desc>CozyComfort Mattresses</short_desc> 186: <price pricetype="starting"> 187: starting at only $99.99 188: </price> 189: <item> 190: <product_desc>Queen</product_desc> 191: <price pricetype="cost">$59.00</price> 192: <price pricetype="sale">$69.00</price> 193: <price pricetype="retail">$99.00</price> 194: </item> 195: <item> 196: <product_desc>King</product_desc> 197: <price pricetype="cost">$159.00</price> 198: <price pricetype="sale">$209.00</price> 199: <price pricetype="retail">$359.00</price> 200: </item> 201: <giveaway> 202: <giveaway_item> 203: Free sheets 204: </giveaway_item> 205: <giveaway_desc> 206: with every set 207: </giveaway_desc> 208: </giveaway> 209: </product> 210: 211: <product> 212: <product_id>39981234</product_id> 213: <short_desc>Floataway Waterbeds</short_desc> 214: <price pricetype="cost">TBD</price> 215: <giveaway> 216: <giveaway_item> 217: 15 different styles to choose from 218: </giveaway_item> 219: <giveaway_desc> 220: with free delivery -- we'll take your 221: old mattress as a trade in! 222: </giveaway_desc> 223: </giveaway> 224: </product> 225: 226: <special specialtype="weekly"> 227: This week only: Round beds with rotating motors 228: starting at a price that will make your head spin. 229: Just talk to Crazy Marge, she'll tell you all about it! 230: </special> 231:</vendor> 232: 233:</products>