- Dissecting an XML Document Type Definition
- Using Document Type Definition Notation and Syntax
- Understanding Literals
- Declaring a NOTATION
- Creating ATTLIST Declarations
- Using Special XML Datatype Constructions
- Understanding the Difference Between Well-Formed and Valid XML
- Learning How to Use External DTDs and DTD Fragments
- Altering an XML DTD
- Getting Down to Cases
Using Special XML Datatype Constructions
Although the constructions in the following subsections appear in the document rather than in the DTD, the functions they serve are closely related to concepts already discussed in this chapter.
Processing instructions allow the document author to include text in the document that shouldn't be parsed or displayed by the XML parser or user agent. Instead, they should be passed to a special helper application defined in the processing instruction itself. They include what you might use a <script> element for in HTML but are much more general. Note especially that they don't use SGML comment tags to hide text. This is not advisable in XML because XML parsers are not required to pass on the text of comments to the user agent.
CDATA sections are a shorthand way of escaping an entire block of text, so any included markup characters are treated as text instead of markup. Like a CDATA content description in the DTD, they tell the parser that it shouldn't attempt to parse any of the text contained within the element.
Processing Instructions
A processing instruction is passed on directly to the application with no attempt to evaluate it. It begins with <? and ends with ?> and cannot contain the characters xml in any case combination.
TIP
Many people have longed for a mechanism to include binary data directly in their XML file. No more waiting around for external references to load, no more wondering if the data is still where you left it last. The whole file is one big glob of data. In theory, this is a nice idea, but guaranteeing that the closing strings don't appear in binary data is a difficult problem, and is probably too much to ask an XML processor to handle.
Using a processing instruction, you can pass data directly to a particular interpreter:
<?javascript {javascript stuff} ?> <?Tcl {Tcl stuff} ?> ...
The advantage of this is that you can freely pass things along to an interpreter or program runtime without going through the process of adding a notation to your DTD. Of course, your processor data mustn't contain the string ?> without escaping it.
Escaping a Text Block in a CDATA Section
A CDATA section is a convenience that allows you to put a bunch of character data into your document without having to tediously escape every single markup characteror what might look at first glance to be markup characterscontained in it.
In use it looks like this:
<![CDATA[Any old text at all including < and & signs or even examples of XML tags <!ENTITY list ... >]]>
A CDATA tag is a convenience. You could have done the same thing by escaping all the special characters like
Any old text at all including < and & signs or even examples of XML tags <!ENTITY list ... >
but possibly spent a lot longer at it and with more possibility of getting it wrong.
The only constraint is that your data can't contain the sequence ]]> ending the section. Figure 3.4 shows the CDATA section in use.
Figure 3.4 An XML page showing a CDATA section in use to avoid having to escape a section of markup in the MultiDoc Pro SGML/XML browser.