- XML Elements
- Generic Identifiers
- Some Rules for Naming Elements
- Storing the Data in XML
- Parsed Character Data
- Bypassing Parsing with CDATA
- Attributes
- When to Use Attributes
- Classifying Attributes: Attribute Types
- Attribute Rules
- Well-Formedness Rules
- Creating a Well-Formed XML Document
- The Basics of Validation
- How Do Applications Use XML?
- An Overview of XML Tools
- Roadmap
- Additional Resources
Bypassing Parsing with CDATA
There are times when you may want to include data in your document that contains markup, but which you do not want to be parsed. For example, if you were authoring a tutorial on HTML, and storing it in an XML file, you might have the following:
<instruction> Titles can be <I>italicized</I> using the <I> tag. </instruction>
This instruction element could be used in an XML document as is; however, it would cause an error because the parser would assume that <I> was a new element. To denote that the content should not be parsed, you can utilize a CDATA Section.
CDATA Sections can occur anywhere character data can occur. They are used to escape blocks of text containing characters that would otherwise be recognized as markup. CDATA Sections begin with the string <![CDATA[ and end with the string ]]>.
What that means is that you can enclose information inside these CDATA markers and that text will be ignored by the parser. So, let's take another look at our example:
<instruction> <!CDATA[Titles can be <I>italicized</I> using the <I> tag. ]]> </instruction>
Now the XML parser will completely ignore whatever text follows the <!CDATA[ tag until it encounters the ]]> tag. This allows you to include any type of data in that section you would like.
Keep in mind, though, that nothing inside a CDATA Section is parsed. Therefore, if you were to include entities, they would not be parsed. So, <I> would remain <I> if it were contained inside a CDATA section.
A CDATA Section can be used anywhere PCDATA occursas element content, and so on. However, attribute values are always parsed unless they are specified as CDATA in a DTD or Schema. So, you cannot include a CDATA Section in an attribute value.
NOTE
Some users of XML have raised the idea of including text-encoded binary data into CDATA Sections. Because the text in a CDATA Section isn't parsed, this seems like an okay idea. However, to do so, you would need to ensure that the encoding did not include ]]>. With XML Schemas, there are a number of binary datatypes that are a much better mechanism for including binary data as element content.
By default, the text content of XML documents is PCDATA, and you will not encounter the PCDATA keyword until we discuss valid XML with DTDs and Schemas. However, CDATA sections that can be used in well-formed XML do escape large sections of text, as well as be used in DTDs and Schemas. We will discuss the use of DTDs and valid XML later in Chapter 4, "Structuring XML Documents with DTDs," and XML Schemas in Chapter 5, "Defining XML Document Structures with XML Schemas."