- XML Elements
- Generic Identifiers
- Some Rules for Naming Elements
- Storing the Data in XML
- Parsed Character Data
- Bypassing Parsing with CDATA
- Attributes
- When to Use Attributes
- Classifying Attributes: Attribute Types
- Attribute Rules
- Well-Formedness Rules
- Creating a Well-Formed XML Document
- The Basics of Validation
- How Do Applications Use XML?
- An Overview of XML Tools
- Roadmap
- Additional Resources
Parsed Character Data
XML documents are read and processed by a specific piece of software called an XML parser. When a document is processed by the XML parser, each character in the document is read, or parsed, in order to create a representation of the data.
Any text that gets read by the parser is Parsed Character Data, or PCDATA. This is important because you will see the term PCDATA pop up all over. Element content is considered either other elements or PCDATA. Attribute values are considered PCDATA.
By definition, PCDATA is parsed, which means that the parser looks at each of the characters and tries to determine their meaning. For example, if the parser encounters a < then it knows that the characters that follow represent an element instance. When the parser encounters a /, it knows that it has encountered an end tag.
Because PCDATA is parsed, it cannot contain <, >, and / characters, as these characters have special meaning in markup. For example:
<math> If you want to denote one number is smaller than another, you can use the < less-than sign </math>
This element will cause an error, because the parser will interpret the < as the start of a new element. If you want to include these characters, you will need to use the equivalent entityfor example, <, to represent a less-than sign.