Summary
This chapter covered XML syntax rules and basic parsing concepts.
We were introduced to fundamental XML terminology, such as element, attribute, tag, and content.
XML document structure was discussed, including the XML prolog, consisting of the XML declaration and the document type declaration, both of which are optional but desirable.
Names of elements, attributes, and many other XML identifiers are required to conform to the definition of an XML Name.
An XML Name consists of a leading letter, underscore, or colon, followed by name characters (letters, digits, hyphens, underscores, colons, or periods).
XML is case-sensitive. Although there is no universal convention concerning use of uppercase or lowercase when developing your own language, one recommendation is to use UpperCamelCase for elements and lowerCamelCase for attributes, a convention used in SOAP.
We learned the difference between markup and character data; all text that isn't markup is character data.
We covered most of the types of markup, including start and end tags, empty element tags, entity references, character references, comments, CDATA sections, document type declarations, processing instructions, and XML declarations.
The minimal requirement for an XML document is that it be well-formed, meaning that it adheres to a number of XML syntax rules.
Although well-formedness is a prerequisite for validity, a document can be valid only if it also conforms to the constraints imposed by a DTD or XML Schema.
More modern parsers can be toggled between two states: validating and nonvalidating. Validation mode is crucial during development. In a production environment, however, it may be desirable (under certain circumstances) to disable validation for efficiency.
Event-based (e.g., SAX) and tree-based (e.g., DOM) parsing were briefly contrasted.