Summary
In this chapter, we learned about Namespaces in XML, XML Information Set, and Canonical XML, three specifications that at least at one time were the the work of the W3C XML Core Working Group. All three of these specifications followed the XML 1.0 Recommendation, attempting to address particular areas that are fundamental. The main points covered are:
XML namespaces are necessary to resolve element (or attribute) name conflict when a document refers to multiple vocabularies (DTDs or XML Schema).
Namespaces are useful to understand because many specifications depend on them for one reason or another, such as XSLT, XML Schema, and XLink.
A universal name (or qualified name) consists of a prefix and a local name, separated by a colon.
A prefix is just shorthand for the actual namespace name, which is represented by a URI.
A namespace declaration appears in an XML document, rather than the DTD. This declaration associates the prefix with the namespace name (URI).
Prefixes are added to elements or attributes in the XML document, except if a default namespace is used.
Namespace scope applies to the element that declares the namespace and all its descendants.
Although namespaces cannot be declared in DTDs, you do need to declare the URI as a fixed value for an xmlns attribute.
Validation and XML namespaces are two different concepts. Creating valid documents that correctly use XML namespaces can be tricky.
A namespace need not point to any physical resource at all, although many do. What the namespace URI maps to is completely undefined. RDDL is a possible way to provide a flexible solution to this ill-defined endpoint.
Special attributes xmlns, xml:space, xml:lang, and xml:base are discussed.
A table of common namespace names, their conventional prefixes, and the specifications with which they are associated is presented.
XML Information Set (Infoset) is an attempt to define a set of terms that other W3C specifications can use to refer to information in a well-formed (but not necessarily valid) XML document that conforms to Namespaces in XML.
Infoset defines eleven kinds of information items, such as element information item and attribute information item, and a different set of properties for each information item type.
It is helpful to understand what is not in the Infoset too, including many details from the DTD.
Due to XML syntax details, it is quite easy to create documents that are physically different and yet logically equivalent.
The purpose of Canonical XML is to define an algorithm by which a particular physical representation of an XML document can be reliably and repeatedly reduced to its canonical (simplest) form.
Perhaps the most important application of Canonical XML is in terms of digital signatures to ensure that the information content has not changed since the document was digitally signed.
When a document is converted to its canonical form, single quoted values become double quoted, abbreviated form of empty element is expanded to paired empty tags, attribute default values from the DTD are substituted in the instance if no value is provided, extraneous white space is stripped, and so on.
A handy free tool for use with Canonical XML is xmlcanon, the Canonical XML Processor, by ElCel Technology.