3.3 Opposites are attracted
To XML, that is!
How is it that XML can be optimal for two such apparently extreme opposites as MOM and POP? The answer is, the two are not really different where it counts.
In both cases, we start with abstract information. For POP, it comes from a human author's head. For MOM, it comes from a database. But either way, the abstract data is marked up with tags and becomes a document.
Here is a terminally cute mnemonic for this very important relationship:
Data + Markup = DocuMent
Aren't you sorry you read it? Now you'll never forget it.
But XML "DocuMents" are special. An application can use three different processing techniques with one:
Parse it, in order to extract the original data. This can be done without information loss because XML represents both metadata and data, and it lets you keep the abstractions distinct from rendition information. Once extracted, the data can be manipulated as needed by the application.
Render it, so it can be presented in a physical medium that a human can perceive. It can be rendered in many different ways, for delivery in multiple media such as screen displays, print, Braille, spoken word, and so on.
Hack it, meaning "process it as plain text without parsing". Hacking might involve cutting and pasting into other XML documents, or scanning the text to get some information from it without doing a real parse.
The important revelation here is that data and documents aren't opposites. Far from it they are actually two states of the same information.
The real difference between the two is this:
When data is in a database, the metadata about its structure and meaning (the schema) is stored according to the proprietary architecture of the database.
When data is in a document, the metadata is stored as markup.
A mixture of markup and data must be governed by the rules of some notation. XML and SGML are notations, as are RTF and Word file format. The rules of the notation determine how a parser will interpret the document text to separate the data from the markup.
Notations are not just for complete documents. There are also data object notations, such as GIF, TIFF, and EPS, that are used to represent such things as graphics, video (e.g., MPEG), and audio (e.g., MP3). Document notations usually allow their documents to contain data objects, such as pictures, that are in the objects' own data object notations.
Data object notations are usually (not always) in binary; that is, they are built-up from low-level ones and zeros. Document notations, however, are frequently character-based. XML is character-based, which is why it can be hacked.4
Since databases and documents are really the same, and MOM and POP applications both use XML documents, there are lots of opportunities for synergy.