Format Indicator
Several applications, communicating via Messages (66), follow an agreed-upon data format, perhaps an enterprise wide Canonical Data Model (355). However, that format may need to change over time.
How can a message’s data format be designed to allow for possible future changes?
Even when you design a data format that works for all participating applications, future requirements may change. New applications may be added that have new format requirements, new data may need to be added to the messages, or developers may find better ways to structure the same data. Whatever the case, designing a single enterprise data model is difficult enough; designing one that will never need to change in the future is darn near impossible.
When an enterprise’s data format changes, there would be no problem if all of the applications changed with it. If every application stopped using the old format and started using the new format, and all did so at exactly the same time, then conversion would be simple. The problem is that some applications will be converted before others, while some less-used applications may never be converted at all. Even if all applications could be converted at the same time, all messages would have to be consumed so that all channels are empty before the conversion could occur.
Realistically, applications will have to be able to support the old format and the new format simultaneously. To do this, applications must be able to tell which messages follow the old format and which follow the new.
One solution might be to use a separate set of channels for the messages with the new format. That, however, would lead to a huge number of channels, duplication of design, and configuration complexity as each application has to be configured for an ever-expanding assortment of channels.
A better solution is for the messages with the new format to use the same channels that the old format messages are using. This means that receivers need a way to distinguish messages of different formats that are using the same channel. Each message must specify what format it is using, and it needs a simple way to indicate its format.
Design a data format that includes a Format Indicator so that the message specifies what format it is using.
The Format Indicator enables the sender to tell the receiver the format of the message. This way, a receiver expecting several possible formats knows which one a message is using and therefore how to interpret the message’s contents.
There are three main alternatives for implementing a Format Indicator:
- Version Number—A number or string that uniquely identifies the format. Both the sender and receiver must agree on which format is designated by a particular indicator. The advantage of this approach is that the sender and receiver do not have to agree on a shared repository for format descriptors, but the drawback is that each must know what descriptor is indicated and where to access it.
- Foreign Key—A unique ID—such as a filename, a database row key, a home primary key, or an Internet URL—that specifies a format document. The sender and receiver must agree on the mapping of keys to documents and the format of the schema document. The advantage of this approach is that the foreign key is very compact and can point to a detailed data format description in a shared repository. The main drawback lies in the fact that each messaging participant has to retrieve the format document from a potentially remote resource.
- Format Document—A schema that describes the data format. The schema document does not have to be retrieved via a foreign key or inferred from a version number; it is embedded in the message. The sender and the receiver must agree on the format of the schema. The advantage of this alternative is that messages are self-contained. However, message traffic increases because each message carries format information that rarely changes.
A version number or foreign key can be stored in a header field that the senders and receivers agree upon. Receivers that are not interested in the format version can ignore the field. A format document may be too long or complex to store in a header field, in which case the message body must have a format that contains two parts: the schema and the data.
Example: XML
XML documents have examples of all three approaches. One example is an XML declaration, like this:
<?xml version="1.0"?>
Here, 1.0 is a version number that indicates the document’s conformance to that version of the XML specification. Another example is the document type declaration, which can take two forms. It can be an external ID containing a system identifier, like this:
<!DOCTYPE greeting SYSTEM "hello.dtd">
The system identifier, hello.dtd, is a foreign key that indicates the file containing the DTD document that describes this XML document’s format. The declaration can also be included locally, like this:
<!DOCTYPE greeting [ <!ELEMENT greeting (#PCDATA)> ]>
The markup declaration, [], is a format document, an embedded schema document that describes the XML’s format [XML 1.0].