- What Is XML?
- Building XML Messages from Processes to Data
- Is XML Ready for Business?
Building XML Messages from Processes to Data
This section looks at the process for building business messages with XML. As we discussed earlier in this chapter, XML lets trading partners define their own elements and tags, taking advantage of XML's extensible naturethe X in XML. But XML messages also represent the structure of those elements, following their prescribed relationships in the hierarchy. The message schema DTD captures the names of the elements as represented by the tags and their hierarchical structure. Messages exchanged among trading partners therefore must represent the rules and practices of a business or industry, as captured in the schema DTD.
For example, in Chapter 3, "ebXML at Work," the Marathoner running store case study points out how retailer and manufacturers can exchange product identifiers and precise inventory levels, so that manufacturers can compare inventory levels to predefined reorder points and decide whether they need to ship more product. Before any of these exchanges can happen, however, the retailer and the manufacturersor, better yet, the entire industryneed to agree on common terminology and structure of the messages. With this common set of rules, shoe manufacturers and retailers can use the same basic set of messages, which promotes the use of packaged software and makes it possible for the parties to develop their systems faster and for less money.
We call this common set of rules a data model because, like a schematic drawing, it offers a skeleton view of the messages, specifies the order of the elements in a message, and shows how the various elements relate to one another in a hierarchy. The term comes from the database world, where database design needs to meet the users' business requirements as efficiently as possible, yet still allow for future growth. The logical model defines the information fields and their relationships in a database (much like a schema DTD in an XML message), while the physical model details field sizes and datatypes, such as alphanumeric or date formats.30 In fact, defining an XML schema of information is analogous to creating traditional row-and-column layouts for a database design system.
The XML syntax is not just about interpreting the content. The business process is a vital component of the content and is helped along by XML.
Determine Processes
As shown in the case studies in Chapter 3, the parties identify business processes or actions taken by the companies to achieve their business goals. For example, the travel agency case proposes a process to decide on a tour package. This process has contingencies built in for continued bids and best-and-final offers if the customers don't want to accept one of the first offers. By working out these larger processes, the trading partners can agree on the overall conduct of the business, before trying to determine the individual messages.
A tool called use cases can help identify these processes. Use cases describe scenarios in which users interact with each other and the systems under development. Each scenario describes the accomplishment of a specific task or achievement of a goal. They also identify the players, steps in the process, and the messages or even the data exchanged. By describing these situations in a storytelling mode, use cases often uncover the processes underlying business practices.31
One of the ebXML development activities involves identifying similarities in business processes across industries. While each industry has its own language and culture, using these common processes helps speed the work and improves the chances for interoperability among industries.32
NOTE
By working out larger processes, trading partners can agree on the overall conduct of the business, before trying to determine the individual messages.
Determine Message Flows
Each process contains a set of individual messages exchanged among the trading partners. In Chapter 3, the running store case listed a series of messages in the process of reporting inventory levels and replenishing the stock:
Periodic inventory report sent from the store to the manufacturer
Ship notice sent from the manufacturer to the store with the shipment details
Receiving report sent from the store to the manufacturer once inventory is accepted
Industries defining their processes can identify the individual messages contained in those processes, as well as how and when the companies send and receive the messages. These messages may resemble EDI transaction sets (see Chapter 5, "The Road Toward ebXML," for a discussion of EDI), as in the running store case, or look nothing like EDI transactions, as in the travel agency case.
Identify Data in the Messages
Once industries identify the messages, they next need to identify the sets of business data that go into those messages. Industry organizations that have previously developed EDI transactions can use this work as the basis for identifying data for XML messages. Newer business processes must rely on information analysis between companies to determine the content required, often replacing older, paper-based documents. But the objective is to improve the way companies do businessnot necessarily to follow the current EDI transactions or old paper-process documents. Industries sometimes use this exercise to test traditional assumptions and practices, which can cut out captured or exchanged data that's no longer needed. On the other hand, this process can generate more pieces of data needed by trading partners to meet their business requirements.
When applying this process to XML, industry groups develop XML vocabularies that put these groups of data into definable messages, also identifying the structure of the data in the messages. To aid understanding and reuse, the XML structure should link related and most-used pieces of information together as logical blocks. The messages thus embody the rules and practices of doing business in a particular industry, defined in terms of XML. In this way, industries can design common groups of data with common structures as industry-wide rules for processing XML messages.
XML vocabularies can represent more than vertical industries. Vocabularies can also define business functions found in multiple industries, or entire frameworks that provide interoperability across industries and functions. One of these frameworks is ebXML itself, which provides the underpinning for global business, not just an industry sector.33
Business Schema DTDs
As discussed earlier, DTDs, as specified for XML, contain the rules for both constructing and structurally validating XML messages. We'll now describe schemas in more detail to give you an understanding of how this key piece of the XML technology is used to enable consistent electronic business.
DTDs assemble information into elements with connected attributes. Elements are the basic building blocks of XML messages, and therefore the basic components of DTDs. Elements can contain other elements expressed in a hierarchy (compound elements), or they can stand alone as simple containers for character data. Compound elements for parent/child blocks can be referenced together. When the modeling process identifies the data in proposed XML messages, most of these data items will become elements, identified as such in DTDs. In XML messages themselves, elements are marked up as tags within the now familiar angle brackets (<>). Element definitions can indicate the frequency with which the elements occuronce or more than onceand whether they're required or optional.34
NOTE
XML vocabularies can define business functions found in multiple industries, or entire frameworks that provide interoperability across industries and functions.
Then attributes provide additional description or qualification for elements. Using the language metaphor often applied to XML, one can think of elements as nouns and attributes as adjectives. The XML document example presented earlier and the following DTD fragment identify the PostalCode as an element, with the codetype and its use as an attribute of that element:
<PostalCode codetype='ZIP'>96045 </PostalCode> <! DTD definition for element and attribute > <!ELEMENT PostalCode (#PCDATA) > <!ATTLIST PostalCode codetype CDATA #IMPLIED >
With the schema DTD syntax, the attributes also provide a limited form of data typing, which means that they describe the kind of data allowed for that element. Attributes can contain strings (character data), enumerated lists, or references to other components in the document called tokens.
Enumerated lists restrict the attribute to only permitted character strings. For example, an attribute to identify smoking preferences for hotel reservations would have the following as its enumerated listing: SMOKING or NONSMOKING. Attributes can likewise indicate a default response, used routinely unless the customer requests otherwise. Returning to the hotel example, the NONSMOKING response could serve as the default, unless the customer specifically requests SMOKING.35 While schema DTD datatyping is deliberately simplistic but thereby more easily understood, the new W3C extended schema datatyping is extensive and sophisticated.36
The Entity Referencing System
Entities are rather misnamed. They're really aliases or substitution strings, intended to identify the reusable objects in a schema DTD, providing handy shortcuts and helping to ensure consistency in the rules expressed by the DTD. These reusable objects can consist of text strings, such as legal boilerplate, or more complex data element and attribute combinations, defined in advance and recalled when needed. Entities can be internal to the DTD or stored as fragments externally.37
Entities also help when placing a character inside a character data or CDATA section of an XML document that would cause confusion with the processing of the XML, such as &, <, >, and ".
Consider the telephone number in the following example. The boldfaced element <Telephone> is a substitution string declared as an entity in the schema DTD telephone-usa.xml, and then included as needed in XML documents based on that DTD. The OpenTravel Alliance uses this technique in its customer profile, which specifies several telephone numbers (customer, emergency contact, travel agency, and so on). The use of this technique simplifies the schema DTD and guarantees that all telephone numbers in the valid messages are defined consistently.38
<?xml version="1.0"?> <!DOCTYPE Cust.Telephone SYSTEM 'http://xml.org/telephone-usa.xml' []> <Cust.Telephone PhoneTech="Voice" PhoneUse="Home"> < Telephone CountryAccessCode="1"> < Phone.AreaCityCode>703 </Phone.AreaCityCode> < Phone.Number>555-9999 </Phone.Number> </ Telephone> </ Cust.Telephone>
Example of Building a Data Model and XML Equivalent
Using a traveler's customer profile, we can show an example of a DTD and how it helps build and validate an XML message.
Table 4.1 shows the pieces of information in a scaled-down traveler profile database, showing three levels in the data hierarchy, as well as the content of each levelelement, text, or attributeas well as single/multiple occurrences, requirement indicator, and allowable options.
The control information identifies the creator of the profile (a travel agency, for the purpose of this exercise), whether it's a new record or an update, whether the customer has given permission to share the data in the profile, and a date/time stamp that most systems can generate routinely.
Table 4.1: Traveler Profile Database Structure
Data level 1 |
Data level 2 |
Data level 3 |
Content |
Occurs |
Required? |
Options |
Control info |
|
|
Element |
Single |
Yes |
|
|
Share permission? |
|
Attribute |
|
|
Yes |
|
|
|
|
|
|
No |
|
Agency |
|
Element |
Single |
Yes |
|
|
|
Agency name |
Text |
Single |
Yes |
|
|
|
Agency ID |
Text |
Single |
|
|
|
New/Update |
|
Text |
Single |
Yes |
New Update |
|
Date-time |
|
Text |
Single |
Yes |
|
Traveler ID |
|
|
Element |
Multiple |
Yes |
|
|
Traveler name |
|
Element |
Single |
Yes |
|
|
|
Title |
Text |
Multiple |
|
|
|
|
Family name |
Text |
Single |
Yes |
|
|
|
Given names |
Text |
Multiple |
|
|
|
Address |
|
Element |
Multiple |
Yes |
|
|
|
Address type |
Attribute |
|
|
Mailing Delivery |
|
|
Number/street |
Text |
Single |
Yes |
|
|
|
Room/floor |
Text |
Multiple |
|
|
|
|
City name |
Text |
Single |
Yes |
|
|
|
Postal code |
Text |
Single |
Yes |
|
|
|
State/Province |
Text |
Multiple |
|
|
|
|
Country |
Text |
Single |
|
|
|
Telephone |
|
Element |
Multiple |
Yes |
|
|
|
Telephone use |
Attribute |
|
|
Work Home |
|
|
Country access |
Text |
Single |
|
|
|
|
Area/city code |
Text |
Single |
Yes |
|
|
|
Tel. number |
Text |
Single |
Yes |
|
|
|
|
Element |
Multiple |
|
|
|
|
Email type |
Attribute |
|
|
Work Personal |
|
|
Email address |
Text |
Single |
|
|
Form of payment |
|
|
Element |
Multiple |
Yes |
|
|
Payment type |
|
Attribute |
|
|
Credit card Debit card |
|
Payment detail |
|
Element |
Multiple |
Yes |
|
|
|
Card number |
Text |
Single |
Yes |
|
|
|
Exp. date |
Text |
Single |
Yes |
|
|
|
Name on card |
Text |
Single |
Yes |
|
Travel preferences |
|
|
Element |
Multiple |
|
|
|
General |
|
Element |
Multiple |
|
|
|
|
Smoking section |
Text |
Single |
|
Smoking Non-smoking |
|
|
Meal preferences |
Text |
Multiple |
|
|
|
|
Special needs |
|
Multiple |
|
|
|
Loyalty programs |
|
Element |
Multiple |
|
|
|
|
Program type |
Attribute |
|
|
General Airline Hotel Rental car |
|
|
Program name |
Text |
Single |
|
|
|
|
Program ID |
Text |
Single |
|
|
|
Airline |
|
Element |
Multiple |
|
|
|
|
Departure airport |
Text |
|
|
|
|
|
Seat selection |
Text |
|
|
Aisle Center Window |
|
Hotel |
|
Element |
Multiple |
|
|
|
|
City section |
Text |
|
|
Downtown Suburbs Airport |
|
|
Room type |
Text |
|
|
Single Double |
|
Car rental |
|
Element |
Multiple |
|
|
|
|
Car type |
Text |
|
|
Compact Midsize Full |
|
|
Child seat |
Text |
Single |
|
Yes |
The DTD for this database structure (Traveler.dtd) is found on this book's web site (http://www.ebxmlbooks.com). Please note that this DTD example is meant only to illustrate how a DTD works, and should not be used for normal business messages.
From this database structure, a travel agency wants to create a traveler profile record for a traveler, with the following specific data and preferences:
Administrative control data
Agency name: GoGo Travel
Agency ID code: ZZY98234
Purpose of record: new
Date/time: 21 June 2001, 3:55 pm
Permission to share data in profile? No
Traveler identification
Traveler's name: Ms. Phoebe P. Peabody-Beebe
Address (delivery): 312 Sycamore St., Buffalo, NY 14204
Telephone (work): 716-555-9999
Email: Phoebe@PeabodyBeebe.com
Payment data
Type of payment: Credit card
Card number: 0000111122223333
Expiration date: 12/2002
Name on card: Phoebe P Peabody-Beebe
Preferences
Nonsmoking
Meal type: Vegetarian
Loyalty programairlines: US Airways, no. 24680
Loyalty programcar rental: National Car Rental, no. 54321
Loyalty programgeneral: AmEx Membership Miles, no. 09876
Departure airport (IATA code): BUF
Airline seat preference: Aisle
Hotel, city section preference: downtown
Hotel room preference: single
Car type preference: Compact
Listing 4.4 gives a validated XML document for these entries based on the rules presented in Traveler.dtd.
Listing 4.4 Sample XML Document Based on Traveler.dtd
<Traveler> <Control> <Agency> <AgencyName>Go-Go Travel </AgencyName> <AgencyID>ZZY98234</AgencyID> </Agency> <Purpose>New</Purpose> <DateTime>20010621t15:55:00</DateTime> </Control> <TravelerID Share="No"> <TravelerName> <Title>Ms</Title> <Family>Peabody-Beebe</Family> <Given>Phoebe</Given> <Given>P.</Given> </TravelerName> <Address AddressType="Deliver"> <NumberStreet>312 Sycamore St </NumberStreet> <City>Buffalo</City> <PostalCode>14204</PostalCode> <StateProv>NY</StateProv> </Address> <Telephone PhoneUse="Work"> <AreaCity>716</AreaCity> <PhoneNumber>555-9999 </PhoneNumber> </Telephone> <Email> <EmailAddress> Phoebe@PeabodyBeebe.Com </EmailAddress> </Email> </TravelerID> <Payment> <PayDetail> <CardNumber> 0000111122223333 </CardNumber> <ExpDate>12/2002</ExpDate> <NameOnCard> Phoebe P Peabody Beebe </NameOnCard> </PayDetail> </Payment> <Preferences> <General> <Smoking>Non-smoking</Smoking> <MealPref>Vegetarian</MealPref> </General> <Loyalty LoyalType="Airline"> <LoyalName>US Airways </LoyalName> <LoyalID>24680</LoyalID> </Loyalty> <Loyalty LoyalType="Car Rental"> <LoyalName>National Car Rental</LoyalName> <LoyalID>54321</LoyalID> </Loyalty> <Loyalty LoyalType="General"> <LoyalName>Amex Member Miles</LoyalName> <LoyalID>09876</LoyalID> </Loyalty> <Airline> <DepartAirport>BUF </DepartAirport> <SeatSelect>Aisle</SeatSelect> </Airline> <Hotel> <CitySection>Downtown </CitySection> <RoomType>Single</RoomType> </Hotel> <CarRent> <CarType>Compact</CarType> </CarRent> </Preferences> </Traveler>
This message referencing the Traveler.dtd contains all of the required data, uses tags that match the element names in the DTD, presents the elements and tags in the order prescribed by the DTD, and therefore conforms as a valid structure to that DTD. Notice that the example doesn't have any data for child seat preferences listed under the XML car rentals section, but does have three different loyalty programs listed. The rules expressed in the DTD allow for such variations. However, if a message left out the traveler's name, a validating parser would return an error message accordingly.
XML Schema
The generic name for DTDs is schemas, a term borrowed from the database world. DTDs represent data only in a hierarchy, which works fine for documentation; remember that the W3C borrowed DTDs from SGML, designed for electronic documentation and the predecessor to XML.
However, many business databases use other kinds of structuressuch as relational databases or object-oriented classes and propertiessome of which don't always lend themselves to a hierarchical model. In some cases, particularly when working with a simple data structure, data architects have been able to adapt object-oriented structures or relational data models to the kind of hierarchies represented in DTDs. But business doesn't always deal a simple hand, and technologists need more robust and flexible tools than the DTD to be prepared for these more complex conditions.
The W3C has developed XML Schema, a major enhancement to XML that offers extended tools for representing information structures and objects, as well as providing extended datatypes beyond those in DTDs. In May 2001, XML Schema reached full recommendation status.39
XML Schema provides more power for defining the structure, content, and semantics of XML documents. The W3C specifications document has three parts:
Methods for describing the structure of data
Definition of datatypes
-
A primer, explaining its features40
The first part of the specification deals with structures, documenting the meaning, use, and relationships of the components of an XML document, such as elements, attributes, and entities. It provides the rules for validating XML documents, based on the rules described in the schemas. It also allows for referencing partial or multiple schemas, thus providing a great deal more flexibility and power than DTDs.41
NOTE
Software and systems supporting XML Schema will need to resist the temptation to cover all of the bells and whistles, since they build in more complexity and cost than is needed.
The second part of XML Schema covers datatypes and addresses the need for defining more kinds of data in the rules used to validate XML documents. This part of the specification identifies a group of basic (or primitive) datatypes such as strings, integers, dates, and sequences. The specification describes features of a datatype system, including acceptable ranges of values and valid representations of the data (such as whole numbers or scientific notation).
The specification identifies datatypes derived from those built into the basic XML recommendations, such as character data (CDATA), tokens, and entities. And it defines various components of datatypes to allow for the development of unanticipated datatypes.42
This greater flexibility comes with a price, however. While it's tempting to use many of these new features, many business applications require just a few of them at any time. For example, being able to validate dates and times will be a significant addition to XML's ability to support business. Few businesses, however, will need the ability to create entirely new datatypes. Software and systems supporting XML Schema will need to resist the temptation to cover all of the bells and whistles, since they build in more complexity and cost than is needed.43
As an alternative, work on RELAX NG is being developed by an OASIS Technical Committee and eventually for submission to ISO. RELAX NG is designed as a simpler and more accessible approach to providing schema functionality for XML documents.44
Other Details
XML Schema incorporates one of the first enhancements to the XML specification, called XML Namespaces. With XML Namespaces, schemas can address multiple XML vocabularies in a single document. Namespaces provide for uniqueness in element names by combining the namespace prefix (mapped to a uniform resource identifier, like a web address), and the local part or element or attribute name.45
Put simply, XML Namespaces allow different companies or industries to avoid name clashes where they both use the same word with different meanings or contexts, but with the same tag name. An example is the word stock, which has at least six possible meanings. An obvious example is using formats such as billing:address and supplier:address to clarify that address is being used in two different contexts.