Home > Articles > Web Services > XML

Why XML?

This chapter is from the book

W3C Goals for XML

The W3C's original goals for XML as well as the current practices of creating and moving XML reflect a new politics of information. In at least seven of these goals, people issues are dominant. Since we are focused primarily on this softer side of XML, we offer a brief statement of benefit at the end of each discussion of goals.

  1. XML shall be straightforwardly usable over the Internet.

This may be the first time you have ever seen the word "straightforwardly." The term is used frequently by technologists who would say "easy" to one another but not to a supervisor or customer. "Straightforwardly" means roughly, "We now have all the pieces on the bench, and we know precisely in our heads how it should all work." For XML, straightforward usability over the Internet does not yet mean "directly to a Web browser." A Web content provider cannot publish just any XML content without other supporting technologies, as we shall see.

The original XML working group had agendas in mind that may have changed somewhat. And the history of XML-to-the-browser has turned out to be somewhat bumpy. Nevertheless XML is indeed made-to to-order for the Internet. First, XML markup makes content recognizable by Internet-connected partners, regardless of computer and software platforms. Much of this book focuses on XML as the ultimate platform neutralizer for content.

XML also participates in supporting the Internet itself. Within the more local layers of Internet activity, XML serves as plumbing. It is common engineering practice for separate software modules to use XML for communicating with one another. One such open standards initiative is XML-RPC. When one software module requires the resources of another module, it has long been an engineering design practice for the consumer module to issue a remote procedure call (RPC). XML makes that strategy more open. With XML-RPC the messaging between those modules consists of tiny XML documents. In this way it is more likely that a software integrator can more easily use software components from different vendors.

Electronic business Extensible Markup Language (ebXML) is an international initiative that is well on schedule toward enhancing international trade. One important component of ebXML is a system-level messaging specification for transfer, routing, and packaging. These are functions that originally were viewed as part of Internet protocol.

With applications like these, XML enables separate software components and widely separated and heterogeneous computers to talk to one another across large global networks. It is only a slight exaggeration therefore to say that the Internet is XML. This goal has been a self-fulfilling prophecy.

Benefit: As XML my content is already that much closer to being Web- and _Internet-ready. Consequently my development costs are reduced. Additionally, because XML content is open, my investment in Internet-based XML is preserved.

  1. XML shall support a wide variety of applications.

SGML is the basis of XML. We discuss this at greater length in Chapter 2. The "G" of SGML is a deliberate and comprehensive design principle that guarantees that SGML can work for all structured information. (Much of this book, in fact, was written using an SGML-enabled word processor.)

The actual track record of SGML demonstrated that it fared well in certain sectors of the electronic information world. One early adopter was the U.S. Internal Revenue Service (IRS). Another notable player is the DoD, which ordered the use of SGML to facilitate electronic commerce with the hope of streamlining its massive acquisitions procedures. The name of that initiative was Computer-aided Acquisition and Logistics Support (CALS), later changed to Continuous Acquisition and Lifecycle Support. As CALS has broadened to the commercial sector, the title has changed to Commerce At Light Speed. CALS, as an SGML application, is still primarily a technology for electronic commerce. Another notable application of SGML is the very large specification developed and enforced by the Airline Transport Association. The ATA standard supports the manufacturing, operation, and maintenance of commercial aircraft around the world.

The planners of XML wanted to make certain that XML continued the tradition of supporting information of every variety. But they needed to see that variety expand dramatically. To make sure this was accomplished they sought every means of keeping the language as simple as possible. The strategy with XML is not to change your business practices to fit an information standard. Rather it is to provide a building-block language that allows you to extend it in ways that fit your existing business practices. This has always been the philosophy of SGML, but it was spelled out more forcefully with XML. The goal of "wide variety" expresses the hope that information designers from any industry can easily form a consortium, using (extending) XML to invent a language that fits.

In order for an industry or interest group to establish the sort of information exchange that XML can support, there is typically a very run-up effort for meetings, standards drafts, protocol definitions, and finally, the full definition of a markup language. In spite of very high front-loaded costs, there are dozens of major XML initiatives underway. Table 1.1 is only a tiny sample of XML-based initiatives. This sample offers a hint of the variety of applications possible.

Table 1.1 A sample of XML-based initiatives

Language

Full Name

Application/Arena

XBRL

eXtensible Business Reporting Language

Financial reporting, including EDGAR filings to the SEC

WML

Wireless Markup Language

Wireless Application Protocol (WAP) Forum

NewsML

News Markup Language

Creation, transfer, and delivery of news

NLMCommon DTD

National Library of Medicine Common document type definition

Support of upgraded services of MEDLINE at National Library of Medicine

FpML

Financial Products Markup Language

E-commerce activities in the field of financial derivatives.

CIDX

Chemical Industry Data

Buying, selling, and delivery of chemicals


Benefit: The broad variety of maturing, industry-specific applications assures me of a shorter learning curve and decreased risk of failure of my XML-based initiative.

  1. XML shall be compatible with SGML.

In most of the 85+ books on XML, SGML typically makes a cameo appearance as somewhat of a historical relic. SGML is historically significant for XML, but there is more to SGML's role than that. Had it not been for over a decade of serious SGML activity, XML probably never would have happened. On its own XML could never have attained the fast-track maturity that it now enjoys. (The first draft of XML was re_leased at the November 1996 SGML meeting in Boston.) So while SGML is related historically to XML, it is more than a parent. In the mathematical sense, XML is a subset of SGML. We will take up the meaning of that statement in Chapter 2.

In order to view the entire XML movement effectively, it is absolutely erroneous to conclude that first we had SGML, and later XML came along. These are not separate initiatives, as we have said. To view them as separate movements can impair our perception of XML's own maturity and robustness.

There exists a significant investment in SGML legacy content within many organizations. After all, the promise of SGML was that its markup would persist for millennia and still be readable after every current vendor had long since left the marketplace. XML provides for these organizations an orderly migration path to the future. But a potential adopter of XML, even one with a large SGML store, needs to be clear about the relationship.

The precise relationship between XML, SGML, and the many hundreds of MLs now in existence are baffling to the newcomer. It makes due diligence more difficult. And journalists' constant misuse of terms hinders fact-gathering even more. Assuming that one of the reasons you are reading this book is to clarify these differences, we feel that it is important to get these distinctions right. We feel that so strongly that we dedicate an entire chapter (Chapter 2) precisely to this topic.

In articulating the purpose of XML, the XML Working Group did not discard any of the beneficial features of SGML. In fact, most of the new appeal of XML comes from extensions of features that were part of SGML all along.

XML is not some sort of hostile intellectual takeover of the markup world from SGML. Quite the contrary, XML is a clear affirmation that markup technologies are here to stay.

Benefit: XML, as a follow-on to SGML, is in reality a 15-year-old, proven cluster of technologies. This maturity of markup technologies mitigates our risk of failure and false starts.

  1. It shall be easy to write programs which process XML documents.

Since XML is plain text, then an XML processor should be (at least partly) only a text processor. It should not require you to reinvent the likes of Microsoft? Word in order to interpret and process an XML document. On the other hand, the choice of easy was somewhat misguided. (They used easy for data creation, which is probably acceptable.) The concept of easy does not combine well with programming. It is indeed easy to concoct half-page examples. We include many in this book. It is quite another matter to develop large XML-based systems to accomplish genuinely useful work.

On the other hand, five years of history has shown the requirement to be largely fulfilled. This is because XML is by definition structured. (We dwell on this in Chap_ters 3-6.) That automatically makes it a good fit for object-oriented (OO) programming languages and for the many scripting languages in use for rapid systems deployment. And since XML is plain text throughout, that greatly eases the learning curve for developers of XML processing software. This requirement statement is a backhanded plea for the programming apparatus supporting XML also to be kept simple, within the skill set of average IT organizations.

An aside regarding documents: The XML working group may have used the term documents, but XML was destined from the start to go far beyond traditional print products. Documents in the XML specifications (and in this book) refer to electronic content of every type and medium.

Benefit: Because XML content, no matter how daunting it may appear to a newcomer, is and always will be plain text. There is no possibility of a single vendor's thwarting our ability to author and manage XML. This open systems approach guarantees our freedom always to select freely among vendors of our choice.

  1. The number of optional features in XML is to be kept to the absolute minimum, ideally zero.

The phenomenon of feature creep is well understood and properly feared by every product manager. Every product that is in wide use must be tightly controlled, offering only those features that will enhance its survival. The same is even more true for an open standard. But XML is not an International Organization for Standardization (ISO) standard, so it does not enjoy the protection and control of a standard. XML therefore could have been threatened by many assaults to accommodate this or that technological opinion or vertical industry. Instead of features, the W3C has fostered the growth and maturity of XML by engendering new and separate working groups' initiatives.

The core XML standard is mature and intact. As later chapters will demonstrate, when an industry or group wishes to add features to a particular XML markup language, that only requires extensions. Designers using ebXML for example may elect to add significant facilities for security. XML is designed precisely to accomplish that-add features to the particular language (here ebXML) without requiring any change whatsoever to XML itself. Pain-free extensibility is what XML is all about.

Benefit: The investment in an XML application is safe, because there is virtually no risk of an XML application's becoming obsolete.

  1. XML documents should be human-legible and reasonably clear.

At first glance this seems to be a curious requirement, somewhat out of place in the highly structured world of XML. It is quite at home in the world of software engineering, where human programmers constantly access code. And it is, of course, a nice thing for all documents to be clear and legible to humans. But why is this nice-to-have a need-to-have in XML? And what are the actual determiners of XML content's appearance?

First, the human-readable requirement has been part of open systems markup all along. So this is not something that just occurred to the XML designers. XML is for the consumption of computers, processors that must render it, transform it, segment it or do whatever is appropriate. But XML content must be visible and readable by humans at every stage of the information food chain: editors, designers, programmers, authors, and at times even end users. This shortens the learning curve and makes troubleshooting easier (imagine if you needed to have a tool just to read the characters in an XML file).

Second, how clear and by whose standards? There are three main contributors to XML:

  1. Information architect, the designer of the content's structure

  2. Database specialist, the manager of the content's storage and retrieval

  3. Stylist, the packager who typically determines the content's appearance

In a perfect world, each of these functions would be kept strictly separate from the others, and the marked-up content would reflect that orderly separation. In actual practice, all three activities converge, often with unpleasant results. XML files for real business frequently are a nearly unreadable mixture of pointy-bracket markup, database language directives, and styling detail. Each specialist's (or group of specialists') native expression has become part of the markup. The result is that XML in actual practice may be nearly illegible and far from clear. Nothing that accomplishes useful work in the real workplace is going to be easily readable, clear, and legible. Consequently, much of current markup practice generates documents that are anything but human-legible and reasonably clear.

Third, it is now common for massive amounts of XML content to be created not by humans but by e-commerce servers and other computers within some automated workflow. As we will see in Chapter 4, the computer views its payload in a manner that is blind to human legibility. And since machines, not humans, are generating an increasingly large proportion of XML content, it may be reasonable to expect legibility and clarity. There is little hope that content created in that way can ever meet the standard of easy human legibility.

On the other hand, XML content, because it is structured, is predictable. That in itself is a strong contributor to legibility. You will be viewing much XML source directly throughout the book and will very soon be able to get a strategic grasp of any XML content.

Benefit: The skill set required for reading XML content is minimal and therefore less costly. This open accessibility to human-legible content means that professionals from all disciplines can feel at home with the same XML content. That eliminates much of an on-going need for expensive specialists and technology transfer.

  1. The XML design should be prepared quickly.

The design of XML did in fact proceed quickly, resulting in version 1.0 of a full recommendation as XML 1.0 in February 1998. The W3C is an workplace consortium that issues recommendations, not a representational body that votes on standards. The process for XML (the core specification) was the same as for all of the W3C's 45 specifications. The milestones (Recommendation Track) for a formal process of moving from draft through to final released recommendation are as follows:

Working draft. Chartered work item of a working group, representing work in progress and a commitment by W3C to pursue work in a particular area.

Last call working draft. Special instance of a working draft, considered by the working group to fulfill the relevant requirements of its charter and any accompanying requirements documents...a public technical report for which the working wroup seeks technical review from other W3C groups, W3C members, and the public.

Candidate recommendation. Believed to meet the relevant requirements of the working group's charter and any accompanying requirements documents...an explicit call for implementation experience to those outside of the related working groups or the W3C itself.

Proposed recommendation. Believed to meet the relevant requirements..., to represent sufficient implementation experience, and to adequately address dependencies from the W3C technical community and comments from previous reviewers.

W3C recommendation. End result of extensive consensus-building inside and outside of W3C about a particular technology or policy...appropriate for widespread deployment and promote W3C's mission.

XML in fact became a recommendation in a very short time. But not all of the related family members of XML were that straightforward. The thorniest area of XML has been in styling (how XML content should finally look or sound). This is predictable, because it impacts browsers, programming practices, and established methods for expressing style.

Benefit: The speed of XML's reaching W3C recommendation status meant that the workplace could begin conducting real business in only one year beyond the first draft.

  1. The design of XML shall be formal and concise.

You are not likely to read many of the formal specifications that comprise XML recommendations. But you can always be sure that the specification exists, when clarification or appeal is necessary. Since every portion of XML is formally defined, a software developer can rely on the specification as a design document.

The design, as expressed in the specification, has its own formal language. That language is the same for all of the specifications for the XML family.

The first edition of version 1.0 (February 1998) was only 29 pages. The second edition (October 2000) had grown to 54 pages, still a modest size for the formal definition of a language.

Benefit: Thanks to the conciseness of XML, it is economically feasible even for small organizations and trading partners to develop XML-based systems.

  1. XML documents shall be easy to create.

This objective is a worthy legacy of the SGML standard. Creating XML content is indeed easy, because we are dealing with plain text, as always. So in theory, creation is possible with even the lowest-grade text editing tool, running on any computer. But the nature of creating structured content for complex content makes a structure-aware editor almost mandatory, as it is for HTML. As we have observed for the first objective, easy should probably read straightforward.

Benefit: Thanks in part to the flat (format-free) nature of XML content (documents), the skill set requirements for data entry, editing, and management is minimal. The arrival of reasonably priced and powerful XML editors has driven the cost of entry even lower for the potential XML player.

  1. Terseness is of minimal importance.

This caveat originates from software engineering. There it is sometimes considered a virtue to write efficient computer code (i.e., as few lines as possible) to accomplish something. The down side of terseness in computer code is generally that it makes the code unreadable by all but the writer and very few others. With the likelihood of commingled XML markup, HTML, JavaScript, and other scripting within a single file, XML likewise can become terse. For serious, reusable XML content, that practice is not only of minimal importance, but it is unacceptable because it prevents the human-legible and reasonably clear objective.

Actual XML content, at least as Web page source, is anything but terse. It is evident in XML Schemas (the means of representing XML document types using XML itself). It is evident in expressions written for Extensible Stylesheet Language Trans_formations (XSLT). Fortunately, there is no penalty for this because of increased processor speed and decreased memory costs, both for disk and RAM.

Benefit: The verbosity of XML is its own best guarantee that XML content will readily expose itself for maintenance and management.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020