Influences on the Design of XQuery
Introduction
The emergence of the World Wide Web in the 1990s was a seminal event in human culture. Suddenly, as if overnight, a significant fraction of the world's computers were connected, not only by a physical network but also by a common protocol for exchanging information. The Web offered an unprecedented opportunity to make information truly ubiquitous. It seemed to promise that people would no longer need to move physically to places and times where information was available, since all information would be everywhere, all the time.
Realizing this promise required some organizing principle for the exchange of information. This principle had to be independent of any particular language or application and easily extensible to new and unanticipated kinds of information. At present, the leading candidate for this organizing principle is the Extensible Markup Language, XML [XML]. XML provides a neutral notation for labeling the parts of a body of information and representing the relationships among these parts. Since XML does not attach any semantic meaning to its labels, applications are free to interpret them as they see fit. Applications that agree on a common vocabulary can use XML for data interchange. Since XML does not mandate any particular storage technique, it can be used as a common interchange format among systems that store data in file systems, relational databases, object repositories, and many other storage formats.
Since XML is emerging as a universal format for data interchange among disparate applications, it is natural for queries that cross application boundaries to be framed in terms of the XML representation of data. In other words, if an application is viewed as a source of information in XML format, it is logical to pose queries against that XML format. This is the basic reason why a query language for XML data is extremely important in a connected world.
Recognizing the importance of an XML query language, the World Wide Web Consortium (W3C)[W3C] organized a query language workshop called QL'98[QL98], which was held in Boston in December 1998. The workshop attracted nearly a hundred participants and fostered sixty-six papers investigating various aspects of querying XML data. One of the long-term outcomes of the workshop was the creation of a W3C working group for XML Query [XQ-WG]. This working group, chaired by Paul Cotton, met for the first time in September 1999. Its initial charter called for the specification of a formal data model and query language for XML, to be coordinated with existing W3C standards such as XML Schema [SCHEMA], XML Namespaces [NAMESP], XML Information Set [INFOSET], and XSLT [XSLT]. The purpose of the new query language was to provide a flexible facility to extract information from real and virtual XML documents. Approximately forty participants became members of the working group, representing about twenty-five different companies, along with a W3C staff member to provide logistical support.
One of the earliest activities of the Query working group was to draw up a formal statement of requirements for an XML query language [XQ-REQ]. This document was quickly followed by a set of use cases [XQ-USE] that described diverse usage scenarios for the new language, including specific queries and expected results. The XML Query Working Group undertook to define a language with two alternative syntaxes: a keyword-based syntax called XQuery [XQ-LANG], optimized for human reading and writing, and an XML-based syntax called XQueryX [XQ-X], optimized for machine generation. This chapter describes only the keyword-based XQuery syntax, which has been the major focus of the working group.
Creating a new query language is a serious business. Many person-years have been spent in defining XQuery, and many more will be spent on its implementation. If the language is successful, developers of web-based applications will use it for many years to come. A successful query language can enhance productivity and serve as a unifying influence in the growth of an industry. On the other hand, a poorly designed language can inhibit the acceptance of an otherwise promising technology. The designers of XQuery took their responsibilities very seriously, not only in the interest of their individual companies but also in order to make a contribution to the industry as a whole.
The purpose of this chapter is to discuss the major influences on the design of the XQuery language. A tutorial introduction to XQuery appears in Chapter 1. Some of the influences on XQuery were principles of computer language design. Others were related languages, interfaces, and standards. Still others were "watershed issues" that were debated by the working group and resolved in ways that guided the evolution of the language. We discuss several of these watershed issues in detail, including the alternatives that were considered and the reasons for the final resolution.
This chapter is based on the most recent XQuery specification at the time of publication. At this time, the broad outline of the language can be considered to be reasonably stable. However, readers should be cautioned that XQuery is still a work in progress, and the design choices discussed here are subject to change until the language has been approved and published as a W3C recommendation.