- XML: A Vision of World Connectedness
- The Role of XML
- XML: Just Tags?
- XML and the Web
- SOAP
- Web Services
- .NET and J2EE
- XML: The Three Revolutions
- Summary
- Resources
XML: The Three Revolutions
At the beginning of this chapter we outlined several areas in which XML's impact has been felt. To understand the changes that are occurring in today's software world, it's helpful to look at XML in the context of three revolutions in which XML is playing a major role.
As Figure 1.7 illustrates, the three areas of impact are data, which XML frees from the confines of fixed, program-dependent formats; architecture, with a change in emphasis from tightly coupled distributed systems to a more loosely coupled confederation based on the Web; and software, with the realization that software evolution is a better path to managing complexity than building monolithic applications. In the following sections we'll explore each in more detail.
Figure 1.7 The three XML revolutions: data, architecture, and software.
The Data Revolution
Prior to XML, data was very much proprietary, closely associated with applications that understood how data was formatted and how to process it. Now, XML-based industry-specific data vocabularies provide alternatives to specialized Electronic Data Interchange (EDI) solutions by facilitating B2B data exchange and playing a key role as a messaging infrastructure for distributed computing.
XML's strength is its data independence. XML is pure data description, not tied to any programming language, operating system, or transport protocol. In the grand scheme of distributed computing this is a radical idea. The implication is that we don't require lock-in to programmatic infrastructures to make data available to Web-connected platforms. In effect, data is free to move about globally without the constraints imposed by tightly coupled transport-dependent architectures. XML's sole focus on data means that a variety of transport technologies may be used to move XML across the Web. As a result, protocols such as HTTP have had a tremendous impact on XML's viability and have opened the door to alternatives to CORBA, RMI, and DCOM, which don't work over TCP/IP. XML does this by focusing on data and leaving other issues to supporting technologies.
XML: Origin and Cultures
Although XML is a relatively new technology, its lineage extends back over several decades. Approved by the W3C in 1998, XML is an effort to simplify the Standard Generalized Markup Language (SGML), which, until XML, was the ISO standard for defining data vocabularies. Technically, XML is a subset of SGML designed to facilitate the exchange of structured documents over the Internet. Although SGML, which became an ISO standard in 1986, has been widely used by organizations seeking to structure their documents and documentation (for example, the General Motors parts catalog), its pre-Web complexity has been the main stumbling block to its widespread use and acceptance by the Web community. Figure 1.8 illustrates the relationship between SGML and XML and shows some of the languages derived from each.
Figure 1.8 XML is the successor to SGML. Both are metalanguages that are used to define new data-oriented vocabularies.
The designers of XML took the best parts of SGML and, based on their experience, produced a technology comparable to SGML but much simpler to use. In fact, simplicity and ease of programming were requirements imposed by the W3C on the Working Group responsible for the final XML specification.
The Code, Data, and Document Cultures
To understand XML's impact on the computing world, it's useful to place XML in perspective. As Figure 1.9 shows, XML comes out of a document culture that is distinct from the code and data cultures that are the hallmarks of the mainstream computer industry. The code culture is characterized by a focus on programming languages, beginning with FORTRAN and evolving through Algol to C, C++, and Java. The data culture is characterized by COBOL, data processing, and databases. Both the data and code cultures carry with them a built-in propensity to view the world through either a code or a data lens. From a code perspective, data is something to be transported by procedure calls. From a data perspective, data is something to be stored in databases and manipulated.
Figure 1.9 Evolution: from programming languages to objects to components.
The late 1980s and early 1990s saw code and data combine in the form of object-oriented languages such as C++, Smalltalk, Java, and Object COBOL. And yet, object technology was only a partial answer. As practitioners in the data world had long realized, transactionsthe ability to update multiple databases in an all-or-none mannerare essential to serious industrial-strength enterprise applications. Because component frameworks provide transactions as a service to applications regardless of language origins, the playing field quickly shifted from objects to components. Thus infrastructures such as CORBA, DCOM, and Enterprise JavaBeans (EJB) provide interconnection, security, and transaction-based services for extending the enterprise. In the mid 1990s, components were the only way to extend legacy. However, XML changed the rules of the game.
XML's emergence from the data-oriented document culture has forced a rethinking about application development, particularly for those accustomed to thinking of building applications from a code-based perspective. What XML brings to the computing world is a technology that allows data to be freed from the constraints created by code-centric infrastructures. Instead of requiring data to be subordinated to parameters in a procedure call, XML now permits data to stand on its own. More radically, it allows code to be treated as data, which has been the driving force behind using XML for remote procedure calls. As Figure 1.10 illustrates, XML offers an alternative to both EDI and technologies such as CORBA, RMI, and DCOM that lock data transfer into underlying networks and object infrastructures. It is this change in perspective that is driving the widespread use of XML across the entire computing industry and opening up new patterns of interaction, including Web services.
Figure 1.10 XML in combination with Web protocols allows data to be independent of network, programming language, or platform.
The Data RevolutionThe Architectural Revolution
Together these XML-based technology initiatives open up new possibilities for distributed computing that leverage the existing infrastructure of the Web and create a transition from object-based distributed systems to architectures based on Web services that can be discovered, accessed, and assembled using open Web technologies. The focal point of this change in architectural thinking has been a move from tightly coupled systems based on established infrastructures such as CORBA, RMI, and DCOM, each with their own transport protocol, to loosely coupled systems riding atop standard Web protocols such as TCP/IP. Although the transport protocols underlying CORBA, RMI, and DCOM provide for efficient communication between nodes, their drawback is their inability to communicate with other tightly coupled systems or directly with the Web.
Loosely coupled Web-based systems, on the other hand, provide what has long been considered the Holy Grail of computing: universal connectivity. Using TCP/IP as the transport, systems can establish connections with each other using common open-Web protocols. Although it is possible to build software bridges linking tightly coupled systems with each other and the Web, such efforts are not trivial and add another layer of complexity on top of an already complex infrastructure. As Figure 1.11 shows, the loose coupling of the Web makes possible new system architectures built around message-based middleware or less structured peer-to-peer interaction.
Figure 1.11 XML in combination with Web protocols has opened up new possibilities for distributed computing based on message passing as well as peer-to-peer interaction.
The Architecture RevolutionThe Software Revolution
XML is also part of a revolution in how we build software. During the 1970s and 1980s, software was constructed as monolithic applications built to solve specific problems. The problem with large software projects is that, by trying to tackle multiple problems at once, the software is often ill-suited to adding new functionality and adapting to technological change. In the 1990s a different model for software emerged based on the concept of simplicity. As Figure 1.12 illustrates, instead of trying to define all requirements up front, this new philosophy was built around the concept of creating building blocks capable of combination with other building blocks that either already existed or were yet to be created.
Figure 1.12 The software revolution: simplicity and collaboration.
A case in point is the Web. After decades of attempts to build complex infrastructures for exchanging information across distributed networks, the Web emerged from an assemblage of foundational technologies such as HTTP, HTML, browsers, and a longstanding networking technology known as TCP/IP that had been put in place in the 1970s.
Figure 1.13 illustrates how the Web as we know it was not something thought out in strict detail. Each of the contributing technologies focused on doing one thing well without inhibiting interconnection with other technologies. The essential idea was to maximize the possibility of interaction and watch systems grow. The result is the Web, a product of the confluence of forces that include the Internet, HTML, and HTTP. Let's now look at how these same forces of combination and collaboration are driving the revolution in software.
Figure 1.13 The Web itself is an example of combinatoric simplicity in action. HTTP, a simple protocol, combines with browser technology to give us the Web as we know it today.
Software and Surprise
One byproduct of this new way of thinking about software combination is the element of surprise. Conventional software built around an ongoing series of requirements poses few surprises (except if it comes in under budget and on time). The Web, however, was different. It took just about everyone by surprise. Like a chemical reaction, the elements reacted in combination, giving rise to totally new structures.
Design Principles
While Tim Berners-Lee gets the credit for assembling the pieces that ultimately composed the Web, a look at what Berners-Lee has to say in a 1998 article entitled "Principles of Design"1 sheds some light on the thinking behind the software revolution. In this article, Berners-Lee focuses on several fundamental principles that are driving a new way of creating software.
Simplicity: Often confused with ease of understanding, simplicity refers to the ease with which a concept or construct achieves its goal.
Modular design: When you want to change a system (and change is inevitable), modular design lets you
introduce change with minimal impact on the workings of other system components.
Decentralization: Systems should be constructed so that no one element serves as a single point of failure for the entire system.
Test of independent invention: This involves a thought test. If someone else had invented your system, would theirs work with yours? This property means that in design, you should try to do one thing well and avoid having to be the center of the universe.
Principle of least power: Computer science from the 1960s to the 1980s put great effort into constructing systems as powerful as possible, systems that tried to do it all. The principle of least power asserts that less powerful solutions ultimately are better suited for analysis and manipulation by applications yet to be invented. One hundred years in the future, software will probably have an easier time figuring out the content of an XML document than of a C++ program.
Another example of the power of combination and surprise is Napster, a radical way of distributing music over the Internet. Napster relied on peer-to-peer connectivity rather than centralized distribution. Napster wasn't the result of a team of dedicated software professionals, but was created by a twenty-something upstart drawing on the power of assembly. The music industry will never be the same.
Combination and Collaboration
The power of combination is finding its way not only into software construction but up the development chain to software specification and design. Rather than hoping to meet the needs of users, design is now more collaborative, bringing in stakeholders early to ensure maximum feedback and the benefits of collaborative thinking. Figure 1.14 illustrates how this collaborative model is used by the W3C, the Internet Engineering Task Force, and Sun in its Java Community Process.
Figure 1.14 Part of the software revolution includes collaboration on specification and design. Examples include the Internet Engineering Task Force, the W3C, and Sun's Java Community Process.
Collaboration in Software Specification and Design
The W3C
Regarding standards from the W3C, it's important to realize that the word "Recommendation," in W3C parlance, means final specification or standard. Understanding the W3C's process in moving from idea to Recommendation is important in tracking where the Web is going. Having the status of an approved Recommendation means that software vendors and developers can be confident that the technology described in the Recommendation has general industry-wide consensus.
There are several steps along the W3C path from submitting a proposal to ultimate approval as an official Recommendation, as Figure 1.15 illustrates.
Figure 1.15 The W3C approval process from Submission to Recommendation.
Submission: Any W3C member may submit a document to the W3C for possible review. A Submission indicates only that the document has been placed in a W3C in-box. It says nothing about what the W3C thinks about it. The next step is for the W3C to determine whether it warrants further consideration as an Acknowledged Submission or should be dropped. This decision is based on whether the Submission is within the scope of the W3C charter.
Note: A Note is a W3C document that has followed a formal submission process and gets an official date stamp. It carries no commitment on the part of the W3C to pursue the work any further.
Acknowledged Submission: A Submission or Note that has been reviewed by the W3C becomes an Acknowledged Submission, which results in the formation of a Working Group, typically composed of the member group that authored the original Submission plus any other interested parties. The Working Group is tasked with producing Working Drafts that go up for public review.
Working Draft: Working Groups produce Working Drafts. A Working Draft is a document in progress. When consensus is reached within the Working Group, a Proposed Recommendation is released. Often a Working Draft will be implemented by vendors who provide feedback to the Working Group about the viability of the proposed idea.
Proposed Recommendation: The Working Group's consensus is formulated in a Proposed Recommendation that is sent to the W3C Advisory Committee for review.
Candidate Recommendation: For complex proposals, the W3C Advisory Committee may release the document as a Candidate Recommendation, which indicates that there is consensus within the Working Group but that the W3C would like additional public review and feedback, particularly from implementers of a specification. These developers also get a head start in bringing the technology to market before it acquires Recommendation status.
Recommendation: A Recommendation represents consensus within the W3C that the idea is ready for prime time. Developers can be confident that a Recommendation will remain stable and that software can be built around it.