Applied SOAP: XML and Web Service Implementation
Introduction
XML is critical to Web Services. It's a fundamental technology; without it, you wouldn't have Web Servicesat least, not as they're implemented today.
This is true for two main reasons. First, XML is loosely coupled. That is, XML enables you to loosely bind the client and server, making it easier to change one or both, as well as add versioned information after the initial release. Second, XML is easily interpreted, making it highly interoperable. XML is simply text, so if you can interpret text on your computer system, you can interpret XML. You'll find XML in use in nearly every contemporary computer system available today, and perhaps even in some venerable systems that you used yesteryear. There is a great deal more to it than that, but, in a nutshell, XML is just text in a file or network packet.
NOTE
My observation that XML is merely text is inarguably true. However, XML is becoming increasingly Unicode-aware. Most of the XML documents that you'll see in use today are encoded using either UTF-8 (single-byte Unicode) or UTF-16 (traditional 2-byte Unicode). Therefore, your system needs Unicode capability to truly handle these contemporary XML documents. .NET provides you with this capability, even if you're running on a 16-bit version of Windows, such as Windows 98.
The goal of this chapter is to introduce you to the fundamentals of XML and to dig deeper into XML to show how it plays a role in Web Service implementation. If you're new to XML, Appendix D, ".NET Web Service Resources," points to alternative sources of information that you can examine on your own. If you do know something about XML, you might want to scan this chapter for information that's new to you and skip the parts that you know well. With luck, you'll find something interesting and new, and you'll expand your understanding of XML, with both Web Services and .NET's handling of XML in general.
The true goal of this chapter is to make sure that you understand how XML fits into Web Service processing and to show you how to work with XML using the .NET class hierarchy. You might find that you'll need to modify the XML that your Web Service uses for some reason. If so, you'll use what you've learned from this chapter. We'll start with a discussion of XML as a wire representation, which should explain why XML is so important to Web Services.
XML as a Wire Representation
Whenever you transmit information over a network, that information is ultimately transformed from a binary representation in your computer's memory into another representation that was designed for network use. Perhaps the network representation is a highly efficient binary one, or perhaps its purpose is more general. In any case, you'll probably find that the information's binary format in memory is very different than its network form. The form that it takes for network transmission is called its wire representation. This is the form that the data takes when it's transmitted over the network, no matter what the network medium is.
Wire representations, and protocols in general, are often designed to meet specific design criteria, a few of which are listed here:
- Compactness
- Protocol efficiency
- Coupling
- Scalability
- Interoperability
Compactness refers to how terse the network packet becomes while still conveying the same information. Small is usually best. Protocol efficiency is related to compactnessyou rate efficiency by examining the overhead required to send the payload. The more overhead you require, the less efficient the protocol is.
A protocol's coupling, loose or tight, tells you how flexible consumers of the protocol will be if you change things. Loosely coupled protocols are quite flexible and easily adapt to change, while tightly coupled protocols will most likely require significant modifications to both the server and existing clients. Tightly coupled protocols, for example, are those that require (or force) such things as the same in-memory representation or the same processor type to avoid endian issues (byte ordering in multibyte values). Loosely coupled protocols avoid this altogether by abstracting the information to a degree that makes the byte order irrelevant. Converting an integer represented in big or little endian looks the same when represented as a string. The byte order conversion is made by the software handling the protocol, not by the protocol itself.
Scalability addresses the protocol's capability to work with a large number of potential recipients. Some protocols are limited to a handful of consumers, while others handle millions of users easily.
Finally, interoperability speaks to the protocol's acceptance on a variety of computing platforms. Will you have to issue network packets to one specific operating system or platform, or is the protocol a bit more general-purpose, enabling you to send information to a wider variety of systems?
Protocols, including XML, generally lie within a continuum of these characteristics. Highly efficient protocols tend to not scale well. Interoperable protocols tend to scale well but are often not as efficient as proprietary protocols. No single protocol does it all, and network engineers often make design decisions based upon these and many other criteria. Which protocol you choose depends upon where the protocol falls in the continuum.
XML is both loosely coupled and highly interoperable. In fact, XML is so interoperable that it is nearly ubiquitous. You can send XML to anyone on the planetnot only will that person receive the information, but he'll also be able to interpret and make use of it in nearly every case.
With XML, you pay for loose coupling and interoperability with protocol efficiency and compactness. XML is actually rather inefficient, although you can tailor this by judiciously choosing your element names (fewer characters in element names often yields more efficient XML, even though terseness of the XML was not a design goal of the inventors). Nor is XML particularly compact. XML is text, and, as with any text document, you might be either conservative with your expressiveness or creative yet a bit more verbose. In either case, you are still left with a loosely coupled and interoperable mechanism for encoding information to be transmitted over the wire.
XML and Loose Coupling
If XML isn't terribly efficient, why use it? Because XML is simply text formatted in a specific way. You could make the same argument about HTTP, SMTP, POP3, NNTP, and a host of other such Internet protocols, but they've also proven to be successful. XML works, and it works wellwhen you transmit information using XML, you're sending text rather than a proprietary protocol that has arbitrary design limitations. DCOM, for example, uses a fairly efficient wire representation but requires the object's server to keep track of the client. To do this, the DCOM client issues "I'm still here" messages to the server every two minutes. After three periods without one of these ping messages, the server chops off the object's head and reclaims the resources (memory and such). But imagine making changes to the DCOM wire representation and then trying to field that worldwide... .
Because XML is nothing more than text, with no claims made to object status or association, changes to your XML are rather easily accepted. You still send and receive text; it's just that the element names might have changed or the document layout might have been slightly altered.
XML and Interoperability
The fact that XML is so loosely coupled has undoubtedly been a contributing factor to its wide acceptance throughout the Internet. Nearly every major computing platform available today has some capability to accept and interpret XML. XML is textand, if your computer can handle text, your computer can handle XML. Palm PCs, cellular telephones, desktop and laptop PCs, and the largest of mainframes all have XML processing capability.
If this is the case, you should expect to be able to send anyone an XML document and know that the recipient can make use of the information. Because this is so, SOAP architects chose XML as the encoding mechanism for the SOAP protocol. XML is a rich and expressive technology that readily lends itself to method parameter serialization. Best of all, everyone understands what has been serialized, regardless of computing platform, and can make use of it.
As you probably already know, it's one thing to create the XML document. It's quite another to interpret the contents using an automated system. You'll require some mechanism to reach into the XML to extract the portions of interest. That mechanism is called XPath. Here we'll move away from "Why is XML good?" toward "How do I use XML?"