The Road to Web Services
Most things in software engineering happen for a very good reason—or, at least, we hope so. This is how concepts such as abstraction and encapsulation have become mainstream. As with any technology, we learn from past mistakes and capitalize upon our successes.
Web Services are no different. They have materialized from a variety of Web technologies that have been proven to work in the widely distributed environment of the Internet. The Internet itself has evolved over the years, spawning many new ideas and concepts that have contributed to the Web Service approach.
Waves of the Internet
Since the beginning of the Internet, many changes have come about in networking technology, security, system scalability, and many other areas of distributed computing. Overall, we believe that the Internet has succumbed to four major waves of development.
The first wave of the Internet started around the 1970s with some very important government research, specifically the Defense Advanced Research Projects Agency (DARPA). This is where Transmission Control Protocol/Internet Protocol (TCP/IP) was born. Its goal was to interconnect computer systems through a complex architecture of networks and subnetworks. Over the years, a variety of physical networks (such as Ethernet) and routing technologies evolved to the point that, in 1990, more than 200,000 computers were interconnected on the Internet.
Although Internet connectivity was one of the most significant achievements in computing, it meant very little without applications to drive it. This is where the second wave of the Internet began (roughly in the 1980s). Tools such as FTP and Telnet gained in popularity by allowing system users to remotely access other computers. Although the tools were crude, compared to today's standards, the underlying protocols that they used where quite elegant.
In the early 1990s, the Internet began to seep into more sophisticated applications and led to the dawn of the Web, which marks the third wave. Browsers—and eventually Java applets—allowed the general consumer to experience interconnected communities of users. Of course, where there are consumers, there are vendors. This spawned more electronic business opportunities, as evidenced by the plethora of electronic storefronts and shopping carts.
All this, of course, has brought about the fourth wave of the Internet, which is the focus of this book—Web Services. Here, the goal is for multiple diverse applications to communicate so that they can execute some task. Not only does this improve the user's experience, but it also offers the ability for you to integrate functionality at a much lower cost than developing it all yourself. From the user's standpoint, all of this is orchestrated from a single application. But behind the scenes, one or more additional applications will likely participate. The key is that the applications work while remaining oblivious to vendor-specific technologies being used by the participating services.
Looking back, each wave introduced new Internet standards that facilitated the next wave of development.
Internet Standards
In April of 1969, the first Request for Comments (RFC) was published at UCLA (RFC1), and thus began the process of sharing ideas in computing for a much greater cause. Then, in 1986, the Internet Engineering Task Force (IETF) was officially created. Its charter was (and still is) to evolve the architecture of the Internet using open contributions from the research and development community.
After inventing the Web, Tim Berners-Lee decided to create the World Wide Web Consortium (W3C) in October of 1994. The W3C's purpose is to promote interoperability and open forum discussions about the Web and its protocols.
These organizations have led the way to standardization, a process that has resulted in a strong foundation for the Web Service infrastructure.
Several standards are prevalent in current Internet development. Some have existed for years, and others are relatively new, and not necessarily standards. This section summarizes these technologies and shows how they apply to Web Services.
HTTP and SMTP
As stated before, TCP/IP is the foundation of Internet communication protocols. However, TCP/IP without an application is a little like a car without a driver. How an application uses TCP/IP also determines the semantics of that application's protocol.
Application protocols such as HTTP and Simple Mail Transfer Protocol (SMTP) already have predefined semantics and behavior that determine how they should be used. For HTTP, the semantic implies a request/response model designed to serve Web resources such as HTML or JPG files. SMTP, on the other hand, implies a one-way request/acknowledge semantic designed to transmit text-based email messages in a fire-and-forget manner.
NOTE
Fire-and-forget is a military warfare term that has been overloaded for networking purposes. We use it to describe the process of sending a message, where the sender does not require any acknowledgement that the recipient actually received the message.
In the context of Web Services, these application protocols are used to carry additional semantics, such as those specified by SOAP. SOAP in turn, provides a way for you to define your application semantics that are also carried over these application protocols. This layering of semantics just re-emphasizes the flexibility of Web Service protocols.
Because HTTP is the dominant protocol being used for Web Services, let's take a quick look at the two most common aspects of HTTP being exploited, the GET and POST verbs.
The following is a sample HTTP GET request:
GET /default.htm HTTP/1.1 Accept: text/* Host: http://www.mcp.com {CR}{LF}
In this case, the client is requesting that the http://www.mcp.com server return the default.htm resource. The client would like to use version 1.1 of the HTTP protocol in this transaction and is willing to accept the resource as some form of text. Notice that the message is terminated by the carriage return/line feed pair following the message.
Many times you need to provide application-specific information to the server that is not represented in the semantics of the HTTP protocol. For instance, you can pass parameters on the URL, as shown:
GET /GetStockQuote.asp?symbol=MSFT HTTP/1.1 Accept: text/* Host: http://www.mcp.com {CR}{LF}
Given this sample request for obtaining stock quote information, the symbol parameter and its value are passed on the query string, and you can expect the server to respond with some form of HTML, such as the following:
HTTP/1.1 200 OK Content-type: text/html <html><head><title>Microsoft Stock Price</title></head> <body> <b>Microsoft: 80.75</b> </body> </html>
The server's response contains the HTTP version, a status code and message, and the content type that is associated with the related payload following the carriage return/line feed pair.
However, using the GET request is less than optimal when dealing with large and complex parameters. Certain HTTP implementations and older firewalls have been known to truncate URLs based on poorly chosen size limits. URLs also undergo encoding for a large number of characters, which complicates processing procedures.
Instead, we can use the HTTP POST verb, which places information within the Body of the request message:
POST /GetStockQuote.asp HTTP/1.1 Accept: text/* Host: http://www.mcp.com Content-type: text/xml Content-length: nnnn <Symbol>MSFT</Symbol>
Similar to the GET verb, the POST verb also identifies a resource, the version of HTTP, and the content that it expects to receive. However, two additional HTTP fields are provided—Content-type and Content-length. The two new fields refer to the remainder of the message, which, in this case, happens to carry an XML message.
Here you can see that we are no longer bound by limitations of the URL there is less character encoding taking place, and we are able to transmit more complicated payloads—this is by far the most helpful aspect of HTTP when used in the Web Service model.
NOTE
Although we generally recommend that you use the POST method, in some situations using GET might make sense. GET is best used for simple semantics that require only minimal message structure and in situations when client applications don't have control over the POST payload.
An example of this is with the Visual Studio.NET test pages that are generated with your Web Services. Because the browser's POST feature does not allow you to place customized information in the message Body, the GET verb comes in very handy.
Also note that your test pages won't support complicated parameter types (neither with the WSDL for HTTP GET or POST), so you'll be required to manually build a client that can exercise your service in this case.
eXtensible Markup Language (XML)
Although XML has such a wide variety of uses, it makes a great foundation for Web Services for many reasons:
In a world where global business arrangements are becoming the norm, XML natively supports different character sets through Unicode (UTF-8, UTF-16, and so on).
XML promotes interoperability in a platform-agnostic way by promoting the convergence of information to a common, vendor-neutral state.
Probably most importantly, XML is simple.
As we briefly mentioned before, XML is really just syntax for application semantics that you must define. To define semantics and the associated XML syntax for your semantics, you need to establish the appropriate structure and restrictions of your markup—you do this through a schema.
XML Schemas
When you want to describe (and possibly validate) an XML document, you might use a Document Type Definition (DTD), such as this one:
<!-- House DTD --> <!ELEMENT house (address)> <!ATTLIST house bedrooms CDATA #REQUIRED bathrooms CDATA #REQUIRED> <!ELEMENT address (street, city, state, zip)> <!ELEMENT street (#PCDATA)> <!ELEMENT city (#PCDATA)> <!ELEMENT state (#PCDATA)> <!ELEMENT zip (#PCDATA)>
As you can see, though, DTDs are limited by their somewhat cryptic syntax and lack of type checking.
In February 2001, the XML Schema specification was promoted to Recommendation status by the W3C. XML schemas not only encapsulate the same feature-function as DTDs, but they also offer complex type checking, all wrapped up in an XML language. This makes XML schemas the preferred method of describing all forms of XML documents, including XML messages. The same example is shown in XML Schema format, as follows:
<!-- House XML Schema --> <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:complexType name="AddressType"> <xsd:sequence> <xsd:element name="street" type="xsd:string" /> <xsd:element name="city" type="xsd:string" /> <xsd:element name="state" type="xsd:string" /> <xsd:element name="zip" type="xsd:decimal" /> </xsd:sequence> <xsd:attribute name="bedrooms" type="xsd:positiveInteger" use="required" /> <xsd:attribute name="bathrooms" type="xsd:positiveInteger" use="required"/> </xsd:complexType> <xsd:element name="house" type="AddressType" /> </xsd:schema>
SOAP
Although SOAP isn't officially an Internet standard, it has been widely adopted by the Internet community, including the Electronic Business XML (ebXML) organization, for its transport and routing layer.
To summarize SOAP in a single word, packaging is the most appropriate description. Many developers have created ad-hoc approaches for sending XML messages between their applications. The creators of XML-RPC took the concept to the next level, by exposing a publicly available specification for XML messaging. Taking XML-RPC a step further, SOAP basically defines a standard yet extensible way to wrap information in XML so that both ends of the connection (and potentially everything in between) can understand how to open this package. Let's take a quick look at a SOAP request message:
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Header> <t:transId xmlns:t="http://www.mcp.com/trans"> 87654 </t:transId> </SOAP-ENV:Header> <SOAP-ENV:Body> <m:GetStockQuote xmlns:m="http://www.mcp.com/stock"> <Symbol>MSFT</Symbol> </m:GetStockQuote> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
This simple message shows the general packaging of a SOAP message. The Envelope contains an optional Header and a mandatory Body. The Header is used for out-of-band information that doesn't necessarily apply to the semantics of the message Body. The Body is used to carry the application-specific message content. This is definitely not all that SOAP represents, but it captures the spirit of what SOAP set out to accomplish. We'll leave it to Chapter 4, ".NET Web Services and SOAP," to provide the gory details about the remainder of the SOAP protocol.
WSDL and UDDI
WSDL is the description language that is used to describe how software must interact with a particular Web Service. Clients use WSDL documents to understand the logical structure and the syntax of a Web Service. WSDL also provides message exchange patterns, service bindings, and references to the location of a service.
Growing in popularity, Universal Description, Discovery, and Integration (UDDI) is one way for the publish, find, and bind process to be accomplished. UDDI servers allow WSDL to be published and propagated across the Internet so that clients can ultimately consume a given service.
Although neither WSDL nor UDDI has been standardized, the industry is giving both the most attention of any similar mechanisms. A great deal more about WSDL and UDDI will be explained in Chapter 5, "Web Services and Description and Discovery."