XML and Web Services: Understanding SOAP
- Understanding SOAP
- SOAP Basics
- Messaging Framework
- The SOAP Encoding
- Transport Options
- Summary
Some technologies, such as MP3, serve a very specific and well-defined purpose. MP3 is an audio file format specific for audio information, whereas XML, on the other hand, is a versatile technology that is used in a variety of solutions, including audio, voice, and data.
One of those solutions is the specific file format for application integration that is associated with Web services. As you will see, there have been several proposals to use XML in the field of Web services, but one of the most promising standards is SOAP, the Simple Object Access Protocol. This article introduces you to the SOAP protocol.
History of SOAP
SOAP connects two fields that were previously largely unrelated: application middleware and Web publishing.
Consequently, depending on whether your background is in middleware or Web publishing, you might understand SOAP slightly differently. Yet it's important to realize that it is neither pure middleware nor is it pure Web publishing; it really is the convergence of the two.
The best approach to understanding the dual nature of SOAP is a historical one. If you review the concepts and trends that led to the development of SOAP, you will be better prepared to study it.
RPCs and Middleware
One of the goals of SOAP is to use XML to enable remote procedure calls (RPCs) over HTTP. Originally, RPC was developed by the Open Group (http://www.opengroup.org) as part of its Distributed Computing Environment (DCE).
When writing distributed applications, programmers spend a disproportionate amount of time implementing network protocols: opening and closing sockets, listening on ports, formatting requests, decoding responses, and more. RPC offers an easier alternative. Programmers simply write regular procedure calls and a pre-compiler generates all the protocol-level code to call those procedures over a network.
Even if you have never used RPC, you might be familiar with its modern descendants: CORBA (Common Object Request Broker Architecture), DCOM (Distributed Component Object Model), and RMI (Remote Method Invocation). Although the implementations differ (and they are mostly incompatible), CORBA, DCOM, and RMI offer what is best described as an enhanced, object-oriented mechanism of implementing RPC functionality.
Listing 1 is the interface to a remote server object that uses RMI. As you can see, it's not very different from a regular interface. The only remarkable aspect is that it extends the java.rmi.Remote interface and every method can throw java.rmi.RemoteException exceptions.
Listing 1RemoteBooking.java
package com.psol.resourceful; import java.util.Date; import java.rmi.Remote; import java.rmi.RemoteException; public interface RemoteBooking extends Remote { public Resource[] getAllResources() throws RemoteException; public Resource[] getFreeResourcesOn(Date start, Date end) throws RemoteException; public void bookResource(int resource, Date start, Date end, String email) throws RemoteException; }
Where's the network code? There is none beyond what's necessary to extend the Remote interface. That's precisely the beauty of middleware: All you have to do is designate certain objects as remote and the middleware takes care of all the networking and protocol aspects for you. The way you designate remote objects varies depending on the actual technology (CORBA, RMI, or DCOM) you're using.
The Downside of Middleware
It's not all rosy with middleware, though. It has been successfully implemented on private networks (LANs, intranets, and the like) but has not been so successful on the Internet at large.
One of the issues is that middleware uses its own protocols and most firewalls are configured to block non-HTTP traffic. You have to reconfigure your firewall to authorize those communications. Oftentimes those changes prove incompatible with the corporate security policy.
Another issue is that middleware successfully addresses only one half of the equation: programming. It's not as good with the other half: deployment. Middleware significantly reduces the burden on the programmer writing distributed applications, but it does little to ease the deployment. In practice, it is significantly easier to deploy a Web site than to deploy a middleware-based application.
Most organizations have invested in Web site deployment. They have hired and trained system administrators that deal with the numerous availability and security issues. They are therefore reluctant to invest again in deploying another set of servers.
As you will see in a moment, SOAP directly addresses both issues. It borrows many concepts from middleware and enables RPC, but it does so with a regular Web server, which lessens the burden on system administrators.
RSS, RDF, and Web Sites
In parallel, the World Wide Web has evolved from a simple mechanism to share files over the Internet into a sophisticated infrastructure. The Web is universally available, and it is well understood and deployed in almost every companysmall and large. The Web's success traces back to the ease with which you can join. You don't have to be a genius to build a Web site, and Web hosts offer a simple solution to deployment.
Obviously, the Web addresses a different audience than middleware, because it is primarily a publishing solution that targets human readers. RPC calls are designed for software consumption.
Gradually the Web has evolved from a pure human publishing solution into a mixed mode where some Web pages are geared toward software consumption. Most of those pages are built with XML.
RSS Documents
RSS is a good example of the using XML to build Web sites for software rather than for humans. RSS, which stands for RDF Site Summary format, was originally developed by Netscape for its portal Web site. An RSS document highlights the main URLs in a Web vocabulary. Listing 2 is a sample RSS document.
Listing 2index.rss
<?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/"> <channel rdf:about="http://www.marchal.com/index.rss"> <title>marchal.com</title> <link>http://www.marchal.com</link> <description> Your source for XML, Java and e-commerce. </description> <image rdf:resource="http://www.marchal.com/images/buttons/ marchal.jpg"/> <items> <rdf:Seq> <rdf:li resource="http://www.marchal.com/go/xbe"/> <rdf:li resource="http://www.pineapplesoft.com/newsletter"/> </rdf:Seq> </items> </channel> <image rdf:about="http://www.marchal.com/images/buttons/marchal.jpg"> <title>marchal.com</title> <link>http://www.marchal.com</link> <url>http://www.marchal.com/images/buttons/marchal.jpg</url> </image> <item rdf:about="http://www.marchal.com/go/xbe"> <title>XML by Example</title> <link>http://www.marchal.com/go/xbe</link> <description> Introduction to XML. Discover the practical applications of XML, and see examples that include e-Commerce and SOAP. </description> </item> <item rdf:about="http://www.pineapplesoft.com/newsletter"> <title>Pineapplesoft Link</title> <link>http://www.pineapplesoft.com/newsletter</link> <description> A free email magazine. Each month it discusses technologies, trends, and facts of interest to web developers. </description> </item>
</rdf:RDF>
As you can see, Listing 2 defines a channel with two items and one image. The two items are further defined with a link and a short description. The portal picks this document up and integrates it into its content.
Other applications of RSS include distributing newsfeeds. The items summarize the news and link to articles that have more details. See http://www.moreover.com for an example.
Although they are hosted on Web sites, RSS documents differ from plain Web pages. RSS goes beyond downloading information for browser rendering. A server downloads the RSS file and most likely integrates it in a database.
Making Requests: XML-RPC
The next logical step is to merge middleware with XML and the Web. How to best characterize the result depends on your point of view. To the Web programmer, adding XML to Web sites is like enhancing Web publishing with a query/response mechanism. But to the middleware programmer, it appears as if middleware had been enhanced to be more compatible with the Web and XML.
This is another illustration of XML being used to connect two fields (Web publishing and middleware) that were previously unrelated.
One of the earliest such implementations is probably XML-RPC. From a bird's-eye view, XML-RPC is similar to regular RPC, but the binary protocol used to carry the request on the network has been replaced with XML and HTTP.
Listing 3 illustrates an XML-RPC request. The client is remotely calling the getFreeResourcesOn(). The equivalent call in Java would have been written as:
BookingService.getFreeResourcesOn(startDate,endDate);
As you can see in Listing 3, XML-RPC packages the call in an XML document that is sent to the server through an HTTP POST request.
Listing 3An XML-RPC Request
POST /xmlrpc HTTP/1.0 User-Agent: Handson (Win98) Host: joker.psol.com Content-Type: text/xml Content-length: 468 <?xml version="1.0"?> <methodCall> <methodName>com.psol.resourceful.BookingService. getFreeResourcesOn</methodName> <params> <param> <value> <dateTime.iso8601>2001-01-15T00:00:00</dateTime.iso8601> </value> </param> <param> <value> <dateTime.iso8601>2001-01-17T00:00:00</dateTime.iso8601> </value> </param> </params> </methodCall>
Without going into all the details, the elements in Listing 3 are
methodCall, which is the root of the RPC call;
methodName, which states which method is to be called remotely;
params, which contains one param element for every parameter in the procedure call;
param, to encode the parameters;
value, an element that appears within param and holds its value;
dateTime.iso8601, which specifies the type of the parameter value.
XML-RPC defines a handful of other types, including:
i4 or int for a four-byte signed integer;
boolean, with the value of 0 (false) or 1 (true);
string, a string;
double, for double-precision signed floating point numbers;
base64, for binary streams (encoded in base64).
XML-RPC also supports arrays and structures (also known as records) through the array and struct elements.
Note one major difference between Listing 3 and Listing 2: the former is a request made to a server. XML-RPC goes beyond downloading files; it provides a mechanism for the client to send an XML request to the server.
Obviously, the server reply is also encoded in XML. It might look like Listing 4.
Listing 4An XML-RPC Encoded Response
HTTP/1.0 200 OK Content-Length: 485 Content-Type: text/xml Server: Jetty/3.1.4 (Windows 98 4.10 x86) <?xml version="1.0"?> <methodResponse> <params> <param> <value> <array> <data> <value><string>Meeting room 1</string></value> <value><string>Meeting room 2</string></value> <value><string>Board room</string></value> </data> </array> </value> </param> </params> </methodResponse>
From XML-RPC to SOAP
XML-RPC is simple and effective, but early on its developers (Microsoft, Userland, and Developmentor) realized that they could do better.
Indeed XML-RPC suffers from four serious flaws:
There's no clean mechanism to pass XML documents themselves in an XML-RPC request or response. Of course the request (or response) is an XML document, but what if you issue a call to, say, a formatter? How do you pass the XML document to the formatter? As you have seen, "XML document" is not a type for XML-RPC. In fact, to send XML documents, you would have to use strings or base64 parameters, which requires special encoding and is therefore suboptimal.
There's no solution that enables programmers to extend the request or response format. For example, if you want to pass security credentials with the XML-RPC call, the only solution is to modify your procedure and add one parameter.
XML-RPC is not fully aligned with the latest XML standardization. For example, it does not use XML namespaces, which goes against all the recent XML developments. It also defines its own data types, which is redundant with Part 2 of the XML schema recommendation;
XML-RPC is bound to HTTP. For some applications, another protocol, such as Simple Mail Transfer Protocol (SMTP, the email protocol), is more sensible.
With the help of IBM, the XML-RPC designers upgraded their protocol. The resulting protocol, SOAP, is not as simple as XML-RPC, but it is dramatically more powerful. SOAP also broadens the field to cover applications that are not adequately described as remote procedure calls.
NOTE
Does SOAP make XML-RPC irrelevant? Yes and no. Most recent developments take advantage of SOAP's increased flexibility and power, but some developers still prefer the simpler XML-RPC protocol.
Listing 5 is the SOAP equivalent to Listing 3. Decoding the SOAP request is more involved than decoding an XML-RPC request, so don't worry if you can't read this document just yet. You learn how to construct SOAP requests in the next section.
Listing 5A SOAP Request
POST /soap/servlet/rpcrouter HTTP/1.0 Host: joker.psol.com Content-Type: text/xml; charset=utf-8 Content-Length: 569 SOAPAction: "http://www.psol.com/2001/soapaction" <?xml version='1.0' encoding='UTF-8'?> <SOAP-ENV:Envelope xmlns:xsd="http://www.w3.org/1999/XMLSchema" xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"> <SOAP-ENV:Body> <ns1:getFreeResourcesOn xmlns:ns1="http://www.psol.com/2001/resourceful" SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <start xsi:type="xsd:timeInstant">2001-01-15T00:00:00Z</start> <end xsi:type="xsd:timeInstant">2001-01-17T00:00:00Z</end> </ns1:getFreeResourcesOn> </SOAP-ENV:Body> </SOAP-ENV:Envelope>
Listing 6 is the reply, so it's the SOAP equivalent to Listing 4. Again, don't worry if you don't understand this listing; you will learn how to decode SOAP requests and responses in a moment.
Listing 6A SOAP Response
HTTP/1.0 200 OK Server: Jetty/3.1.4 (Windows 98 4.10 x86) Servlet-Engine: Jetty/3.1 (JSP 1.1; Servlet 2.2; java 1.3.0) Content-Type: text/xml; charset=utf-8 Content-Length: 704 <?xml version='1.0' encoding='UTF-8'?> <env:Envelope xmlns:xsd="http://www.w3.org/1999/XMLSchema" xmlns:env="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"> <env:Body> <ns1:getFreeResourcesOnResponse xmlns:ns1="http://www.psol.com/2001/resourceful" env:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <return xmlns:ns2="http://schemas.xmlsoap.org/soap/encoding/" xsi:type="ns2:Array" ns2:arrayType="ns1:String[3]"> <item xsi:type="xsd:string">Meeting room 1</item> <item xsi:type="xsd:string">Meeting room 2</item> <item xsi:type="xsd:string">Board room</item> </return> </ns1:getFreeResourcesOnResponse> </env:Body> </env:Envelope>