The vast majority of all Internet traffic today is data transferred using HTTP (HyperText Transfer Protocol), mostly by people browsing the World Wide Web. HTTP is ubiquitous because it is supported by an extensive, long-established infrastructure of servers and browsers. The inventors of SOAP took note of this infrastructure and shrewdly designed SOAP so that every message can be carried as the payload of an HTTP message. This “tunneling” has been fundamental to SOAP's rapid adoption and unprecedented success.
It's possible to deliver SOAP messages using other protocols, such as SMTP and FTP as well, but details of these non-HTTP bindings are not specified by SOAP and are not supported by the BP, so this book discusses SOAP over HTTP only.
SOAP messages sent over HTTP are placed in the payload of an HTTP request or response, an area that is normally occupied by form data and HTML. HTTP is a Request/Response protocol, which means that the sender expects a response (either an error code or data) from the receiver. HTTP requests are typified by the messages that your browser sends to a Web server to request a Web page or submit a form. A request for a Web page is usually made in an HTTP GET message, while submission of a form is done with an HTTP POST message.
There is nothing intrinsic to HTTP that limits it to requesting Web pages, but that's been its primary occupation for the past decade. Most HTTP traffic is composed of HTTP GET requests and HTTP replies. The HTTP GET request identifies the Web page requested and may include some parameters. An HTTP reply message returns the Web page to the requester as its payload.
While the HTTP GET request is perfectly suited for requesting Web pages, it doesn't have a payload area and therefore cannot be used to carry SOAP messages. The HTTP POST request, on the other hand, does have a payload area and is perfectly suited to carrying a SOAP message. HTTP reply messages, whether they are replies to GET or POST messages, follow the same format and carry a payload. Web services that use SOAP 1.1 with HTTP always use HTTP POST and not HTTP GET messages.
4.7.1 Transmitting SOAP with HTTP POST Messages
Sending a SOAP message as the payload of an HTTP POST message is very simple. Listing 4-26 shows the BookQuote SOAP message embedded in an HTTP POST message.
Listing 4-26 A SOAP Request over HTTP
POST /jwsbook/BookQuote HTTP/1.1 Host: www.Monson-Haefel.com Content-Type: text/xml; charset="utf-8" Content-Length: 295 SOAPAction="" <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:mh="http://www.Monson-Haefel.com/jwsbook/BookQuote"> <soap:Body> <mh:getBookPrice> <isbn>0321146182</isbn> </mh:getBookPrice> </soap:Body> </soap:Envelope>
The HTTP POST message must contain a SOAPAction header field, but the value of this header field is not specified. The SOAPAction header field can improve throughput by providing routing information outside the SOAP payload. A node can then do some of the routing work using the SOAPAction, rather than having to parse the SOAP XML payload.
While the SOAPAction header field can improve efficiency, it's also the source of a lot of debate in the Web services industry. SOAP purists don't like the use of the SOAPAction HTTP header field because it expands the SOAP processing model to include the carrier protocol (in this case HTTP). They believe that all of the routing and payload should be contained in the SOAP document, so that SOAP messages are not dependent on the protocol over which they are delivered. This is a creditable argument, so the SOAPAction header field may contain an empty string, as indicated by an empty pair of double quotes. The decision to use a value for the SOAPAction header field is up to the person who develops the Web service. SOAP 1.2 will replace the SOAPAction header with the protocol-independent action media type (a parameter to the "application/soap+xml" MIME type), so dependency on this feature may result in forward-compatibility problems. The BP requires that the SOAPAction header field be present and that its value be a quoted string that matches the value of the soapAction attribute declared by the corresponding WSDL document. If that document declares no soapAction attribute, the SOAPAction header field can be an empty string. Details are provided in Chapter 5: WSDL.
You may have noticed that the Content-Type is text/xml, which indicates that the payload is an XML document. The WS-I Basic Profile 1.0 prefers that the text/xml Content-Type be used with SOAP over HTTP. It's possible to use others (for example, SOAP with Attachments would specify multipart/related) but it's not recommended.
The reply to the SOAP message is placed in an HTTP reply message that is similar in structure to the request message, but contains no SOAPAction header. Listing 4-27 illustrates.
Listing 4-27 A SOAP Reply over HTTP
HTTP/1.1 200 OK Content-Type: text/xml; charset='utf-8' Content-Length: 311 <?xml version="1.0" encoding="UTF-8"?> <soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:mh="http://www.Monson-Haefel.com/jwsbook/BookQuote" > <soap:Body> <mh:getBookPriceResponse> <result>24.99</result> </mh:getBookPriceResponse> </soap:Body> </soap:Envelope>
4.7.2 HTTP Response Codes
Although SOAP faults provide an error-handling system in the SOAP context, you must also understand HTTP response codes, which indicate the success or failure of an HTTP request. In Listing 4-27 you'll notice that the first line of text is HTTP/1.1 200 OK. The HTTP/1.1 portion indicates the version of HTTP used. Although HTTP 1.1 is the preferred protocol, you may also use HTTP 1.0.BP The rest of the line, 200 OK, is the HTTP response code.
HTTP defines a number of success and failure codes that can be included in an HTTP reply message, but the BP takes special care to specify exactly which codes can be used by conformant SOAP applications. The types of response codes used depend on the success or failure of the SOAP request and the type of messaging exchange pattern used, Request/Response or One-Way.
4.7.2.1 Success Codes
The 200-level HTTP success codes are used to indicate that a SOAP request was received or successfully processed. The 200 OK and 202 Accepted HTTP success codes are used in Web services.
200 OK When a SOAP operation generates a response SOAP message, the HTTP response code for successful processing is 200 OK. This response code indicates that the reply message is not a fault, that it does contain a normal SOAP response message.
202 Accepted This response code means that the request was processed successfully but that there is no SOAP response data. This type of SOAP operation is similar to a Java method that has a return type of void.
Although a One-Way SOAP message is conceptually unidirectional, when it's sent over HTTP some type of HTTP reply will be transmitted back to the receiver. One-Way SOAP messages do not return SOAP faults or results of any kind, so the HTTP 202 Accepted response code indicates only that the message made it to the receiver—it doesn't indicate whether the message was successfully processed.BP
4.7.2.2 Error Codes
In general, HTTP uses the 400-level response codes to indicate that the client made some kind of error when transmitting the message. For example, you have undoubtedly encountered the infamous 404 Resource Not Found error when using a Web browser. The 404 error code signifies that the client attempted to access a Web page or some other resource that doesn't exist. Web services uses a specific set of 400-level codes when the error is related to the contents of the SOAP message itself, rather than the HTTP request. HTTP also uses the 500-level response codes to indicate that the server suffered some type of failure that is not the client's fault.
400 Bad Request This error code is used to indicate that either the HTTP request or the XML in the SOAP message was not well formed.
405 Method Not Allowed If a Web service receives a SOAP message via any HTTP method other than HTTP POST, the service should return a 405 Method Not Allowed error to the sender.
415 Unsupported Media Type HTTP POST messages must include a Content-Type header with a value of text/xml. If it's any other value, the server must return a 415 Unsupported Media Type error.
500 Internal Server Error This code must be used when the response message in a Request/Response MEP is a SOAP fault.
4.7.3 Final Words about HTTP
HTTP provides a solid bedrock on which to base SOAP messaging. HTTP is ubiquitous, well understood, and widely supported. That said, HTTP has its detractors. For example, Don Box has characterized HTTP as the “cockroach of the Internet,” to convey his view that it's an undesirable protocol that can't easily be done away with. The fact that modern firewalls do not restrict HTTP traffic on port 80 makes HTTP convenient for accessing servers and clients behind firewalls—which are a major impediment to distributed computing. Of course this introduces security issues because we are effectively circumventing the firewalls that help keep organizations safe from malicious hackers. It seems likely that firewall vendors will not permit “tunneling” to go on forever. Eventually they will feel compelled to enhance firewall products so that they will filter for, and block, HTTP communications that carry SOAP messages.
Blocking SOAP messages at the firewall is not necessary, however. Because SOAP is a transparent protocol (it's simple text rather than opaque data), a firewall can easily inspect the contents and route the message to a SOAP-specific security processor.
HTTP is not the only protocol over which you can send SOAP messages. You can also use SMTP (e-mail) and raw TCP/IP. The WS-I may one day extend the BP to include these other protocols—but for now HTTP is the only protocol endorsed by the WS-I.