The SOAP Body
If you're using a network protocol at all, you're probably using it to send data from one network address to another. Some overhead is associated with any network protocol, and SOAP is no exception. In general, though, the protocol should efficiently convey the data from one network location to the other. This is the primary responsibility of the SOAP Body object, the workhorse of the SOAP protocol.
If you're using SOAP in the general case, such as for messaging purposes, the SOAP specification loosely defines the arrangement of items in the Body. On the other hand, if you are using SOAP for RPC purposes, the SOAP specification cranks down the serialized XML and describes it in very precise terms.
When the SOAP specification was introduced, tightly constrained RPC serialization made perfect sense. If you and someone else wanted to share remote methods, you had to agree in no uncertain terms on how the method's parameters would be arranged within the XML document. Otherwise, how would you know what you were getting? (Remember, the intention was to automate the process.)
Today, the hefty SOAP RPC serialization rules have less impact because the WSDL explicitly tells the recipient how the XML is formatted. As a result, you can make simplifications to the parameter serialization, at least if you're dealing with a WSDL-enabled site. If you're not, you (and .NET) will need to revert to the SOAP RPC encoding rules. This section is important to you if you're at all interested in interoperability, at least in the near term (until WSDL 1.1 is adopted by the other sites that you intend to deal with). So let's dive inwe'll start with some terms that the SOAP specification provides to describe certain XML constructs and parameter use models.
SOAP Body Serialization Terminology
When you serialize method parameter information that is stored in the local computer's memory into another format for transmission, you often have to adopt a new vocabulary to describe the serialization process or the results. SOAP is no different. The SOAP specification describes several terms that are often used to describe serialized SOAP packets.
Many terms overlap. For example, a simple value will generally be embedded and, therefore, will be single reference and locally scoped. In plain English, that means that the simple value, which would be something like an integer, will be recorded deep within the XML element hierarchy and will only ever be accessed by processes working with its parent or siblings. We're really referring to where in the XML document you'll find the integer value and who can access it. Embedded values are typically sent to the remote system by value, whereas independent elements are often how data is sent by reference, such as data accessed in memory by a pointer or output parameters.
Table 4.1 addresses most of the important SOAP terms. Probably the most important set of terms to understand is the difference between simple and compound values. A simple value is a method parameter value passed by value to the remote method; it is usually an integer, a floating point, or a string value. Compound values, on the other hand, refer to how structures and arrays are serialized in the SOAP XML stream. SOAP differentiates between the structure and the array by identifying how you access the elements within the compound value. Structure members are accessed by name, while array elements are accessed by position. When you see how SOAP encodes method parameter data, you'll understand why these concepts were described in this manner.
Table 4.1 SOAP Terminology
Term |
Definition |
Value |
A string, a name of measurement (number, date, enumeration, and so on), or a composite of several simple types. All values are of specific datatypes. |
Simple value |
A value without named parts (specific strings, enumerations, and so on). |
Compound value |
An aggregation of relations to other values (struct or array). |
Accessor |
A particular related value contained within a compound value (distinguished by either name [struct] or ordinal value [array]). |
Array |
A compound value in which ordinal position distinguishes member values. |
Struct |
A compound value in which accessor name distinguishes member values. |
Simple type |
A class of simple values, such as strings, integers, enumerations, and so on. |
Compound type |
A class of compound values, such as a struct definition that could be filled with different accessor values. |
Locally scoped |
An accessor that has a distinct name within that (compound) type but that is not distinct with respect to other types (such as struct members). |
Universally scoped |
An accessor whose name is based upon a URI, in whole or in part, directly or indirectly, so that the name alone is sufficient to identify the accessor, regardless of the type (that is, qualified by a namespace URI, such as Header entries). |
Single reference |
An accessor that can be only directly accessed (that is, a value passed by a value). It is typically embedded. |
Multireference |
An accessor that can be indirectly accessed, whether accessed in this manner actually or potentially (that is, a value passed by a reference). It is typically independent. |
Embedded |
An element that appears within an independent element (such as XML elements that are grandchildren or that are further descended from the SOAP Header or Body). |
Independent |
An element that appears at the top level of a serialization (such as immediate children of the SOAP Header or Body). |
The SOAP specification itself is relatively easy to read and understand, as long as you know what the verbiage is referring to. Before applying these terms to the serialization of some remote methods, let's look at a couple Body element attributes that you'll find useful.
SOAP Body Attributes
The SOAP Body attributes, like those of the SOAP Header, are usually applied to subordinate XML elements rather than the Body element itself. Some of the attributes are designed specifically for certain serialization conditions, such as for encoding arrays. But two attributes in particular are used quite often when serializing strings or general compound values. These attributes are actually relatedthey're used to link XML elements that aren't necessarily hierarchically related.
The SOAP id and href Attributes
The id and href attributes really come from XLink (http://www.w3.org/TR/xlink/, Section 5.4), which is actually used to link external elements to a given XML document. In SOAP's case, their use is narrowed to link XML elements. An example will best illustrate this.
Let's return to the linked list. Imagine that you have a linked list in your computer's memory that you want to ship to some remote computer for processing. For brevity, assume that it has only two nodes and takes this form (shown in C#):
public class Node { public int iData; public Node pNext; }
Using this class, suppose that you created one node and assigned its iData member to the value 27. Then you created a second node and set its iData member to 54. To link the nodes, you assigned the first node's pNext member so that it referenced the second node, and the second node's pNext pointer was set to null:
Node pHead = new Node(); pHead.iData = 27; Node pTail = new Node(); pTail.iData = 54; pHead.pNext = pTail; pTail.pNext = null;
When this data structure is serialized in a SOAP packet, the SOAP Body would take this form (if serialized according to Section 5 of the SOAP specification, assuming that the remote method is called ConsumeList(Node pNode)):
<soap:Body> <m:ConsumeList xmlns:m="http://tempuri.org"> <pNode href="#node1"/> <m:ConsumeList> <m:Node xmlns:m="www://tempuri.org" soapenc:root="true" id="node1"> <iData>27</iData> <pNext href="#node2"/> </m:Node> <m:Node xmlns:m="www://tempuri.org" id="node2"> <iData>54</iData> <pNext xsi:null="true"/> </m:Node> </soap:Body>
The id and href attributes are italicized for emphasis. Although this is an example of a compound value serialization, something we've not covered yet, it does show how values that are not hierarchically related are associated with each other (few would claim that a linked list was an example of a hierarchical data structure!). The input method parameter pNode "points" to the value for pHead, and pHead's pNext value "points" to pTail, as you'd expect. The # used within the href's attribute value indicates that the linkage is internal to the current document rather than being linked to some external source.
The serialization of this linked list example could actually take many different forms, depending upon the WSDL associated with the particular Web Service. For example, this particular linked list could have been serialized as an array of integers, if you knew this beforehand and could therefore later extract the list. If you're serializing according to SOAP Section 5 rules (used for RPC), then you'll see the SOAP packet serialized very much as you've seen here. Armed with what you've seen so far, let's see how SOAP actually serializes your Web Service data.
SOAP Remote Method Serialization
Although you could arrange things within the SOAP Body nearly any way you want, the SOAP specification gives a precise format for serializing RPC information destined for the remote system. The basic SOAP Request packet format has the method information serialized as the first child of the SOAP Body element, like so:
<soap:Body> <m:ReverseString xmlns:m="http://tempuri.org"> (Embedded single reference parameters here) </m:ReverseString> (Independent multi-reference accessors here) </soap:Body>
The method element is an independent element because it is the immediate child of the SOAP Body opening tag. In fact, for RPC purposes, the method element should also be the first child element to simplify deserialization. Other independent elements may follow (as you've seen with the linked list example previously).
Note that the method element has a namespace. What the method namespace URI text consists of is often immaterial to SOAP. SOAP may care only that it is unique throughout the SOAP packet. (You might care, however, for interoperability purposes, or the URI may be used to arbitrate overloaded methods, in which case SOAP does care.) Because a namespace is applied to this element, it is globally scoped (another one of those SOAP terms).
When the remote server receives this SOAP packet, it extracts the method name, checks to see if it can handle that method, and examines the SOAP XML to see if the required parameters have been provided. Assuming that things are correct, the server then executes the method (reversing a string, in this case) and returns the results of the method. The SOAP Response packet looks a lot like the Request packet:
<soap:Body> <m:ReverseStringResponse xmlns:m="http://tempuri.org"> <return>(Method return value here)</return> (Embedded single reference parameters here) </m:ReverseStringResponse> (Independent multi-reference accessors here) </soap:Body>
As you see, you can still have returned values (embedded or independent multireference), but the method's return value is encoded as the first (embedded) child of the method response element. The XML tag name, in this case, doesn't matteronly the position within the XML document matters. Traditionally, though, this element is named return or response, but this is by convention rather than by specification.
Directed Data Flow
You've probably guessed that data has an associated directional component. It can go from your local computer to the remote computer only; if this is the case, the data is referred to as [in] data. You could be expecting data from the server, which is annotated by the [out] moniker. Data also can go both ways, which is [in, out] data. The notation "in," "out," and ""in, out" comes from the Interface Definition Language attributes that tell the COM serialization code more precisely how to encode the data for transmission. (Of course, in .NET, you don't have COM, but the concepts and markings remain.)
For example, if you're expecting [out] data, the serialization code doesn't need to put anything into the Request packet. You're not sending data to the server; you're expecting the server to return information to you. Thus, the [out] data would be serialized within the Response packet alone.
The C# language has these same concepts, by the way. In C#, you occasionally might employ the ref and out method parameter attributes to optimize the way the Common Language Runtime (CLR) deals with memory and data. C#'s out attribute equates to [out] data, while the ref attribute indicates [in, out] data. If you omit either attribute from the method signature, the data is passed by value to the method as [in] data.
This directed nature has a direct bearing on the serialization of the method parameters. [in] data is by value and, therefore, encoded as embedded elements of the Request method element. [in, out] data is, by nature, multireference and, therefore, is encoded as independent elements within the SOAP Body. [out] data is also independent and multireference, although this reasoning might not at first be apparent. The [out] data is independent and multireference because when you call the remote method, you provide the .NET serialization code with a reference to the variable that it will populate with the [out] data. As soon as you see the word reference, you should think "independent multireference SOAP serialization."
NOTE
This is one place where .NET diverges from the SOAP specification a bit. .NET serializes referenced information as embedded by default. .NET uses SOAP more as a messaging protocol than as a true SOAP RPC protocol, and the fact that the referenced parameters are embedded is noted in the WSDL generated for your Web Service. In other words, what you're seeing in this chapter is how the SOAP specification tells you to serialize RPC information, but this isn't the only way you can do it. It's important to know this because sometimes you might need to interoperate with third-party Web Services that don't emit WSDL 1.1 or that otherwise expect you to provide SOAP RPC-encoded parameter information. At least you'll know what the XML might have to look like to be fully interoperable.
NOTE
.NET serializes information according to Sections 5 and 7 of the SOAP specification, if you need it to do so (again, for interoperability purposes). The .NET SoapRpcMethodAttribute class enables you to force SOAP Specification Sections 5 and 7 encoding rules for specific Web methods. It actually forces the WSDL encoding style to RPC (versus Literal), which triggers the alternative .NET serialization code.
If you want the entire Web Service class to follow Section 5 and 7 encoding rules, you should apply the SoapRpcService attribute to the class as a whole.
You now know how SOAP would have you serialize the method itself. The method element encapsulates the parameter information that you're sending to the Web Service or receiving back as output data. We'll turn now to how SOAP serializes simple parameter values.
SOAP Serialization of Simple Datatypes
Because SOAP is based on XML, SOAP inherently "understands" the simple datatypes noted by the W3 Schema Specification, Part 2 (http://www.w3.org/TR/xmlschema-2/). If you're writing SOAP serialization software, you'll be very interested in these datatypes and their XML representations. In this case, though, .NET handles the details for you, so the entire suite of datatypes isn't shown here. They include all the datatypes that you would expect, such as strings, various kinds of integers and floating-point values, some XML-specific information (such as the QName, or namespace-qualified name), and dates and times.
SOAP rolls the W3 datatypes into a collection that it refers to as the simple datatypes. The other members of this collection include enumerations and byte arrays.
Serializing W3 Datatypes
As a rule, single-reference W3 datatypes are serialized as plain XML elements. The tag names match the parameter name given by the method signature. For example, consider this method:
int Add(int A, int B);
If the values for A and B were 17 and 12, respectively, the corresponding SOAP packet would look very much like this:
<soap:Body> <m:Add xmlns:m="http://tempuri.org"> <A>17</A> <B>12</B> </m:Add> </soap:Body>
A and B are embedded elements (children of the method signature) serialized in the order specified in the method signature, from left to right (A before B, in the XML document order).
In this example, the parameters are integer values, but they could just have easily been time durations, dates, times, URIs, floats, or something else. These all generally serialize in this way, at least if they are [in] parameters, as shown in this example.
The one exception is the string, which can be serialized as either an embedded an or independent element. For example, consider this method:
string ReverseString(string X);
Blindly encoding this, as indicated by the SOAP specification, would yield something very much like this Body serialization:
<soap:Body> <m:ReverseString xmlns:m="http://tempuri.org"> <X href="#str0"/> </m:ReverseString> <m:string xmlns:m="http://tempuri.org" id="str0"> Hello, World! </m:string> </soap:Body>
In many cases, though, the string is used as an [in] parameter, so you have the option of optimizing the serialization a bit to embed the string value:
<soap:Body> <m:ReverseString xmlns:m="http://tempuri.org"> <X>Hello, World!</X> </m:ReverseString> </soap:Body>
The optimization here is that you don't have to hunt for the str0 identifier to discover what text the string contained. This saves you an additional XPath query .NET takes advantage of this optimization, for example.
If the simple value is [in, out], then it should be serialized as an independent, multireference accessor. The serialization format is as you see it in the first string example. To provide a second example, imagine that you have this method:
bool CheckPressure(ref int iPressure); ... int iPres = 330; bool bOver = CheckPressure(ref iPres);
This method triggers some (imaginary) code to check pressure in some vessel. If the pressure exceeds the pressure provided in the iPressure parameter, the method returns a true value. In either case, imagine that the method returns the actual pressure within the vessel in the iPressure parameter. The addition of the ref attribute tells you that the method can consume the incoming pressure value and modify the contents of the parameter variable upon return. Data goes in, and potentially different data comes back out.
The resulting SOAP packet for this method would look something like this:
<soap:Body> <m:CheckPressure xmlns:m="http://tempuri.org"> <iPressure href="#p0"/> </m:CheckPressure > <m:iPres xmlns:m="http://tempuri.org" id="p0"> 330 </m:iPres> </soap:Body>
A possible Response packet would then appear like this:
<soap:Body> <m:CheckPressureResponse xmlns:m="http://tempuri.org"> <return>false</return> <iPressure href="#p0"/> </m:CheckPressure > <m:iPres xmlns:m="http://tempuri.org" id="p0"> 297 </m:iPres> </soap:Body>
For this example, then, iPres would contain the value 297 (even though it started with the value 330), and the bOver variable would contain the value false.
All the W3 datatypes are serialized in this manner, but you're still left with enumerations and byte arrays. Let's turn first to enumeration serialization.
Serializing Enumerations
SOAP's answer to enumeration serialization is actually relatively intuitive. As you might know, enumerations are represented in memory at runtime as integer values. Only in your source code do you actually use the textual representation. For example, consider this enumeration:
enum Colors { Red = 0, Green, Blue };
In memory, anywhere you used the Red enumeration value, the computer would use the value 0. Green is 1, and Blue is 2.
SOAP simply reverts back to using the textual value. Consider a method signature like this:
void FavoriteColor( Colors eColor );
The resulting SOAP Request packet would look a lot like this:
<soap:Body> <m:FavoriteColor xmlns:m="http://tempuri.org"> <eColor>Blue</eColor> </m:FavoriteColor> </soap:Body>
Of course, this packet shows you an embedded, single-reference value. The method parameter could also have been tagged as a ref parameter, in which case the independent, multireference serialization rules apply.
Serializing Byte Arrays
So what do we mean by a byte array? Well, suppose that you had a 1K text buffer into which a user typed some information. You could serialize that as a string, but you could also serialize it as a buffer. If you chose to maintain its buffer nature, you would serialize the contents of the buffer as an array of bytes. Another example would be an image or perhaps a public key used for encryption and decryption purposes. These would also be serialized as an array of bytes.
Because we're talking about a binary large object (BLOB), the SOAP rules for array serialization can be set aside in favor of a simpler mechanism. In this case, you simply convert the bytes to Base64 and then record the converted data within the parameter's XML element.
This brings up another issue, however. Some characters are "special," according to XML. They include <, >, ', ", and &. If you blindly shove bytes that happen to represent these characters into your XML document, the XML parser will croak and return a parse error. It would be a shame to sustain a round trip to the remote server only to find that you shipped it malformed XML!
You have a couple options. For one, you can check each type in the array for special character status; if a given character is special, you can substitute its entity reference. Table 4.2 shows you the special characters and their respective entity reference values.
Table 4.2 XML Special Characters and Their Entity Reference Values
Special Character |
Entity Reference |
< |
< |
> |
> |
' |
' |
" |
" |
& |
& |
This works, but you'll do a lot of looping through your buffers to search for the special characters. Moreover, when you find one, you will have to insert 4 to 6 bytes for each byte of special character. This means that you'll need to split your buffer and play memory games. This can be done, but it's expensive and time-consuming.
An alternative is to use the XML CDATA section:
<![CDATA[(byte array here)]]>
The CDATA section is used in XML to protect blocks of free-form text that should not be parsed. So, you can insert special characters here to your heart's content. The only overhead that you'll have is to insert the byte array text into the CDATA section and strip it out again. You also can't embed CDATA sections, as the first grouping of the characters "]]>" terminates the entire section. This also means you can't allow random text encoded within a CDATA section to contain these characters. At least this character sequence isn't very common, so in most cases you will have no problem.
However, the SOAP specification recommends using Base64 encoding. The advantage to this is the resulting textual stream is suitable for transmission over the Internet and is guaranteed to pass through firewalls and servers unmolested. Some firewall and server software packages modify incoming text, for various reasons. Modifications to your byte array could be catastrophic, however, so you will likely see SOAP packets with large amounts of buffer data using this encoding scheme.
Base64 is a Multipurpose Internet Mail Extension (MIME) encoding. It's typically used to encode email attachments because SMTP servers are particularly touchy about certain textual values and characters. Base64 encoding replaces each byte with a "safe" textual representation. Although the specific algorithm isn't terribly important here, the fact that you have this option is important. You'll essentially run your buffer through a Base64 encoder, only to receive back more text. But the text that you get back won't trigger substitutional or executional code in firewalls and SMTP servers (SMTP is administered through a command line, so it understands commands and carriage returns, and executes the commands on demand). Your buffer will grow by approximately 33%, but it is safe from firewall and server molestation. If you are interested in the Base64 algorithm, you'll find it listed in RFC 2045, "Multipurpose Internet Mail Extensions (MIME), Part One: Format of Internet Message Bodies," or in our book Understanding SOAP (Sams Publishing, ISBN 0672319225).
If you applied the Base64 algorithm to the text "Hello World," you would get this value:
SGVsbG8gV29ybGQ=
To take the example further, assume that you have this C# code:
void SendBytes(byte[] bytes); ... byte[] bytes = {0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72, 0x6c, 0x64}; // "Hello World" SendBytes(bytes);
The resulting SOAP packet would look like this:
<soap:Body> <m:SendBytes xmlns:m="http://tempuri.org"> <bytes href="#p0"/> <m:SendBytes> <m:bytes xsi:type="xsd:base64inary" id="p0"> SGVsbG8gV29ybGQ= </m:bytes> </soap:Body>
Of course, as with strings, you have the option of optimizing this to embed the byte array:
<soap:Body> <m:SendBytes xmlns:m="http://tempuri.org"> <bytes xsi:type="xsd:base64inary"> SGVsbG8gV29ybGQ= </bytes> <m:SendBytes> </soap:Body>
And with the serialization of the byte array, you've seen how SOAP encodes simple datatypes! It really has been pretty easy to see how SOAP handles these datatypes. But not everything is so simple when it comes to compound datatypes, as you'll see in the next section.
SOAP Serialization of Compound Data Types
When you consider how difficult it must be to write a system that can serialize any array or structure you can invent, you can see why a discussion of the compound datatype encodings can get so detailed. After all, there is an infinite set of arrays and structures out there, yet .NET must accept anything that you code for SOAP serialization.
So how do the .NET system programmers do it? How do you write code that accepts whatever array or structure you pass in?
Well, it helps if you're clever. But you also can rely upon the SOAP standard. If you take a close look at how arrays and structures are created in memory, you'll find patterns that you can exploit and shortcuts that you can take based on the way .NET (or any system) stores data in memory. So although it's not a simple task, it's also not an overly complex one when you see how things work.
To review a moment, remember that, to SOAP, structs are compound types whose constituent elements are accessed by name. Arrays, on the other hand, are accessed by position (array element ordinal). The result of this is that when you serialize arrays, you use generic XML tag names. With structs, though, you have to carry over the names provided by the structure's designer. The honest truth is that structures are easier to serialize, so let's start there.
SOAP struct Serialization
Before we get to the details, let's take a look at an example. Imagine that you have this structure:
public struct PartInfo { public string strPartID; public int iQtyOnHand; public string strPartLocation; }
Here, you have some sort of part that you happen to be tracking. Notice that you are providing structure data by value, but you have the option of multireferenced strings, depending on how you want those serialized (optimized or not). The integer value will always be serialized as an embedded parameter because it is passed within the structure itself by value.
Next let's put the struct to work by implementing this code:
// Function prototype... bool RecordPartInfo(PartInfo pi); ... PartInfo piMyPart = new PartInfo(); piMyPart.strPartID = "002-AS-3220-1"; piMyPart.iQtyOnHand = 17; piMyPart.strPartLocation = "Bin 3741"; RecordPartInfo(piMyPart);
Assuming that the RecordPartInfo() method resides on another computer as a Web Service (which is actually accessed via a local proxy), the resulting SOAP Request Body might appear very much like this:
<soap:Body> <m:RecordPartInfo xmlns:m="http://tempuri.org"> <pi> <strPartID>002-AS-3220-1</strPartID> <iQtyOnHand>17</iQtyOnHand> <strPartLocation>Bin 3741</strPartLocation> </pi> <m:RecordPartInfo> </soap:Body>
The structure data is embedded within the method element because it was passed into the method by value. We'll change the method signature a bit to allow for passing by reference:
// Function prototype... bool RecordPartInfo(ref PartInfo pi); ... RecordPartInfo(ref piMyPart);
As a result, we should obtain a slightly different SOAP Request Body:
<soap:Body> <m:RecordPartInfo xmlns:m="http://tempuri.org"> <pi href="#s0"/> <m:RecordPartInfo> <m:PartInfo xmlns:m="http://tempuri.org" id="s0"> <strPartID>002-AS-3220-1</strPartID> <iQtyOnHand>17</iQtyOnHand> <strPartLocation>Bin 3741</strPartLocation> </m:PartInfo> </soap:Body>
Note that the string serialization is optimized. The packet could have appeared like this:
<soap:Body> <m:RecordPartInfo xmlns:m="http://tempuri.org"> <pi href="#s0"/> <m:RecordPartInfo> <m:PartInfo xmlns:m="http://tempuri.org" id="s0"> <strPartID href="#s1" /> <iQtyOnHand>17</iQtyOnHand> <strPartLocation href="#s2" /> </m:PartInfo> <string id="s1">002-AS-3220-1</string> <string id="s2:>Bin 3741</string> </soap:Body>
In this case, we've simply made the strings independent elements instead of embedded elements. Moving the string data from embedded elements to independent elements signifies that the string values are more global in scope. You might do this, say, when reusing the same string value in a couple of places in the method signature (that is, the strings are multireference). It's important to realize that there are many ways to serialize method signature information, and you might or might not have to tweak the XML coming from (or into) .NET to allow for interoperability. You'll see how to accomplish this in Chapter 6, ".NET Web Services and ASP.NET."
With this example under your belt, let's take a slightly closer look at how SOAP tells you to serialize structure information.
Embedded Versus Independent
The decision to serialize the struct information as embedded within the method element or as independent data depends entirely upon how the structure is passed into the method. Information passed by value is generally serialized as embedded data, while information passed by reference is serialized as independent elements. The format of the resulting SOAP packet is different, and sometimes you will need to tweak your packet layout to satisfy a third-party Web Service that assumes that the structure information is serialized in a certain form.
Named Parts
Note also that the XML tag names have meaning when serializing structure data. That's because you can "reach into" the structure and pull out a specific piece of information:
int x = piMyPart.iQtyOnHand;
The "quantity on hand" value is accessed by its name, iQtyOnHand. This is a different access model than the access model for arrays. With arrays, the data is extracted by ordinal position (the element's offset into the array). You might believe that this makes array encoding easy, but array encoding is actually the most complex part of the SOAP specification.
SOAP Array Serialization
When serializing arrays, it's important to remember that arrays come in different shapes and sizes. For example, there is the single-dimension array, the multiple-dimension array, and the partial, jagged, and sparse array. Other forms of arrays exist, but these are the types that the SOAP specification is concerned with. Let's start with some general concepts.
Array Serialization General Concepts
With arrays, you'll need to know whether there is an associated variable name (provided though the schema) because the encoding differs if you have the name. For example, consider this array:
int[3] i = {0,2,4};
If you have the variable name i, the array would be encoded as such:
<m:i soapenc:arrayType="xsd:int[3]" xmlns:m="http://tempuri.org> <int>0</int> <int>2</int> <int>4</int> </m:i>
Without the variable name (and schema), however, you'll need to use the SOAP Array element:
<soapenc:Array soapenc:arrayType="xsd:int[3]"> <int>0</int> <int>2</int> <int>4</int> </soapenc:Array>
In either case, it's important to know that all arrays in SOAP are derived from soapenc:Array and that, if you have an associated schema with your method, the arrays that you specify there must be derived from the SOAP Array element:
<element name="i" type="soapenc:Array"/>
You probably also noticed that SOAP conveys the size and dimension of the array through its use of the soapenc:arrayType attribute:
<soapenc:Array soapenc:arrayType="xsd:int[3]">
You'll see this concept repeated in the next few sections.
Single-Dimension Array
The previous section, "Array Serialization General Concepts," actually provided you with your first SOAP array serialization example:
int[3] i = {0,2,4};
This serialized to the following:
<m:i soapenc:arrayType="xsd:int[3]" xmlns:m="http://tempuri.org> <int>0</int> <int>2</int> <int>4</int> </m:i>
The SOAP arrayType attribute tells you there are three elements to this array, and all three have been serialized (0, 2, and 4). Notice that the element names are simply the datatype associated with the valuesthis is in keeping with the notion that array values are accessed by position rather than by name. You couldn't access them by name in this case even if you wanted to because they're (clearly) all named the same.
A good question to ask at this point would be something like, "What if I don't have all three values assigned when I serialize the array?" This leads to the concept of a partial array, which is covered next.
Partially Transmitted Array
Partially transmitted arrays are simply arrays that do not have all their element values assigned, or at least are not entirely transmitted within the same SOAP packet. To conserve resources, SOAP provides you with a mechanism to indicate this partially completed nature.
For example, take a look at this code:
int[3] j; j[1] = 27; j[2] = 54;
In this case, j[0] has not been assigned a value, so it remains, whatever the default value for the language happens to be. With C#, that's a special form of null that the compiler recognizes to be the unassigned value.
The SOAP serialization for this array would be something like this:
<m:j soapenc:arrayType="xsd:int[3]" soapenc:offset="[1]" xmlns:m="http://tempuri.org> <int>27</int> <int>54</int> </m:j>
As you see, we have added the soapenc:offset attribute to tell which array element contains the first assigned array elemental value. Other values proceed from this position. We simply serialized the next value after the first, and this would continue for the remaining array elements. But what if there is a break in the action?
Sparse Arrays
This "break in the action" is often the case with a type of array known as the sparse array. Any array could have uninitialized elements, but this is especially common with arrays with large upper bounds, or where a given number dominates the matrix (such as with the diagonal in image transformation matrices). For example, consider this example:
int[1000] k; k[301] = 43; k[572] = 76; k[893] = 109;
In this case, the array k is declared to be 1000 elements large, but we have used only 3 of those elements. It would be very wasteful to send all 1000 elements to a remote server with values for only 3 elements.
SOAP handles this by using the soap:position attribute:
<m:k soapenc:arrayType="xsd:int[1000]" xmlns:m="http://tempuri.org> <int soap:position="[301]">43</int> <int soap:position="[572]">76</int> <int soap:position="[893]">109</int> </m:k>
You know that it is a sparse array because the arrayType attribute tells you that there are 1000 elements but only 3 have been serialized (and you have their values and element position information as well).
The serialization of both partially transmitted arrays and sparse arrays also holds true for multidimensional arrays. That is, you can easily have multidimensional arrays that are partially transmitted, sparse, or both. Now let's see how multidimensional arrays are encoded in SOAP.
Multidimensional Arrays
Only a couple differences exist between single-dimension and multiple-dimension arrays when it comes to SOAP serialization. First, the array bounds change. Second, you serialize information in row-major format. Let's look at this example:
int[,] z = {{5,67},{7,21},{92,4}};
The z array is a two-by-three array (two columns, three rows), and its SOAP serialization would look something like this:
<m:z soapenc:arrayType="xsd:int[2,3]" xmlns:m="http://tempuri.org> <int>5</int> <int>67</int> <int>7</int> <int>21</int> <int>92</int> <int>4</int> </m:z>
Essentially, you store all the columns from the first row before moving to the second row, and so on. You know how large the array is based on the bounds provided with the arrayType attribute. If you have no values for certain elements, you'll need to be informed of this by the use of either the offset or the position attributes, or both. Otherwise, you have to assume that all the data is present. If you later find out that there isn't enough information, you can return an error or you can assume default values (SOAP encodes default values very efficientlyit omits them entirely!).
You should be aware of one other case of array serialization: the jagged array.
Jagged Arrays
A jagged array is an array of arrays, and it gets the name "jagged" from the fact that each constituent array is probably not the same size as the others. This is unlike a multidimensional array, which has constant bounds.
Let's again turn to an example:
int[][] q = {new int[2], new int[4], new int[3]}; q[0][0] = 4; q[0][1] = 7; q[1][0] = 15; q[1][1] = 72; q[1][2] = 6; q[1][3] = 167; q[2][0] = 1; q[2][1] = 90; q[2][2] = 659;
It's nasty-looking, that's true, but take a look at how SOAP might serialize the q array:
<m:q soapenc:arrayType="xsd:int[][3]" xmlns:m="http://tempuri.org> <int href="#a0"/> <int href="#a1"/> <int href="#a2"/> </m:q> <soapenc:Array soapenc:arrayType="xsd:int[2]" id="a0"> <int>4</int> <int>7</int> </soapEnc:Array> <soapenc:Array soapenc:arrayType="xsd:int[4]" id="a1"> <int>15</int> <int>72</int> <int>6</int> <int>167</int> </soapenc:Array> <soapenc:Array soapenc:arrayType="xsd:int[3]" id="a2"> <int>1</int> <int>90</int> <int>659</int> </soapenc:Array>
Actually, you might also see the q array serialized using embedded elements, depending on what system performed the serialization. In that case, another valid SOAP serialization would be as follows:
<m:q soapenc:arrayType="xsd:int[][3]" xmlns:m="http://tempuri.org> <soapenc:Array soapenc:arrayType="xsd:int[2]"> <int>4</int> <int>7</int> </soapEnc:Array> <soapenc:Array soapenc:arrayType="xsd:int[4]"> <int>15</int> <int>72</int> <int>6</int> <int>167</int> </soapenc:Array> <soapenc:Array soapenc:arrayType="xsd:int[3]"> <int>1</int> <int>90</int> <int>659</int> </soapenc:Array> </m:q>
Of course, it gets a lot more complicated than this, but you're generally dealing with variations on a theme. For example, you could have a jagged array of structs, or you could have a partially transmitted multidimensional array of strings. It just depends. SOAP is flexible enough to serialize these things, but aren't you glad that .NET does it for you?
You've seen just enough SOAP to understand what will be sent to the Web Service and returned to you for local processing. .NET handles the details for youat least, in most cases.
NOTE
Most of the time you canand shouldlet .NET handle the SOAP serialization for you. .NET works well. But in some cases you might be dealing with Web Services that aren't supported by .NET (Apache, WebSphere, and so on). You might find small interoperability issues with using these or other Web Services. Microsoft is working hard to eliminate inconsistencies with other major vendors, but it's a large Internet out there. The possibility still exists that your SOAP serialization code (within .NET) and the remote end's won't sync up. If this is the case, you'll need to tweak SOAP yourself using .NET's SoapExtension or work though one of the many SOAP-related .NET attribute classes, which you'll see in Chapter 6.
At this point, you've seen how SOAP would format binary data for transmission. If things work as they should, you'll see XML with data inside going back and forth, client to server. But in the real world, things break. How do you tell the client that you had an error? This is the task of the SOAP Fault.
SOAP Faults
SOAP returns an error to the client in a very specific mannerit specifies a fault packet. The SOAP Fault is really an element of the SOAP Body, and it takes the following form:
<soap:Envelope> <soap:Body> <soap:Fault> (Fault information) </soap:Fault> </soap:Body> </soap:Envelope>
The SOAP Fault information consists of four elements, two of which are optional:
<soap:faultcode/>
<soap:faultstring/>
<soap:faultactor/> (optional)
<soap:detail/> (optional)
The faultcode element contains an enumerated value that indicates the type of fault: MustUnderstand, VersionMismatch, Client, and Server. MustUnderstand must be returned if the client issues a SOAP Header with the mustUnderstand attribute and the server truly doesn't understand what to do with the header. VersionMismatch must be returned if the SOAP namespace URI is not understood by the SOAP processor, such as when a SOAP 1.1 processor receives a SOAP 1.2 packet. Client and Server are used to return errors related to client formatting or parameter issues (Client), or server-side failures (Server). SOAP's intention for this element is to allow the client or server to use this information in an automated manner to deal with the fault.
SOAP fault codes are extensible. MustUnderstand, VersionMismatch, Client, and Server are all generic errors that are commonly tailored with more specific information. The value placed in the <soap:faultcode/> element indicates the more generic error code on the left, with each more detailed code being separated by a dot:
Server.DivideByZero
The faultcode is contrasted by the faultstring element, whose use is to provide a meaningful, human-readable error message. The string itself can contain nearly anything (XML special characters excluded if left unreferenced).
The faultcode and faultstring elements are required to exist within the SOAP Fault packet. The faultactor, however, is optional because not all SOAP packets use actors. As you might recall, a SOAP actor is a recipient of a SOAP message. The actors open the message and examine the header(s). If the message has reached the final actor (the destination), the recipient acts upon the message itself. If there is an error along the way, the actor that exposed the error places its URI within the faultactor element.
The final SOAP Fault element, detail, is actually a free-form element into which anything can be placed to further identify the error, as long as it's well-formed XML. If the Web Service had a problem processing the SOAP Body, it is required to use this element. Otherwise, this element is optional. The contents are identified by namespace so that you'll know what to expect:
<soap:Body> <soap:Fault> <faultcode>Server.DivideByZero</faultcode> <faultstring>Divide by Zero Error</faultstring> <detail> <m:Error xmlns:m="http://tempuri.org"> <message>Divide by Zero Error, assembly myassembly.dll, [ic:ccc] method TryMe()</message> <errorcode>RPC_S_FP_DIV_ZERO (1769)</errorcode> </m:Error> </detail> </soap:Fault> </soap:Body>
With the SOAP Fault, you've completed your whirlwind tour of the SOAP protocol. Before jumping into .NET Web Service code from a high-level perspective, which you'll do in Chapter 6, let's look at some low-level .NET SOAP serialization classes and see what services they provide.