- 6.1 About Ethernet
- 6.2 About Hubs, Switches, and Routers
- 6.3 About TCP/IP
- 6.4 About Packets
- 6.5 About Remote Procedure Calls (RPCs)
- 6.6 Slop
- 6.7 Observing Network Traffic
- 6.8 Sample RPC Message Definition
- 6.9 Sample Logging Design
- 6.10 Sample Client-Server System Using RPCs
- 6.11 Sample Server Program
- 6.12 Spinlocks
- 6.13 Sample Client Program
- 6.14 Measuring One Sample Client-Server RPC
- 6.15 Postprocessing RPC Logs
- 6.16 Observations
- 6.17 Summary
- Exercises
6.8 Sample RPC Message Definition
Local procedure calls have relatively simple dynamics—on a given CPU core, procedure A calls procedure B, and that CPU core then executes instructions in B; no further instructions in A execute until B returns. Local procedure calls may be nested, with A calling B, which in turn calls C. But these all execute sequentially on a single CPU core. We can observe a complete local call tree by capturing the entry and exit times of each procedure. Call nesting is implied by nested entry/exit times. In a multiple-core, multi-threaded-program environment, multiple local calls to B can occur simultaneously, but they are from different callers executing on different software threads possibly using different CPU cores.
Remote procedure calls are more complicated. Unlike a subroutine call, the transmission of the request message from client machine A to server machine B is not instantaneous, nor is the transmission of the response message back from B to A. Since these messages use shared network resources, other network traffic may delay them, so we at least want to capture send/receive times for each message.
As shown in Figure 6.8, an RPC may be non-blocking—the caller A may proceed with additional execution in parallel with B and issue additional parallel RPCs to C, D, E, etc. Eventual response messages from B, C, D, E, ... arrive at A asynchronously and not necessarily in order. To match up multiple request and response pairs in the RPC library code, each outstanding RPC is given a unique ID, and that is included in its request and response messages. To match up one RPC with any sub-RPCs that it does, each sub-RPC also includes the RPC ID of its parent; this allows us to reconstruct entire call trees.
A caller may wait for all its RPC responses before finishing, or it may finish early as in Figure 6.8 (pyge65 at the very top right returns before the calls to pyej23 and pyhr19 at the very bottom right). If a network link goes down or a server crashes, some responses may never arrive; the caller needs to detect and deal with this rather than waiting forever.
During the time that A is waiting for a response from B, other clients on the same or different machines may also be sending RPCs to B, and B may be working on those and not on A’s. If that happens, a sub-RPC from B to some other server Z may be part of the work for A or for any other client of B. The parent ID shows the proper association.
All these complications happen fairly often in a large datacenter environment.
We can observe the dynamics of a complete remote call tree only by capturing the caller/callee pairs and send/receive times for each request and response message and explicitly recording the parent caller for all nested RPCs. Most of this information must be transmitted between machines in each of the request and response messages.
For our sample RPC system, each request or response message starts with an RPC marker followed by an RPC header followed optionally by a byte string that contains the argument values for a request or the result values for a response, as shown in Figure 6.9. Each complete message is broken up into the payload data carried in one or more TCP/IP packets. We will focus in the rest of this chapter on complete messages instead of individual packets.
Figure 6.9 Overall structure of a request or response message in our sample RPC design
The 16-byte RPC marker, as shown in Figure 6.10, serves several purposes: delimiting messages, defining variable lengths, and sanity check.
Figure 6.10 RPC marker of 16 bytes
The signature is a fixed 32-bit value. This allows a quick check that the subsequent bytes could begin an RPC message and are not something else. If somehow a TCP connection gets out of sync, it also allows scanning forward until a signature is found as a way to resynchronize. (This is not necessarily a good idea; it may be better to drop the connection and force a clean restart.) In Chapter 15, we use the signature field to filter packets that appear to be the beginning of an RPC message, recording KUtrace entries for each.
The 32-bit headerlen field gives the byte length of the following RPC header, whose size will likely vary over several months or years in a real datacenter as the RPC library is updated and expanded. To improve validity checking, headerlen values are required to be less than 2**12. For our sample RPC design, headerlen is always 72.
The 32-bit datalen field gives the byte length of the optional argument or result byte string that follows the RPC header. A length of 0 indicates no string. To improve validity checking and to make huge messages invalid, datalen values are required to be less than 2**24. The two length fields allow an RPC library to break a message into its variable-length pieces.
Finally, the 32-bit checksum field is a simple arithmetic function of the previous three fields, allowing a robust sanity check that the marker and subsequent bytes are highly likely to be the start of a valid RPC message.
The RPC marker is designed to be part of a complete network message but is not visible to the callers of the RPC library software. Those callers deal only with the RPC header and data string.
The RPC header, shown in Figure 6.11, has all the information to describe a single RPC request or response message. Fields are initialized to zero and are filled in incrementally by RPC library as an RPC is processed. For example, T1 and the first L are filled in by the RPC library when an RPC request message is about to be sent by a client program. The second L is filled in when an RPC response message is about to be sent by a server program, and T4 is not filled in until the RPC response message is received by the client program.
Figure 6.11 RPC header of 72 bytes.
Briefly, the naturally aligned fields are
RPCID 32 bits, containing a unique ID number for each outstanding request.
Parent ID 32 bits, containing the RPCID of the request that spawned the current request.
T1..T4 64-bit wall-clock timestamps with microsecond resolution, giving respectively the request send time, request receive time, response send time, and response receive time; T1 and T4 are based on the client machine’s time-of-day clock; T2 and T3 are based on the server machine’s time-of-day clock.
IP 32 bits and port 16 bits, giving the client and server machines’ TCP/IP addresses,
L L 8 bits each, giving the logarithm of the byte lengths of request and response messages.
Message type 16 bits, to indicate request or response or other types of message.
Method 64 bits (8 bytes), ASCII name of the routine being called, zero padded.
Status 32 bits, return-value status indicating success, failure, or specific error number.
Padding 32 bits, to make the header a multiple of eight bytes in total length.
The sizes of the RPC header fields are somewhat arbitrary; different sizes would work equally well. Reducing the byte lengths to logarithms is just an example of trading resolution (e.g., within 10%) for space.
This header format is somewhat less flexible than those used in real datacenters, but is sufficient for our sample RPC work.