- 6.1 About Ethernet
- 6.2 About Hubs, Switches, and Routers
- 6.3 About TCP/IP
- 6.4 About Packets
- 6.5 About Remote Procedure Calls (RPCs)
- 6.6 Slop
- 6.7 Observing Network Traffic
- 6.8 Sample RPC Message Definition
- 6.9 Sample Logging Design
- 6.10 Sample Client-Server System Using RPCs
- 6.11 Sample Server Program
- 6.12 Spinlocks
- 6.13 Sample Client Program
- 6.14 Measuring One Sample Client-Server RPC
- 6.15 Postprocessing RPC Logs
- 6.16 Observations
- 6.17 Summary
- Exercises
6.5 About Remote Procedure Calls (RPCs)
Our lab experiments will use a form of remote procedure call. For a local procedure call, routine A calls some Method with arguments and gets back a return value, with all the code running on a single machine:
routine A { ... foo = Method(arguments); ... }
For a remote procedure call, the idea is the same, but the Method (e.g., a C function) runs on a remote computer.
The Method name and arguments are passed to the remote server in a request message, and the return value is eventually passed back in a response message, as shown in Figure 6.5. The client and server programs are constructed with calls to an RPC library. Building, sending, and parsing the request and response messages is done by the library routines, implementing a particular RPC design. Non-blocking RPCs allow multiple RPC requests to be outstanding at once and allow responses to return out of order.
Figure 6.5 A single RPC sending a request message and eventually receiving a response message; “krnl” is kernel code; T1-T4 are timestamps in user-mode code to send/receive RPC request and response messages.
Each message is a network transmission. The request message goes from
a user-mode client program on computer A at time T1 to
kernel-mode code on A,
over the network,
to kernel-mode code on computer B,
to a user-mode server program on B at time T2.
The response message travels in the opposite direction, at times T3 and T4. RPC latency is measured from the time T1 that the user-mode client program on A sends the request to the time T4 that the user-mode client program on A receives the response. When a response is delayed, the delay can be anywhere on the picture—request or response, user code or kernel code, machine A or machine B, send or receive network hardware. The four times T1..T4 help observe where the overall time went.
To examine the performance effects of network RPCs we will use timelines with events T1, T2, T3, and T4 indicating the RPC timing. We will draw individual RPCs as timelines with notches showing the times T1..T4, as shown in Figure 6.6. The notches do not take up much diagram space, but the human eye is quite good at picking them out, even when there are hundreds of RPC lines close together. The total RPC latency, as observed by the client user-mode program, is T4-T1. The total server time for the RPC is T3–T2.
Figure 6.6 Diagram of one RPC, showing the four times. T1 to T2 is the time from client user-mode code sending an RPC request to server user-mode code receiving that request. T2 to T3 is the server time spent performing the request. T3 to T4 is the time from server user-mode code sending the RPC response to client user-mode code receiving that response. Times T1 and T4 are taken from the client CPU’s time-of-day clock, while T2 and T3 are from the server CPU’s time-of-day clock. The two clocks may be offset from each other by microseconds to milliseconds. We will deal with clock alignment in the next chapter. w1 is the time the client kernel-mode code sends the request to the network hardware (“;w” for “wire”), and w3 is the time the server kernel-mode code sends the response to the network hardware.
The return value from an RPC may be a single status number or may be thousands of bytes of data. It is convenient to always return both an overall status for the call (success, failure, specific error codes) and a possibly empty byte string of additional results.
Most datacenter software uses RPCs to send work between servers. For example, passing a paragraph of text to Google Translate via its web-page interface may send that paragraph to a load-balancing server that in turn forwards it to a least-busy translation server, which in turn may break the paragraph into sentences and send the individual sentences in parallel to a few dozen sentence servers that do sequences of multi-word phrase lookups in the source language and map into the best-score sequence out of many possible phrases in the target language. These results are then gathered back together by the translation server into a single translated paragraph.