- 6.1 About Ethernet
- 6.2 About Hubs, Switches, and Routers
- 6.3 About TCP/IP
- 6.4 About Packets
- 6.5 About Remote Procedure Calls (RPCs)
- 6.6 Slop
- 6.7 Observing Network Traffic
- 6.8 Sample RPC Message Definition
- 6.9 Sample Logging Design
- 6.10 Sample Client-Server System Using RPCs
- 6.11 Sample Server Program
- 6.12 Spinlocks
- 6.13 Sample Client Program
- 6.14 Measuring One Sample Client-Server RPC
- 6.15 Postprocessing RPC Logs
- 6.16 Observations
- 6.17 Summary
- Exercises
6.9 Sample Logging Design
The client-server programs described in the coming text each write a log file of all the RPCs they process. This logging is our designed-in observability for the dynamics of the RPC system. As such, it is important that the logging is not so slow or bulky that it consumes significant resources or distorts the performance of the underlying service.
Our sample design target is to be able to process and therefore log up to 10,000 RPCs per second with little overhead. This is the right order of magnitude for a real datacenter service.
Back-of-the-envelope: If each log entry is 1,000 bytes, logging 10,000 RPC/sec would write 10MB/sec to a log file, or 864GB/day per service, with multiple services running on each server. This quickly gets bulky and can also consume significant bandwidth to a disk holding multiple log files. Each service would nearly fill up a 1TB disk every day, likely requiring multiple disk drives just for logging.
Instead, if each log entry is about 100 bytes, one service would log about 1MB/sec, and total about 86 GB/day. This is still a bit bulky, but several services writing 1MB/sec to a logging disk is an easily sustained rate, and those services would need just a single 1TB disk to hold a day’s worth of log entries—a manageable amount.
For slower-rate services that handle only about 1,000 RPCs per second, we could roughly afford 1,000 bytes per log entry, although most logging does not need so much data per RPC. Do the back-of-the-envelope arithmetic at design time to document affordable limits on the logging overhead.
For our sample RPC design, the binary log format is just a copy of the current RPC header, with the full data length from the RPC mark moved to just in front of the data, and the data itself truncated or zero-extended to 24 bytes, as shown in Figure 6.12. Each log entry is thus exactly 96 bytes.
Figure 6.12 Sample log entry format, 96 bytes total
Log-system performance is a key consideration. Truncating the data keeps each log entry size bounded, and it also provides a bit of privacy for possible user-supplied data such as an email message. Including even a little of the data, though, helps identify what is going on when there is unusual latency. Recording log entries as binary fields instead of printed ASCII values saves file space and also saves CPU time for formatting all the numbers. This approach substantially reduces the logging overhead. Conversion to readable ASCII can be done by postprocessing binary log files as needed later.
Each server program will write a local binary file of these log entries as it runs, and each client program will write its own local binary file of log entries. For an overall service running on 2,000 machines, the server programs would write 2,000 local log files on those machines. Scattered clients would write local log files on their own machines.
In a large datacenter, all programs would periodically close their log files and open new ones. A background service would gather closed log files into a single place or into a distributed file system so that they could be postprocessed efficiently. To conserve space, most logs would be thrown away after a few days. There is little point keeping data if no one will ever read it again. For our sample environment, none of that log management is done: we just deal with multiple local log files written locally on individual servers.