Categorizing TCP Traffic
The maximum frame size of the Ethernet is 1518 bytes. Since Sun uses the version 2 format of Ethernet frame, the Ethernet header is always 14 bytes. Excluding the last four bytes of cyclic redundancy check (CRC) code, the maximum payload for each Ethernet frame, or packet, is 1500 bytes. This is called the MTU size of the Ethernet interface. Since the IP and TCP headers require 20 bytes each when the TCP network is used, the actual payload perceived by the user application is 1,460 bytes for each packet. This 1,460-byte payload is called the TCP maximum segment size (MSS) when Ethernet is the underlying carrier1. Based on the MSS, TCP network traffic can be categorized into two types:
Bulk transfer trafficThe payload size of most segments from the sender to the receiver is 1,460 bytes.
Small packet trafficThe payload size of most segments from the sender to the receiver is below 1,460 bytes.
In the real life, network traffic is a mixture of both, however, this article presents them separately because they are treated differently in the Solaris OE according to the TCP specifications [5].
Typically, bulk transfer traffic is seen when the amount of data to move from one computer to another is far larger than 1,460 bytes. Traffic generated by FTP transfers, network-based backup, and downloading web pages with large amount of graphics fall into this category. Small packet traffic, however, is usually generated by the client/server type of applications. These kinds of applications typically see short requests from the client (for example, a database query), and short replies from the server. (for example, a selected row in the database). Although bulk transfer traffic is the bigger consumer of the two for network bandwidth, both types of traffic require a large amount of system resource to process packets. A system must perform well for both types of traffic to satisfy the demands of the end users, but these two types of traffic exert different types of pressure on the system, and hence behave very differently. The following section describes this behavior.
Bulk Transfer Traffic Performance Issues
The performance of bulk transfer traffic is commonly measured by throughput because the goal is to move as much data as possible in as little time as possible. The overall performance, however, depends on many factors:
Size of the TCP window
Overhead to move data from the sender to the receiver
Quality of the link
TCP uses the sliding-windows protocol [4] to control the flow of data and to assist reliable delivery. There are two types of windows. One is the send window, the other is the receive window. The send window, together with the sender's congestion window [4], helps the sender to avoid congestion. The receive window prevents the sender from overwhelming the receiver with packets by enforcing a limit on how much data can be sent without waiting for acknowledgment. In a switch-based LAN environment, congestion is usually not an issue. Hence, the window typically refers to the receive window. The size of the receive window can be thought of as the size of a pipe between two water-delivering sites. Once a window is negotiated, data from the sender can start to fill the pipe while it flows from the sender to the receiver. When the pipe is full, the sender will stop sending for a while until the receiver drains the pipe to make room. Ideally, in the steady state, data flows at a constant rate from the sender to the receiver. This rate is determined by the slower party between the receiver and the sender. Suppose the rate is X bps, and the latency from sender to the receiver is T seconds (includes the time the sender takes to send a packet, the time the packet travels in the pipe, and the time the receiver takes to process the packet), the pipe must hold at least X * T/8 bytes at any given time to ensure that the size of the pipe is not a performance bottleneck. To get one gigabit per second (Gbps) on Ethernet (including the headers), the system must deliver 1,000,000,000/8/1518 = 82,345 packets per second. This is equivalent to delivering one full-sized packet every 12 microseconds. If the latency is 100 microseconds, the size of the window needs to be at least 1,000,000,000 * 0.0001/8 = 12,500 bytes.
The significance of this issue is two fold. First, if the sender and receiver are incapable of handling 82,345 packets-per-second, reducing overhead helps to improve throughput and packet rate. Second, if the sender and receiver are capable of handling 82,345 packets- per-second, reducing overhead makes the CPUs more available to run other tasks instead of handling the network traffic. The TCP parameters that affect the overhead include, but are not limited totcp_maxpsz_multiplier, tcp_deferred_acks max, and tcp_deferred_ack interval.
The quality of the link determines the frequency of dropped/erroneous packets, which leads to retransmitted TCP segments and duplicated acknowledgments (DUP-ACKs). Retransmission and DUP-ACKs waste the effective bandwidth. This article, however, does not evaluate the impact of link quality.
Hence, the general approach to have the best available performance of TCP bulk transfer traffic is to set up a TCP receive window that is sufficiently large and to make the data movement as efficient as possible. Some example mechanisms are selecting the appropriate size for socket buffers, minimizing the number of system calls to send the same amount of data, moving as much data as possible in kernel with each action, and minimizing the number of acknowledgment (ACK) packets from the receiver. "Bulk Transfer Traffic Performance" discusses the details of these mechanisms.
Small Packet Traffic Issues
Request-reply types of applications typically generate small packet traffic. Thus, the latency resulting from packet processing is more important than the throughput delivered. This network latency is usually calculated as measurement time over the number of packets transferred. Hence, if the measurement time is one second, latency is the reciprocal of packet rate-per-second. Since packet rate also shows a server's capability to process packets besides latency, this article uses this metric for the studies on small packet traffic.
Similar issues such as TCP window size and transfer overhead, discussed previously, also affect the performance of small packet traffic. However, small packet traffic faces other challenges too:
Nagle's control flow algorithm
Distinguish the end of a transmission and the traffic generated by small packets
In RFC 896[5], Nagle's algorithm is proposed to control congestion and reduce the amount of small packets in the Internet traffic. Small packets are not the most efficient way to transfer data, and, hence, the bandwidth hungry Internet backbone should avoid them as much as possible. Nagle's algorithm says: "The sender should not transmit the next small packet if it has already one unacknowledged small packet outstanding. Instead, the sender should accumulate small packets until the amount of data to be transferred exceeds theMTU or when the receiver sends the acknowledgment for the outstanding packet." Hence, applications that send small packets continuously from systems that adopt Nagle's algorithm may observe unwanted delay if the receiving systems enable deferred acknowledgment. However, subsequent sections of this article show that applications can disable Nagle's algorithm on a per connection basis.
For example, a transfer of 1470 bytes results in a full-sized packet (MTU packet) and a packet with 10 bytes of payload is a small packet. Hence, some algorithms targeting quick resolution of end-of-transfer issues may potentially work against the performance of small packet traffic. This issue is not well understood.
Later sections of this article discuss the performance issues of bulk transfer and small packet traffic separately. But before the packet rate and throughput numbers are discussed, how long does a packet travel from the sender's application to the receiver's application?