STEP 3. Monitoring Networks
After checking memory and disks for bottlenecks, look next at any networks connected to the server. Although network bottlenecks are not likely to directly affect the performance of the database server, they can have a big impact on application response times.
When database applications are running in client/server mode, a slow network between the client and the server impacts interactions between the database and the applications. When the slow network sits between the applications and the user interface, the user perception could well be that the database server is slow.
What to Look For
A simple way of determining the impact of network latency on response times is to log and plot ping round-trip times from the client to the server. The following command pings a host called adelaide every 5 seconds and reports the round trip time in milliseconds.
alpaca% ping -s -I 5 adelaide PING adelaide: 56 data bytes 64 bytes from adelaide (129.158.93.100): icmp_seq=0. time=147. ms 64 bytes from adelaide (129.158.93.100): icmp_seq=1. time=150. ms 64 bytes from adelaide (129.158.93.100): icmp_seq=2. time=150. ms 64 bytes from adelaide (129.158.93.100): icmp_seq=3. time=150. ms ^C ----adelaide PING Statistics---- 4 packets transmitted, 4 packets received, 0% packet loss round-trip (ms) min/avg/max = 147/149/150
Some application transactions involve multiple trips to the database server, each of which incurs the round-trip penalty. I have seen network latencies account for a significant portion of application response time in wide area networks.
One effective way of quantifying application response times on a wide area network is to enter a dummy transaction with a remote terminal or browser emulator and measure the response time. Dummy transactions can be entered from each remote location at regular intervals. The transaction response times in conjunction with round-trip times captured with ping can help determine whether the server or the network has the major impact on performance.
The netstat utility shows packet activity on a network; see Figure 5 for an example.
Figure 5 Network traffic on hme0
Watch for collisions (colls) greater than 10% of output packets (output packets). The use of switches makes collisions less an issue than in the past when many devices shared the same subnet.
Unfortunately, this netstat report shows only the number of packets sent and received and not the size of the packets. Without the size of packets it is difficult to assess the effective throughput of the network.
A number of tools are available to provide the number of bytes as well as the number of packets transmitted on a network. The tcp_mon script, which is part of the SE toolkit (available on the book website) reports network traffic in both packets and bytes. For the sake of simplicity, divide the theoretical bandwidth by 10 to get the effective throughput in Mbytes. So, a 10-Mbit Ethernet subnet will not be able to exceed approximately 1 Mbyte per second, and a 100-Mbit Ethernet subnet will not be able to exceed 10 Mbytes per second.
The undocumented netstat option, -k, reports the number of packets received and sent by each network interface (ipackets and opackets), and as of Solaris 2.6, netstat -k also reports the number of bytes received and sent (rbytes and obytes). The kstat utility, introduced in Solaris 8, allows network statistics to be selectively extracted. The following example displays the number of packets and bytes sent and received by all network interfaces on a host called apollo.
apollo% kstat -p -s "*packets" hme:0:hme0:ipackets 362997 hme:0:hme0:opackets 480774 hme:1:hme1:ipackets 0 hme:1:hme1:opackets 0 ipdptp:0:ipdptp0:ipackets 124649 ipdptp:0:ipdptp0:opackets 180857 lo:0:lo0:ipackets 45548 lo:0:lo0:opackets 45548 apollo% kstat -p -s "*bytes" hme:0:hme0:obytes 394165502 hme:0:hme0:rbytes 47823373 hme:1:hme1:obytes 0 hme:1:hme1:rbytes 0
The interfaces in the above example are two 100-Mbit Ethernet interfaces (hme0 and hme1), a dial-up PPP connection (ipdptp), and the loopback interface (lo). The numbers are cumulative; that is, they represent the total since the last reboot. Calculating the average packet sizes (rbytes/ipackets and obytes/opackets) shows that the average packet received on the hme0 interface was 131 bytes in size and the average packet sent was 819 bytes in size.
Although we are focusing on database servers and not NFS file servers, for completeness it is worth mentioning that nfsstat monitors NFS traffic. From a client, use nfsstat -c. Watch for timeouts greater than 5% of calls, or not responding messages when the server was running: they indicate either network problems or an overloaded NFS server.
As of Solaris 2.6, iostat also shows NFS mounts, so all disk statistics available under iostat are also available for NFS mounts.
For a comprehensive treatment of NFS monitoring, refer to Chapter 9 of Sun Performance and Tuning by Adrian Cockcroft and Richard Pettit, Second Edition, Sun Press, 1998.
What You Can Do to Minimize Network Bottlenecks
To overcome a network bottleneck, try one of the following:
Install multiple network adapters and split the traffic across multiple subnets if network traffic becomes an issue. Expanding the network in this way is usually easier in the case of a local area network (LAN) than a wide area network (WAN). Current LAN technology is relatively inexpensive and performs acceptably in most environments. WAN technology is available to satisfy even heavy throughput requirements, although it is still relatively expensive.
Use Solaris Bandwidth Manager to manage network traffic on servers running mixed workloads.