- What the Heck Is a Protocol Analyzer?
- Packet Analyzers
- Net Therapy 101: Techniques for Using Your Analyzer
- Appropriate Analysis: Some Analyzer Scenarios
- Packet Analyzer Limitations and Solutions
- Summary
- Q&A
- Workshop
Appropriate Analysis: Some Analyzer Scenarios
Let's take a look at a couple scenarios in which analyzer use helped me out. In all cases, I was confronted by problems where we arrived at a theory, which then was proven through the use of the packet analyzer. This is how all of your analysis sessions should go.
I Can't Print, Again
Let's travel back to a problem we've discussed previously in Hour 19, "Internet and Intranet Troubleshooting: TCP/IP at Work." Remember how I found a UNIX host that would not spool more than one print job to a given network print server at one time, even if that print server had multiple printers attached to it? In other words, the host assumed that each print server only had one printera seriously wrong assumption! In this scenario, even though I had proved to myself that the host was at fault by using black box troubleshooting, I wanted evidence to submit to the vendor to prove that its stuff worked differently (wrongly) from other vendors' implementations of UNIX printing in order to try to force the vendor to fix it.
It was fairly easy to take a trace of this by specifying a capture filter of the print server's MAC address or TCP/IP address. Why not the UNIX host? Because the UNIX host had hundreds and hundreds of users, all accessing it via TCP/IPhad I specified the UNIX host, I would have had a little more data than I could handle.
I set up two test queues (queue1 and queue2) on the suspect hostone for each printer on the print server. As a "control experiment," I set up the same two queues on another host. I started the analyzer capture, went back to my desk, and quickly printed two jobs to the two test queues. I went back to the analyzer, stopped the capture, and saved it to disk, giving it the filename problem.
Then I did the exact same procedure, but used another host to print to the queues. I called this trace file good because this capture illustrated what happens with a UNIX host that's not brain dead. (Although the vendor didn't immediately act, our salesperson saw that we acted on this objective data and bought something else, which had good long-term effects on our leverage with this vendorso it was worth doing. In fact, when we started having more problems with the machine and implementation of UNIX, we were given a new machine in reparation.)
Here are the important points to remember when submitting analyzer traces to a vendor:
Traces should be small. Filter as much as you can. If you have extraneous "stuff" during the initial capture, do a post-capture filter to remove everything but pertinent data.
Traces should be discrete. In particular, it is very useful to submit traces showing a "good" event versus a "bad" event.
Traces should be backed up with an objective and succinct description of the problem, describing what troubleshooting measures were taken.
Slow Databases 'R Us
Remember when we discussed file sharing databases in Hour 20, "In-depth Application Troubleshooting?" Unfortunately, these are still widely in use, and running into problems with these is pretty common. Let's take a look at the packet trace that I did in one case to prove that sequential record searching (rather than indexed) was in use on one database file. Again, this wasn't a problem in the vendor's 100-record database, but it was a huge problem once they sold it to my customerwho had tens of thousands of records.
The symptom of the problem was pressing Next Record on the work order system. Remember that databases are usually composed of more than one table, and more than one database file. To go to the "next record" in this work order database, another database was first consulted (to gather customer information to show on the screen). Because a sequential, rather than seeking (index-based) search was in use, this meant that finding the customer data could be quite a lengthy process!
How'd I prove this? Simply by capturing the network traffic generated at the workstation when the Next Record button was pressed. Then, I scrolled through the capture, and indeed, I found that sequential record reads were being used.
Here's how. Check out Figure 21.9. It shows the information that you'll need to start doing any type of file-and-print analysis. (This happens to be of an NCP session, but the basics are the same whether you're doing SMB or NFS.) First, you want to establish "where does the file get opened?" So, in packet #108, you see the OPEN (the details in the decode are what differentiate an OPEN from a CREATE) requested by the workstation. In packet #109, you see an important piece of data returned by the server: the file handle. A file handle is simply a number that refers back to the file. In subsequent file operations, you will not see the filename, just the handle.
Figure 21.9 When dealing with file-oriented captures, discovering the file handle that the server allocates is a good first step.
Now look at Figure 21.10. Packet #110 is the initial read request. Notice two things about the decodethe first of which is that this packet uses the same file handle as in Figure 21.9, that is, file handle 000092930200. (One analysis trick, whether you're dealing with SYN numbers, file handles, or any other long number: just remember the last four numbers when you're bouncing between packets. If there's a question, sure, compare the whole thing, but for quick reads, using the last four digits usually works out fine.) If a different file handle was displayed, you know you're dealing with the wrong file.
Figure 21.10 The initial read request for this problem starts at file offset 0the beginning of the file.
Second, note that you're starting at file offset 0. This could be an index file, being checked from the beginning, but it's not. (The decode of Figure 21.9 shows that the file name is not an index.) What if it was an index? In this case, you would expect that the file seeking would start jumping around, as indexes are sorted, and typically use a binary search, a concept that we discussed in Hour 5, "The Napoleon Method: Divide and Conquer."
Scanning through the next couple of transactions tells the tale. Each one of the subsequent transactions looks similar to the one shown in Figure 21.11: It increments the offset by a small amount, which is basically the definition of what a sequential read sequence is. Again, this is a totally crazy and irresponsible thing to do if you are designing a fast database: Any field that gets searched needs some sort of index, which, when used, avoids sequential reads through binary search.
Figure 21.11 Future offsets in this problem show that sequential reads are occurring because file offsets keep incrementing in a predictable, sequential way.
At this point, you'd filter this trace to start at packet 108, and progress through perhaps 1020 sequential reads. That should be enough for your vendor to get the idea. And, in our case, it was. We received fixed code within 30 days of reporting this. That's not bad.
Identifying a Station
In addition to analyzing and reporting bad network events, another use of an analyzer can be to identify workstations by MAC address using application data.
We've all been at sites where the MAC addresses weren't terribly well documented, so any MAC-related error was difficult to run down. For example, suppose Windows exclaims that there's a duplicate TCP/IP address on MAC address 00:00:C9:05:89:62. If you don't have switches and can't use the port-to-MAC table as we discussed in Hour 19, you might think that you're totally stuck.
Undocumented MAC addresses can be a nightmare. If your analyzer doesn't automatically identify network names for you, you might think that you're out of luck. The same goes for when your expert analyzer tells you that 00:08:02:55:29:2A is probably a bad network card and is causing many network errors.
But, no problemyou've got a wiretap, right? You can listen to all the traffic generated by this workstation, and it's likely that you'll get something that will identify the user. By taking a look at the data in the hexadecimal or character-oriented decode window, you can see various data that might lead you to identify the workstation's user (or department).
This is something that takes a little practice, but use your head and you'll get good at it in no time. For example, filtering on Telnet sessions will give you the entirety of a user's Telnet. Find the Password: prompt sent by the server, as in Figure 21.12, and you'll get the login name. The only problem is that you will likely have to assemble the password manually: As you go backward, the username data will present itself. To add to the fun, it's likely that the username will be "character-by-character," rather than the whole thing in one data packet. So, if the user name was "joe," you would see an "e" in packet #40's Telnet data section, and then an "o" in #37, with "j" coming right before that. The astute reader will point out that there are packets sent by the workstation (192.168.1.202) in between #45 and #40what's up with that? Do the experiment: You'll see that they don't contain character data.
Figure 21.12 Telnet session usernames can be found in the packets immediately preceding the Password prompt.
Too much of a pain? Check out some of the Telnet data itself. You might see a report or a menu screen that only a particular user or department uses. This sort of use of an analyzer is a good opportunity to get good at reading your protocol decodes. But clearly, documenting your MAC addresses is the real solution.
TIP
When looking for the start of any TCP-based session, go down in the TCP decode and filter on synchronize=1. This will get you to the beginning of the session. It's the equivalent of saying "hello?" when you first pick up the telephonethe next steps, like "may I speak to Ms. So-and-so?" are likely in a nearby packet. Of course, encrypted authentications are going to make this a lot harder.