Patching the Holes
The three limitations of UDP that can cause the most pain are its unreliability, lack of integrity, and duplication. To fix each defect without losing the usefulness of these unordered datagrams, you only have to avoid reordering the messages as they arrive. Likewise, you need to look at the data from both P2P and multicast perspectives. Later in this article, I describe a tool for multicasting very large files, so the presented solutions focus on algorithms for multicasting.
Building in reliability simply requires making sure that the message eventually arrives. You can do this in two ways: making the receiver request any missing messages, or having the sender repeat the message periodically. The first solution requires some unique identifier that the receiver uses to check off its receipt. This form doesn't lend itself well to multicast senders, so a repeating sender serves better—as long as the information doesn't change regularly.
The solution for duplication is similar to that of increasing reliability. The client needs to account for received messages and skip over the repeated messages. Again, this requires that both the sender and the receiver track each message with an identification code. If the receiver gets a message twice (the ID code is already accounted for), the receiver just ignores the second message.
Fixing the integrity limitation presents no real challenge to a programmer. Most modern network interface cards (NICs) include a built-in checksum for the entire data message. This means that while it's possible to get scrambled data, it's not very likely. Before the modern NIC hands off the packet, it checks this checksum; if it's faulty, the NIC either dismisses the packet or requests a re-send. Still, you may want to add your own checksum and/or message digest to each message. A message digest is simply a large number that "summarizes" the entire message using some kind of hashing function. The function goes through the message byte by byte and calculates this hash number. When the receiver gets the message, it extracts the digest, runs the function on the data, and compares the results. If they match, the message is okay.
Message digests are becoming more popular and more complicated—including information about the sender, the hash function name, what services are available for interfacing the hosting peers, and so on. (For the real power of these summaries, see my later article on secure sockets.)
Therefore, the two new pieces of data that you can add in your own header to a datagram message are sequence numbers (which can serve as a unique identifier) and message digests. This new header resides in the data portion of the packet and precedes the actual message data. When the receiver gets the message, it separates the customized header from the message data and verifies both. You can add whatever you need to a customized header. Just remember that adding more information reduces the relative amount of data that really is sent.