Data Communications: Use the Right Medium for your Message
In this chapter
- Information as a Quantity
- Bounded Medium
- Unbounded Medium
- Effects of Bandwidth on a Transmission Channel
- Bandwidth Requirements for Signals
- Carrier Systems
- What You Have Learned
Messages represent information useful to people, but the sender and receiver might or might not be human. The medium must be suitable to convey the type of message. In this chapter, you will examine in more detail the types of messages and the media that carry them. Concerning the latter, you will also examine impairments that affect the data-transfer capability of different media and why certain types of media have a higher data-transfer capacity than other types of media.
Useful communication requires four elements, as shown in Figure 3.1:
- A message (information) to be communicated
- A sender of the message
- A medium or channel over which the message can be sent
- A receiver
Figure 3.1 The elements of communication.
Information as a Quantity
Information can be defined as "a numerical quantity that measures the uncertainty in the outcome of an experiment to be performed." This definition has an application in the sending of messages. For example, suppose you have a machine that can send only two symbols: A1 and A2. You can then say that the "experiment" is the accurate recognition of the two symbols (A1 and A2) being sent from one machine to another. As far as the receiving machine is concerned, it is just as likely to receive one symbol as the other. So, you can say that the "numerical quantity" in this experiment is a unit of information that allows a selection between two equally likely choices. This quantity, or unit of information, is usually called a bit (a contraction of the terms "binary digit"), and it has two possible values, 0 and 1. If these two values are used to represent A1 and A2, A1 could be represented by the bit value 0, and A2 could be represented by the bit value 1. The number of bits per symbol is 1, but you still need a way of selecting which symbol (bit) you want to use. A machine that needs only two symbols needs only a 1-bit select code (0 and 1).
A machine that uses only two symbols is not of much use for communication, but suppose the machine could use 128 symbols (like the standard ASCII character set). Then the number of equally likely choices to be handled would be 128, and the number of bits (information) required to represent each of those 128 symbols would be seven (refer to Table 1.1 in Chapter 1, "An Overview of Data Communications"). You can see, then, that if the knowledge (or intelligence) to be communicated can be represented by a set of equally likely symbols, the amount of information required per symbol to communicate that knowledge is necessarily dependent on the total number of bits of information.
The standard ASCII character set is particularly useful for selecting the information to be communicated because it can select 1 of 128 ASCII symbols with only one 8-bit byte, a common bit grouping in computers (the eighth bit is not used in this case). The extended ASCII character set that uses all 8 bits per byte supports 256 symbols, doubling the number of ASCII symbols.
Information Content of Symbols
In many information systems, not every symbol is equally likely to be used in a given communication. The English language is a good example. In a message written in English, the letter e is 12 times more likely to occur than the letter s. This uneven distribution is also characteristic of particular groups of letters and of words. This means that each of the 128 symbols in ASCII (or 256 symbols in extended ASCII) is not likely to occur an equal number of times in any given communication. For example, notice that the uses of the letters e and g and the letter combinations th and er are unequal in this paragraph.
In 1949, Claude Shannon published a book entitled The Mathematical Theory of Communication. In this book, he discussed the uncertainty or amount of disorder of a system, which he called entropy. Entropy can also be considered as a measure of randomness, and, as you will shortly note, it has a significant role in communications, both for verification of the accurate arrival of messages and as a mechanism for reducing the physical size of messages while retaining their meaning.
The entropy of a set of equally likely symbols (such as the digits 09 in a table of random numbers) is the logarithm to the base 2 of the number of symbols in the set. The entropy of the English alphabet, which contains 26 letters and a space, is then log2(27) = 4.76 bits per symbol. Because of the uneven use of letters in the English language, however, its entropy was estimated by Shannon as 1.3 bits per symbol.
Because the probability of occurrence of each character in the English alphabet differs, entropy of the alphabet is calculated as follows:
H=-(P1log2P1+P2log2P2+...Pilog2Pi+...+P26log2P26)
Here, Pi is the probability of occurrence of the ith character in the English alphabet. Note that the symbol H is used by mathematicians to represent entropy. The preceding calculation can be simplified as follows for any code or language:
n H=Pi1log2P1 i=1
Here, n represents n possible distinct characters or symbols in a language or code.
When entropy was computed for the English language, Shannon discovered that the language is about 70 percent redundant and that it should be possible to reconstruct English text accurately if every other letter is lost or changed due to noise or distortion. Obviously, redundancy is desirable to raise the chances of receiving a good message when the medium is noisy. (The words noise and noisy in this book refer to electrical noisethat is, an electrical signal that is not supposed to be present.)
Using Redundancy in Communications
So, you wonder, what does all this have to do with the real world of data communications? Quite a bit, because almost every scheme in current use for sending data uses redundancy in an attempt to verify that the data has been received exactly as sent (that is, no errors have been introduced by the sending mechanism, the transmission medium, or the receiver). The redundant information might consist simply of a retransmission of the entire original message.
Although it's simple to implement, retransmission is not efficient. Special techniques are therefore used to generate redundant information that is related to the message in a way that is known to both the sender and the receiver. The sender generates the redundant information during transmission and sends it with the message. The receiver regenerates and checks the redundant information when the message is received. This scheme is represented in Figure 3.2. Verification usually occurs at the end of each link in the chain making up the transmission path. The details of this process and various methods in current use are described in Chapter 10, "WAN Architectures and Packet Networks."
Using Redundancy for Data Compression
A second area where entropy plays a considerable role is as a foundation for data compression. Because the entropy of an alphabet indicates the average number of bits per symbol, this information provides software and hardware developers with a goal for implementing various data-compression schemes.
Figure 3.2 Error-checking points.
For example, if the uneven use (different probabilities) of the letters in the English alphabet results in an entropy of 1.3 bits per symbol, why use an 8-bit byte to transmit each character? This tells software and hardware designers that by compressing data and temporarily removing redundancies via the use of one or more algorithms prior to transmission, larger quantities of data can be transmitted per unit time. Recognizing the value of entropy, almost all modems today include a built-in data-compression mechanism.