Home > Articles > Networking

Mathematical Foundations of Computer Networking: Probability

By Srinivasan Keshav
May 1, 2012

📄 Contents

␡

1.1. Introduction
1.2. Joint and Conditional Probability
1.3. Random Variables
1.4. Moments and Moment Generating Functions
1.5. Standard Discrete Distributions
1.6. Standard Continuous Distributions
1.7. Useful Theorems
1.8. Jointly Distributed Random Variables
1.9. Further Reading
1.10. Exercises

⎙ Print

< Back Page 6 of 10 Next >

This chapter is from the book 

Mathematical Foundations of Computer Networking

Learn More Buy

1.6. Standard Continuous Distributions

This section presents some standard continuous distributions. Recall from Sections 1.3 that, unlike discrete random variables, the domain of a continuous random variable is a subset of the real line.

1.6.1. Uniform Distribution

A random variable X is said to be uniformly randomly distributed in the domain [a,b] if its density function f(x) = 1/(b – a) when x lies in [a,b] and is 0 otherwise. The expected value of a uniform random variable with parameters a,b is (a + b)/2.

1.6.2. Gaussian, or Normal, Distribution

A random variable is Gaussian, or normally distributed, with parameters μ and σ² if its density is given by

We denote a Gaussian random variable X with parameters μ and σ² as X ~ N(μ,σ²), where we read the “~” as “is distributed as.”

The Gaussian distribution can be obtained as the limiting case of the binomial distribution as n tends to infinity and p is kept constant. That is, if we have a very large number of independent trials, such that the random variable measures the number of trials that succeed, the random variable is Gaussian. Thus, Gaussian random variables naturally occur when we want to study the statistical properties of aggregates.

The Gaussian distribution is called normal because many quantities, such as the heights of people, the slight variations in the size of a manufactured item, and the time taken to complete an activity approximately follow the well-known bell-shaped curve.⁴

When performing experiments or simulations, it is often the case that the same quantity assumes different values during different trials. For instance, if five students were each measuring the pH of a reagent, it is likely that they would get five slightly different values. In such situations, it is common to assume that these quantities, which are supposed to be the same, are in fact normally distributed about some mean. Generally speaking, if you know that a quantity is supposed to have a certain standard value but you also know that there can be small variations in this value due to many small and independent random effects, it is reasonable to assume that the quantity is a Gaussian random variable with its mean centered on the expected value.

The expected value of a Gaussian random variable with parameters μ and σ² is μ and its variance is σ². In practice, it is often convenient to work with a standard Gaussian distribution, which has a zero mean and a variance of 1. It is possible to convert a Gaussian random variable X with parameters μ and σ² to a Gaussian random variable Y with parameters 0,1 by choosing Y = (X – μ)/σ.

The Gaussian distribution is symmetric about the mean and asymptotes to 0 at +∞ and –∞. The σ² parameter controls the width of the central “bell”: The larger this parameter, the wider the bell, and the lower the maximum value of the density function as shown in Figure 1.4. The probability that a Gaussian random variable X lies between μ – σ and μ+ σ is approximately 68.26%; between μ – 2σ and μ+ 2σ is approximately 95.44%; and between μ – 3σ and μ + 3σ is approximately 99.73%.

Figure 1.4. Gaussian distributions for different values of the mean and variance

It is often convenient to use a Gaussian continuous random variable to approximately model a discrete random variable. For example, the number of packets arriving on a link to a router in a given fixed time interval will follow a discrete distribution. Nevertheless, by modeling it using a continuous Gaussian random variable, we can get quick estimates of its expected extremal values.

Example 1.32. Gaussian Approximation of a Discrete Random Variable

Suppose that the number of packets arriving on a link to a router in a 1-second interval can be modeled accurately by a normal distribution with parameters (20, 4). How many packets can we expect to see with at least 99% confidence?

Solution:

The number of packets are distributed (20, 4), so that μ = 20 and σ = 2. We have more than 99% confidence that the number of packets seen will be μ±3σ, or between 14 and 26. That is, if we were to measure packets’ arrivals over a long period of time, fewer than 1% of the 1-second intervals would have packet counts fewer than 14 or more than 26.

The MGF of the normal distribution is given by

where in the last step, we recognize that the integral is the area under a normal curve, which evaluates to . Note that the MGF of a normal variable with zero mean and a variance of 1 is therefore

We can use the MGF of a normal distribution to prove some elementary facts about it.

If X ~ N(μ,σ²), then a+bX ~(a + bμ, b²σ²), because the MGF of a + bX is

which can be seen to be a normally distributed random variable with mean a + bμ and variance b²σ².
If X ~ N(μ,σ²), then Z = (X – μ)/σ ~ N(0,1). This is obtained trivially by substituting for a and b in expression (a). Z is called the standard normal variable.
If X ~N(μ₁,σ₁²) and X and Y are independent, , because the MGF of their sum is the product of their individual . As a generalization, the sum of any number of independent normal variables is also normally distributed with the mean as the sum of the individual means and the variance as the sum of the individual variances.

1.6.3. Exponential Distribution

A random variable X is exponentially distributed with parameter λ, where λ >0, if its density function is given by

Note than when x = 0, f(x) = λ (see Figure 1.5). The expected value of such a random variable is and its variance is . The exponential distribution is the continuous analog of the geometric distribution. Recall that the geometric distribution measures the number of trials until the first success. Correspondingly, the exponential distribution arises when we are trying to measure the duration of time before some event happens (i.e., achieves success). For instance, it is used to model the time between two consecutive packet arrivals on a link.

Figure 1.5 Exponentially distributed random variables with λ = {1, 0.5, 0.25}

The cumulative density function of the exponential distribution, F(X), is given by

An important property of the exponential distribution is that, like the geometric distribution, it is memoryless and, in fact, is the only memoryless continuous distribution. Intuitively, this means that the expected remaining time until the occurrence of an event with an exponentially distributed waiting time is independent of the time at which the observation is made. More precisely, P(X > s+t | X>s) = P(X>t) for all s, t. From a geometric perspective, if we truncate the distribution to the left of any point on the positive X axis and then rescale the remaining distribution so that the area under the curve is 1, we will obtain the original distribution. The following examples illustrate this useful property.

Example 1.34. Memorylessness 1

Suppose that the time a bank teller takes is an exponentially distributed random variable with an expected value of 1 minute. When you arrive at the bank, the teller is already serving a customer. If you join the queue now, you can expect to wait 1 minute before being served. However, suppose that you decide to run an errand and return to the bank. If the same customer is still being served (i.e., the condition X>s), and if you join the queue now, the expected waiting time for you to be served would still be 1 minute!

Example 1.35. Memorylessness 2

Suppose that a switch has two parallel links to another switch and that packets can be routed on either link. Consider a packet A that arrives when both links are already in service. Therefore, the packet will be sent on the first link that becomes free. Suppose that this is link 1. Now, assuming that link service times are exponentially distributed, which packet is likely to finish transmission first: packet A on link 1 or the packet continuing service on link 2?

Solution:

Because of the memorylessness of the exponential distribution, the expected remaining service time on link 2 at the time that A starts transmission on link 1 is exactly the same as the expected service time for A, so we expect both to finish transmission at the same time. Of course, we are assuming that we don’t know the service time for A. If a packet’s service time is proportional to its length, and if we know A’s length, we no longer have an expectation for its service time: We know it precisely, and this equality no longer holds.

1.6.4. Power-Law Distribution

A random variable described by its minimum value x_min and a scale parameter α > 1 is said to obey the power-law distribution if its density function is given by

Typically, this function needs to be normalized for a given set of parameters to ensure that .

Note that f(x) decreases rapidly with x. However, the decline is not as rapid as with an exponential distribution (see Figure 1.6). This is why a power-law distribution is also called a heavy-tailed distribution. When plotted on a log-log scale, the graph of f(x) versus x shows a linear relationship with a slope of –α, which is often used to quickly identify a potential power-law distribution in a data set.

Figure 1.6. A typical power-law distribution with parameters x_min = 0.1 and α = 2.3 compared to an exponential distribution using a linear-linear (left) and a log-log (right) scale

Intuitively, if we have objects distributed according to an exponential or power law, a few “elephants” occur frequently and are common, and many “mice” are relatively uncommon. The elephants are responsible for most of the probability mass. From an engineering perspective, whenever we see such a distribution, it makes sense to build a system that deals well with the elephants, even at the expense of ignoring the mice. Two rules of thumb that reflect this are the 90/10 rule—90% of the output is derived from 10% of the input—and the dictum optimize for the common case.

When α < 2, the expected value of the random variable is infinite. A system described by such a random variable is unstable (i.e., its value is unbounded). On the other hand, when α > 2, the tail probabilities fall rapidly enough that a powerlaw random variable can usually be well approximated by an exponential random variable.

A widely studied example of power-law distribution is the random variable that describes the number of users who visit one of a collection of Web sites on the Internet on any given day. Traces of Web site accesses almost always show that all but a microscopic fraction of Web sites get fewer than one visitor a day: Traffic is garnered mostly by a handful of well-known Web sites.

< Back Page 6 of 10 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address