The Art of Computer Programming

By Donald E. Knuth
Oct 25, 2022

📄 Contents

␡

⎙ Print

< Back Page 8 of 11 Next >

This chapter is from the book 

Art of Computer Programming, Volume 4B, The: Combinatorial Algorithms

Learn More Buy

Mathematical Preliminaries Redux

Many parts of this book deal with discrete probabilities, namely with a finite or countably infinite set Ω of atomic events ω, each of which has a given probability Pr(ω), where

Click to view larger image

This set Ω, together with the function Pr, is called a “probability space.” For example, Ω might be the set of all ways to shuffle a pack of 52 playing cards, with Pr(ω)= 1/52! for every such arrangement.

An event is, intuitively, a proposition that can be either true or false with certain probability. It might, for instance, be the statement “the top card is an ace,” with probability 1/13. Formally, an event A is a subset of Ω, namely the set of all atomic events for which the corresponding proposition A is true; and

Click to view larger image

A random variable is a function that assigns a value to every atomic event. We typically use uppercase letters for random variables, and lowercase letters for the values that they might assume; thus, we might say that the probability of the event X = x is Pr(X = x) = ∑_ω∈Ω Pr(ω)[X(ω)= x]. In our playing card example, the top card T is a random variable, and we have Pr(T = 𝓠♠) = 1/52. (Sometimes, as here, the lowercase-letter convention is ignored.)

The random variables X₁, ... , X_k are said to be independent if

Click to view larger image

for all (x₁,...,x_k). For example, if F and S denote the face value and suit of the top card T, clearly F and S are independent. Hence in particular we have Pr(T = 𝓠♠) = Pr(F = 𝓠) Pr(S = ♠). But T is not independent of the bottom card, B; indeed, we have Pr(T = t and B = b) ≠ 1/52² for any cards t and b.

A system of n random variables is called k-wise independent if no k of its variables are dependent. With pairwise (2-wise) independence, for example, we could have variable X independent of Y, variable Y independent of Z, and variable Z independent of X; yet all three variables needn’t be independent (see exercise 6). Similarly, k-wise independence does not imply (k + 1)-wise independence. But (k + 1)-wise independence does imply k-wise independence.

The conditional probability of an event A, given an event B, is

Click to view larger image

Statements that are almost sure, or even quite sure.

In fact, we’ve proved more: We’ve shown that p_n is superpolynomially small, namely that

Click to view larger image

When the probability of an event is superpolynomially small, we say that A_n holds “quite surely,” and abbreviate that by ‘q.s.’. In other words, we’ve proved

Click to view larger image

We’ve seen that the combination of any two a.s. events is a.s.; hence the combination of any finite number of a.s. events is also a.s. That’s nice, but q.s. events are even nicer: The combination of any polynomial number of q.s. events is also q.s. For example, if n⁴ different people each toss n coins, it is quite sure that every one of them, without exception, will obtain between .49n and .51n heads!

(When making such asymptotic statements we ignore the inconvenient truth that our bound on the failure of the assertion, 2n⁴e^−.0001n in this case, becomes negligible only when n is greater than 700,000 or so.)

Exercises

1. [M21] (Nontransitive dice.) Suppose three biased dice with the respective faces

are rolled independently at random.

Show that Pr(A>B) = Pr(B>C) = Pr(C>A) = 5/9.
Find dice with Pr(A>B), Pr(B>C), Pr(C>A) all greater than 5/9.
If Fibonacci dice have F_m faces instead of just six, show that we could have

Click to view larger image

2. [M32] Prove that the previous exercise is asymptotically optimum, in the sense that min(Pr(A>B), Pr(B>C), Pr(C>A)) < 1/φ, regardless of the number of faces.

3. [22] (Lake Wobegon dice.) Continuing the previous exercises, find three dice such that . Each face of each die should be or or or or or .

4. [22] (Nontransitive Bingo.) Each player in the game of NanoBingo has a card containing four numbers from the set S = {1, 2, 3, 4, 5, 6}, arranged in two rows. An announcer calls out the elements of S, in random order; the first player whose card has a horizontal row with both numbers called shouts “Bingo!” and wins. (Or victory is shared when there are multiple Bingoes.) For example, consider the four cards

Click to view larger image

If the announcer calls “6, 2, 5, 1” when A plays against B, then A wins; but the sequence “1, 3, 2” would yield a tie. One can show that Pr(A beats B) = , Pr(B beats A) = , and Pr(A and B tie) = . Determine the probabilities of all possible outcomes when there are (a) two (b) three (c) four different players using those cards.

▶5. [HM22] (T. M. Cover, 1989.) Common wisdom asserts that longer games favor the stronger player, because they provide more evidence of the relative skills.

However, consider an n-round game in which Alice scores A₁ + ‧ ‧ ‧ + A_n points while Bob scores B₁+ ‧ ‧ ‧ + B_n points. Here each of A₁, . . . , A_n are independent random variables with the same distribution, all representing Alice’s strength; similarly, each of B₁, . . . , B_n independently represent Bob’s strength (and are independent of the A’s). Suppose Alice wins with probability P_n.

Show that it’s possible to have P₁ = .99 but P₁₀₀₀ < .0001.
Let m_k = 2^k³, n_k = 2^{k^2+k}, and q_k = 2^−k²/D, where D = 2⁻⁰ + 2⁻¹ + 2⁻⁴ + 2⁻⁹ + ‧ ‧ ‧ ≈ 1.56447. Suppose the random variable A takes the values (m₀, m₂, m₄, . . . ) with probabilities (q₀, q₂, q₄, . . . ); otherwise A = 0. Independently, the random variable B takes the values (m₁, m₃, m₅, . . .) with probabilities (q₁, q₃, q₅, . . . ); otherwise B = 0. What are Pr(A > B), Pr(A < B), and Pr(A = B)?
With the distributions in (b), prove that P_{n_k} → [k even] as k→∞.

▶6. [M22] Consider random Boolean (or binary) vectors X₁ . . . X_n, where n ≥ 2, with the following distribution: The vector x₁ . . . x_n occurs with probability 1/(n − 1)s² if x₁ + ‧ ‧ ‧ + x_n = 2, with probability (n − 2)/(2n − 2) if x₁ + ‧ ‧ ‧ + x_n = 0, and with probability 0 otherwise. Show that the components are pairwise independent (that is, X_i is independent of X_j when i ≠ j); but they are not k-wise independent for k > 2.

Also find a joint distribution, depending only on νx = x₁ +‧ ‧ ‧+ x_n, that is k-wise independent for k = 2 and k = 3 but not k = 4.

7. [M30] (Ernst Schulte-Geers, 2012.) Generalizing exercise 6, construct a νx-based distribution that has k-wise but not (k + 1)-wise independence, given k ≥ 1.

▶8. [M20] Suppose the Boolean vector x₁ + ·· ·+ x_n occurs with probability (2+(−1)^νx)/2ⁿ⁺¹, where νx = x₁ + ‧ ‧ ‧ + x_n. For what k is this distribution k-wise independent?

9. [M20] Find a distribution of Boolean vectors x₁ . . . x_n such that any two components are dependent; yet if we know the value of any x_j, the remaining components are (n−1)-wise independent. Hint: The answer is so simple, you might feel hornswoggled.

▶10. [M21] Let Y₁, . . . , Y_m be independent and uniformly distributed elements of {0, 1, . . . , p − 1}, where p is prime. Also let X_j = (j_m + Y₁j^m−1 + ‧ ‧ ‧ + Y_m) mod p, for 1 ≤ j ≤ n. For what k are the X’s k-wise independent?

11. [M20] If X₁, . . . , X_2n are independent random variables with the same discrete distribution, and if α is any real number whatsoever, prove that