Mathematical Preliminaries Redux
Many parts of this book deal with discrete probabilities, namely with a finite or countably infinite set Ω of atomic events ω, each of which has a given probability Pr(ω), where
This set Ω, together with the function Pr, is called a “probability space.” For example, Ω might be the set of all ways to shuffle a pack of 52 playing cards, with Pr(ω)= 1/52! for every such arrangement.
An event is, intuitively, a proposition that can be either true or false with certain probability. It might, for instance, be the statement “the top card is an ace,” with probability 1/13. Formally, an event A is a subset of Ω, namely the set of all atomic events for which the corresponding proposition A is true; and
A random variable is a function that assigns a value to every atomic event. We typically use uppercase letters for random variables, and lowercase letters for the values that they might assume; thus, we might say that the probability of the event X = x is Pr(X = x) = ∑ω∈Ω Pr(ω)[X(ω)= x]. In our playing card example, the top card T is a random variable, and we have Pr(T = 𝓠♠) = 1/52. (Sometimes, as here, the lowercase-letter convention is ignored.)
The random variables X1, ... , Xk are said to be independent if
for all (x1,...,xk). For example, if F and S denote the face value and suit of the top card T, clearly F and S are independent. Hence in particular we have Pr(T = 𝓠♠) = Pr(F = 𝓠) Pr(S = ♠). But T is not independent of the bottom card, B; indeed, we have Pr(T = t and B = b) ≠ 1/522 for any cards t and b.
A system of n random variables is called k-wise independent if no k of its variables are dependent. With pairwise (2-wise) independence, for example, we could have variable X independent of Y, variable Y independent of Z, and variable Z independent of X; yet all three variables needn’t be independent (see exercise 6). Similarly, k-wise independence does not imply (k + 1)-wise independence. But (k + 1)-wise independence does imply k-wise independence.
The conditional probability of an event A, given an event B, is
Statements that are almost sure, or even quite sure.
In fact, we’ve proved more: We’ve shown that pn is superpolynomially small, namely that
When the probability of an event is superpolynomially small, we say that An holds “quite surely,” and abbreviate that by ‘q.s.’. In other words, we’ve proved
We’ve seen that the combination of any two a.s. events is a.s.; hence the combination of any finite number of a.s. events is also a.s. That’s nice, but q.s. events are even nicer: The combination of any polynomial number of q.s. events is also q.s. For example, if n4 different people each toss n coins, it is quite sure that every one of them, without exception, will obtain between .49n and .51n heads!
(When making such asymptotic statements we ignore the inconvenient truth that our bound on the failure of the assertion, 2n4e−.0001n in this case, becomes negligible only when n is greater than 700,000 or so.)
Exercises
1. [M21] (Nontransitive dice.) Suppose three biased dice with the respective faces
are rolled independently at random.
Show that Pr(A>B) = Pr(B>C) = Pr(C>A) = 5/9.
Find dice with Pr(A>B), Pr(B>C), Pr(C>A) all greater than 5/9.
If Fibonacci dice have Fm faces instead of just six, show that we could have
2. [M32] Prove that the previous exercise is asymptotically optimum, in the sense that min(Pr(A>B), Pr(B>C), Pr(C>A)) < 1/φ, regardless of the number of faces.
3. [22] (Lake Wobegon dice.) Continuing the previous exercises, find three dice such that . Each face of each die should be or or or or or .
4. [22] (Nontransitive Bingo.) Each player in the game of NanoBingo has a card containing four numbers from the set S = {1, 2, 3, 4, 5, 6}, arranged in two rows. An announcer calls out the elements of S, in random order; the first player whose card has a horizontal row with both numbers called shouts “Bingo!” and wins. (Or victory is shared when there are multiple Bingoes.) For example, consider the four cards
If the announcer calls “6, 2, 5, 1” when A plays against B, then A wins; but the sequence “1, 3, 2” would yield a tie. One can show that Pr(A beats B) = , Pr(B beats A) = , and Pr(A and B tie) = . Determine the probabilities of all possible outcomes when there are (a) two (b) three (c) four different players using those cards.
▶5. [HM22] (T. M. Cover, 1989.) Common wisdom asserts that longer games favor the stronger player, because they provide more evidence of the relative skills.
However, consider an n-round game in which Alice scores A1 + ‧ ‧ ‧ + An points while Bob scores B1+ ‧ ‧ ‧ + Bn points. Here each of A1, . . . , An are independent random variables with the same distribution, all representing Alice’s strength; similarly, each of B1, . . . , Bn independently represent Bob’s strength (and are independent of the A’s). Suppose Alice wins with probability Pn.
Show that it’s possible to have P1 = .99 but P1000 < .0001.
Let mk = 2k3, nk = 2k2+k, and qk = 2−k2/D, where D = 2−0 + 2−1 + 2−4 + 2−9 + ‧ ‧ ‧ ≈ 1.56447. Suppose the random variable A takes the values (m0, m2, m4, . . . ) with probabilities (q0, q2, q4, . . . ); otherwise A = 0. Independently, the random variable B takes the values (m1, m3, m5, . . .) with probabilities (q1, q3, q5, . . . ); otherwise B = 0. What are Pr(A > B), Pr(A < B), and Pr(A = B)?
With the distributions in (b), prove that Pnk → [k even] as k→∞.
▶6. [M22] Consider random Boolean (or binary) vectors X1 . . . Xn, where n ≥ 2, with the following distribution: The vector x1 . . . xn occurs with probability 1/(n − 1)s2 if x1 + ‧ ‧ ‧ + xn = 2, with probability (n − 2)/(2n − 2) if x1 + ‧ ‧ ‧ + xn = 0, and with probability 0 otherwise. Show that the components are pairwise independent (that is, Xi is independent of Xj when i ≠ j); but they are not k-wise independent for k > 2.
Also find a joint distribution, depending only on νx = x1 +‧ ‧ ‧+ xn, that is k-wise independent for k = 2 and k = 3 but not k = 4.
7. [M30] (Ernst Schulte-Geers, 2012.) Generalizing exercise 6, construct a νx-based distribution that has k-wise but not (k + 1)-wise independence, given k ≥ 1.
▶8. [M20] Suppose the Boolean vector x1 + ·· ·+ xn occurs with probability (2+(−1)νx)/2n+1, where νx = x1 + ‧ ‧ ‧ + xn. For what k is this distribution k-wise independent?
9. [M20] Find a distribution of Boolean vectors x1 . . . xn such that any two components are dependent; yet if we know the value of any xj, the remaining components are (n−1)-wise independent. Hint: The answer is so simple, you might feel hornswoggled.
▶10. [M21] Let Y1, . . . , Ym be independent and uniformly distributed elements of {0, 1, . . . , p − 1}, where p is prime. Also let Xj = (jm + Y1jm−1 + ‧ ‧ ‧ + Ym) mod p, for 1 ≤ j ≤ n. For what k are the X’s k-wise independent?
11. [M20] If X1, . . . , X2n are independent random variables with the same discrete distribution, and if α is any real number whatsoever, prove that
12. [21] Which of the following four statements are equivalent to the statement that Pr(A|B) > Pr(A)? (i) Pr(B |A) > Pr(B); (ii) ; (iii) ; (iv) .
13. [15] True or false: Pr(A|C) > Pr(A) if Pr(A|B) > Pr(A) and Pr(B|C) > Pr(B).