The Universe of Discourse
           
Thu, 22 Mar 2007

My favorite math problem

Introduction

According to legend, the imaginary unit i = √-1 was invented by mathematicians because the equation x2 + 1 = 0 had no solution. So they introduced a new number i, defined so that i2 + 1 = 0, and investigated the consequences. Like most legends, this one has very little to do with the real history of the thing it purports to describe. But let's pretend that it's true for a while.

Descartes' theorem

If P(x) is some polynomial, and z is some number such that P(z) = 0, then we say that z is a zero of P or a root of P. Descartes' theorem (yes, that Descartes) says that if z is a zero of P, then P can be factored into P = (x-z)P', where P' is a polynomial of lower degree, and has all the same roots as P, except without z. For example, consider P = x3 - 6x2 + 11x - 6. One of the zeroes of P is 1, so Descartes theorem says that we can write P in the form (x-1)P' for some second-degree polynomial P', and if there are any other zeroes of P, they will also be zeroes of P'. And in fact we can; x3 - 6x2 + 11x - 6 = (x-1)(x2 - 5x + 6). The other zeroes of the original polynomial P are 2 and 3, and both of these (but not 1) are also zeroes of x2 - 5x + 6.

We can repeat this process. If a1, a2, ... an are the zeroes of some nth-degree polynomial P, we can factor P as (x-a1)(x-a2)...(x-an).

The only possibly tricky thing here is that some of the ai might be the same. x2 - 2x + 1 has a zero of 1, but twice, so it factors as (x-1)(x-1). x3 - 4x2 + 5x - 2 has a zero of 1, but twice, so it factors as (x-1)(x2 - 3x + 2), where (x2 - 3x + 2) also has a zero of 1, but this time only once.

Descartes' theorem has a number of important corollaries: an nth-degree polynomial has no more than n zeroes. A polynomial that has a zero can be factored. When coupled with the so-called fundamental theorem of algebra, which says that every polynomial has a zero over the complex numbers, it implies that every nth-degree polynomial can be factored into a product of n factors of the form (x-z) where z is complex, or into a product of first- and second-degree polynomials with real coefficients.

The square roots of -1

Suppose we want to know what the square roots of 1 are. We want numbers x such that x2 = 1, or, equivalently, such that x2 - 1 = 0. x2 - 1 factors into (x-1)(x+1), so the zeroes are ±1, and these are the square roots of 1. On the other hand, if we want the square roots of 0, we need the solutions of x2 = 0, and the corresponding calculation calls for us to factor x2, which is just (x-0)(x-0), So 0 has the same square root twice; it is the only number without two distinct square roots.

If we want to find the square roots of -1, we need to factor x2 + 1. There are no real numbers that do this, so we call the polynomial irreducible. Instead, we make up some square roots for it, which we will call i and j. We want (x - i)(x - j) = x2 + 1. Multiplying through gives x2 - (i + j)x + ij = x2 + 1. So we have some additional relationships between i and j: ij = 1, and i + j = 0.

Because i + j = 0, we always dispense with j and just call it -i. But it's essential to realize that neither one is more important than the other. The two square roots are identical twins, different individuals that are completely indistinguishable from each other. Our choice about which one was primary was completely arbitrary. That doesn't even do justice to the arbitrariness of the choice: i and j are so indistinguishable that we still don't know which one got the primary notation. Every property enjoyed by i is also enjoyed by j. i3 = j, but j3 = i. eiπ = -1, but so too ejπ = -1. And so on.

In fact, I once saw the branch of mathematics known as Galois theory described as being essentially the observation that i and -i were indistinguishable, plus the observation that there are no other indistinguishable pairs of complex numbers.

Anyway, all this is to set up the main point of the article, which is to describe my favorite math problem of all time. I call it the "train problem" because I work on it every time I'm stuck on a long train trip.

We are going to investigate irreducible polynomials; that is, polynomials that do not factor into products of simpler polynomials. Descartes' theorem implies that irreducible polynomials are zeroless, so we are going to start by looking for polynomials without zeroes. But we are not going to do it in the usual world of the real numbers. The answer in that world is simple: x2 + 1 is irreducible, and once you make up a zero for this polynomial and insert it into the universe, you are done; the fundamental theorem of algebra guarantees that there are no more irreducible polynomials. So instead of the reals, we're going to do this in a system known as Z2.

Z2

Z2 is much simpler than the real numbers. Instead of lots and lots of numbers, Z2 has only two. (Hence the name: The 2 means 2 and the Z means "numbers". Z is short for Zahlen, which is German.) Z2 has only 0 and 1. Addition and multiplication are as you expect, except that instead of 1 + 1 = 2, we have 1 + 1 = 0.

+01
001
110
×01
0 00
1 01

The 1+1=0 oddity is the one and only difference between Z2 and ordinary algebra. All the other laws remain the same. To be explicit, I will list them below:

  • a + 0 = a
  • a + b = b + a
  • (a + b) + c = a + (b + c)
  • a × 0 = 0
  • a × 1 = a
  • a × b = b × a
  • (a × b) × c = a × (b × c)
  • a × (b + c) = a × c + b × c

Algebra in Z2

But the 1+1=0 oddity does have some consequences. For example, a + a = a·(1 + 1) = a·0 = 0 for all a. This will continue to be true for all a, even if we extend Z2 to include more elements than just 0 and 1, just as x + x = 2x is true not only in the real numbers but also in their extension, the complex numbers.

Because a + a = 0, we have a = -a for all a, and this means that in Z2, addition and subtraction are the same thing. If you don't like subtraction—and really, who does?—then Z2 is the place for you.

As a consequence of this, there are some convenient algebraic manipulations. For example, suppose we have a + b = c and we want to solve for a. Normally, we would subtract b from both sides. But in Z2, we can add b to both sides, getting a + b + b = c + b. Then the b + b on the left cancels (because b + b = 0) leaving a = c + b. From now on, I'll call this the addition law.

If you suffered in high school algebra because you wanted (a + b)2 to be equal to a2 + b2, then suffer no more, because in Z2 they are equal. Here's the proof:

(a + b)2= 
(a + b)(a + b)= 
a(a + b) + b(a + b)= 
(a2 + ab) + (ba + b2)= 
a2 + ab + ab + b2= 
a2 + b2  

Because the ab + ab cancels in the last step.

  This happy fact, which I'll call the square law, also holds true for sums of more than two terms: (a + b + ... + z)2 = a2 + b2 + ... + z2. But it does not hold true for cubes; (a + b)3 is a3 + a2b + ab2 + b3, not a3 + b3.

Irreducible polynomials in Z2

Now let's try to find a polynomial that is irreducible in Z2. The first degree polynomials are x and x+1, and both of these have zeroes: 0 and 1, respectively.

There are four second-degree polynomials:

  1. x2
  2. x2 + 1
  3. x2 + x
  4. x2 + x + 1
If a second-degree polynomial is reducible, it must factor into a product of two first-degree polynomials, and there are three such products: x·x, (x+1)(x+1), and x(x+1). These are, respectively, polynomials 1, 2, and 3 in the previous display. The polynomials have zeroes as indicated here:

PolynomialZeroesFactorization
x2 0 (twice) x2
x2 + 11 (twice) (x+1)2
x2 + x 0, 1 x(x+1)
x2 + x + 1 none  

Note in particular that, unlike in the real numbers, x2 + 1 is not irreducible in Z2. 1 is a zero of this polynomial, which, because of the square law, factors as (x+1)(x+1). But we have found an irreducible polynomial, x2 + x + 1. The other three second-degree polynomials all can be factored, but x2 + x + 1 cannot.

Like the fictitious mathematicians who invented i, we will rectify this problem, by introducing into Z2 a new number, called b, which has the property that b2 + b + 1 = 0. What are the consequences of this?

The first thing to notice is that we haven't added only one new number to Z2. When we add i to the real numbers, we also get a lot of other stuff like i+1 and 3i-17 coming along for the ride. Here we can't add just b. We must also add b+1.

The next thing we need to do is figure out how to add and multiply with b. The addition and multiplication tables are extended as follows:

+01bb+1
001bb+1
110b+1b
bbb+101
b+1b+1b10
×01bb+1
00000
101bb+1
b0bb+11
b+10b+11b

The only items here that might require explanation are the entries in the lower-right-hand corner of the multiplication table. Why is b×b = b+1? Well, b2 + b + 1 = 0, by definition, so b2 = b + 1 by the addition law. Why is b × (b+1) = 1? Well, b × (b+1) = b2 + b = (b + 1) + b = 1 because the two b's cancel. Perhaps you can work out (b+1) × (b+1) for yourself.

From here there are a number of different things we could investigate. But what I think is most crucial is to discover the value of b3. Let's calculate that. b3 = b·b2. b2, we know, is b+1. So b3 = b(b+1) = 1. Thus b is a cube root of 1.

People with some experience with this sort of study will not be surprised by this. The cube roots of 1 are precisely the zeroes of the polynomial x3 + 1 = 0. 1 is obviously a cube root of 1, and so Descartes' theorem says that x3 + 1 must factor into (x+1)P, where P is some second-degree polynomial whose zeroes are the other two cube roots. P can't be divisible by x, because 0 is not a cube root of 1, so it must be either (x+1)(x+1), or something else. If it were the former, then we would have x3 + 1 = (x+1)3, but as I pointed out before, we don't. So it must be something else. The only candidate is x2 + x + 1. Thus the other two cube roots of 1 must be zeroes of x2 + x + 1.

The next question we must answer without delay is: b is one of the zeroes of x2 + x + 1; what is the other? There is really only one possibility: it must be b+1, because that is the only other number we have. But another way to see this is to observe that if p is a cube root of 1, then p2 must also be a cube root of 1:

b3=1
(b3)2=12
b6=1
(b2)3=1

So we now know the three zeroes of x3 + 1: they are 1, b, and b2. Since x3 + 1 factors into (x+1)(x2+x+1), the two zeroes of x2+x+1 are b and b2. As we noted before, b2 = b+1.

This means that x2 + x + 1 should factor into (x+b)(x+b+1). Multiplying through we get x2 + bx + (b+1)x + b(b+1) = x2 + x + 1, as hoped.

Actually, the fact that both b and b2 are zeroes of x2+x+1 is not a coincidence. Z2 has the delightful property that if z is a zero of any polynomial whose coefficients are all 0's and 1's, then z2 is also a zero of that polynomial. I'll call this Theorem 1. Theorem 1 is not hard to prove; it follows from the square law. Feel free to skip the demonstration which follows in the rest of the paragraph. Suppose the polynomial is P(x) = anxn + an-1xn-1 + ... + a1x + a0, and z is a zero of P, so that P(z) = 0. Then (P(z))2 = 0 also, so (anzn + an-1zn-1 + ... + a1z + a0)2 = 0. By the square law, we have an2(zn)2 + an-12(zn-1)2 + ... + a12z2 + a02 = 0. Since all the ai are either 0 or 1, we have ai2 = ai, so an(zn)2 + an-1(zn-1)2 + ... + a1z2 + a0 = 0. And finally, (zk)2 = (z2)k, so we get an(z2)n + an-1(z2)n-1 + ... + a1z2 + a0 = P(z2) = 0, which is what we wanted to prove.

Now you might be worried about something: the fact that b is a zero of x2 + x + 1 implies that b2 is also a zero; doesn't the fact that b2 is a zero imply that b4 is also a zero? And then also b8 and b16 and so forth? Then we would be in trouble, because x2 + x + 1 is a second-degree polynomial, and, by Descartes' theorem, cannot have more than two zeroes. But no, there is no problem, because b3 = 1, so b4 = b·b3 = b·1 = b, and similarly b8 = b2, so we are getting the same two zeroes over and over.

Symmetry again

Just as i and -i are completely indistinguishable from the point of view of the real numbers, so too are b and b+1 indistinguishable from the point of view of Z2. One of them has been given a simpler notation than the other, but there is no way to tell which one we gave the simpler notation to! The one and only defining property of b is that b2 + b + 1 = 0; since b+1 also shares this property, it also shares every other property of b. b and b+1 are both cube roots of 1. Each, added to the other gives 1; each, added to 1 gives the other. Each, when multiplied by itself gives the other; each when multiplied by the other gives 1.

Third-degree polynomials

We have pretty much exhausted the interesting properties of b and its identical twin b2, so let's move on to irreducible third-degree polynomials. If a third-degree polynomial is reducible, then it is either a product of three linear factors, or of a linear factor and an irreducible second-degree polynomial. If the former, it must be one of the four products x3, x2(x+1), x(x+1)2, or (x+1)3. If the latter, it must be one of x(x2+x+1) or (x+1)(x2+x+1). This makes 6 reducible polynomials. Since there are 8 third-degree polynomials in all, the other two must be irreducible:

PolynomialZeroesFactorization
x3    0, 0, 0 x3
x3   + 1 1, b, b+1 (x+1)(x2+x+1)
x3 + x  0, 1, 1 x(x+1)2
x3 + x + 1 none  
x3+ x2   0, 0, 1 x2(x+1)
x3+ x2  + 1 none  
x3+ x2+ x  0, b, b+1 x(x2+x+1)
x3+ x2+ x + 1 1, 1, 1 (x+1)3

There are two irreducible third-degree polynomials. Let's focus on x3 + x + 1, since it seems a little simpler. As before, we will introduce a new number, c, which has the property c3 + c + 1 = 0. By the addition law, this implies that c3 = c + 1. Using this, we can make a table of powers of c, which will be useful later:

c1 =  c 
c2 = c2  
c3 =  c +1
c4 = c2 +c 
c5 = c2 +c +1
c6 = c2 + 1
c7 =   1

To calculate the entries in this table, just multiply each line by c to get the line below. For example, once we know that c5 = c2 + c + 1, we multiply by c to get c6 = c3 + c2 + c. But c3 = c + 1 by definition, so c6 = (c + 1) + c2 + c = c2 + 1. Once we get as far as c7, we find that c is a 7th root of 1. Analogous to the way b and b2 were both cube roots of 1, the other 7th roots of 1 are c2, c3, c4, c5, c6, and 1. For example, c5 is a 7th root of 1 because (c5)7 = c35 = (c7)5 = 15 = 1.

We've decided that c is a zero of x3 + x + 1. What are the other two zeroes? By theorem 1, they must be c2 and (c2)2 = c4. Wait, what about c8? Isn't that a zero also, by the same theorem? Oh, but c8 = c·c7 = c, so it's really the same zero again.

What about c3, c5, and c6? Those are the zeroes of the other irreducible third-degree polynomial, x3 + x2 + 1. For example, (c3)3 + (c3)2 + 1 = c9 + c6 + 1 = c2 + (c2+1) + 1 = 0.

So when we take Z2 and adjoin a new number that is a solution of the previously unsolvable equation x3 + x + 1 = 0, we're actually forced to add six new numbers; the resulting system has 8, and includes three different solutions to both irreducible third-degree polynomials, and a full complement of 7th roots of 1. Since the zeroes of x3 + x + 1 and x3 + x2 + 1 are all 7th roots of 1, this tells us that x7 + 1 factors as (x+1)(x3 + x + 1)(x3 + x2 + 1), which might not have been obvious.

Onward

I said this was my favorite math problem, but I didn't say what the question was. But by this time we have dozens of questions. For example:

  • We've seen cube roots and seventh roots of 1. Where did the fifth roots go?

    (The square theorem ensures that even-order roots are uninteresting. The fourth roots of 1, for example, are just 1, 1, 1, and 1. The sixth roots are the same as the cube roots, but repeated: 1, 1, b, b, b2, and b2.)

    The fifth roots, it turns out, appear later, as the four zeroes of the polynomial x4 + x3 + x2 + x + 1. These four, and 1, are five of the 15 fifteenth roots of 1. Eight others appear as the zeroes of the other two irreducible fourth-degree polynomials, and the other two fifteenth roots are b and b2. If we let d be a zero of one of the other irreducible fourth-degree polynomials, say x4 + x + 1, then the zeroes of this polynomial are d, d2, d4, and d8, and the zeroes of the other are d7, d11, d13, and d14. So we have an interesting situation. We can extend Z2 by adjoining the zeroes of x3 + 1, to get a larger system with 4 elements instead of only 2. And we can extend Z2 by adjoining the zeroes of x7 + 1, to get a larger system with 8 elements instead of only 2. But we can't do the corresponding thing with x5 + 1, because the fifth roots of 1 don't form a closed system. The sum of any two of the seventh roots of 1 was another seventh root of 1; for example, c3 + c4 = c6 and c + c2 = c4. But the corresponding fact about fifth roots is false. Suppose, for example, that y5 = 1. We'd like for y + y2 to be another fifth root of 1, but it isn't; it actually turns out that (y + y2)5 = b. For which n do the nth roots of 1 form a closed system?

  • What do we get if we extend Z2 with both b and c? What's bc? What's b + c? (Spoiler: b + c turns out to be zero of x6 + x5 + x3 + x2 + 1, and is a 63rd root of 1, so we may as well call it f. bc = f48.)

    The degree of a number is the degree of the simplest polynomial of which it is a zero. What's the degree of b + c? In general, if p and q have degrees d(p) and d(q), respectively, what can we say about the degrees of pq and p + q?

  • Since there is an nth root of 1 for every odd n, that implies that for all odd n, there is some k such that n divides 2k - 1. How are n and k related?

  • For n = 2, 3, and 4, the polynomial xn + x + 1 was irreducible. Is this true in general? (No: x5 + x + 1 = (x3 + x2 + 1)(x2 + x + 1).) When is it true? Is there a good way to find irreducible polynomials? How many irreducible polynomials are there of degree n? (The sequence starts out 0, 0, 1, 2, 3, 6, 9, 18, 30, 56, 99,...)

  • The Galois theorem says that any finite field that contains Z2 must have 2n elements, and its addition component must be isomorphic to a direct product of n copies of Z2. Extending Z2 with c, for example, gives us the field GF(8), obtained as the direct product of {0, 1}, {0, c}, and {0, c2}. The addition is very simple, but the multiplication, viewed in this light, is rather difficult, since it depends on knowing the secret identity c · c2 = c + 1.

    Identify the the element a0 + a1c + a2c2 with the number a0 + 2a1 + 4a2, and let the addition operation be exclusive-or. What happens to the multiplication operation?

  • Consider a linear-feedback shift register:

            # Arguments: N, a positive integer
            #            0 <= K <= 2**N-1
            MAXBIT = 2**N
            c = 2
            until c == 1:
              c *= 2
              if (c => MAXBIT):
                c &= ~MAXBIT
                c ^= K
              print c
    
    (Except that this function runs backward relative to the way that LFSRs usually work.)

    Does this terminate for all choices of N and K? (Yes.) For any given N, the maximum number of iterations of the until loop is 2N - 1. For which choices of K is this maximum achieved?

  • Consider irreducible sixth-degree polynomials. We should expect that adjoining a zero of one of these, say f, will expand Z2 to contain 64 elements: 0, 1, f, f2, ..., f62, and f63 = 1. In fact, this is the case.

    But of these 62 new elements, some are familiar. Specifically, f9 = c, and f21 = b. Of the 62 powers of f, 8 are numbers we have seen before, leaving only 54 new numbers. Each irreducible 6th-degree polynomial has 6 of these 54 as its zeroes, so there are 9 such polynomials. What does this calculation look like in general?

  • c1, c2, and c4 are zeroes of x3 + x + 1, one of the two irreducible third-degree polynomials. This means that x3 + x + 1 = (x + c1)(x + c2)(x + c4). Multiplying out the right-hand side, we get: x3 + (c + c2 + c4)x2 + (c3 + c5 + c6)x + 1. Equating coefficients, we have c + c2 + c4 = 0 and c3 + c5 + c6 = 1. c3, c5, and c6 are the zeroes of the other irreducible third-degree polynomial. Coincidence? (No.)

  • The irreducible polynomials appear to come in symmetric pairs: if anxn + an-1xn-1 + ... + a1x + a0 is irreducible, then so is a0xn + a1xn-1 + ... + an-1x + an. Why?

  • c1, c2, and c4 are the zeroes of one of the irreducible third-degree polynomials. 1, 2, and 4 are all the three-bit binary numbers that have exactly one 1 bit. c3, c5, and c6 are the zeroes the other irreducible third-degree polynomial. 3, 5, and 6 are all the three-bit binary numbers that have exactly one 0 bit. Coincidence? (No; theorem 1 is closely related here.)

    Now consider the analogous case for fourth-degree polynomials. The 4-bit numbers that have one 1 bit are 0001, 0010, 0100, and 1000 (1, 2, 4, and 8) which are all zeroes of one of the irreducible 4th-degree polynomials. The 4-bit numbers that have one 0 bit are 1110, 1101, 1011, and 0111 (7, 11, 13, and 14) which are all zeroes of another of the irreducible 4th-degree polynomials.

    The 4-bit numbers with two 1 bits and two 0 bits fall into two groups. In one group we have the numbers where similar bits are adjacent: 0011, 0110, 1100, and 1001, or 3, 6, 9, and 12. These are the zeroes of the third irreducible 4th-degree polynomial. The remaining 4-bit numbers are 0101 and 1010, which correspond to b and b2.

    There is, therefore, a close relationship between the structure an extension of Z2 and the so-called "necklace patterns" of bits, which are patterns that imagine that the bits are joined into a circle with no distinguishable starting point.

    The exceptional values (b and b2) corresponded to necklace patterns that did not possess the full n-fold symmetry. This occurs for composite values of n. But we might also expect to find exceptions at n = 11, since then 2n-1 = 2047 is composite. However, 2047 is exceptional in another way: 2047 = 23 · 89, and neither 23 nor 89 is a divisor of 2n-1 for any smaller value of n. Can we prove from this that when p is prime and 2p-1 is not, the divisors of 2p-1 do not divide 2q-1 for any q < p?

  • Since ci are the seventh roots of 1, we have x7 + 1 = (x + 1)(x + c)(x + c2)...(x + c6). After dividing out the trivial factor of x+1, we are left with:

    x6 + x5 + ... + 1 = (x + c)(x + c2)...(x + c6)

    Multiplying through on the right side and equating coefficients gives:

    c + c2 + ... + c6=1
    cc2 + cc3 + ... + c5c6=1
    cc2c3 + cc2c4 + ... + c4c5c6=1
    ...=...
    cc2c3c4c5c6=1

    What, if anything, can one make of this?

The reason this problem works so well as a long-train-trip problem is that it can be approached from many different directions and at many different levels of difficulty. If I am tired and woozy, I can still enumerate fifth-degree polynomials looking for the irreducible ones, calculate tables of values of en, and ponder the interrelationships of the exponents until I fall asleep. When I wake up again, rested, I can consider the deep relationships with the Galois theorem, with algebra, with random number generation, and with cryptography.

And then, when I get tired of Z2, I can start all over with Z3.


[Other articles in category /math] permanent link

Sun, 11 Mar 2007

Bernoulli processes
A family has four children. Assume that the sexes of the four children are independent, and that boys and girls are equiprobable. What's the most likely distribution of boys and girls?

Well,it depends how you count. Are there three possibilities or five?

All four the same
Three the same, one different
Two-and-two
Four boys, no girls
Three boys, one girl
Two boys, two girls
One boy, three girls
No boys, four girls
If we group outcomes into five categories, as in the pink division on the right, the most likely distribution is two-and-two, as you would probably guess:

BoysGirlsProbability
040.0625
130.25
220.375
310.25
400.0625

This distribution is depicted in the graph at right. Individually, (3, 1) and (1, 3) are less likely than (2, 2). But "three-and-one" includes both (1, 3) and (3, 1), whereas "two-and-two" includes only (2, 2). So if you group outcomes into three categories, as in the green division above left, "three-and-one" comes out more frequent overall than "two-and-two":


One sexThe otherTotal
probability
400.125
310.5
220.375


It makes a difference whether you specify the sexes in the distribution. If a "distribution" is a thing like "b of the children are boys and g are girls", then the most frequent distribution is (2, 2). But if a distribution is "x of one sex and y of the other", then the most frequent distribution [3, 1], where I've used square brackets to show that the order is not important. [3, 1] is the same as [1, 3].

This is true in general. Suppose someone has 1,000 kids. What's the most likely distribution of sexes? It's 500 boys and 500 girls, which I've been writing (500, 500). This is more likely than either (499, 501) or (501, 499). But if you consider "Equal numbers" versus "501-to-499", which I've been writing as [500, 500] and [501, 499], then [501, 499] wins:

BoysGirlsProbability
5014990.02517
5005000.02522
4995010.02517


One sexThe otherTotal
probability
5014990.05035
5005000.02522
For odd numbers of kids, this anomaly doesn't occur, because there's no symmetric value like [500, 500] to get shorted.

DistributionNumber
of hands
Frequency
[4, 4, 3, 2] 10810800 0.16109347
[5, 4, 3, 1] 8648640 0.12887478
[5, 3, 3, 2] 8648640 0.12887478
[5, 4, 2, 2] 6486480 0.09665608
[4, 3, 3, 3] 4804800 0.07159710
[6, 4, 2, 1] 4324320 0.06443739
[6, 3, 2, 2] 4324320 0.06443739
[6, 3, 3, 1] 2882880 0.04295826
[5, 5, 2, 1] 2594592 0.03866243
[7, 3, 2, 1] 2471040 0.03682137
[4, 4, 4, 1] 1801800 0.02684891
[6, 4, 3, 0] 1441440 0.02147913
[5, 4, 4, 0] 1081080 0.01610935
[6, 5, 2, 0] 864864 0.01288748
[6, 5, 1, 1] 864864 0.01288748
[5, 5, 3, 0] 864864 0.01288748
[7, 4, 2, 0] 617760 0.00920534
[7, 4, 1, 1] 617760 0.00920534
[7, 2, 2, 2] 617760 0.00920534
[8, 2, 2, 1] 463320 0.00690401
[7, 3, 3, 0] 411840 0.00613689
[8, 3, 2, 0] 308880 0.00460267
[8, 3, 1, 1] 308880 0.00460267
[7, 5, 1, 0] 247104 0.00368214
[8, 4, 1, 0] 154440 0.00230134
[6, 6, 1, 0] 144144 0.00214791
[9, 2, 1, 1] 102960 0.00153422
[9, 3, 1, 0] 68640 0.00102282
[9, 2, 2, 0] 51480 0.00076711
[10, 2, 1, 0] 20592 0.00030684
[7, 6, 0, 0] 20592 0.00030684
[8, 5, 0, 0] 15444 0.00023013
[9, 4, 0, 0] 8580 0.00012785
[10, 1, 1, 1] 6864 0.00010228
[10, 3, 0, 0] 3432 0.00005114
[11, 1, 1, 0] 1872 0.00002789
[11, 2, 0, 0] 936 0.00001395
[12, 1, 0, 0] 156 0.00000232
[13, 0, 0, 0] 4 0.00000006

Similar behavior appears in related problems. What's the most likely distribution of suits in a bridge hand? People often guess (4, 3, 3, 3), and this is indeed the most likely distribution of particular suits. That is, if you consider distributions of the form "a hearts, b spades, c diamonds, and d clubs", then (4, 3, 3, 3) gives the most likely distribution. (The distributions (3, 4, 3, 3), (3, 3, 4, 3), and (3, 3, 3, 4) are of course equally frequent.) But if distributions have the form "a cards of one suit, b of another, c of another, and d of the fourth"—which is what is usually meant by a suit distribution in a bridge hand—then [4, 4, 3, 2] is the most likely distribution, and [4, 3, 3, 3] is in fifth place.

Why is this? [4, 3, 3, 3] covers the four most frequent distributions: (4, 3, 3, 3), (3, 4, 3, 3), (3, 3, 4, 3), and (3, 3, 3, 4). But [4, 4, 3, 2] covers twelve quite frequent distributions: (4, 4, 3, 2), (4, 3, 2, 4), and so on. Even though the individual distributions aren't as common as (4, 4, 4, 3), there are twelve of them instead of 4. This gives [4, 4, 3, 2] the edge.

[5, 4, 3, 1] includes 24 distributions, and ends up tied for second place. A complete table is in the sidebar at left.

(For 5-card poker hands, the situation is much simpler. [2, 2, 1, 0] is most common, followed by [2, 1, 1, 1] and [3, 1, 1, 0] (tied), then [3, 2, 0, 0], [4, 1, 0, 0], and [5, 0, 0, 0].)

This same issue arose in my recent article on Yahtzee roll probabilities. There we had six "suits", which represented the six possible rolls of a die, and I asked how frequent each distribution of "suits" was when five dice were rolled. For distribution [p1, p2, ...], we let ni be the number of p's that are equal to i. Then the expression for probability of the distribution has a factor of $\prod {n_i}!$ in the denominator, with the result that distributions with a lot of equal-sized parts tend to appear less frequently than you might otherwise expect.

I'm not sure how I got so deep into this end of the subject, since I didn't really want to compare complex distributions to each other so much as to compare simple distributions under different conditions. I had originally planned to discuss the World Series, which is a best-four-of-seven series of baseball games that we play here in the U.S. and sometimes in that other country to the north. Sometimes one team wins four games in a row ("sweeps"); other times the Series runs the full seven games.

You might expect that even splits would tend to occur when the two teams playing were evenly matched, but that when one team was much better than the other, the outcome would be more likely to be a sweep. Indeed, this is generally so. The chart below graphs the possible outcomes. The x-axis represents the probability of the Philadelphia Phillies winning any individual game. The y-axis is the probability that the Phillies win the entire series (red line), which in turn is the sum of four possible events: the Phillies win in 4 games (green), in 5 games (dark blue), in 6 games (light blue), or in 7 games (magenta). The probabilities of the Nameless Opponents winning are not shown, because they are exactly the opposite. (That is, you just flip the whole chart horizontally.)

(The Opponents are a semi-professional team that hails from Nameless, Tennessee.)

Clearly, the Phillies have a greater-than-even chance of winning the Series if and only if they have a greater-than-even chance of winning each game. If they are playing a better team, they are likely to lose, but if they do win they are most likely to do so in 6 or 7 games. A sweep is the most likely outcome only if the Opponents are seriously overmatched, and have a less than 25% chance of winning each game. (The lines for the 4-a outcome and the 4-b outcome cross at 1-(pa / pb)1/(b-a), where pi is 1, 4, 10, 20 for i = 0, 1, 2, 3.)

If we consider just the first four games of the World Series, there are five possible outcomes, ranging from a Phillies sweep, through a two-and-two split, to an Opponents sweep. Let p be the probability of the Phillies winning any single game. As p increases, so does the likelihood of a Phillies sweep. The chart below plots the likelihood of each of the five possible outcomes, for various values of p, charted here on the horizontal axis:

The leftmost red curve is the probability of an Opponents sweep; the red curve on the right is the probability of a Phillies sweep. The green curves are the probabilities of 3-1 outcomes favoring the Opponents and the Phillies, respectively, with the Phillies on the right as before. The middle curve, in dark blue, is the probability of a 2-2 split.

When is the 2-2 split the most likely outcome? Only when the Phillies and the Opponents are approximately evenly matched, with neither team no more than 60% likely to win any game.

But just as with the sexes of the four kids, we get a different result if we consider the outcomes that don't distinguish the teams. For the first four games of the World Series, there are only three outcomes: a sweep (which we've been writing [4, 0]), a [3, 1] split, and a [2, 2] split:

Here the green lines in the earlier chart have merged into a single outcome; similarly the red lines have merged. As you can see from the new chart, there is no pair of teams for which a [2, 2] split predominates; the even split is buried. When one team is grossly overmatched, winning less than about 19% of its games, a sweep is the most likely outcome; otherwise, a [3, 1] split is most likely.

Here are the corresponding charts for series of various lengths.

Series length
(games)
Distinguish teams Don't
distinguish teams
2
3
4
5
6
7
8
9
10

I have no particular conclusion to announce about this; I just thought that the charts looked cool.

Coming later, maybe: reasoning backwards: if the Phillies sweep the World Series, what can we conclude about the likelihood that they are a much better team than the Opponents? (My suspicion is that you can conclude a lot more by looking at the runs scored and runs allowed totals.)

(Incidentally, baseball players get a share of the ticket money for World Series games, but only for the first four games. Otherwise, they could have an an incentive to prolong the series by playing less well than they could, which is counter to the ideals of sport. I find this sort of rule, which is designed to prevent conflicts of interest, deeply satisfying.)


[Other articles in category /math] permanent link

Mon, 05 Mar 2007

An integer partition puzzle
Last month I wrote an article about calculating Yahtzee probabilities and another one about counting permutations in which integer partitions came up. An integer partition of some integer N is an unordered sequence of positive integers that sums to N. For example, there are 5 different integer partitions of 4:

1 1 1 1
2 1 1
2 2
3 1
4
I've spent a lot of time tinkering with partitions since then.

Here's one interesting fact: it's quite easy to calculate the number of partitions of N. Let P(n, k) be the number of partitions of n into parts that are at least k. Then it's easy to see that:

$$P(n, k) = \sum_{i=k}^{n-1} P(n-i, k)$$

And there are simple boundary conditions: P(n, n) = 1; P(n, k) = 0 when k > n, and so forth. And P(n), the number of partitions of n into parts of any size, is just P(n, 1). So a program to calculate P(n) is very simple:

        my @P;
        sub P {
          my ($n, $k) = @_;
          return 0 if $n < 0;
          return 1 if $n == 0;
          return 0 if $k > $n;
          my $r = $P[$n] ||= [];
          return $r->[$k] if defined $r->[$k];
          return $r->[$k] = P($n-$k, $k) + P($n, $k+1);
        }

        sub part {
          P($_[0], 1);
        }

        for (1..100) {
          printf "%3d %10d\n", $_, part($_);
        }
I had a funny conversation once with someone who ought to have known better: I remarked that it was easy to calculate P(n), and disagreed with me, asking why Rademacher's closed-form expression for P(n) had been such a breakthrough. But the two properties are independent; the same is true for lots of stuff. Just because you can calculate something doesn't mean you understand it. Calculating ζ(2) is quick and easy, but it was a major breakthrough when Euler discovered that it was equal to π2/6. Calculating ζ(3) is even quicker and easier, but nobody has any idea what the value represents.

Similarly, P(n) is easy to calculate, but harder to understand. Ramanujan observed, and proved, that P(5k+4) is always a multiple of 5, which had somehow escaped everyone's notice until then. And there are a couple of other similar identities which were proved later: P(7k+5) is always a multiple of 7; P(11k+6) is always a multiple of 11. Based on that information, any idiot could conjecture that P(13k+7) would always be a multiple of 13; this conjecture is wrong. (P(7) = 15.)

Anyway, all that is just leading up the the real point of this note, which is that I was tabulating the number of partitions of n into exactly k parts, which is also quite easy. Let's call this Q(n, k). And I discovered that Q(13, 4) = Q(13, 5). There are 18 ways to divide a pile of 13 beans into 4 piles, and also 18 ways to divide the beans into 5 piles.

1 1 1 10
1 1 2 9
1 1 3 8
1 1 4 7
1 1 5 6
1 2 2 8
1 2 3 7
1 2 4 6
1 2 5 5
1 3 3 6
1 3 4 5
1 4 4 4
2 2 2 7
2 2 3 6
2 2 4 5
2 3 3 5
2 3 4 4
3 3 3 4
1 1 1 1 9
1 1 1 2 8
1 1 1 3 7
1 1 1 4 6
1 1 1 5 5
1 1 2 2 7
1 1 2 3 6
1 1 2 4 5
1 1 3 3 5
1 1 3 4 4
1 2 2 2 6
1 2 2 3 5
1 2 2 4 4
1 2 3 3 4
1 3 3 3 3
2 2 2 2 5
2 2 2 3 4
2 2 3 3 3

The question I'm trying to resolve: is this just a coincidence? Or is there something in the structure of the partitions that would lead us to suspect that Q(13, 4) = Q(13, 5) even if we didn't know the value of either one?

So far, I haven't turned anything up; it seems to be a coincidence. A simpler problem of the same type is that Q(8, 3) = Q(8, 4); that seems to be a coincidence too:

1 1 6
1 2 5
1 3 4
2 2 4
2 3 3
1 1 1 5
1 1 2 4
1 1 3 3
1 2 2 3
2 2 2 2

Looking at this, one can see all sorts of fun correspondences. But on closer inspection, they turn out to be illusory. For example, any partition into 4 parts can be turned into a partition into 3 parts by taking the smallest of the 4 parts, dividing it up into 1's, and distributing the extra 1's to the largest parts. But there's no reason why that should always yield different outputs for different inputs, and, indeed, it doesn't.

Oh well, sometimes these things don't work out the way you'd like.


[Other articles in category /math] permanent link

"Go ahead, throw your vote away!"
I noticed this back in November right afer the election, when I was reading the election returns in the newspaper. There were four candidates for the office of U.S. Senator in Nevada. One of these was Brendan Trainor, running for the Libertarian party.

Trainor received a total of 5,269 votes, or 0.90% of votes cast.

A fifth choice, "None of these candidates", was available. This choice received 8,232 votes, or 1.41%.

Another candidate, David Schumann, representing the Independent American Party, was also defeated by "None of these candidates".

(Complete official results.)

I'm not sure what conclusion to draw from this. I am normally sympathetic to the attempts of independent candidates and small parties to run for office, and I frequently vote for them. But when your candidate fails to beat out "None of the above", all I can think is that you must be doing something terribly wrong.


[Other articles in category /politics] permanent link