# The Universe of Discourse

Fri, 26 Sep 2008

Sprague-Grundy theory
I'm on a small mailing list for math geeks, and there's this one guy there, Richard Penn, who knows everything. Whenever I come up with some idle speculation, he has the answer. For example, back in 2003 I asked:

Let N be any positive integer. Does there necessarily exist a positive integer k such that the base-10 representation of kN contains only the digits 0 through 4?
M. Penn was right there with the answer.

Yesterday, M. Penn asked a question to which I happened to know the answer, and I was so pleased that I wrote up the whole theory in appalling detail. Since I haven't posted a math article in a while, and since the mailing list only has about twelve people on it, I thought I would squeeze a little more value out of it by posting it here.

Richard Penn asked:

N dots are placed in a circle. Players alternate moves, where a move consists of crossing out any one of the remaining dots, and the dots on each side of it (if they remain). The winner is the player who crosses out the last dot. What is the optimal strategy with 19 dots? with 20? Can you generalize?
M. Penn observed that there is a simple strategy for the 20-dot circle, but was not able to find one for the 19-dot circle. But solving such problems in general is made easy by the Sprague-Grundy theory, which I will explain in detail.

### 0. Short Spoilers

Both positions are wins for the second player to move.

The 20-dot case is trivial, since any first-player move leaves a row of 17 dots, from which the second player can leave two disconnected rows of 7 dots each. Then any first-player move in one of these rows can be effectively answered by the second player in the other row.

The 19-dot case is harder. The first player's move leaves a row of 16 dots. The second player can win by removing 3 dots to leave disconnected rows of 6 and 7 dots. After this, the strategy is complicated, but is easily found by the Sprague-Grundy theory. It's at the end of this article if you want to skip ahead.

Sprague-Grundy theory is a complete theory of all finite impartial games, which are games like this one where the two players have exactly the same moves from every position.

The theory says:

1. Every such game position has a "value", which is a non-negative integer.
2. A position is a second-player win if and only if its value is zero.
3. The value of a position can be calculated from the values of the positions to which the players can move, in a simple way.
4. The value of a collection of disjoint positions (such as two disconnected rows of dots) can be calculated from the values of its component positions in a simple way.
 Order Winning Ways for Your Mathematical Plays, Vol. 1 with kickback no kickback
Long details follow. They are also found in "Winning Ways", Vol I, by Berlekamp, Conway, and Guy.

### 1. Nim

In the game of Nim, one has some piles of beans, and a legal move is to remove some or all of the beans from any one pile. The winner is the player who takes the last bean. Equivalently, the winner is the last player who has a legal move.

Nim is important because every position in every impartial game is somehow equivalent to a position in Nim, as we will see. In fact, every position in every impartial game is equivalent to a Nim position with at most one heap of beans! Since single Nim-heaps are trivially analyzed, one can completely analyze any impartial game position by calculating the Nim-heap to which it is equivalent.

### 2. Disjoint sums of games

Definition: The "disjoint sum" A # B of two games A and B is a new game whose rules are as follows: a legal move in A # B is either a move in A or a move in B; the winner is the last player with a legal move.

Three easy exercises:

1. # is commutative.
2. # is associative.
3. Let (a,b,c...) represent the Nim position with heaps a, b, c, etc. Then the game (a,b,c,...) is precisely (a) # (b) # (c) # ... .
Consider the trivial game with no legal moves for anyone. This game is called 0, because:

0 # a = a # 0 = a
for all games a. 0 is a win for the previous player: the next player to move has no legal moves, and loses.

We will call the next player to move "P1", and the player who just moved "P2".

Note that a Nim-heap of 0 beans is precisely the 0 game.

### 3. Sums of Nim-heaps

We usually represent a single Nim-heap with n beans as "∗n". I'll do that from now on.

We observed that ∗0 is a win for the second player. Observe now that when n is positive, ∗n is a win for the first player, by a trivial strategy.

From now on we will use the symbol "=" to mean a weaker relation on games than strict equality. Two games A and B will be equivalent if their outcomes are the same in a rather strong sense:

A = B means that for any game X, A # X is a winning position if and only if B # X is also.
Taking X = 0, the condition A = B implies that both games have the same outcome in isolation: if one is a first-player win, so is the other. But the condition is stronger than that. Both ∗1 and ∗2 are first-player wins, but ∗1 ≠ ∗2, because ∗1 # ∗1 is a second-player win, while ∗2 # ∗1 is a first-player win.

Exercise: ∗x = ∗y if and only if x = y.

It so happens that the disjoint sum of two Nim-heaps is equivalent to a single Nim-heap:

Nim-sum theorem:a # ∗b = ∗(ab), Where ⊕ is the bitwise exclusive-or operation.

I'll omit the proof, which is pretty easy to find. ⊕ is often described as "write a and b in binary, and add, ignoring all carries." For example 1 ⊕ 2 = 3, and 13 ⊕ 7 = 10. This implies that ∗1 # ∗2 = ∗3, and that ∗13 # ∗7 = ∗10.

Although I omitted the proof that # for Nim-heaps is essentially the ⊕ operation in disguise, there are many natural implications of this that you can use to verify that the claim is plausible. For example:

1. The Nim-sum theorem implies that ∗0 is a neutral element for #, which we already knew.
2. Since aa = 0, we have:
a # ∗a = ∗0 for all a
That is, ∗a # ∗a is a win for P2. And indeed, P2 has an obvious strategy: whatever P1 does in one pile, P2 does in the other pile. P2 never runs out of legal moves until after P1 does, and so must win.

3. Since aa = 0, we have, more generally:
a # ∗a # X = X for all a, X
No matter what X is, its outcome is the same as that of ∗a # ∗a # X. Why?

Suppose you are the player with a winning strategy for playing X alone. Then it is easy to see that you have a winning strategy in ∗a # ∗a # X, as follows: ignore the ∗a # ∗a component, until your opponent moves in it, when you should copy their move in the other half of that component. Eventually the ∗a # ∗a part will be used up (that is, reduced to ∗0 # ∗0 = 0) and your opponent will be forced to move in X, whereupon you can continue your winning strategy there until you win.

4. According to the ⊕ operation, ∗1 # ∗2 = ∗3, and so ∗1 # ∗2 # ∗3 = ∗3 # ∗3 = 0, so P2 should have a winning strategy in ∗1 # ∗2 # ∗3. Which he does: If P1 removes any entire heap, P2 can win by equalizing the remaining heaps, leaving ∗1 # ∗1 = 0 or ∗2 # ∗2 = 0, which he wins easily. If P1 equalizes any two heaps, P2 can remove the third heap, winning the same way.

5. Let's reconsider the game of the previous paragraph, but change the ∗1 to something else. 2 ⊕ 3 ⊕ x > 0 so if ∗x ≠ 1, ∗2 # ∗3 # ∗x = ∗y, where y>0. Since ∗y is a single nonempty Nim-heap, it is obviously a win for P1, and so ∗2 # ∗3 # ∗x should be equivalent, also a win for P1. What is P1's winning strategy in ∗2 # ∗3 # ∗x? It's easy. If x > 1, then P1 can reduce ∗x to ∗1, leaving ∗2 # ∗3 # ∗1, which we saw is a winning position. And if x = 0, then P1 can move to ∗2 # ∗2 and win.

### 4. The MEX rule

The important thing about disjoint sums is that they abstract away the strategy. If you have some complicated set of Nim-heaps ∗a # ∗b # ... # ∗z, you can ignore them and pretend instead that they are a single heap ∗(ab ⊕ ... ⊕ z). Your best move in the compound heap can be easily worked out from the corresponding best move in the fictitious single heap.

For example, how do you figure out how to play in ∗2 # ∗3 # ∗x? You consider it as (∗2 # ∗3) # ∗x = ∗1 # ∗x. That is, you pretend that the ∗2 and the ∗3 are actually a single heap of size 1. Then your strategy is to win in ∗1 # ∗x, which you obviously do by reducing ∗x to size 1, or, if ∗x is already ∗0, by changing ∗1 to ∗0.

Now, that is very facile, but ∗2 # ∗3 is not the same game as ∗1, because from ∗1 there is just one legal move, which is to ∗0. Whereas from ∗2 # ∗3 there are several moves. It might seem that your opponent could complicate the situation, say by moving from ∗2 # ∗3 to ∗3, which she could not do if it were really ∗1.

But actually this extra option can't possibly help your opponent, because you have an easy response to that move, which is to move right back to ∗1! If pretending that ∗2 # ∗3 was ∗1 was good before, it is certainly good after you make it ∗1 for real.

From ∗2 # ∗3 there are a whole bunch of moves:

Move to ∗3
Move to ∗2
Move to ∗1 # ∗3 = ∗2
Move to ∗2 # ∗1 = ∗3
Move to ∗2 # ∗2 = ∗0
But you can disregard the first four of these, because they are reversible: if some player X has a winning strategy that works by pretending that ∗2 # ∗3 is identical with ∗1, then the extra options of moving to ∗2 and ∗3 won't help X's opponent, because X can reverse those moves and turn the ∗2 # ∗3 component back into ∗1. So we can ignore these options, and say that there's just one move from ∗2 # ∗3 worth considering further, namely to ∗2 # ∗2 = 0. Since this is exactly the same set of moves that is available from ∗1, ∗2 # ∗3 behaves just like ∗1 in all situations, and have just proved that ∗2 # ∗3 = ∗1.

Unlike the other moves, the move from ∗2 # ∗3 to ∗0 is not reversible. Once someone turns ∗2 # ∗3 into ∗0, by equalizing the piles, it cannot then be turned back into ∗1, or anything else.

Considering this in more generality, suppose we have some game position P where the options are to move to one of several possible Nim-heaps, and M is the smallest Nim-heap that is not among the options. Then P = ∗M. Why? Because P has just the same options that ∗M has, namely the options of moving to one of ∗0 ... ∗(M-1). P also has some extra options, but we can ignore these because they're reversible. If you have a winning strategy in X # ∗M, then you have a winning strategy in X # P also, as follows:

• If your opponent plays in X, then follow your strategy for X # ∗M, since the same move will also be available in X # P.

• If your opponent makes P into ∗y, with y < M, then they've discarded their extra options, which are now irrelevant; play as you would if they had moved from X # ∗M to X # ∗y.

• If your opponent makes P into ∗y, with y > M, then just move from ∗y to ∗M, leaving X + ∗M, which you can win.

MEX Theorem: If all the legal moves from a position P are equivalent to Nim-heaps of sizes {s1, ..., sk}, then P itself is equivalent to a nim-heap of size MEX(s1, ..., sk), where the MEX is the "Minimal EXcluded" element of the set: the smallest nonnegative integer that is not in the set.

For example, let's consider what happens if we augment Nim by adding a special token, called ♦. A player may, in lieu of a regular move, replace ♦ by a pile of beans of any positive size. What effect does this have on Nim?

Since the legal moves from ♦ are {∗1, ∗2, ∗3, ...} and the MEX is 0, ♦ should behave like ∗0. That is, adding a ♦ token to any position should leave the outcome unaffected. And indeed it does. If you have a winning strategy in game G, then you have a winning strategy in G # ♦ also, as follows: If your opponent plays in G, reply in G. If your opponent replaces ♦ with a pile of beans, remove it, leaving only G.

Exercise: Let G be a game where all the legal moves are to Nim-heaps. Then G is a win for P1 if and only if one of the legal moves from G is to ∗0, and a win for P2 if and only if none of the legal moves from G is to ∗0.

### 5. The Sprague-Grundy theory

An "impartial game" is one where both players have the same moves from every position.

Sprague-Grundy theorem: Any finite impartial game is equivalent to some Nim-heap ∗n, which is the "Nim-value" of the game.

Now let's consider Richard Penn's game, which is impartial. A legal move is to cross out any dot, and the adjacent dot or dots, if any.

The Sprague-Grundy theorem says that every row of dots in Penn's game is equivalent to some Nim-heap. Let's tabulate the size of this heap (the Nim-value) for each row of n dots. We'll represent a row of n dots as [οοοοο...ο]. Obviously, [] = ∗0 so the Nim-value of [] is 0. Also obviously, [ο] = ∗1, since they're exactly the same game.

[οο] = ∗1 also, since the only legal move from [2] is to [] = 0, and the MEX of {0} is 1.

The legal moves from [οοο] are to [] = ∗0 and [ο] = ∗1, so {∗0, ∗1}, and the MEX is 2. So [οοο] = ∗2.

Let's check that this is working. Since the Nim-value of [οοο] is 2, the theory predicts that [οοο] # ∗2 = 0 and so should be a win for P2. P2 should be able to pretend that [οοο] is actually ∗2.

Suppose P1 turns the ∗2 into ∗1, moving to [οοο] # ∗1. Then P2 should turn [οοο] into ∗1 also, which he can do by crossing out an end dot and the adjacent one, leaving [ο] # ∗1, which he easily wins. If P1 turns ∗2 into ∗0, moving to [οοο] # ∗0, then P2 should turn [οοο] into ∗0 also, which he can do by crossing out the middle and adjacent dots, leaving [] # ∗0, which he wins immediately.

If P1 plays in the [οοο] component, she must move to [] or to [ο], each equivalent to some Nim-heap of size x < 2, and P2 can answer by reducing the true Nim-heap ∗2 to contain x beans also.

Continuing our analysis of rows of dots: In Penn's game, the legal moves from [οοοο] are to [οο] and [ο]. Both of these have Nim-value ∗1, so the MEX is 0.

Easy exercise: Since [οοοο] is supposedly equivalent to ∗0, you should be able to show that a player who has a winning strategy in some game G also has a winning strategy in G + [οοοο].

The legal moves from [οοοοο] are to [οοο], [οο], and [ο] # [ο]. The Nim-values of these three games are ∗2, ∗1, and ∗0 respectively, so the MEX is 3 and [οοοοο] = ∗3.

The legal moves from [οοοοοο] are to [οοοο], [οοο], and [ο] # [οο]. The Nim-values of these three games are 0, 2, and 0, so [οοοοοο] = ∗1.

### 6. Richard Penn's game analyzed

 Row ofn dots Nim-value Winningmove 0 0 1 1 [] 2 1 [] 3 2 [] 4 0 5 3 [ο] # [ο] 6 1 [ο] # [οο] 7 1 [οο] # [οο] 8 0 9 3 [οοο] # [οοο] 10 3 [οοοοοοοο] 11 2 [οοοο] # [οοοο] 12 2 [οο] # [οοοοοοο] 13 4 [οοοοο] # [οοοοο] 14 0 15 5 [οοοοοο] # [οοοοοο] 16 2 [ο × 14] 17 2 [οοοοοοο] # [οοοοοοο] 18 3 [οοο] # [ο × 12] 19 3 [οοοοοοοο] # [οοοοοοοο] 20 0
Continuing in this way, we get the table of Nim-values that you see at left.

The table says that a row of 19 dots should be a win for P1, if she reduces the Nim-value from 3 to 0. And indeed, P1 has an easy winning strategy, which is to cross the 3 dots in the middle of the row, replacing [οοοοοοοοοοοοοοοοοοο] with [οοοοοοοο] # [οοοοοοοο]. But no such easy strategy obtains in a row of 20 dots, which, indeed, is a win for P2.

The original question involved circles of dots, not rows. But from a circle of n dots there is only one legal move, which is to a row of n-3 dots. From a circle of 20 dots, the only legal move is to [ο × 17] = ∗2, which should be a win for P1. P1 should win by changing ∗2 to ∗0, so should look for the move from [ο × 17] to ∗0. This is the obvious solution Richard Penn discovered: move to [οοοοοοο] # [οοοοοοο]. So the circle of 20 dots is an easy win for P2, the second player.

But for the circle of 19 dots the answer is the same, a win for the second player. The first player must move to [ο × 16] = ∗2, and then the second player should win by moving to a 0 position. [ο × 16] must have such a move, because if it didn't, the MEX rule would imply that its Nim-value was 0 instead of 2. So what's the second player's zero move here? There are actually two options. The second player can win by playing to [ο × 14], or by splitting the row into [οοοοοο] # [οοοοοοο].

### 7. Complete strategy for 19-bean circle

Just for completeness, let's follow one of these purportedly winning moves in detail. I claimed that the second player could win by moving to [οοοοοο] # [οοοοοοο]. But what next?

First recall that any isolated row of four dots, [οοοο], can be disregarded, because any first-player move in such a row can be answered by a second-player move that crosses out the rest of the row. And any pair of isolated rows of one or two dots, [ο] or [οο], can be similarly disregarded, because any move that crosses out one can be answered by a move that crosses out the other. So in what follows, positions like [οο] # [ο] # [οοοο] will be assumed to have been won by the second player, and we will say that the second player "has an easy win" if he has a move to such a position.

• The first player has three possible moves in the left [οοοοοο] component, as follows:

1. If the first player moves to [οοοο] # [οοοοοοο], the second player has an easy win by moving to [οοοο] # [οοοο].

2. If the first player moves to [οοο] # [οοοοοοο] = ∗2 # ∗1, the second player should reduce the left component to ∗1, by moving to [ο] # [οοοοοοο]. Then no matter what the first player does, the second player has an easy win.

3. If the first player moves to [ο] # [οο] # [οοοοοοο] = ∗1 # ∗1 # ∗1, the second player can disregard the [ο] # [οο] component. The second player instead plays to [ο] # [οο] # [οοοο] and wins.

• The first player has four moves in the right [οοοοοοο] component, as follows:

1. If the first player moves to [οοοοοο] # [οοοοο] = ∗1 # ∗3, the second player should move from ∗3 to ∗1. There must be a move in [οοοοο] to a position with Nim-value 1. (If there weren't, [οοοοο] would have Nim-value 1 instead of 3, by the MEX rule.) Indeed, the second player can move to [οοοοοο] # [οο]. Now whatever the first player does the second player has an easy win, either to [οοοο] or to X # X for some row X.

2. If the first player moves to [οοοοοο] # [οοοο] = ∗1 # ∗0, the second player should move from ∗1 to ∗0. There must be a move in [οοοοοο] to a position with Nim-value 0, and indeed there is: the second player moves to [οοοο] # [οοοο] and wins.

3. If the first player moves to [οοοοοο] # [ο] # [οοο] = ∗1 # ∗1 # ∗2, the second player can disregard the ∗1 # ∗1 component and should move in the ∗2 component, to ∗0, which he does by eliminating it entirely, leaving the first player with [οοοοοο] # [ο]. After any move by the first player the second player has an easy win.

4. If the first player moves to [οοοοοο] # [οο] # [οο] = ∗1 # ∗1 # ∗1, the second player has a number of good choices. The simplest thing to do is to disregard the [οο] # [οο] component and move in the [οοοοοο] to some position with Nim-value 0. Moving to [οοοο] # [οο] # [οο] suffices.

So [ο × 17] is indeed a win for the next player to move, and a circle of 20 dots is therefore a win for the previous player, who is the second player.

But the important point here is not the strategy itself, which is hard to remember, and which could have been found by computer search. The important thing to notice is that computing the table of Nim-values for each row of n dots is easy, and once you have done this, the rest of the strategy almost takes care of itself. Do you need to find a good move from [οοοοοοο] # [οοοοοοοοο] # [οοοοοοοοοο]? There's no need to worry, because the table says that this can be viewed as ∗1 # ∗3 # ∗3, and so a good move is to reduce the ∗1 component, the [οοοοοοο], to ∗0, say by changing it to [οοοο] or to [οο] # [οο]. Whatever your opponent does next, calculating your reply will be similarly easy.

Thu, 18 Sep 2008

data Mu f = In (f (Mu f))
Last week I wrote about one of two mindboggling pieces of code that appears in the paper Functional Programming with Overloading and Higher-Order Polymorphism, by Mark P. Jones. Today I'll write about the other one. It looks like this:

        data Mu f = In (f (Mu f))                       -- (???)

I bet a bunch of people reading this on Planet Haskell are nodding and saying "Oh, that!"

When I first saw this I couldn't figure out what it was saying at all. It was totally opaque. I still have trouble recognizing in Haskell what tokens are types, what tokens are type constructors, and what tokens are value constructors. Code like (???) is unusually confusing in this regard.

Normally, one sees something like this instead:

        data Maybe f = Nothing | Just f

Here f is a type variable; that is, a variable that ranges over types. Maybe is a type constructor, which is like a function that you can apply to a type to get another type. The most familiar example of a type constructor is List:

        data List e = Nil | Cons e (List e)

Given any type f, you can apply the type constructor List to f to get a new type List f. For example, you can apply List to Int to get the type List Int. (The Haskell built-in list type constructor goes by the funny name of [], but works the same way. The type [Int] is a synonym for ([] Int).)

Actually, type names are type constructors also; they're argumentless type constructors. So we have type constructors like Int, which take no arguments, and type constructors like List, which take one argument. Haskell also has type constructors that take more than one argument. For example, Haskell has a standard type constructor called Either for making union types:

        data Either a b = Left a | Right b;

Then the type Either Int String contains values like Left 37 and Right "Cotton Mather".

To keep track of how many arguments a type constructor has, one can consider the, ahem, type, of the type constructor. But to avoid the obvious looming terminological confusion, the experts use the word "kind" to refer to the type of a type constructor. The kind of List is * → *, which means that it takes a type and gives you back a type. The kind of Either is * → * → *, which means that it takes two types and gives you back a type. Well, actually, it is curried, just like regular functions are, so that Either Int is itself a type constructor of kind * → * which takes a type a and returns a type which could be either an Int or an a. The nullary type constructor Int has kind *.

Continuing the "Maybe" example above, f is a type, or a constructor of kind *, if you prefer. Just is a value constructor, of type fMaybe f. It takes a value of type f and produces a value of type Maybe f.

Now here is a crucial point. In declarations of type constructors, such as these:

        data Either a b = ...
data List e = ...
data Maybe f = ...

the type variables a, b, e, and f actually range over type constructors, not over types. Haskell can infer the kinds of the type constructors Either, List, and Maybe, and also the kinds of the type variables, from the definitions on the right of the = signs. In this case, it concludes that all four variables must have kind *, and so really do represent types, and not higher-order type constructors. So you can't ask for Either Int List because List is known to have kind * → *, and Haskell needs a type constructor of kind * to serve as an argument to Either.

But with a different definition, Haskell might infer that a type variable has a higher-order kind. Here is a contrived example, which might be good for something, perhaps. I'm not sure:

        data TyCon f = ValCon (f Int)

This defines a type constructor TyCon with kind (* → *) → *, which can be applied to any type constuctor f that has kind * → *, to yield a type. What new type? The new type TyCon f is isomorphic to the type f of Int. For example, TyCon List is basically the same as List Int. The value Just 37 has type Maybe Int, and the value ValCon (Just 37) has type TyCon Maybe.

Similarly, the value [1, 2, 3] has type [Int], which, you remember, is a synonym for [] Int. And the value ValCon [1, 2, 3] has type TyCon [].

Now that the jargon is laid out, let's look at (???) again:

        data Mu f = In (f (Mu f))                       -- (???)

When I was first trying to get my head around this, I had trouble seeing what the values were going to be. It looks at first like it has no bottom. The token f here, like in the TyCon example, is a variable that ranges over type constructors with kind * → *, so could be List or Maybe or [], something that takes a type and yields a new type. Mu itself has kind (* → *) → *, taking something like f and yielding a type. But what's an actual value? You need to apply the value constructor In to a value of type f (Mu f), and it's not immediately clear where to get such a thing.

I asked on #haskell, and Cale Gibbard explained it very clearly. To do anything useful you first have to fix f. Let's take f = Maybe. In that particular case, (???) becomes:

        data Mu Maybe = In (Maybe (Mu Maybe))

So the In value constructor will take a value of type Maybe (Mu Maybe) and return a value of type Mu Maybe. Where do we get a value of type Maybe (Mu Maybe)? Oh, no problem: the value Nothing is polymorphic, and has type Maybe a for all a, so in particular it has type Maybe (Mu Maybe). Whatever Maybe (Mu Maybe) is, it is a Maybe-type, so it has a Nothing value. So we do have something to get started with.

Since Nothing is a Maybe (Mu Maybe) value, we can apply the In constructor to it, yielding the value In Nothing, which has type Mu Maybe. Then applying Just, of type a → Maybe a, to In Nothing, of type Mu Maybe, produces Just (In Nothing), of type Maybe (Mu Maybe) again. We can repeat the process as much as we want and produce as many values of type Mu Maybe as we want; they look like these:

        In Nothing
In (Just (In Nothing))
In (Just (In (Just (In Nothing))))
In (Just (In (Just (In (Just (In Nothing))))))
...

And that's it, that's the type Mu Maybe, the set of those values. It will look a little simpler if we omit the In markers, which don't really add much value. We can just agree to omit them, or we can get rid of them in the code by defining some semantic sugar:

        nothing = In Nothing
just = In . Just

Then the values of Mu Maybe look like this:
        nothing
just nothing
just (just nothing)
just (just (just nothing))
...

It becomes evident that what the Mu operator does is to close the type under repeated application. This is analogous to the way the fixpoint combinator works on values. Consider the usual definition of the fixpoint combinator:

        Y f = f (Y f)

Here f is a function of type aa. Y f is a fixed point of f. That is, it is a value x of type a such that f x = x. (Put x = Y f in the definition to see this.)

The fixed point of a function f can be computed by considering the limit of the following sequence of values:

f(⊥)
f(f(⊥))
f(f(f(⊥)))
...

This actually finds the least fixed point of f, for a certain definition of "least". For many functions f, like xx + 1, this finds the uninteresting fixed point ⊥, but for many f, like x → λ n. if n = 0 then 1 else n * x(n - 1), it's something better.

Mu is analogous to Y. Instead of operating on a function f from values to values, and producing a single fixed-point value, it operates on a type constructor f from types to types, and produces a fixed-point type. The resulting type T is the least fixed point of the type constructor f, the smallest set of values such that f T = T.

Consider the example of f = Maybe again. We want to find a type T such that T = Maybe T. Consider the following sequence:

{ ⊥ }
Maybe { ⊥ }
Maybe(Maybe { ⊥ })
Maybe(Maybe(Maybe { ⊥ }))
...

The first item is the set that contains nothing but the bottom value, which we might call t0. But t0 is not a fixed point of Maybe, because Maybe { ⊥ } also contains Nothing. So Maybe { ⊥ } is a different type from t0, which we can call t1 = { Nothing, ⊥ }.

The type t1 is not a fixed point of Maybe either, because Maybe t1 evidently contains both Nothing and Just Nothing. Repeating this process, we find that the limit of the sequence is the type Mu Maybe = { ⊥, Nothing, Just Nothing, Just (Just Nothing), Just (Just (Just Nothing)), ... }. This type is fixed under Maybe.

It might be worth pointing out that this is not the only such fixed point, but is is the least fixed point. One can easily find larger types that are fixed under Maybe. For example, postulate a special value Q which has the property that Q = Just Q. Then Mu Maybe ∪ { Q } is also a fixed point of Maybe. But it's easy to see (and to show, by induction) that any such fixed point must be a superset of Mu Maybe. Further consideration of this point might take me off to co-induction, paraconsistent logic, Peter Aczel's nonstandard set theory, and I'd never get back again. So let's leave this for now.

So that's what Mu really is: a fixed-point operator for type constructors. And having realized this, one can go back and look at the definition and see that oh, that's precisely what the definition says, how obvious:

              Y f =     f  (Y f)             -- ordinary fixed-point operator
data Mu f = In (f (Mu f))            -- (???)

Given f, a function from values to values, Y(f) calculates a value x such that x = f(x). Given f, a function from types to types, Mu(f) calculates a type T such that f(T) = T. That's why the definitions are identical. (Except for that annoying In constructor, which really oughtn't to be there.)

You can use this technique to construct various recursive datatypes. For example, Mu Maybe turns out to be equivalent to the following definition of the natural numbers:

        data Number = Zero | Succ Number;

Notice the structural similarity with the definition of Maybe:

        data Maybe a = Nothing | Just a;

One can similarly define lists:

        data Mu f = In (f (Mu f))
data ListX a b = Nil | Cons a b deriving Show
type List a = Mu (ListX a)

-- syntactic sugar
nil :: List a
nil = In Nil
cons :: a → List a → List a
cons x y = In (Cons x y)

-- for example
ls = cons 3 (cons 4 (cons 5 nil))          -- :: List Integer
lt = (cons 'p' (cons 'y' (cons 'x' nil)))  -- :: List Char

Or you could similarly do trees, or whatever. Why one might want to do this is a totally separate article, which I am not going to write today.

Here's the point of today's article: I find it amazing that Haskell's type system is powerful enough to allow one to defined a fixed-point operator for functions over types.

We've come a long way since FORTRAN, that's for sure.

A couple of final, tangential notes: Google search for "Mu f = In (f (Mu f))" turns up relatively few hits, but each hit is extremely interesting. If you're trying to preload your laptop with good stuff to read on a plane ride, downloading these papers might be a good move.

The Peter Aczel thing seems to be less well-known that it should be. It is a version of set theory that allows coinductive definitions of sets instead of inductive definitions. In particular, it allows one to have a set S = { S }, which standard set theory forbids. If you are interested in co-induction you should take a look at this. You can find a clear explanation of it in Barwise and Etchemendy's book The Liar (which I have read) and possibly also in Aczel's book Non Well-Founded Sets (which I haven't read).

Thu, 11 Sep 2008

Return return
Among the things I read during the past two months was the paper Functional Programming with Overloading and Higher-Order Polymorphism, by Mark P. Jones. I don't remember why I read this, but it sure was interesting. It is an introduction to the new, cool features of Haskell's type system, with many examples. It was written in 1995 when the features were new. They're no longer new, but they are still cool.

There were two different pieces of code in this paper that wowed me. When I started this article, I was planning to write about #2. I decided that I would throw in a couple of paragraphs about #1 first, just to get it out of the way. This article is that couple of paragraphs.

[ Addendum 20080917: Here's the article about #2. ]

Suppose you have a type that represents terms over some type v of variable names. The v type is probably strings but could possibly be something else:

	data Term v = TVar v                -- Type variable
| TInt                  -- Integer type
| TString               -- String type
| Fun (Term v) (Term v) -- Function type

There's a natural way to make the Term type constructor into an instance of Monad:

	instance Monad Term where
return v          = TVar v
TVar v   >>= f = f v
TInt     >>= f = TInt
TString  >>= f = TString
Fun d r  >>= f = Fun (d >>= f) (r >>= f)

That is, the return operation just lifts a variable name to the term that consists of just that variable, and the bind operation just maps its argument function over the variable names in the term, leaving everything else alone.

Jones wants to write a function, unify, which performs a unification algorithm over these terms. Unification answers the question of whether, given two terms, there is a third term that is an instance of both. For example, consider the two terms a → Int and String → b, which are represented by Fun (TVar "a") TInt and Fun TString (TVar "b"), respectively. These terms can be unified, since the term String → Int is an instance of both; one can assign a = TString and b = TInt to turn both terms into Fun TString TInt.

The result of the unification algorithm should be a set of these bindings, in this example saying that the input terms can be unified by replacing the variable "a" with the term TString, and the variable "b" with the term TInt. This set of bindings can be represented by a function that takes a variable name and returns the term to which it should be bound. The function will have type v → Term v. For the example above, the result is a function which takes "a" and returns TString, and which takes "b" and returns TInt. What should this function do with variable names other than "a" and "b"? It should say that the variable named "c" is "replaced" by the term TVar "c", and similarly other variables. Given any other variable name x, it should say that the variable x is "replaced" by the term TVar x.

The unify function will take two terms and return one of these substitutions, where the substition is a function of type v → Term v. So the unify function has type:

    unify :: Term v → Term v → (v → Term v)

Oh, but not quite. Because unification can also fail. For example, if you try to unify the terms ab and Int, represented by Fun (TVar "a") (TVar "b") and TInt respectively, the unfication should fail, because there is no term that is an instance of both of those; one represents a function and the other represents an integer. So unify does not actually return a substitution of type v → Term v. Rather, it returns a monad value, which might contain a substitution, if the unification is successful, and otherwise contains an error value. To handle the example above, the unify function will contain a case like this:

	unify	TInt	(Fun _ _) = fail ("Cannot unify" ....)

It will fail because it is not possible to unify functions and integers.

If unification is successful, then instead of using fail, the unify function will construct a substitution and then return it with return. Let's consider the result of unifying TInt with TInt. This unification succeeds, and produces a trivial substitition with no bindings. Or more precisely, every variable x should be "replaced" by the term TVar x. So in this case the substitution returned by unify should be the trivial one, a function which takes x and returns TVar x for all variable names x.

But we already have such a function. This is just what we decided that Term's return function should do, when we were making Term into a monad. So in this case the code for unify is:

	unify	TInt	TInt	  = return return

Yep, in this case the unify function returns the return function.

Wheee!

At this point in the paper I was skimming, but when I saw return return, I boggled. I went back and read it more carefully after that, you betcha.

That's my couple of paragraphs. I was planning to get to this point and then say "But that's not what I was planning to discuss. What I really wanted to talk about was...". But I think I'll break with my usual practice and leave the other thing for tomorrow.

Happy Diada Nacional de Catalunya, everyone!

[ Addendum 20080917: Here's the article about the other thing. ]

Wed, 10 Sep 2008

Factorials are not quite as square as I thought
(This is a followup to yesterday's article.)

Let s(n) be the smallest perfect square larger than n. Then to have n! = a2 - 1 we must have a2 = s(n!), and in particular we must have s(n!) - n! square.

This actually occurs for n in { 4, 5, 6, 7, 8, 9, 10, 11 }, and since 11 was as far as I got on the lunch line yesterday, I had an exaggerated notion of how common it is. had I worked out another example, I would have realized that after n=11 things start going wrong. The value of s(12!) is 218872, but 218872 - 12! = 39169, and 39169 is not a square. (In fact, the n=11 solution is quite remarkable; which I will discuss at the end of this note.)

So while there are (of course) solutions to 12! = a2 - b2, and indeed where b is small compared to a, as I said, the smallest such b takes a big jump between 11 and 12. For 4 ≤ n ≤ 11, the minimal b takes the values 1, 1, 3, 1, 9, 27, 15, 18. But for n = 12, the solution with the smallest b has b = 288.

Calculations with Mathematica by Mitch Harris show that one has n! = s(n!) - b2 only for n in {1, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16}, and then not for any other n under 1,000. The likelihood that I imagine of another solution for n! = a2 - 1, which was already not very high, has just dropped precipitously.

My thanks to M. Harris, and also to Stephen Dranger, who also wrote in with the results of calculations.

Having gotten this far, I then asked OEIS about the sequence 1, 1, 3, 1, 9, 27, 15, 18, and (of course) was delivered a summary of the current state of the art in n! = a2 - 1. Here's my summary of the summary.

The question is known as "Brocard's problem", and was posed by Brocard in 1876. No solutions are known with n > 7, and it is known that if there is a solution, it must have n > 109. According to the Mathworld article on Brocard's problem, it is believed to be "virtually certain" that there are no other solutions.

The calculations for n ≤ 109 are described in this unpublished paper of Berndt and Galway, which I found linked from the Mathworld article. The authors also investigated solutions of n! = a2 - b2 for various fixed b between 2 and 50, and found no solutions with 12 ≤ n ≤ 105 for any of them. The most interesting was the 11! = 63182 - 182 I mentioned already.

[ The original version of this article contained some confusion about whether s(n) was the largest square less than n, or the largest number whose square was less than n. Thanks to Roie Marianer for pointing out the error. ]

Tue, 09 Sep 2008

Factorials are almost, but not quite, square
This weekend I happened to notice that 7! = 712 - 1. Is this a strange coincidence? Well, not exactly, because it's not hard to see that

$$n! = a^{2} - b^{2}\qquad (*)$$

will always have solutions where b is small compared to a. For example, we have 11! = 63182 - 182.

But to get b=1 might require a lot of luck, perhaps more luck than there is. (Jeremy Kahn once argued that |2x - 3y| = 1 could have no solutions other than the obvious ones, essentially because it would require much more fabulous luck than was available. I sneered at this argument at the time, but I have to admit that there is something to it.)

Anyway, back to the subject at hand. Is there an example of n! = a2 -1 with n > 7? I haven't checked yet.

In related matters, it's rather easy to show that there are no nontrivial examples with b=0.

It would be pretty cool to show that equation (*) implied n = O(f(b)) for some function f, but I would not be surprised to find out that there is no such bound.

This kept me amused for twenty minutes while I was in line for lunch, anyway. Incidentally, on the lunch line I needed to estimate √11. I described in an earlier article how to do this. Once again it was a good trick, the sort you should keep handy if you are the kind of person who needs to know √11 while standing in line on 33rd Street. Here's the short summary: √11 = √(99/9) = √((100-1)/9) = √((100/9)(1 - 1/100) = (10/3)√(1 - 1/100) ≈ (10/3)(1 - 1/200) = (10/3)(199/200) = 199/60.

[ Addendum 20080909: There is a followup article. ]