| The Universe of Discourse | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
12 recent entries Archive:
In this section:
Comments disabled |
Fri, 26 Sep 2008
Sprague-Grundy theory
Let N be any positive integer. Does there necessarily exist a positive integer k such that the base-10 representation of kN contains only the digits 0 through 4?M. Penn was right there with the answer. Yesterday, M. Penn asked a question to which I happened to know the answer, and I was so pleased that I wrote up the whole theory in appalling detail. Since I haven't posted a math article in a while, and since the mailing list only has about twelve people on it, I thought I would squeeze a little more value out of it by posting it here. Richard Penn asked:
N dots are placed in a circle. Players alternate moves, where a move consists of crossing out any one of the remaining dots, and the dots on each side of it (if they remain). The winner is the player who crosses out the last dot. What is the optimal strategy with 19 dots? with 20? Can you generalize?M. Penn observed that there is a simple strategy for the 20-dot circle, but was not able to find one for the 19-dot circle. But solving such problems in general is made easy by the Sprague-Grundy theory, which I will explain in detail.
0. Short SpoilersBoth positions are wins for the second player to move.The 20-dot case is trivial, since any first-player move leaves a row of 17 dots, from which the second player can leave two disconnected rows of 7 dots each. Then any first-player move in one of these rows can be effectively answered by the second player in the other row. The 19-dot case is harder. The first player's move leaves a row of 16 dots. The second player can win by removing 3 dots to leave disconnected rows of 6 and 7 dots. After this, the strategy is complicated, but is easily found by the Sprague-Grundy theory. It's at the end of this article if you want to skip ahead. Sprague-Grundy theory is a complete theory of all finite impartial games, which are games like this one where the two players have exactly the same moves from every position. The theory says:
1. NimIn the game of Nim, one has some piles of beans, and a legal move is to remove some or all of the beans from any one pile. The winner is the player who takes the last bean. Equivalently, the winner is the last player who has a legal move.Nim is important because every position in every impartial game is somehow equivalent to a position in Nim, as we will see. In fact, every position in every impartial game is equivalent to a Nim position with at most one heap of beans! Since single Nim-heaps are trivially analyzed, one can completely analyze any impartial game position by calculating the Nim-heap to which it is equivalent.
2. Disjoint sums of gamesDefinition: The "disjoint sum" A # B of two games A and B is a new game whose rules are as follows: a legal move in A # B is either a move in A or a move in B; the winner is the last player with a legal move.Three easy exercises:
0 # a = a # 0 = afor all games a. 0 is a win for the previous player: the next player to move has no legal moves, and loses. We will call the next player to move "P1", and the player who just moved "P2". Note that a Nim-heap of 0 beans is precisely the 0 game.
3. Sums of Nim-heapsWe usually represent a single Nim-heap with n beans as "∗n". I'll do that from now on.We observed that ∗0 is a win for the second player. Observe now that when n is positive, ∗n is a win for the first player, by a trivial strategy. From now on we will use the symbol "=" to mean a weaker relation on games than strict equality. Two games A and B will be equivalent if their outcomes are the same in a rather strong sense:
A = B means that for any game X, A # X is a winning position if and only if B # X is also.Taking X = 0, the condition A = B implies that both games have the same outcome in isolation: if one is a first-player win, so is the other. But the condition is stronger than that. Both ∗1 and ∗2 are first-player wins, but ∗1 ≠ ∗2, because ∗1 # ∗1 is a second-player win, while ∗2 # ∗1 is a first-player win. Exercise: ∗x = ∗y if and only if x = y. It so happens that the disjoint sum of two Nim-heaps is equivalent to a single Nim-heap: Nim-sum theorem: ∗a # ∗b = ∗(a ⊕ b), Where ⊕ is the bitwise exclusive-or operation. I'll omit the proof, which is pretty easy to find. ⊕ is often described as "write a and b in binary, and add, ignoring all carries." For example 1 ⊕ 2 = 3, and 13 ⊕ 7 = 10. This implies that ∗1 # ∗2 = ∗3, and that ∗13 # ∗7 = ∗10. Although I omitted the proof that # for Nim-heaps is essentially the ⊕ operation in disguise, there are many natural implications of this that you can use to verify that the claim is plausible. For example:
4. The MEX ruleThe important thing about disjoint sums is that they abstract away the strategy. If you have some complicated set of Nim-heaps ∗a # ∗b # ... # ∗z, you can ignore them and pretend instead that they are a single heap ∗(a ⊕ b ⊕ ... ⊕ z). Your best move in the compound heap can be easily worked out from the corresponding best move in the fictitious single heap.For example, how do you figure out how to play in ∗2 # ∗3 # ∗x? You consider it as (∗2 # ∗3) # ∗x = ∗1 # ∗x. That is, you pretend that the ∗2 and the ∗3 are actually a single heap of size 1. Then your strategy is to win in ∗1 # ∗x, which you obviously do by reducing ∗x to size 1, or, if ∗x is already ∗0, by changing ∗1 to ∗0. Now, that is very facile, but ∗2 # ∗3 is not the same game as ∗1, because from ∗1 there is just one legal move, which is to ∗0. Whereas from ∗2 # ∗3 there are several moves. It might seem that your opponent could complicate the situation, say by moving from ∗2 # ∗3 to ∗3, which she could not do if it were really ∗1. But actually this extra option can't possibly help your opponent, because you have an easy response to that move, which is to move right back to ∗1! If pretending that ∗2 # ∗3 was ∗1 was good before, it is certainly good after you make it ∗1 for real. From ∗2 # ∗3 there are a whole bunch of moves:
Move to ∗3But you can disregard the first four of these, because they are reversible: if some player X has a winning strategy that works by pretending that ∗2 # ∗3 is identical with ∗1, then the extra options of moving to ∗2 and ∗3 won't help X's opponent, because X can reverse those moves and turn the ∗2 # ∗3 component back into ∗1. So we can ignore these options, and say that there's just one move from ∗2 # ∗3 worth considering further, namely to ∗2 # ∗2 = 0. Since this is exactly the same set of moves that is available from ∗1, ∗2 # ∗3 behaves just like ∗1 in all situations, and have just proved that ∗2 # ∗3 = ∗1. Unlike the other moves, the move from ∗2 # ∗3 to ∗0 is not reversible. Once someone turns ∗2 # ∗3 into ∗0, by equalizing the piles, it cannot then be turned back into ∗1, or anything else. Considering this in more generality, suppose we have some game position P where the options are to move to one of several possible Nim-heaps, and M is the smallest Nim-heap that is not among the options. Then P = ∗M. Why? Because P has just the same options that ∗M has, namely the options of moving to one of ∗0 ... ∗(M-1). P also has some extra options, but we can ignore these because they're reversible. If you have a winning strategy in X # ∗M, then you have a winning strategy in X # P also, as follows:
For example, let's consider what happens if we augment Nim by adding a special token, called ♦. A player may, in lieu of a regular move, replace ♦ by a pile of beans of any positive size. What effect does this have on Nim? Since the legal moves from ♦ are {∗1, ∗2, ∗3, ...} and the MEX is 0, ♦ should behave like ∗0. That is, adding a ♦ token to any position should leave the outcome unaffected. And indeed it does. If you have a winning strategy in game G, then you have a winning strategy in G # ♦ also, as follows: If your opponent plays in G, reply in G. If your opponent replaces ♦ with a pile of beans, remove it, leaving only G. Exercise: Let G be a game where all the legal moves are to Nim-heaps. Then G is a win for P1 if and only if one of the legal moves from G is to ∗0, and a win for P2 if and only if none of the legal moves from G is to ∗0.
5. The Sprague-Grundy theoryAn "impartial game" is one where both players have the same moves from every position.Sprague-Grundy theorem: Any finite impartial game is equivalent to some Nim-heap ∗n, which is the "Nim-value" of the game. Now let's consider Richard Penn's game, which is impartial. A legal move is to cross out any dot, and the adjacent dot or dots, if any. The Sprague-Grundy theorem says that every row of dots in Penn's game is equivalent to some Nim-heap. Let's tabulate the size of this heap (the Nim-value) for each row of n dots. We'll represent a row of n dots as [οοοοο...ο]. Obviously, [] = ∗0 so the Nim-value of [] is 0. Also obviously, [ο] = ∗1, since they're exactly the same game. [οο] = ∗1 also, since the only legal move from [2] is to [] = 0, and the MEX of {0} is 1. The legal moves from [οοο] are to [] = ∗0 and [ο] = ∗1, so {∗0, ∗1}, and the MEX is 2. So [οοο] = ∗2. Let's check that this is working. Since the Nim-value of [οοο] is 2, the theory predicts that [οοο] # ∗2 = 0 and so should be a win for P2. P2 should be able to pretend that [οοο] is actually ∗2. Suppose P1 turns the ∗2 into ∗1, moving to [οοο] # ∗1. Then P2 should turn [οοο] into ∗1 also, which he can do by crossing out an end dot and the adjacent one, leaving [ο] # ∗1, which he easily wins. If P1 turns ∗2 into ∗0, moving to [οοο] # ∗0, then P2 should turn [οοο] into ∗0 also, which he can do by crossing out the middle and adjacent dots, leaving [] # ∗0, which he wins immediately. If P1 plays in the [οοο] component, she must move to [] or to [ο], each equivalent to some Nim-heap of size x < 2, and P2 can answer by reducing the true Nim-heap ∗2 to contain x beans also. Continuing our analysis of rows of dots: In Penn's game, the legal moves from [οοοο] are to [οο] and [ο]. Both of these have Nim-value ∗1, so the MEX is 0. Easy exercise: Since [οοοο] is supposedly equivalent to ∗0, you should be able to show that a player who has a winning strategy in some game G also has a winning strategy in G + [οοοο]. The legal moves from [οοοοο] are to [οοο], [οο], and [ο] # [ο]. The Nim-values of these three games are ∗2, ∗1, and ∗0 respectively, so the MEX is 3 and [οοοοο] = ∗3. The legal moves from [οοοοοο] are to [οοοο], [οοο], and [ο] # [οο]. The Nim-values of these three games are 0, 2, and 0, so [οοοοοο] = ∗1.
6. Richard Penn's game analyzed
The table says that a row of 19 dots should be a win for P1, if she reduces the Nim-value from 3 to 0. And indeed, P1 has an easy winning strategy, which is to cross the 3 dots in the middle of the row, replacing [οοοοοοοοοοοοοοοοοοο] with [οοοοοοοο] # [οοοοοοοο]. But no such easy strategy obtains in a row of 20 dots, which, indeed, is a win for P2. The original question involved circles of dots, not rows. But from a circle of n dots there is only one legal move, which is to a row of n-3 dots. From a circle of 20 dots, the only legal move is to [ο × 17] = ∗2, which should be a win for P1. P1 should win by changing ∗2 to ∗0, so should look for the move from [ο × 17] to ∗0. This is the obvious solution Richard Penn discovered: move to [οοοοοοο] # [οοοοοοο]. So the circle of 20 dots is an easy win for P2, the second player. But for the circle of 19 dots the answer is the same, a win for the second player. The first player must move to [ο × 16] = ∗2, and then the second player should win by moving to a 0 position. [ο × 16] must have such a move, because if it didn't, the MEX rule would imply that its Nim-value was 0 instead of 2. So what's the second player's zero move here? There are actually two options. The second player can win by playing to [ο × 14], or by splitting the row into [οοοοοο] # [οοοοοοο].
First recall that any isolated row of four dots, [οοοο], can be disregarded, because any first-player move in such a row can be answered by a second-player move that crosses out the rest of the row. And any pair of isolated rows of one or two dots, [ο] or [οο], can be similarly disregarded, because any move that crosses out one can be answered by a move that crosses out the other. So in what follows, positions like [οο] # [ο] # [οοοο] will be assumed to have been won by the second player, and we will say that the second player "has an easy win" if he has a move to such a position.
But the important point here is not the strategy itself, which is hard to remember, and which could have been found by computer search. The important thing to notice is that computing the table of Nim-values for each row of n dots is easy, and once you have done this, the rest of the strategy almost takes care of itself. Do you need to find a good move from [οοοοοοο] # [οοοοοοοοο] # [οοοοοοοοοο]? There's no need to worry, because the table says that this can be viewed as ∗1 # ∗3 # ∗3, and so a good move is to reduce the ∗1 component, the [οοοοοοο], to ∗0, say by changing it to [οοοο] or to [οο] # [οο]. Whatever your opponent does next, calculating your reply will be similarly easy.
[Other articles in category /math] permanent link Thu, 18 Sep 2008
Return return
There were two different pieces of code in this paper that wowed me. When I started this article, I was planning to write about #2. I decided that I would throw in a couple of paragraphs about #1 first, just to get it out of the way. This article is that couple of paragraphs. [ Addendum 20080917: Here's the article about #2. ] Suppose you have a type that represents terms over some type v of variable names. The v type is probably strings but could possibly be something else:
data Term v = TVar v -- Type variable | TInt -- Integer type | TString -- String type | Fun (Term v) (Term v) -- Function typeThere's a natural way to make the Term type constructor into an instance of Monad:
instance Monad Term where
return v = TVar v
TVar v >>= f = f v
TInt >>= f = TInt
TString >>= f = TString
Fun d r >>= f = Fun (d >>= f) (r >>= f)
That is, the return operation just lifts a variable name to
the term that consists of just that variable, and the bind
operation just maps its argument function over the variable names in
the term, leaving everything else alone.Jones wants to write a function, unify, which performs a unification algorithm over these terms. Unification answers the question of whether, given two terms, there is a third term that is an instance of both. For example, consider the two terms a → Int and String → b, which are represented by Fun (TVar "a") TInt and Fun TString (TVar "b"), respectively. These terms can be unified, since the term String → Int is an instance of both; one can assign a = TString and b = TInt to turn both terms into Fun TString TInt. The result of the unification algorithm should be a set of these bindings, in this example saying that the input terms can be unified by replacing the variable "a" with the term TString, and the variable "b" with the term TInt. This set of bindings can be represented by a function that takes a variable name and returns the term to which it should be bound. The function will have type v → Term v. For the example above, the result is a function which takes "a" and returns TString, and which takes "b" and returns TInt. What should this function do with variable names other than "a" and "b"? It should say that the variable named "c" is "replaced" by the term TVar "c", and similarly other variables. Given any other variable name x, it should say that the variable x is "replaced" by the term TVar x. The unify function will take two terms and return one of these substitutions, where the substition is a function of type v → Term v. So the unify function has type:
unify :: Term v → Term v → (v → Term v)
Oh, but not quite. Because unification can also fail. For example,
if you try to unify the terms a → b and Int,
represented by Fun (TVar "a") (TVar "b") and TInt
respectively, the unfication should fail, because there is no term
that is an instance of both of those; one represents a function and
the other represents an integer. So unify does not actually
return a substitution of type v → Term v. Rather, it
returns a monad value, which might contain a substitution, if the
unification is successful, and otherwise contains an error value. To handle
the example above, the unify function will contain a case
like this:
unify TInt (Fun _ _) = fail ("Cannot unify" ....)
It will fail because it is not possible to unify functions and
integers.If unification is successful, then instead of using fail, the unify function will construct a substitution and then return it with return. Let's consider the result of unifying TInt with TInt. This unification succeeds, and produces a trivial substitition with no bindings. Or more precisely, every variable x should be "replaced" by the term TVar x. So in this case the substitution returned by unify should be the trivial one, a function which takes x and returns TVar x for all variable names x. But we already have such a function. This is just what we decided that Term's return function should do, when we were making Term into a monad. So in this case the code for unify is:
unify TInt TInt = return returnYep, in this case the unify function returns the return function. Wheee! At this point in the paper I was skimming, but when I saw return return, I boggled. I went back and read it more carefully after that, you betcha. That's my couple of paragraphs. I was planning to get to this point and then say "But that's not what I was planning to discuss. What I really wanted to talk about was...". But I think I'll break with my usual practice and leave the other thing for tomorrow. Happy Diada Nacional de Catalunya, everyone! [ Addendum 20080917: Here's the article about the other thing. ]
[Other articles in category /prog] permanent link
data Mu f = In (f (Mu f))
data Mu f = In (f (Mu f)) -- (???)
I bet a bunch of people reading this on Planet Haskell are nodding and
saying "Oh, that!"When I first saw this I couldn't figure out what it was saying at all. It was totally opaque. I still have trouble recognizing in Haskell what tokens are types, what tokens are type constructors, and what tokens are value constructors. Code like (???) is unusually confusing in this regard. Normally, one sees something like this instead:
data Maybe f = Nothing | Just f
Here f is a type variable; that is, a variable that ranges over
types. Maybe is a type constructor, which is like a function
that you can apply to a type to get another type. The most familiar
example of a type constructor is List:
data List e = Nil | Cons e (List e)
Given any type f, you can apply the type constructor
List to f to get a new type List f.
For example, you can apply List to Int to get the
type List Int. (The Haskell built-in list type constructor
goes by the funny name of [], but works the same way. The
type [Int] is a synonym for ([] Int).)Actually, type names are type constructors also; they're argumentless type constructors. So we have type constructors like Int, which take no arguments, and type constructors like List, which take one argument. Haskell also has type constructors that take more than one argument. For example, Haskell has a standard type constructor called Either for making union types:
data Either a b = Left a | Right b;
Then the type Either Int String contains values like Left
37 and Right "Cotton Mather".To keep track of how many arguments a type constructor has, one can consider the, ahem, type, of the type constructor. But to avoid the obvious looming terminological confusion, the experts use the word "kind" to refer to the type of a type constructor. The kind of List is * → *, which means that it takes a type and gives you back a type. The kind of Either is * → * → *, which means that it takes two types and gives you back a type. Well, actually, it is curried, just like regular functions are, so that Either Int is itself a type constructor of kind * → * which takes a type a and returns a type which could be either an Int or an a. The nullary type constructor Int has kind *. Continuing the "Maybe" example above, f is a type, or a constructor of kind *, if you prefer. Just is a value constructor, of type f → Maybe f. It takes a value of type f and produces a value of type Maybe f. Now here is a crucial point. In declarations of type constructors, such as these:
data Either a b = ...
data List e = ...
data Maybe f = ...
the type variables a, b, e, and f actually
range over type constructors, not over types. Haskell can infer the
kinds of the type constructors Either, List, and
Maybe, and also the kinds of the type variables, from the
definitions on the right of the = signs. In this case, it
concludes that all four variables must have kind *, and so really do
represent types, and not higher-order type constructors. So you can't
ask for Either Int List because List is known to
have kind * → *, and Haskell needs a type constructor of kind * to
serve as an argument to Either. But with a different definition, Haskell might infer that a type variable has a higher-order kind. Here is a contrived example, which might be good for something, perhaps. I'm not sure:
data TyCon f = ValCon (f Int)
This defines a type constructor TyCon with kind (* → *) → *,
which can be applied to any type constuctor f that has kind *
→ *, to yield a type. What new type? The new type TyCon
f is isomorphic to the type f of Int. For
example, TyCon List is basically the same as List
Int. The value Just 37 has type Maybe Int,
and the value ValCon (Just 37) has type TyCon
Maybe.Similarly, the value [1, 2, 3] has type [Int], which, you remember, is a synonym for [] Int. And the value ValCon [1, 2, 3] has type TyCon []. Now that the jargon is laid out, let's look at (???) again:
data Mu f = In (f (Mu f)) -- (???)
When I was first trying to get my head around this, I had trouble
seeing what the values were going to be. It looks at first like it
has no bottom. The token f here, like in the TyCon
example, is a variable that ranges over type constructors with kind *
→ *, so could be List or Maybe or [],
something that takes a type and yields a new type. Mu itself
has kind (* → *) → *, taking something like f and yielding a
type. But what's an actual value? You need to apply the value
constructor In to a value of type f (Mu
f), and it's not immediately clear where to get such a
thing.I asked on #haskell, and Cale Gibbard explained it very clearly. To do anything useful you first have to fix f. Let's take f = Maybe. In that particular case, (???) becomes:
data Mu Maybe = In (Maybe (Mu Maybe))
So the In value constructor will take a value of type
Maybe (Mu Maybe) and return a value of type
Mu Maybe. Where do we get a value of type
Maybe (Mu Maybe)? Oh, no problem: the value Nothing
is polymorphic, and has type Maybe a for all a,
so in particular it has type Maybe (Mu Maybe). Whatever
Maybe (Mu Maybe) is, it is a Maybe-type, so it has a
Nothing value. So we do have something to get started
with.Since Nothing is a Maybe (Mu Maybe) value, we can apply the In constructor to it, yielding the value In Nothing, which has type Mu Maybe. Then applying Just, of type a → Maybe a, to In Nothing, of type Mu Maybe, produces Just (In Nothing), of type Maybe (Mu Maybe) again. We can repeat the process as much as we want and produce as many values of type Mu Maybe as we want; they look like these:
In Nothing
In (Just (In Nothing))
In (Just (In (Just (In Nothing))))
In (Just (In (Just (In (Just (In Nothing))))))
...
And that's it, that's the type Mu Maybe, the set of those
values. It will look a little simpler if we omit the In
markers, which don't really add much value. We can just agree to omit
them, or we can get rid of them in the code by defining some semantic
sugar:
nothing = In Nothing
just = In . Just
Then the values of Mu Maybe look like this:
nothing
just nothing
just (just nothing)
just (just (just nothing))
...
It becomes evident that what the Mu operator does is to close
the type under repeated application. This is analogous to the way the
fixpoint combinator works on values. Consider the usual definition of
the fixpoint combinator:
Y f = f (Y f)
Here f is a function of type a → a. Y f
is a fixed point of f. That is, it is a value x of type
a such that f x = x. (Put x = Y
f in the definition to see this.)The fixed point of a function f can be computed by considering the limit of the following sequence of values:
⊥ This actually finds the least fixed point of f, for a certain definition of "least". For many functions f, like x → x + 1, this finds the uninteresting fixed point ⊥, but for many f, like x → λ n. if n = 0 then 1 else n * x(n - 1), it's something better. Mu is analogous to Y. Instead of operating on a function f from values to values, and producing a single fixed-point value, it operates on a type constructor f from types to types, and produces a fixed-point type. The resulting type T is the least fixed point of the type constructor f, the smallest set of values such that f T = T. Consider the example of f = Maybe again. We want to find a type T such that T = Maybe T. Consider the following sequence:
{ ⊥ } The first item is the set that contains nothing but the bottom value, which we might call t0. But t0 is not a fixed point of Maybe, because Maybe { ⊥ } also contains Nothing. So Maybe { ⊥ } is a different type from t0, which we can call t1 = { Nothing, ⊥ }. The type t1 is not a fixed point of Maybe either, because Maybe t1 evidently contains both Nothing and Just Nothing. Repeating this process, we find that the limit of the sequence is the type Mu Maybe = { ⊥, Nothing, Just Nothing, Just (Just Nothing), Just (Just (Just Nothing)), ... }. This type is fixed under Maybe. It might be worth pointing out that this is not the only such fixed point, but is is the least fixed point. One can easily find larger types that are fixed under Maybe. For example, postulate a special value Q which has the property that Q = Just Q. Then Mu Maybe ∪ { Q } is also a fixed point of Maybe. But it's easy to see (and to show, by induction) that any such fixed point must be a superset of Mu Maybe. Further consideration of this point might take me off to co-induction, paraconsistent logic, Peter Aczel's nonstandard set theory, and I'd never get back again. So let's leave this for now. So that's what Mu really is: a fixed-point operator for type constructors. And having realized this, one can go back and look at the definition and see that oh, that's precisely what the definition says, how obvious:
Y f = f (Y f) -- ordinary fixed-point operator
data Mu f = In (f (Mu f)) -- (???)
Given f, a function from values to values, Y(f)
calculates a value x such that x = f(x).
Given f, a function from types to types, Mu(f) calculates
a type T such that f(T) = T. That's why the
definitions are identical. (Except for that annoying In
constructor, which really oughtn't to be there.)You can use this technique to construct various recursive datatypes. For example, Mu Maybe turns out to be equivalent to the following definition of the natural numbers:
data Number = Zero | Succ Number;
Notice the structural similarity with the definition of Maybe:
data Maybe a = Nothing | Just a;
One can similarly define lists:
data Mu f = In (f (Mu f))
data ListX a b = Nil | Cons a b deriving Show
type List a = Mu (ListX a)
-- syntactic sugar
nil :: List a
nil = In Nil
cons :: a → List a → List a
cons x y = In (Cons x y)
-- for example
ls = cons 3 (cons 4 (cons 5 nil)) -- :: List Integer
lt = (cons 'p' (cons 'y' (cons 'x' nil))) -- :: List Char
Or you could similarly do trees, or whatever. Why one might want to
do this is a totally separate article, which I am not going to write
today.Here's the point of today's article: I find it amazing that Haskell's type system is powerful enough to allow one to defined a fixed-point operator for functions over types. We've come a long way since FORTRAN, that's for sure. A couple of final, tangential notes: Google search for "Mu f = In (f (Mu f))" turns up relatively few hits, but each hit is extremely interesting. If you're trying to preload your laptop with good stuff to read on a plane ride, downloading these papers might be a good move. The Peter Aczel thing seems to be less well-known that it should be. It is a version of set theory that allows coinductive definitions of sets instead of inductive definitions. In particular, it allows one to have a set S = { S }, which standard set theory forbids. If you are interested in co-induction you should take a look at this. You can find a clear explanation of it in Barwise and Etchemendy's book The Liar (which I have read) and possibly also in Aczel's book Non Well-Founded Sets (which I haven't read).
[Other articles in category /prog] permanent link Tue, 09 Sep 2008
Factorials are not quite as square as I thought
Let s(n) be the smallest perfect square larger than n. Then to have n! = a2 - 1 we must have a2 = s(n!), and in particular we must have s(n!) - n! square. This actually occurs for n in { 4, 5, 6, 7, 8, 9, 10, 11 }, and since 11 was as far as I got on the lunch line yesterday, I had an exaggerated notion of how common it is. had I worked out another example, I would have realized that after n=11 things start going wrong. The value of s(12!) is 218872, but 218872 - 12! = 39169, and 39169 is not a square. (In fact, the n=11 solution is quite remarkable; which I will discuss at the end of this note.) So while there are (of course) solutions to 12! = a2 - b2, and indeed where b is small compared to a, as I said, the smallest such b takes a big jump between 11 and 12. For 4 ≤ n ≤ 11, the minimal b takes the values 1, 1, 3, 1, 9, 27, 15, 18. But for n = 12, the solution with the smallest b has b = 288. Calculations with Mathematica by Mitch Harris show that one has n! = s(n!) - b2 only for n in {1, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16}, and then not for any other n under 1,000. The likelihood that I imagine of another solution for n! = a2 - 1, which was already not very high, has just dropped precipitously. My thanks to M. Harris, and also to Stephen Dranger, who also wrote in with the results of calculations. Having gotten this far, I then asked OEIS about the sequence 1, 1, 3, 1, 9, 27, 15, 18, and (of course) was delivered a summary of the current state of the art in n! = a2 - 1. Here's my summary of the summary. The question is known as "Brocard's problem", and was posed by Brocard in 1876. No solutions are known with n > 7, and it is known that if there is a solution, it must have n > 109. According to the Mathworld article on Brocard's problem, it is believed to be "virtually certain" that there are no other solutions. The calculations for n ≤ 109 are described in this unpublished paper of Berndt and Galway, which I found linked from the Mathworld article. The authors also investigated solutions of n! = a2 - b2 for various fixed b between 2 and 50, and found no solutions with 12 ≤ n ≤ 105 for any of them. The most interesting was the 11! = 63182 - 182 I mentioned already. [ The original version of this article contained some confusion about whether s(n) was the largest square less than n, or the largest number whose square was less than n. Thanks to Roie Marianer for pointing out the error. ]
[Other articles in category /math] permanent link
Factorials are almost, but not quite, square
But to get b=1 might require a lot of luck, perhaps more luck than there is. (Jeremy Kahn once argued that |2x - 3y| = 1 could have no solutions other than the obvious ones, essentially because it would require much more fabulous luck than was available. I sneered at this argument at the time, but I have to admit that there is something to it.) Anyway, back to the subject at hand. Is there an example of n! = a2 -1 with n > 7? I haven't checked yet. In related matters, it's rather easy to show that there are no nontrivial examples with b=0. It would be pretty cool to show that equation (*) implied n = O(f(b)) for some function f, but I would not be surprised to find out that there is no such bound. This kept me amused for twenty minutes while I was in line for lunch, anyway. Incidentally, on the lunch line I needed to estimate √11. I described in an earlier article how to do this. Once again it was a good trick, the sort you should keep handy if you are the kind of person who needs to know √11 while standing in line on 33rd Street. Here's the short summary: √11 = √(99/9) = √((100-1)/9) = √((100/9)(1 - 1/100) = (10/3)√(1 - 1/100) ≈ (10/3)(1 - 1/200) = (10/3)(199/200) = 199/60. [ Addendum 20080909: There is a followup article. ] [Other articles in category /math] permanent link
Why pi?
what property of a circle makes it . . . an irrational number. . . perhaps about as arbitrary a number as you can get.I thought about this pretty hard, and, to my amazement, I came up with a plausible answer. So here we are. The one-paragraph summary: My theory is that the association of the very weird and complex number π with a geometric object as simple as a circle is a reflection of the underlying fundamental complexity of Euclidean geometry: specifically, that its metric is a nonlinear function. First I'm going to spend some time arguing that π does require explanation. I expect that almost everyone will agree that π is weird; if you do agree, feel free to skip this section. Then I'll discuss Euclidean and non-Euclidean geometries. This is important, because the relation between π and circles appears to be a special property of Euclidean geometry, one which does not occur, for example, in spherical geometry. Finally, I'll look at the essential properties of Euclidean geometry, and why I think it is more complex than people usually realize.
π is complex and bizarreIn this section, I'm going to argue that the question is indeed worth asking. π is an extremely peculiar number, even by mathematical standards. You often hear π mentioned in the same breath with e, another constant of fundamental mathematical importance. But e is much more tractable than π is, and much better understood.In fact, the degree to which π is not understood is rather shocking when you consider its ubiquity. If you don't need to be persuaded that π is unusually weird, even as transcendental numbers go, you may want to skip to the next section. Really, this section is here to address people who think they know more mathematics than they do, who want to argue that π is no more or less complicated than any other number. But I think it is. It should be fairly clear that, as a representation of real numbers, decimal fractions are not very satisfactory. For example, you might like simple numbers to have simple representations. But the representation of 1/3 is 0.33333...., which isn't even finite. The fact that a complicated number like like 3674/31250 ("0.117568") has a simpler representation than a simple number like 1/3 just demonstrates that the system is defective. 3674/31250 gets a simple representation not because it is a simple number, but because 31250 happens to divide 106. This being the case, it is perhaps not too surprising that nobody can make head or tail of the representation of π, which is 3.14159265358979... . As far as I know, the state of our current understanding of this representation of π can be summed up as:
None But that might just be the fault of the representation. The representation is based on the number 10, and it is not clear that π has anything to do with 10, so our failure to find an answer here may just indicate that the question was not worth asking.There are better representations of real numbers; one such is the so-called "continued fraction representation". I don't want to explain this in detail in this article, but I can refer you to a talk I gave on the subject. But an itemization of this representation's desirable properties may be persuasive even if you don't know how it works:
But it doesn't work for π. The continued fraction representation of π is [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, ...], and as far as I know, the state of our current understanding of this representation of π can be summed up as:
None So much for continued fractions.We might hope for some understanding about why π is irrational. The proof that √2 is irrational is elementary, and dates back to the Greeks; you can understand it as being related to the fact that 2 is not a perfect square. π was not shown to be irrational until 1761, and the proof is not simple, which means that nobody knows a simple argument about why it should be the complicated thing it is, rather than a simple fraction. So π is complex and poorly-understood, even compared with other important transcendental numbers like e.
Euclidean geometryM. Cozens asked:Is pi inherent in our definition of a circle, or our particular geometry, or our planet, or could the ratio be different in different worlds?I this question is very insightful. π, at least as it relates to circles, is inherent in a particular geometry, namely, Euclidean geometry. Euclidean geometry takes its name from Euclid, who wrote Elements, an extremely influential treatise on geometry, about 2300 years ago. Most of the Elements is concerned with plane geometry, which takes place in an infinite, flat two-dimensional space. π arises naturally in this kind of space as the perimeter of a circle.
Non-Euclidean geometryIn non-Euclidean spaces, π is geometrically much less important. For example, consider geometry done on a sphere. If we keep the definition of "line" as the path of shortest distance between two points, then our "lines" turn out to be great circles on the sphere—that is, circles whose centers are at the center of the sphere; the equator is an example. The 49th parallel is not a "line" because it's not a path of shortest distance; for any two points on the 49th parallel, there's a path between them over the surface of the sphere that is shorter than the one along the parallel. (This may seem strange, but it's true, and it's why direct flights from New York to Taipei often stop off in Anchorage, Alaska.) In addition to being a line, the equator is also an example of a circle. What's a circle? Circles look pretty much the way you expect them to. A circle is the set of all points that are some fixed distance from a center point. The equator is all the points that are a certain fixed distance from the north pole. The 49th parallel is also a circle; its center is also the north pole.
Let's say that the circumference of the equator, the red line, is 1. Then the length of the equator's diameter, the blue line, is 1/2. If we were expecting to divide the circumference by the diameter and get π, we are in for a surprise, because we just got 2 instead. For the 49th parallel, the ratio of circumference to diameter is larger: I calculate about 2.88. For smaller circles, the ratio is larger still. For very small circles, the ratio is very close to π, because a small circle can't tell whether it's on a sphere or in a plane; up close the two things look the same. So the relation between π and circles is actually a special property of Euclidean geometry. Circles in non-Euclidean spaces have a perimeter-to-diameter ratio that is different from π. Euclidean metricThe single fundamental property of Euclidean geometry is that the distance between two points, say (x1, y1) and (x2, y2), is ((x2-x1)2 + (y2-y1)2)1/2. (Or, in higher-dimensional spaces, the obvious extension of this formula.) If you change the way you measure distance, you get a different kind of geometry with different kinds of circles that have different perimeter-to-diameter ratios. In an earlier article, I discussed an alternative distance function, called the Manhattan distance, which gives diamond-shaped circles whose perimeter-to-diameter ratio is always 4.The Euclidean distance function is nothing more than the familiar Pythagorean theorem. It is very difficult for me to imagine any reasonable way to do plane geometry without the Pythagorean theorem. It is just too simple. Even the proof is simple:
So the relationship of π (which is complicated) to circles (which appear to be simple) is grounded in the Euclidean distance formula. If you change the distance formula, π is no longer related to circles. So the weirdness must be due, at least in part, to some complexity in the Euclidean distance formula. But what's complex about the Euclidean distance formula? How could it be simpler? Actually, I think it only seems simple because it is so familiar. The Euclidean distance formula is, in some ways, deeply weird. I realized this a few months ago, but everyone I mentioned it to acted like I was insane. But now I'm pretty sure. I think the essence of the problem with π is that the Euclidean distance function is nonlinear in the two spatial coordinates x and y.
Nonlinearity of the Euclidean metricLinear functions are very well-behaved. If F is a linear function, then F(a+b) = F(a) + F(b), which means that you can calculate the contributions of a and b independently of each other. To calculate F of some very complicated argument, you can break the argument into simple components and deal with them all separately. With quadratic functions like the Euclidean distance function, you cannot do this; complex problems are not easily decomposable into simple ones.For the Euclidean metric, it means that the horizontalness and verticalosity are not independent, but are tangled together and cannot be separated. What do we really mean by the perimeter of a circle? The circle is the set of points (x, y) which are at distance 1 from the point (0,0). The only meaningful way I know to talk about the length of this set is to calculate it as a limit of an approximate polygonal path as the path gets more and more segments. So you are necessarily dragging in an infinite limiting process, and such processes are always complicated. If the distance function were linear, it wouldn't matter, because then you could treat the horizontal and vertical components separately, and when you did that, you would be dealing with paths in one dimension, which, being straight lines, would be simple. You can see this if you consider the Manhattan distance function: It doesn't matter how you get from (x1, y1) to (x2, y2); whatever path you take, whether you take a lot of steps or only one, the distance is always |x2-x1| + |y2-y1|, because the distance function is linear, and thus there is no interaction between the x parts and the y parts. But with a nonlinear distance function like the Euclidean metric, it does matter what path you take. I was thinking a few months ago about how peculiar this is. I cannot think of anything else that behaves this way. Suppose you have two jugs and you start filling them with milk. You find that to fill each jug separately requires one quart, but to fill both at once requires only 1.4142 quarts. Wouldn't that freak you out? But space does behave like that. To drive ten miles north takes a gallon of gas. To drive ten miles east takes a gallon of gas. North and east are perpendicular and should be completely independent of each other. To drive ten miles north and ten miles east should require two gallons of gas. But it requires only 1.4142 gallons. How the heck did that happen? I believe that this strange entanglement between north and east, two things one might have supposed were independent, is the ultimate root of what makes the circumference of a circle such a peculiar number. I was very pleased to have this confirmation that the entanglement between horizontal and vertical is strange and complex, because, as I mentioned before, when I tried to explain to people what I found strange about it, they thought I was nuts.
One-dimensional circlesMy theory is that the peculiar length of a circle's perimeter is a result of the peculiar interaction between the otherwise apparently independent spatial dimensions in Euclidean space. If this theory is correct, we should expect that the corresponding perimeter in a one-dimensional space will not be peculiar. A one-dimensional Euclidean space, having only one dimension, has no strange interactions between independent directions. And indeed, this is the case! The perimeter of a one-dimensional circle does not involve π. It's simply 2; the "area" (which is really length) is 2r. You only get difficult numbers in spaces of at least 2 dimensions.
Why 3?M. Cozens also asked me why the number came out to be around 3, rather than around 5 or 57, and there I was on much shakier ground. I did not have any clever insights, and all I could do was itemize a bunch of stuff that seemed to bear on the issue. It will probably appear here in a future article.[ Addendum: Here it is. ]
[Other articles in category /math] permanent link |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||