Archive:
In this section: Subtopics:
Comments disabled |
Mon, 21 Aug 2017
Recognizing when two arithmetic expressions are essentially the same
My first cut at writing a solver for twenty-four puzzles was a straightforward search program. It had a couple of hacks in it to cut down the search space by recognizing that !!a+E!! and !!E+a!! are the same, but other than that there was nothing special about it and I've discussed it before. It would quickly and accurately report whether any particular twenty-four
puzzle was solvable, but as it turned out that wasn't quite good
enough. The original motivation for the program was this: Toph and I
play this game in the car. Pennsylvania license plates have three
letters and four digits, and if we see a license plate But this wasn't quite good enough either, because after we would find that first solution, say !!2·(5 + 9 - 2)!!, we would wonder: are there any more? And here the program was useless: it would cheerfully report that there were three, so we would rack our brains to find another, fail, ask the program to tell us the answer, and discover to our disgust that the three solutions it had in mind were: $$ 2 \cdot (5 + (9 - 2)) \\ 2 \cdot (9 + (5 - 2)) \\ 2 \cdot ((5 + 9) - 2) $$ The computer thinks these are different, because it uses different data structures to represent them. It represents them with an abstract syntax tree, which means that each expression is either a single constant, or is a structure comprising an operator and its two operand expressions—always exactly two. The computer understands the three expressions above as having these structures: It's not hard to imagine that the computer could be taught to understand that the first two trees are equivalent. Getting it to recognize that the third one is also equivalent seems somewhat more difficult. Commutativity and associativityI would like the computer to understand that these three expressions should be considered “the same”. But what does “the same” mean? This problem is of a kind I particularly like: we want the computer to do something, but we're not exactly sure what that something is. Some questions are easy to ask but hard to answer, but this is the opposite: the real problem is to decide what question we want to ask. Fun! Certainly some of the question should involve commutativity and associativity of addition and multiplication. If the only difference between two expressions is that one has !!a + b!! where the other has !!b + a!!, they should be considered the same; similarly !!a + (b + c)!! is the same expression as !!(a + b) + c!! and as !!(b + a) + c!! and !!b + (a + c)!! and so forth. The «2 2 5 9 ⇒ 24» example above shows that commutativity and associativity are not limited to addition and multiplication. There are commutative and associative properties of subtraction also! For example, $$a+(b-c) = (a+b)-c$$ and $$(a+b)-c = (a-c)+b.$$ There ought to be names for these laws but as far as I know there aren't. (Sure, it's just commutativity and associativity of addition in disguise, but nobody explaining these laws to school kids ever seems to point out that subtraction can enter into it. They just observe that !!(a-b)-c ≠ a-(b-c)!!, say “subtraction isn't associative”, and leave it at that.) Closely related to these identities are operator inversion identities like !!a-(b+c) = (a-b)+c!!, !!a-(b-c) = (a-b)+c!!, and their multiplicative analogues. I don't know names for these algebraic laws either. One way to deal with all of this would to build a complicated comparison function for abstract syntax trees that tried to transform one tree into another by applying these identities. A better approach is to recognize that the data structure is over-specified. If we want the computer to understand that !!(a + b) + c!! and !!a + (b + c)!! are the same expression, we are swimming upstream by using a data structure that was specifically designed to capture the difference between these expressions. Instead, I invented a data structure, called an Ezpr (“Ez-pur”), that can represent expressions, but in a somewhat more natural way than abstract syntax trees do, and in a way that makes commutativity and associativity transparent. An Ezpr has a simplest form, called its “canonical” or “normal” form. Two Ezprs represent essentially the same mathematical expression if they have the same canonical form. To decide if two abstract syntax trees are the same, the computer converts them to Ezprs, simplifies them, and checks to see if resulting canonical forms are identical. The EzprSince associativity doesn't matter, we don't want to represent it. When we (humans) think about adding up a long column of numbers, we don't think about associativity because we don't add them pairwise. Instead we use an addition algorithm that adds them all at once in a big pile. We don't treat addition as a binary operation; we normally treat it as an operator that adds up the numbers in a list. The Ezpr makes this explicit: its addition operator is applied to a list of subexpressions, not to a pair. Both !!a + (b + c)!! and !!(a + b) + c!! are represented as the Ezpr
which just says that we are adding up !!a!!, !!b!!, and !!c!!. (The
Similarly the Ezpr To handle commutativity, we want those Subtraction and divisionThis doesn't yet handle subtraction and division, and the way I chose
to handle them is the only part of this that I think is at all
clever. A
and this is also the representation of !!a + c - b - d!!, of !!c + a
- d - b!!, of !!c - d+ a-b!!, and of any other expression of the
idea that we are adding up !!a!! and !!c!! and then deducting !!b!!
and !!d!!. The Either of the two bags may be empty, so for example !!a + b!! is just
Division is handled similarly. Here conventional mathematical
notation does a little bit better than in the sum case: Ezprs handle the associativity and commutativity of subtraction and
division quite well. I pointed out earlier that subtraction has an
associative law !!(a + b) - c = a +
(b - c)!! even though it's not usually called that.
Similarly there is a commutative law for subtraction: !!a + b - c = a - c + b!! and once again that same Ezpr does for both. Ezpr lawsEzprs are more flexible than binary trees. A binary tree can represent the expressions !!(a+b)+c!! and !!a+(b+c)!! but not the expression !!a+b+c!!. Ezprs can represent all three and it's easy to transform between them. Just as there are rules for building expressions out of simpler expressions, there are a few rules for combining and manipulating Ezprs. Lifting and flatteningThe most important transformation is lifting, which is the Ezpr
version of the associative law. In the canonical form of an Ezpr, a
you should lift the terms from the inner sum into the outer one:
effectively transforming !!a+(b+c)!! into !!a+b+c!!. More generally, in
we lift the terms from the inner Ezprs into the outer one:
This effectively transforms !!a + (b - c) - d - (e - f))!! to !!a + b + f - c - d - e!!. Similarly, when a Say we are converting the expression !!7 ÷ (3 ÷ (6 × 4))!! to an Ezpr. The conversion function is recursive and the naïve version computes this Ezpr:
But then at the bottom level we have a
which represents !!\frac7{\frac{3}{6\cdot 4}}!!.
Then again we have a
which we can imagine as !!\frac{7·6·4}3!!. The lifting only occurs when the sub-node has the same type as its
parent; we may not lift terms out of a Trivial nodesThe Ezpr An even simpler case is CancellationConsider the puzzle «3 3 4 6 ⇒ 24». My first solver found 49 solutions to this puzzle. One is !!(3 - 3) + (4 × 6)!!. Another is !!(4 + (3 - 3)) × 6!!. A third is !!4 × (6 + (3 - 3))!!. I think these are all the same: the solution is to multiply the 4 by the 6, and to get rid of the threes by subtracting them to make a zero term. The zero term can be added onto the rest of expression or to any of its subexpressions—there are ten ways to do this—and it doesn't really matter where. This is easily explained in terms of Ezprs: If the same subexpression appears in both of a node's bags, we can drop it. For example, the expression !!(4 + (3 -3)) × 6!! starts out as
but the duplicate threes in
The sum is now trivial, as described in the previous section, so can be eliminated and replaced with just 4:
This Ezpr records the essential feature of each of the three solutions to «3 3 4 6 ⇒ 24» that I mentioned: they all are multiplying the 6 by the 4, and then doing something else unimportant to get rid of the threes. Another solution to the same puzzle is !!(6 ÷ 3) × (4 × 3)!!. Mathematically we would write this as !!\frac63·4·3!! and we can see this is just !!6×4!! again, with the threes gotten rid of by multiplication and division, instead of by addition and subtraction. When converted to an Ezpr, this expression becomes:
and the matching threes in the two bags are cancelled, again leaving
In fact there aren't 49 solutions to this puzzle. There is only one, with 49 trivial variations. Identity elementsIn the preceding example, many of the trivial variations on the !!4×6!! solution involved multiplying some subexpression by !!\frac 33!!. When one of the input numbers in the puzzle is a 1, one can similarly obtain a lot of useless variations by choosing where to multiply the 1. Consider «1 3 3 5 ⇒ 24»: We can make 24 from !!3 × (3 + 5)!!. We then have to get rid of the 1, but we can do that by multiplying it onto any of the five subexpressions of !!3 × (3 + 5)!!: $$ 1 × (3 × (3 + 5)) \\ (1 × 3) × (3 + 5) \\ 3 × (1 × (3 + 5)) \\ 3 × ((1 × 3) + 5) \\ 3 × (3 + (1×5)) $$ These should not be considered different solutions.
Whenever we see any 1's in either of the bags of a
but then the 1 is eliminated from the
The fourth expression, !!3 × ((1 × 3) + 5)!!, is initially converted to the Ezpr
When the 1 is eliminated from the inner
which is the same Ezpr as before. Zero terms in the bags of a Multiplication by zeroOne final case is that The question about what to do when there is a zero in the denominator
is a bit of a puzzle.
In the presence of division by zero, some of our simplification rules
are questionable. For example, when we have ResultsThe Associativity is taken care of by the Ezpr structure itself, and
commutativity is not too difficult; as I mentioned, it would have been
trivial if Perl had a built-in bag structure. I find it much easier
to reason about transformations of Ezprs than abstract syntax trees.
Many operations are much simpler; for example the negation of
It took me a while to get the normalization tuned properly, but the results have been quite successful, at least for this problem domain. The current puzzle-solving program reports the number of distinct solutions to each puzzle. When it reports two different solutions, they are really different; when it fails to support the exact solution that Toph or I found, it reports one essentially the same. (There are some small exceptions, which I will discuss below.) Since there is no specification for “essentially the same” there is no hope of automated testing. But we have been using the app for several months looking for mistakes, and we have not found any. If the normalizer failed to recognize that two expressions were essentially similar, we would be very likely to notice: we would be solving some puzzle, be unable to find the last of the solutions that the program claimed to exist, and then when we gave up and saw what it was we would realize that it was essentially the same as one of the solutions we had found. I am pretty confident that there are no errors of this type, but see “Arguable points” below. A harder error to detect is whether the computer has erroneously conflated two essentially dissimilar expressions. To detect this we would have to notice that an expression was missing from the computer's solution list. I am less confident that nothing like this has occurred, but as the months have gone by I feel better and better about it. I consider the problem of “how many solutions does this puzzle really have to have?” been satisfactorily solved. There are some edge cases, but I think we have identified them. Code for my solver is on
Github. The Ezpr code
is in the Some examplesThe original program claims to find 35 different solutions to «4 6 6 6 ⇒ 24». The revised program recognizes that these are of only two types:
Some of the variant forms of the first of those include: $$
6 × (4 + (6 - 6)) \\
6 + ((4 × 6) - 6) \\
(6 - 6) + (4 × 6) \\
(6 ÷ 6) × (4 × 6) \\
6 ÷ ((6 ÷ 4) ÷ 6) \\
6 ÷ (6 ÷ (4 × 6)) \\
6 × (6 × (4 ÷ 6)) \\
(6 × 6) ÷ (6 ÷ 4) \\
6 ÷ ((6 ÷ 6) ÷ 4) \\
6 × (6 - (6 - 4)) \\
6 × (6 ÷ (6 ÷ 4)) \\
\ldots In an even more extreme case, the original program finds 80 distinct expressions that solve «1 1 4 6 ⇒ 24», all of which are trivial variations on !!4·6!!. Of the 715 puzzles, 466 (65%) have solutions; for 175 of these the solution is unique. There are 3 puzzles with 8 solutions each («2 2 4 8 ⇒ 24», «2 3 6 9 ⇒ 24», and «2 4 6 8 ⇒ 24»), one with 9 solutions («2 3 4 6 ⇒ 24»), and one with 10 solutions («2 4 4 8 ⇒ 24»). The 10 solutions for «2 4 4 8 ⇒ 24» are as follows:
A complete listing of every essentially different solution to every «a b c d ⇒ 24» puzzle is available here. There are 1,063 solutions in all. Arguable points There are a few places where we have not completely pinned down what it means for two solutions to be essentially the same; I think there is room for genuine disagreement.
It would be pretty easy to adjust the normalization process to handle these the other way if the user wanted that. Some interesting puzzles«1 2 7 7 ⇒ 24» has only one solution, quite unusual. (Spoiler) «2 2 6 7 ⇒ 24» has two solutions, both somewhat unusual. (Spoiler) Somewhat similar to «1 2 7 7 ⇒ 24» is «3 9 9 9 ⇒ 24» which also has an unusual solution. But it has two other solutions that are less surprising. (Spoiler) «1 3 8 9 ⇒ 24» has an easy solution but also a quite tricky solution. (Spoiler) One of my neighbors has the license plate What took so long?
And here we are, five months later! This article was a huge pain to write. Sometimes I sit down to write something and all that comes out is dreck. I sat down to write this one at least three or four times and it never worked. The tortured Git history bears witness. In the end I had to abandon all my earlier drafts and start over from scratch, writing a fresh outline in an empty file. But perseverance paid off! WOOOOO. [Other articles in category /math] permanent link Tue, 08 Aug 2017I should have written about this sooner, by now it has been so long that I have forgotten most of the details. I first encountered Paul Erdős in the middle 1980s at a talk by János Pach about almost-universal graphs. Consider graphs with a countably infinite set of vertices. Is there a "universal" graph !!G!! such that, for any finite or countable graph !!H!!, there is a copy of !!H!! inside of !!G!!? (Formally, this means that there is an injection from the vertices of !!H!! to the vertices of !!G!! that preserves adjacency.) The answer is yes; it is quite easy to construct such a !!G!! and in fact nearly all random graphs have this property. But then the questions become more interesting. Let !!K_\omega!! be the complete graph on a countably infinite set of vertices. Say that !!G!! is “almost universal” if it includes a copy of !!H!! for every finite or countable graph !!H!! except those that contain a copy of !!K_\omega!!. Is there an almost universal graph? Perhaps surprisingly, no! (Sketch of proof.) I enjoyed the talk, and afterward in the lobby I got to meet Ron Graham and Joel Spencer and talk to them about their Ramsey theory book, which I had been reading, and about a problem I was working on. Graham encouraged me to write up my results on the problem and submit them to Mathematics Magazine, but I unfortunately never got around to this. Graham was there babysitting Erdős, who was one of Pách's collaborators, but I did not actually talk to Erdős at that time. I think I didn't recognize him. I don't know why I was able to recognize Graham. I find the almost-universal graph thing very interesting. It is still an open research area. But none of this was what I was planning to talk about. I will return to the point. A couple of years later Erdős was to speak at the University of Pennsylvania. He had a stock speech for general audiences that I saw him give more than once. Most of the talk would be a description of a lot of interesting problems, the bounties he offered for their solutions, and the progress that had been made on them so far. He would intersperse the discussions with the sort of Erdősism that he was noted for: referring to the U.S. and the U.S.S.R. as “Sam” and “Joe” respectively; his ever-growing series of styles (Paul Erdős, P.G.O.M., A.D., etc.) and so on. One remark I remember in particular concerned the $3000 bounty he offered for proving what is sometimes known as the Erdős-Túran conjecture: if !!S!! is a subset of the natural numbers, and if !!\sum_{n\in S}\frac 1n!! diverges, then !!S!! contains arbitrarily long arithmetic progressions. (A special case of this is that the primes contain arbitrarily long arithmetic progressions, which was proved in 2004 by Green and Tao, but which at the time was a long-standing conjecture.) Although the $3000 was at the time the largest bounty ever offered by Erdős, he said it was really a bad joke, because to solve the problem would require so much effort that the per-hour payment would be minuscule. I made a special trip down to Philadelphia to attend the talk, with the intention of visiting my girlfriend at Bryn Mawr afterward. I arrived at the Penn math building early and wandered around the halls to kill time before the talk. And as I passed by an office with an open door, I saw Erdős sitting in the antechamber on a small sofa. So I sat down beside him and started telling him about my favorite graph theory problem. Many people, preparing to give a talk to a large roomful of strangers, would have found this annoying and intrusive. Some people might not want to talk about graph theory with a passing stranger. But most people are not Paul Erdős, and I think what I did was probably just the right thing; what you don't do is sit next to Erdős and then ask how his flight was and what he thinks of recent politics. We talked about my problem, and to my great regret I don't remember any of the mathematical details of what he said. But he did not know the answer offhand, he was not able solve it instantly, and he did say it was interesting. So! I had a conversation with Erdős about graph theory that was not a waste of his time, and I think I can count that as one of my lifetime accomplishments. After a little while it was time to go down to the auditorium for the the talk, and afterward one of the organizers saw me, perhaps recognized me from the sofa, and invited me to the guest dinner, which I eagerly accepted. At the dinner, I was thrilled because I secured a seat next to Erdős! But this was a beginner mistake: he fell asleep almost immediately and slept through dinner, which, I learned later, was completely typical. [Other articles in category /math] permanent link Thu, 15 Jun 2017Rik Signes brought to my attention that since version 5.1 Unicode has contained the following excitingly-named characters:
I looked into this a little and found out what they are for. It makes a lot of sense! The details were provided by “Telugu Measures and Arithmetic Marks” by Nāgārjuna Venna. Telugu is the third-most widely spoken language in India, spoken mostly in the southeast part of the country. Traditional Telugu units of measurement are often divided into four or eight subunits. For example, the tūmu is divided into four kuṁcamulu, the kuṁcamulu, into four mānikalu, and the mānikalu into four sōlalu. These days they mainly use liters like everyone else. But the traditional measurements are mostly divided into fours, so amounts are written with a base-10 integer part and a base-4 fractional part. The characters above are the base-4 fractional digits. To make the point clearer, I hope, let's imagine that we are using the Telugu system, but with the familar western-style symbols 0123456789 instead of the Telugu digits ౦౧౨౩౪౫౬౭౮౯. (The Telugu had theirs first of course.) And let's use 0-=Z as our base-four fractional digits, analogous to Telugu ౦౼౽౾. (As in Telugu, we'll use the same zero symbol for both the integer and the fractional parts.) Then to write the number of gallons (7.4805195) in a cubic foot, we say
which is 7 gallons plus one (-) quart plus three (Z) cups plus two (=) quarter-cups plus three (Z) tablespoons plus zero (0) drams, a total of 7660 drams almost exactly. Or we could just round off to 7.=, seven and a half gallons. (For the benefit of readers who might be a bit rusty on the details of these traditional European measurements, I should mention that there are four drams in a tablespoon, four tablespoons in a quarter cup, four quarter cups in a cup, four cups in a quart, and four quarts in a gallon, so 4⁵ = 1024 drams in a gallon and 7.4805195·4⁵ = 7660.052 drams in a cubic foot. Note also that these are volume (fluid) drams, not mass drams, which are different.) We can omit the decimal point (as the Telegu did) and write
and it is still clear where the integer part leaves off and the fraction begins, because we are using special symbols for the fractional part. But no, this isn't quite enough, because if we wrote 20ZZ= it might not be clear whether we meant 20.ZZ= or 2.0ZZ=. So the system has an elaboration. In the odd positions, we don't use the 0-=Z symbols; we use Q|HN instead. And we don't write 7-Z=Z0, we write
This is always unambiguous: 20.ZZ= is actually written 20NZH and 2.0ZZ= is written 2QZN=, quite different. This is all fanciful in English, but Telugu actually did this. Instead of 0-=Z they had ౦౼౽౾ as I mentioned before. And instead of Q|HN they had ౸౹౺౻. So if the Telugu were trying to write 7.4805195, where we had 7|ZHZQ they might have written ౭౹౾౺౾౸. Like us, they then appended an abbreviation for the unit of measurement. Instead of “gal.” for gallon they might have put ఘ (letter “gha”), so ౭౹౾౺౾౸ఘ. It's all reasonably straightforward, and also quite sensible. If you have ౭౹౾౺ tūmu, you can read off instantly that there are ౺ (two) sōlalu left over, just as you can see that $7.43 has three pennies left over. Notice that both sets of Telugu fraction digits are easy to remember: the digits for 3 have either three horizonal strokes ౾ or three vertical strokes ౻, and the others similarly. I have an idea that the alternating vertical-horizontal system might have served as an error-detection mechanism: if a digit is omitted, you notice right away because the next symbol is wrong. I find this delightful. A few years back I read all of The Number Concept: Its Origin and Development (1931) by Levi Leonard Conant, hoping to learn something really weird, and I was somewhat disappointed. Conant spends most of his book describing the number words and number systems used by dozens of cultures and almost all of them are based on ten, and a few on five or twenty. (“Any number system which passes the limit 10 is reasonably sure to have either a quinary, a decimal, or a vigesimal structure.”) But he does not mention Telugu! [Other articles in category /math] permanent link Sun, 05 Mar 2017
I said I had found an unusually difficult puzzle of this type, which is to make 2,5,6,6 total to 17. This is rather difficult. (I will reveal the solution later in this article.) Several people independently wrote to advise me that it is even more difficult to make 3,3,8,8 total to 24. They were right; it is amazingly difficult. After a couple of weeks I finally gave up and asked the computer, and when I saw the answer I didn't feel bad that I hadn't gotten it myself. (The solution is here if you want to give up without writing a program.) From now on I will abbreviate the two puzzles of the previous paragraph as «2 5 6 6 ⇒ 17» and «3 3 8 8 ⇒ 24», and others similarly. The article also inspired a number of people to write their own solvers and send them to me, and comparing them was interesting. My solver followed the tree search technique that I described in chapter 5 of Higher-Order Perl, and which has become so familiar to me that by now I can implement it without thinking about it very hard:
This is precisely a breadth-first search. To make it into depth-first
search, replace the queue with a stack. To make a heuristically
directed search, replace In my solver, each node contains a list of available expressions, annotated with its numerical value. Initially, the expressions are single numbers and the values are the same, say
Whether you represent expressions as strings or as something more structured depends on what you need to do with them at the end. If you just need to print them out, strings are good enough and are easy to handle. A node represents a successful search if it contains only a single expression and if the expression's value is the target sum, say 24:
From a node, the search should proceed by selecting two of the expressions, removing them from the node, selecting a legal operation, combining the two expressions into a single expression, and inserting the result back into the node. For example, from the initial node shown above, the search might continue by subtracting the fourth expression from the second:
or by multiplying the second and the third:
When the program encounters that first node it will construct both of these, and many others, and put them all into the queue to be investigated later. From
the search might proceed by dividing the first expression by the third:
Then perhaps by subtracting the first from the second:
From here there is no way to proceed, so when this node is removed from the queue, nothing is added to replace it. Had it been a winner, it would have been printed out, but since !!-\frac{35}3!! is not the target value of 24, it is silently discarded. To solve a puzzle of the «a b c d ⇒ t» sort requires examining a few thousand nodes. On modern hardware this takes approximately zero seconds. The actual code for my solver is a lot of Perl gobbledygook that may not be of general interest so I will provide a link for people who are interested in deciphering it. It also represents my second attempt: I lost the code that I described in the earlier article and had to rewrite it. It is rather bigger than I would have liked. Stuff goes wrongPeople showed me a lot of programs to solve this, and many didn't work. There are a few hard cases that several of them get wrong. FractionsSome puzzles require that some subexpressions have fractional values. Many of the programs people showed me used integer arithmetic (sometimes implicitly and unintentionally) and failed to solve those puzzles. We can detect this by asking for a solution to «2 5 6 6 ⇒ 17», which requires a fraction. The solution is !!6×(2+(5÷6))!!. A program using integer arithmetic will calculate !!5÷6 = 0!! and fail to recognize the solution. Several people on Twitter made this mistake and then mistakenly claimed that there was no solution at all. Usually it was possible to correct their programs by changing
to
or something like that. Some people also surprised me by claiming that I had lied when I stated that the puzzle could be solved without any “underhanded tricks”, and that the use of intermediate fractions was itself an underhanded trick. Your Honor, I plead not guilty. I originally described the puzzle this way:
The objectors are implicitly claiming that when you combine 5 and 6 with the “ordinary arithmetic operation” of division, you get something other than !!\frac56!!. This is an indefensible claim. I wasn't even trying to be tricky! It never occurred to me that fractions were something that some people would consider underhanded, and now that it has been suggested, I reject the suggestion. Folks, the result of division can be a fraction. Fractions are not some sort of obscure mathematical pettifoggery. They have been with us for at least 3,500 years now, so it is time everyone got used to them. Floating-point errorSome programs used floating-point arithmetic to deal with the fractions and then fell foul of floating-point error. I will defer discussion of this to a future article. I've complained about floating-point numbers on this blog before. ( 1 2 3 4 5 ) God, how I loathe them. Expression constructionA more subtle error that several programs made was to assume that all expressions can be constructed by combining a previous expression with a single input number. For example, to solve «2 3 5 7 ⇒ 24», you multiply 3 by 7 to get 21, then add 5 to get 26, then subtract 2 to get 24. But not every puzzle can be solved this way. Consider «2 3 5 7 ⇒ 41». You start by multiplying 2 by 3 to get 6, but if you try to combine the 6 with either 5 or 7 at this point you will lose. The only solution is to put the 6 aside and multiply 5 by 7 to get 35. Then add the 6 and the 35 to get 41. Another way to put this is that an unordered binary tree with 4 leaves can take two different shapes. (Imagine filling the green circles with numbers and the pink squares with operators.) The right-hand type of structure is sometimes necessary, as with «2 3 5 7 ⇒ 41». But several of the proposed solutions produced only expressions with structures like that on the left. Here's Sebastian Fischer's otherwise very elegant Haskell solution, in its entirety:
You can see the problem in the last line. Often the way these programs worked was to generate every possible permutation of the inputs and then apply the operators to the input lists stackwise: pop the first two values, combine them, push the result, and repeat. Here's a relevant excerpt from a program by Tim Dierks, this time in Python:
Here the expression structure is implicit, but the current result is always made by combining one of the input numbers with the old result. I have seen many people get caught by this and similar traps in the
past. I once posed the problem of enumerating all the strings of
balanced parentheses of a given length,
and several people assumed that all such strings have the form Division by zeroA less common error exhibited by some programs was a failure to properly deal with division by zero. «2 5 6 6 ⇒ 17» has a solution, and if a program dies while checking !!2+(5÷(6-6))!! and doesn't find the solution, that's a bug. Programs that workedIngo Blechschmidt (Haskell)Ingo Blechschmidt showed me a solution in
Haskell. The code is quite short.
M. Blechschmidt's program defines a synthetic expression type and an
evaluator for it. It defines a function By “synthetic expression type” I mean this:
Probably 80% of the Haskell programs ever written have something like this in them somewhere. This approach has a lot of boilerplate. For example, M. Blechschmidt's program then continues:
Having made up our own synonyms for the arithmetic operators ( I spent a while trying to shorten the code by using a less artificial expression type:
but I was disappointed; I was only able to cut it down by 18%, from 34 lines to 28. I hope to discuss this in a future article. By the way, “Blechschmidt” is German for “tinsmith”. Shreevatsa R. (Python)Shreevatsa R. showed me a solution in Python. It generates every possible expression and prints it out with its value. If you want to filter the voluminous output for a particular target value, you do that later. Shreevatsa wrote up an extensive blog article about this which also includes a discussion about eliminating duplicate expressions from the output. This is a very interesting topic, and I have a lot to say about it, so I will discuss it in a future article. Jeff Fowler (Ruby)Jeff Fowler of the Recurse Center wrote a compact solution in Ruby that he described as “hot garbage”. Did I say something earlier about Perl gobbledygook? It's nice that Ruby is able to match Perl's level of gobbledygookitude. This one seems to get everything right, but it fails mysteriously if I replace the floating-point constants with integer constants. He did provide a version that was not “egregiously minified” but I don't have it handy. Lindsey Kuper (Scheme)Lindsey Kuper wrote a series of solutions in the Racket dialect of Scheme, and discussed them on her blog along with some other people’s work. M. Kuper's first draft was 92 lines long (counting whitespace) and when I saw it I said “Gosh, that is way too much code” and tried writing my own in Scheme. It was about the same size. (My Perl solution is also not significantly smaller.) Martin Janecke (PHP)I saved the best for last. Martin Janecke showed me an almost flawless solution in PHP that uses a completely different approach than anyone else's program. Instead of writing a lot of code for generating permutations of the input, M. Janecke just hardcoded them:
Then three nested loops generate the selections of operators:
Expressions are constructed from templates:
(I don't think those templates are all necessary, but hey, whatever.)
Finally, another set of nested loops matches each ordering of the
input numbers with each selection of operators, uses
If loving this is wrong, I don't want to be right. It certainly satisfies Larry Wall's criterion of solving the problem before your boss fires you. The same approach is possible in most reasonable languages, and some unreasonable ones, but not in Haskell, which was specifically constructed to make this approach as difficult as possible. M. Janecke wrote up a blog article about this, in
German. He says “It's not an elegant
program and PHP is probably not an obvious choice for arithmetic
puzzles, but I think it works.” Indeed it does. Note that the use of
ThanksThanks to everyone who discussed this with me. In addition to the people above, thanks to Stephen Tu, Smylers, Michael Malis, Kyle Littler, Jesse Chen, Darius Bacon, Michael Robert Arntzenius, and anyone else I forgot. (If I forgot you and you want me to add you to this list, please drop me a note.) Coming upI have enough material for at least three or four more articles about this that I hope to publish here in the coming weeks. But the previous article on this subject ended similarly, saying
and that was in July 2016, so don't hold your breath. [ Addendum 20170820: the next article is ready. I hope you weren't holding your breath! ] [Other articles in category /math] permanent link Tue, 07 Feb 2017
How many 24 puzzles are there?
[ Note: The tables in this article are important, and look unusually crappy if you read this blog through an aggregator. The properly-formatted version on my blog may be easier to follow. ] A few months ago I wrote about puzzles of the following type: take four digits, say 1, 2, 7, 7, and, using only +, -, ×, and ÷, combine them to make the number 24. Since then I have been accumulating more and more material about these puzzles, which will eventually appear here. But meantime here is a delightful tangent. In the course of investigating this I wrote programs to enumerate the solutions of all possible puzzles, and these programs were always much faster than I expected at first. It appears as if there are 10,000 possible puzzles, from «0,0,0,0» through «9,9,9,9». But a moment's thought shows that there are considerably fewer, because, for example, the puzzles «7,2,7,1», «1,2,7,7», «7,7,2,1», and «2,7,7,1» are all the same puzzle. How many puzzles are there really? A back-of-the-envelope estimate is that only about 1 in 24 puzzles is really distinct (because there are typically 24 ways to rearrange the elements of a puzzle) and so there ought to be around !!\frac{10000}{24} \approx 417!! puzzles. This is an undercount, because there are fewer duplicates of many puzzles; for example there are not 24 variations of «1,2,7,7», but only 12. The actual number of puzzles turns out to be 715, which I think is not an obvious thing to guess. Let's write !!S(d,n)!! for the set of sequences of length !!n!! containing up to !!d!! different symbols, with the duplicates removed: when two sequences are the same except for the order of their symbols, we will consider them the same sequence. Or more concretely, we may imagine that the symbols are sorted into nondecreasing order, so that !!S(d,n)!! is the set of nondecreasing sequences of length !!n!! of !!d!! different symbols. Let's also write !!C(d,n)!! for the number of elements of !!S(d,n)!!. Then !!S(10, 4)!! is the set of puzzles where input is four digits. The claim that there are !!715!! such puzzles is just that !!C(10,4) = 715!!. A tabulation of !!C(\cdot,\cdot)!! reveals that it is closely related to binomial coefficients, and indeed that $$C(d,n)=\binom{n+d-1}{d-1}.\tag{$\heartsuit$}$$ so that the surprising !!715!! is actually !!\binom{13}{9}!!. This is not hard to prove by induction, because !!C(\cdot,\cdot)!! is easily shown to obey the same recurrence as !!\binom\cdot\cdot!!: $$C(d,n) = C(d-1,n) + C(d,n-1).\tag{$\spadesuit$}$$ To see this, observe that an element of !!C(d,n)!! either begins with a zero or with some other symbol. If it begins with a zero, there are !!C(d,n-1)!! ways to choose the remaining !!n-1!! symbols in the sequence. But if it begins with one of the other !!d-1!! symbols it cannot contain any zeroes, and what we really have is a length-!!n!! sequence of the symbols !!1\ldots (d-1)!!, of which there are !!C(d-1, n)!!.
Now we can observe that !!\binom74=\binom73!! (they are both 35) so that !!C(5,3) = C(4,4)!!. We might ask if there is a combinatorial proof of this fact, consisting of a natural bijection between !!S(5,3)!! and !!S(4,4)!!. Using the relation !!(\spadesuit)!! we have: $$ \begin{eqnarray} C(4,4) & = & C(3, 4) + & C(4,3) \\ C(5,3) & = & & C(4,3) + C(5,2) \\ \end{eqnarray}$$ so part of the bijection, at least, is clear: There are !!C(4,3)!! elements of !!S(4,4)!! that begin with a zero, and also !!C(4,3)!! elements of !!S(5, 3)!! that do not begin with a zero, so whatever the bijection is, it ought to match up these two subsets of size 20. This is perfectly straightforward; simply match up !!«0, a, b, c»!! (blue) with !!«a+1, b+1, c+1»!! (pink), as shown at right. But finding the other half of the bijection, between !!S(3,4)!! and !!S(5,2)!!, is not so straightforward. (Both have 15 elements, but we are looking for not just any bijection but for one that respects the structure of the elements.) We could apply the recurrence again, to obtain: $$ \begin{eqnarray} C(3,4) & = \color{darkred}{C(2, 4)} + \color{darkblue}{C(3,3)} \\ C(5,2) & = \color{darkblue}{C(4,2)} + \color{darkred}{C(5,1)} \end{eqnarray}$$ and since $$ \begin{eqnarray} \color{darkred}{C(2, 4)} & = \color{darkred}{C(5,1)} \\ \color{darkblue}{C(3,3)} & = \color{darkblue}{C(4,2)} \end{eqnarray}$$ we might expect the bijection to continue in that way, mapping !!\color{darkred}{S(2,4) \leftrightarrow S(5,1)}!! and !!\color{darkblue}{S(3,3) \leftrightarrow S(4,2)}!!. Indeed there is such a bijection, and it is very nice. To find the bijection we will take a detour through bitstrings. There is a natural bijection between !!S(d, n)!! and the bit strings that contain !!d-1!! zeroes and !!n!! ones. Rather than explain it with pseudocode, I will give some examples, which I think will make the point clear. Consider the sequence !!«1, 1, 3, 4»!!. Suppose you are trying to communicate this sequence to a computer. It will ask you the following questions, and you should give the corresponding answers:
At each stage the
computer asks about the identity of the next symbol. If the answer is
“yes” the computer has learned another symbol and moves on to the next
element of the sequence. If it is “no” the computer tries guessing a
different symbol. The “yes” answers become ones and “no”
answers become zeroes, so that the resulting bit string is It sometimes happens that the computer figures out all the elements of the sequence before using up its !!n+d-1!! questions; in this case we pad out the bit string with zeroes, or we can imagine that the computer asks some pointless questions to which the answer is “no”. For example, suppose the sequence is !!«0, 1, 1, 1»!!:
The bit string is We can reverse the process, simply taking over the role of the
computer. To find the sequence that corresponds to the bit string
We have recovered the sequence !!«1, 1, 2, 4»!! from the
bit string This correspondence establishes relation !!(\heartsuit)!! in a different way from before: since there is a natural bijection between !!S(d, n)!! and the bit strings with !!d-1!! zeroes and !!n!! ones, there are certainly !!\binom{n+d-1}{d-1}!! of them as !!(\heartsuit)!! says because there are !!n+d-1!! bits and we may choose any !!d-1!! to be the zeroes. We wanted to see why !!C(5,3) = C(4,4)!!. The detour above shows that there is a simple bijection between
on one hand, and between
on the other hand. And of course the bijection between the two sets of bit strings is completely obvious: just exchange the zeroes and the ones. The table below shows the complete bijection between !!S(4,4)!! and its descriptive bit strings (on the left in blue) and between !!S(5, 3)!! and its descriptive bit strings (on the right in pink) and that the two sets of bit strings are complementary. Furthermore the top portion of the table shows that the !!S(4,3)!! subsets of the two families correspond, as they should—although the correct correspondence is the reverse of the one that was displayed earlier in the article, not the suggested !!«0, a, b, c» \leftrightarrow «a+1, b+1, c+1»!! at all. Instead, in the correct table, the initial digit of the !!S(4,4)!! entry says how many zeroes appear in the !!S(5,3)!! entry, and vice versa; then the increment to the next digit says how many ones, and so forth.
Observe that since !!C(d,n) = \binom{n+d-1}{d-1} = \binom{n+d-1}{n} = C(n+1, d-1)!! we have in general that !!C(d,n) = C(n+1, d-1)!!, which may be surprising. One might have guessed that since !!C(5,3) = C(4,4)!!, the relation was !!C(d,n) = C(d+1, n-1)!! and that !!S(d,n)!! would have the same structure as !!S(d+1, n-1)!!, but it isn't so. The two arguments exchange roles. Following the same path, we can identify many similar ‘coincidences’. For example, there is a simple bijection between the original set of 715 puzzles, which was !!S(10,4)!!, and !!S(5,9)!!, the set of nondecreasing sequences of !!0\ldots 4!! of length !!9!!. [ Thanks to Bence Kodaj for a correction. ] [Other articles in category /math] permanent link Thu, 15 Dec 2016
Let's decipher a thousand-year-old magic square
The Parshvanatha temple in Madhya Pradesh, India was built around 1,050 years ago. Carved at its entrance is this magic square: The digit signs have changed in the past thousand years, but it's a quick and fun puzzle to figure out what they mean using only the information that this is, in fact, a magic square. A solution follows. No peeking until you've tried it yourself! There are 9 one-digit entries
It is tempting to imagine that is 4. But we can see it's not so. Adding up the rightmost column, we get
+ + + = so that must be an odd number. We know it isn't 1 (because is 1), and it can't be 7 or 9 because appears in the bottom row and there is no 17 or 19. So must be 3 or 5. Now if were 3, then would be 13, and the third column would be
+ + + = and then would be 10, which is too big. So must be 5, and this means that is 4 and is 8. ( appears only a as a single-digit numeral, which is consistent with it being 8.) The top row has
+ + + = so that + = 9. only appears as a single digit and we already used 8 so must be 7 or 9. But 9 is too big, so it must be 7, and then is 2. is the only remaining unknown single-digit numeral, and we already know 7 and 8, so is 9. The leftmost column tells us that is 16, and the last two entries, and are easily discovered to be 13 and 3. The decoded square is:
I like that people look at the right-hand column and immediately see 18 + 11 + 4 + 8 but it's actually 14 + 11 + 5 + 4. This is an extra-special magic square: not only do the ten rows, columns, and diagonals all add up to 34, so do all the four-cell subsquares, so do any four squares arranged symmetrically about the center, and so do all the broken diagonals that you get by wrapping around at the edges. [ Addendum: It has come to my attention that the digit symbols in the magic square are not too different from the current forms of the digit symbols in the Gujarati script. ] [ Addendum 20161217: The temple is not very close to Gujarat or to the area in which Gujarati is common, so I guess that the digit symbols in Indian languages have evolved in the past thousand years, with the Gujarati versions remaining closest to the ancient forms, or else perhaps Gujarati was spoken more widely a thousand years ago. I would be interested to hear about this from someone who knows. ] [ Addendum 20170130: Shreevatsa R. has contributed a detailed discussion of the history of the digit symbols. ] [Other articles in category /math] permanent link Sat, 30 Jul 2016
Decomposing a function into its even and odd parts
As I have mentioned before, I am not a sudden-flash-of-insight person. Every once in a while it happens, but usually my thinking style is to minutely examine a large mass of examples and then gradually synthesize some conclusion about them. I am a penetrating but slow thinker. But there have been a few occasions in my life when the solution to a problem struck me suddenly out of the blue. One such occasion was on the first day of my sophomore honors physics class in 1987. This was one of the best classes I took in my college career. It was given by Professor Stephen Nettel, and it was about resonance phenomena. I love when a course has a single overarching theme and proceeds to examine it in detail; that is all too rare. I deeply regret leaving my copy of the course notes in a restaurant in 1995. The course was very difficult, But also very satisfying. It was also somewhat hair-raising, because of Professor Nettel's habit of saying, all through the second half “Don't worry if it doesn't seem to make any sense, it will all come together for you during the final exam.” This was not reassuring. But he was right! It did all come together during the final exam. The exam had two sets of problems. The problems on the left side of the exam paper concerned some mechanical system, I think a rod fixed at one end and free at the other, or something like that. This set of problems asked us to calculate the resonant frequency of the rod, its rate of damping at various driving frequencies, and related matters. The right-hand problems were about an electrical system involving a resistor, capacitor, and inductor. The questions were the same, and the answers were formally identical, differing only in the details: on the left, the answers involved length, mass and stiffness of the rod, and on the right, the resistance, capacitance, and inductance of the electrical components. It was a brilliant exam, and I have never learned so much about a subject during the final exam. Anyway, I digress. After the first class, we were assigned homework. One of the problems was
(Maybe I should explain that an even function is one which is symmetric across the !!y!!-axis; formally it is a function !!f!! for which !!f(x) = f(-x)!! for every !!x!!. For example, the function !!x^2-4!!, shown below left. An odd function is one which is symmetric under a half-turn about the origin; formally it satisfies !!f(x) = -f(-x)!! for all !!x!!. For example !!\frac{x^3}{20}!!, shown below right.)
I found this claim very surprising, and we had no idea how to solve it. Well, not quite no idea: I knew that functions could be expanded in Fourier series, as the sum of a sine series and a cosine series, and the sine part was odd while the cosine part was even. But this seemed like a bigger hammer than was required, particularly since new sophomores were not expected to know about Fourier series. I had the privilege to be in that class with Ron Buckmire, and I remember we stood outside the class building in the autumn sunshine and discussed the problem. I might have been thinking that perhaps there was some way to replace the negative part of !!f!! with a reflected copy of the positive part to make an even function, and maybe that !!f(x) + f(-x)!! was always even, when I was hit from the blue with the solution: $$ \begin{align} f_e(x) & = \frac{f(x) + f(-x)}2 \text{ is even},\\ f_o(x) & = \frac{f(x) - f(-x)}2 \text{ is odd, and}\\ f(x) &= f_e(x) + f_o(x) \end{align} $$ So that was that problem solved. I don't remember the other three problems in that day's homework, but I have remembered that one ever since. But for some reason, it didn't occur to me until today to think about what those functions actually looked like. Of course, if !!f!! itself is even, then !!f_e = f!! and !!f_o = 0!!, and similarly if !!f!! is odd. But most functions are neither even nor odd. For example, consider the function !!2^x!!, which is neither even nor odd. Then we get $$ \begin{align} f_e(x) & = \frac{2^x + 2^{-x}}2\\ f_o(x) & = \frac{2^x - 2^{-x}}2 \end{align} $$ The graph is below left. The solid red line is !!2^x!!, and the blue and purple dotted lines are !!f_e!! and !!f_o!!. The red line is the sum of the blue and purple lines. I thought this was very interesting-looking, but a little later I realized that I had already known what these graphs would look like, because !!2^x!! is just like !!e^x!!, and for !!e^x!! the even and odd components are exactly the familiar !!\cosh!! and !!\sinh!! functions. (Below left, !!2^x!!; below right, !!e^x!!.)
I wasn't expecting polynomials to be more interesting, but they were. (Polynomials whose terms are all odd powers of !!x!!, such as !!x^{13} - 4x^5 + x!!, are always odd functions, and similarly polynomials whose terms are all even powers of !!x!! are even functions.) For example, consider !!(x-1)^2!!, which is neither even nor odd. We don't even need the !!f_e!! and !!f_o!! formulas to separate this into even and odd parts: just expand !!(x-1)^2!! as !!x^2 - 2x + 1!! and separate it into odd and even powers, !!-2x!! and !!x^2 + 1!!:
Or we could do !!\frac{(x-1)^3}3!! similarly, expanding it as !!\frac{x^3}3 - x^2 + x -\frac13!! and separating this into !!-x^2 -\frac13!! and !!\frac{x^3}3 + x!!:
I love looking at these and seeing how the even blue line and the odd purple line conspire together to make whatever red line I want. I kept wanting to try familiar simple functions, like !!\frac1x!!, but many of these are either even or odd, and so are uninteresting for this application. But you can make an even or an odd function into a neither-even-nor-odd function just by translating it horizontally, which you do by replacing !!x!! with !!x-c!!. So the next function I tried was !!\frac1{x+1}!!, which is the translation of !!\frac 1x!!. Here I got a surprise. I knew that !!\frac1{x+1}!! was undefined at !!x=-1!!, so I graphed it only for !!x>-1!!. But the even component is !!\frac12\left(\frac1{1+x}+\frac1{1-x}\right)!!, which is undefined at both !!x=-1!! and at !!x=+1!!. Similarly the odd component is undefined at two points. So the !!f = f_o + f_e!! formula does not work quite correctly, failing to produce the correct value at !!x=1!!, even though !!f!! is defined there. In general, if !!f!! is undefined at some !!x=c!!, then the decomposition into even and odd components fails at !!x=-c!! as well. The limit $$\lim_{x\to -c} f(x) = \lim_{x\to -c} \left(f_o(x) + f_e(x)\right)$$ does hold, however. The graph below shows the decomposition of !!\frac1{x+1}!!.
Vertical translations are uninteresting: they leave !!f_o!! unchanged and translate !!f_e!! by the same amount, as you can verify algebraically or just by thinking about it. Following the same strategy I tried a cosine wave. The evenness of the cosine function is one of its principal properties, so I translated it and used !!\cos (x+1)!!. The graph below is actually for !!5\cos(x+1)!! to prevent the details from being too compressed:
This reminded me of the time I was fourteen and graphed !!\sin x + \cos x!! and was surprised to see that it was another perfect sinusoid. But I realized that there was a simple way to understand this. I already knew that !!\cos(x + y) = \sin x\cos y + \sin y \cos x!!. If you take !!y=\frac\pi4!! and multiply the whole thing by !!\sqrt 2!!, you get $$\sqrt2\cos\left(x + \frac\pi4\right) = \sqrt2\sin x\cos\frac\pi4 + \sqrt2\cos x\sin\frac\pi4 = \sin x + \cos x$$ so that !!\sin x + \cos x!! is just a shifted, scaled cosine curve. The decomposition of !!\cos(x+1)!! is even simpler because you can work forward instead of backward and find that !!\cos(x+1) = \sin x\cos 1 + \cos x \sin 1!!, and the first term is odd while the second term is even, so that !!\cos(x+1)!! decomposes as a sum of an even and an odd sinusoid as you see in the graph above. Finally, I tried a Poisson distribution, which is highly asymmetric. The formula for the Poisson distribution is !!\frac{\lambda^xe^\lambda}{x!}!!, for some constant !!\lambda!!. The !!x! !! in the denominator is only defined for non-negative integer !!x!!, but you can extend it to fractional and negative !!x!! in the usual way by using !!\Gamma(x+1)!! instead, where !!\Gamma!! is the Gamma function. The !!\Gamma!! function is undefined at zero and negative integers, but fortunately what we need here is the reciprocal gamma function !!\frac1{\Gamma(x)}!!, which is perfectly well-behaved. The results are spectacular. The graph below has !!\lambda = 0.8!!.
The part of this with !!x\ge 0!! is the most interesting to me, because the Poisson distribution has a very distinctive shape, and once again I like seeing the blue and purple !!\Gamma!! functions working together to make it. I think it's just great how the red line goes gently to zero as !!x!! increases, even though the even and the odd components are going wild. (!!x! !! increases rapidly with !!x!!, so the reciprocal !!\Gamma!! function goes rapidly to zero. But the even and odd components also have a !!\frac1{\Gamma(-x)}!! part, and this is what dominates the blue and purple lines when !!x >4!!.) On the !!x\lt 0!! side it has no meaning for me, and it's just wiggly lines. It hadn't occurred to me before that you could extend the Poisson distribution function to negative !!x!!, and I still can't imagine what it could mean, but I suppose why not. Probably some statistician could explain to me what the Poisson distribution is about when !!x<0!!. You can also consider the function !!\sqrt x!!, which breaks down completely, because either !!\sqrt x!! or !!\sqrt{-x}!! is undefined except when !!x=0!!. So the claim that every function is the sum of an even and an odd function fails here too. Except perhaps not! You could probably consider the extension of the square root function to the complex plane, and take one of its branches, and I suppose it works out just fine. The geometric interpretation of evenness and oddness are very different, of course, and you can't really draw the graphs unless you have four-dimensional vision. I have no particular point to make, except maybe that math is fun, even elementary math (or perhaps especially elementary math) and it's fun to see how it works out. The beautiful graphs in this article were made with Desmos. I had dreaded having to illustrate my article with graphs from Gnuplot (ugh) or Wolfram|α (double ugh) and was thrilled to find such a handsome alternative. [ Addendum: I've just discovered that in Desmos you can include a parameter in the functions that it graphs, and attach the parameter to a slider. So for example you can arrange to have it display !!(x+k)^3!! or !!e^{-(x+k)^2}!!, with the value of !!k!! controlled by the slider, and have the graph move left and right on the plane as you adjust the slider, with its even and odd parts changing in real time to match. ] [ For example, check out travelling Gaussians or varying sinusoid. ] [Other articles in category /math] permanent link Tue, 12 Jul 2016
A simple but difficult arithmetic puzzle
Lately my kids have been interested in puzzles of this type: You are given a sequence of four digits, say 1,2,3,4, and your job is to combine them with ordinary arithmetic operations (+, -, ×, and ÷) in any order to make a target number, typically 24. For example, with 1,2,3,4, you can go with $$((1+2)+3)×4 = 24$$ or with $$4×((2×3)×1) = 24.$$ We were stumped trying to make 6,6,5,2 total 24, so I hacked up a solver; then we felt a little foolish when we saw the solutions, because it is not that hard. But in the course of testing the solver, I found the most challenging puzzle of this type that I've ever seen. It is:
There are no underhanded tricks. For example, you may not concatenate 2 and 5 to make 25; you may not say !!6÷6=1!! and !!5+2=7!! and concatenate 1 and 7 to make !!17!!; you may not interpret the 17 as a base 12 numeral, etc. I hope to write a longer article about solvers in the next week or so. [ Addendum 20170305: The next week or so, ha ha. Anyway, here it is. ] [Other articles in category /math] permanent link Wed, 20 Apr 2016A classic puzzle of mathematics goes like this:
(The puzzle is, what just happened?) It's not hard to come up with variations on this. For example, picking three fractions at random, suppose the will says that the eldest child receives half the horses, the middle child receives one-fifth, and the youngest receives one-seventh. But the estate has only 59 horses and an argument ensues. All that is required for the sage to solve the problem is to lend the estate eleven horses. There are now 70, and after taking out the three bequests, !!70 - 35 - 14 - 10 = 11!! horses remain and the estate settles its debt to the sage. But here's a variation I've never seen before. This time there are 13 horses and the will says that the three children should receive shares of !!\frac12, \frac13,!! and !!\frac14!!. respectively. Now the problem seems impossible, because !!\frac12 + \frac13 + \frac14 \gt 1!!. But the sage is equal to the challenge! She leaps into the saddle of one of the horses and rides out of sight before the astonished heirs can react. After a day of searching the heirs write off the lost horse and proceed with executing the will. There are now only 12 horses, and the eldest takes half, or six, while the middle sibling takes one-third, or 4. The youngest heir should get three, but only two remain. She has just opened her mouth to complain at her unfair treatment when the sage rides up from nowhere and hands her the reins to her last horse. [Other articles in category /math] permanent link Fri, 18 Dec 2015I only posted three answers in August, but two of them were interesting.
I did ask a question this month: I was looking for a simpler version of the dogbone space construction. The dogbone space is a very peculiar counterexample of general topology, originally constructed by R.H. Bing. I mentioned it here in 2007, and said, at the time:
I did try to read it, but I did not try very hard, and I did not understand it. So my question this month was if there was a simpler example of the same type. I did not receive an answer, just a followup comment that no, there is no such example. [Other articles in category /math/se] permanent link Sun, 16 Aug 2015My overall SE posting volume was down this month, and not only did I post relatively few interesting items, I've already written a whole article about the most interesting one. So this will be a short report.
[Other articles in category /math/se] permanent link Tue, 28 Jul 2015A few months ago I wrote an article here called an ounce of theory is worth a pound of search and I have a nice followup. When I went looking for that article I couldn't find it, because I thought it was about how an ounce of search is worth a pound of theory, and that I was writing a counterexample. I am quite surprised to discover that that I have several times discussed how a little theory can replace a lot of searching, and not vice versa, but perhaps that is because the search is my default. Anyway, the question came up on math StackExchange today:
OP opined no, but had no argument. The first answer that appeared was somewhat elaborate and outlined a computer search strategy which claimed to reduce the search space to only 14,553 items. (I think the analysis is wrong, but I agree that the search space is not too large.) I almost wrote the search program. I have a program around that is something like what would be needed, although it is optimized to deal with a few oddly-shaped tiles instead of many similar tiles, and would need some work. Fortunately, I paused to think a little before diving in to the programming. For there is an easy answer. Suppose John solved the problem. Look at just one of the 7×11 faces of the big box. It is a 7×11 rectangle that is completely filled by 1×3 and 3×3 rectangles. But 7×11 is not a multiple of 3. So there can be no solution. Now how did I think of this? It was a very geometric line of reasoning. I imagined a 7×11×9 carton and imagined putting the small boxes into the carton. There can be no leftover space; every one of the 693 cells must be filled. So in particular, we must fill up the bottom 7×11 layer. I started considering how to pack the bottommost 7×11×1 slice with just the bottom parts of the small boxes and quickly realized it couldn't be done; there is always an empty cell left over somewhere, usually in the corner. The argument about considering just one face of the large box came later; I decided it was clearer than what I actually came up with. I think this is a nice example of the Pólya strategy “solve a simpler problem” from How to Solve It, but I was not thinking of that specifically when I came up with the solution. For a more interesting problem of the same sort, suppose you have six 2×2x1 slabs and three extra 1×1×1 cubes. Can you pack the nine pieces into a 3×3x3 box? [Other articles in category /math] permanent link Sun, 19 Jul 2015[ Notice: I originally published this report at the wrong URL. I moved it so that I could publish the June 2015 report at that URL instead. If you're seeing this for the second time, you might want to read the June article instead. ] A lot of the stuff I've written in the past couple of years has been on Mathematics StackExchange. Some of it is pretty mundane, but some is interesting. I thought I might have a little meta-discussion in the blog and see how that goes. These are the noteworthy posts I made in April 2015.
[Other articles in category /math/se] permanent link Fri, 03 Jul 2015
The annoying boxes puzzle: solution
There are two boxes on a table, one red and one green. One contains a treasure. The red box is labelled "exactly one of the labels is true". The green box is labelled "the treasure is in this box."It's not too late to try to solve this before reading on. If you want, you can submit your answer here:
Results
66.52% 300 red 25.72 116 not-enough-info 3.55 16 green 2.00 9 other 1.55 7 spam 0.44 2 red-with-qualification 0.22 1 attack 100.00 451 TOTAL One-quarter of respondents got
the right answer, that there is not enough information
given to solve the problem, Two-thirds of respondents said the
treasure was in the red box.
This is wrong. The treasure
is in the green box.
What?Let me show you. I stated:
The labels are as I said. Everything I told you was literally true. The treasure is definitely not in the red box. No, it is actually in the green box. (It's hard to see, but one of the items in the green box is the gold and diamond ring made in Vienna by my great-grandfather, which is unquestionably a real treasure.) So if you said the treasure must be in the red box, you were simply mistaken. If you had a logical argument why the treasure had to be in the red box, your argument was fallacious, and you should pause and try to figure out what was wrong with it. I will discuss it in detail below.
SolutionThe treasure is undeniably in the green box. However, correct answer to the puzzle is "no, you cannot figure out which box contains the treasure". There is not enough information given. (Notice that the question was not “Where is the treasure?” but “Can you figure out…?”)
(Fallacious) Argument AMany people erroneously conclude that the treasure is in the red box, using reasoning something like the following:
What's wrong with argument A?Here are some responses people commonly have when I tell them that argument A is fallacious:
"If the treasure is in the green box, the red label is lying." Not quite, but argument A explicitly considers the possibility that the red label was false, so what's the problem?
"If the treasure is in the green box, the red label is inconsistent." It could be. Nothing in the puzzle statement ruled this out. But actually it's not inconsistent, it's just irrelevant.
"If the treasure is in the green box, the red label is meaningless." Nonsense. The meaning is plain: it says “exactly one of these labels is true”, and the meaning is that exactly one of the labels is true. Anyone presenting argument A must have understood the label to mean that, and it is incoherent to understand it that way and then to turn around and say that it is meaningless! (I discussed this point in more detail in 2007.)
"But the treasure could have been in the red box." True! But it is not, as you can see in the pictures. The puzzle does not give enough information to solve the problem. If you said that there was not enough information, then congratulations, you have the right answer. The answer produced by argument A is incontestably wrong, since it asserts that the treasure is in the red box, when it is not.
"The conditions supplied by the puzzle statement are inconsistent." They certainly are not. Inconsistent systems do not have models, and in particular cannot exist in the real world. The photographs above demonstrate a real-world model that satisfies every condition posed by the puzzle, and so proves that it is consistent.
"But that's not fair! You could have made up any random garbage at all, and then told me afterwards that you had been lying." Had I done that, it would have been an unfair puzzle. For example, suppose I opened the boxes at the end to reveal that there was no treasure at all. That would have directly contradicted my assertion that "One [box] contains a treasure". That would have been cheating, and I would deserve a kick in the ass.But I did not do that. As the photograph shows, the boxes, their colors, their labels, and the disposition of the treasure are all exactly as I said. I did not make up a lie to trick you; I described a real situation, and asked whether people they could diagnose the location of the treasure. (Two respondents accused me of making up lies. One said: There is no treasure. Both labels are lying. Look at those boxes. Do you really think someone put a treasure in one of them just for this logic puzzle?What can I say? I did put a treasure in a box just for this logic puzzle. Some of us just have higher standards.)
"But what about the labels?" Indeed! What about the labels?
The labels are worthlessThe labels are red herrings; the provide no information. Consider the following version of the puzzle:
There are two boxes on a table, one red and one green. One contains a treasure.Obviously, the problem cannot be solved from the information given. Now consider this version:
There are two boxes on a table, one red and one green. One contains a treasure. The red box is labelled "gcoadd atniy fnck z fbi c rirpx hrfyrom". The green box is labelled "ofurb rz bzbsgtuuocxl ckddwdfiwzjwe ydtd."One is similarly at a loss here. (By the way, people who said one label was meaningless: this is what a meaningless label looks like.)
There are two boxes on a table, one red and one green. One contains a treasure. The red box is labelled "exactly one of the labels is true". The green box is labelled "the treasure is in this box."The point being that in the absence of additional information, there is no reason to believe that the labels give any information about the contents of the boxes, or about labels, or about anything at all. This should not come as a surprise to anyone. It is true not just in annoying puzzles, but in the world in general. A box labeled “fresh figs” might contain fresh figs, or spoiled figs, or angry hornets, or nothing at all. Why doesn't every logic puzzle fall afoul of this problem?I said as part of the puzzle conditions that there was a treasure in one box. For a fair puzzle, I am required to tell the truth about the puzzle conditions. Otherwise I'm just being a jerk.Typically the truth or falsity of the labels is part of the puzzle conditions. Here's a typical example, which I took from Raymond Smullyan's What is the name of this book? (problem 67a):
… She had the following inscriptions put on the caskets:Notice that the problem condition gives the suitor a certification about the truth of the labels, on which he may rely. In the quotation above, the certification is in boldface. A well-constructed puzzle will always contain such a certification, something like “one label is true and one is false” or “on this island, each person always lies, or always tells the truth”. I went to What is the Name of this Book? to get the example above, and found more than I had bargained for: problem 70 is exactly the annoying boxes problem! Smullyan says:
Good heavens, I can take any number of caskets that I please and put an object in one of them and then write any inscriptions at all on the lids; these sentences won't convey any information whatsoever.(Page 65) Had I known ahead of time that Smullyan had treated the exact same topic with the exact same example, I doubt I would have written this post at all.
But why is this so surprising?I don't know.
Final notes16 people correctly said that the treasure was in the green box. This has to be counted as a lucky guess, unacceptable as a solution to a logic puzzle.One respondent referred me to a similar post on lesswrong. I did warn you all that the puzzle was annoying. I started writing this post in October 2007, and then it sat on the shelf until I got around to finding and photographing the boxes. A triumph of procrastination! [ Addendum 20150911: Steven Mazie has written a blog article about this topic, A Logic Puzzle That Teaches a Life Lesson. ] [Other articles in category /math/logic] permanent link Wed, 01 Jul 2015
The annoying boxes puzzle
There are two boxes on a table, one red and one green. One contains a treasure. The red box is labelled "exactly one of the labels is true". The green box is labelled "the treasure is in this box."Starting on 2015-07-03, the solution will be here.
[Other articles in category /math/logic] permanent link Fri, 19 Jun 2015A lot of the stuff I've written in the past couple of years has been on math.StackExchange. Some of it is pretty mundane, but some is interesting. My summary of April's interesting posts was well-received, so here are the noteworthy posts I made in May 2015.
[ Addendum 20150619: A previous version of this article included the delightful typo “mathemativicians”. ] [Other articles in category /math/se] permanent link Sun, 14 Jun 2015[ This page originally held the report for April 2015, which has moved. It now contains the report for June 2015. ]
[Other articles in category /math/se] permanent link Fri, 20 Mar 2015
Rectangles with equal area and perimeter
Wednesday while my 10-year-old daughter Katara was doing her math homework, she observed with pleasure that a !!6×3!! rectangle has a perimeter of 18 units and also an area of 18 square units. I mentioned that there was an infinite family of such rectangles, and, after a small amount of tinkering, that the only other such rectangle with integer sides is a !!4×4!! square, so in a sense she had found the single interesting example. She was very interested in how I knew this, and I promised to show her how to figure it out once she finished her homework. She didn't finish before bedtime, so we came back to it the following evening. This is just one of many examples of how she has way too much homework, and how it interferes with her education. She had already remarked that she knew how to write an equation expressing the condition she wanted, so I asked her to do that; she wrote $$(L×W) = ([L+W]×2).$$ I remember being her age and using all different shapes of parentheses too. I suggested that she should solve the equation for !!W!!, getting !!W!! on one side and a bunch of stuff involving !!L!! on the other, but she wasn't sure how to do it, so I offered suggestions while she moved the symbols around, eventually obtaining $$W = 2L\div (L-2).$$ I would have written it as a fraction, but getting the right answer is important, and using the same notation I would use is much less so, so I didn't say anything. I asked her to plug in !!L=3!! and observe that !!W=6!! popped right out, and then similarly that !!L=6!! yields !!W=3!!, and then I asked her to try the other example she knew. Then I suggested that she see what !!L=5!! did: it gives !!W=\frac{10}3!!, This was new, so she checked it by calculating the area and the perimeter, both !!\frac{50}3!!. She was very excited by this time. As I have mentioned earlier, algebra is magical in its ability to mechanically yield answers to all sorts of questions. Even after thirty years I find it astonishing and delightful. You set up the equations, push the symbols around, and all sorts of stuff pops out like magic. Calculus is somehow much less astonishing; the machinery is all explicit. But how does algebra work? I've been thinking about this on and off for a long time and I'm still not sure. At that point I took over because I didn't think I would be able to guide her through the next part of the problem without a demonstration; I wanted to graph the function !!W=2L\div(L-2)!! and she does not have much experience with that. She put in the five points we already knew, which already lie on a nice little curve, and then she asked an incisive question: does it level off, or does it keep going down, or what? We discussed what happens when !!L!! gets close to 2; then !!W!! shoots up to infinity. And when !!L!! gets big, say a million, you can see from the algebra that !!W!! is a hair more than 2. So I drew in the asymptotes on the hyperbola. Katara is not yet familiar with hyperbolas. (She has known about parabolas since she was tiny. I have a very fond memory of visiting Portland with her when she was almost two, and we entered Holladay park, which has fountains that squirt out of the ground. Seeing the water arching up before her, she cried delightedly “parabolas!”) Once you know how the graph behaves, it is a simple matter to see that there are no integer solutions other than !!\langle 3,6\rangle, \langle 4,4\rangle,!! and !!\langle6,3\rangle!!. We know that !!L=5!! does not work. For !!L>6!! the value of !!W!! is always strictly between !!2!! and !!3!!. For !!L=2!! there is no value of !!W!! that works at all. For !!0\lt L\lt 2!! the formula says that !!W!! is negative, on the other branch of the hyperbola, which is a perfectly good numerical solution (for example, !!L=1, W=-2!!) but makes no sense as the width of a rectangle. So it was a good lesson about how mathematical modeling sometimes introduces solutions that are wrong, and how you have to translate the solutions back to the original problem to see if they make sense. [ Addendum 20150330: Thanks to Steve Hastings for his plot of the hyperbola, which is in the public domain. ] [Other articles in category /math] permanent link Thu, 19 Mar 2015
An ounce of theory is worth a pound of search
The computer is really awesome at doing quick searches for numbers with weird properties, and people with an amateur interest in recreational mathematics would do well to learn some simple programming. People appear on math.stackexchange quite often with questions about tic-tac-toe, but there are only 5,478 total positions, so any question you want to ask can be instantaneously answered by an exhaustive search. An amateur showed up last fall asking “Is it true that no prime larger than 241 can be made by either adding or subtracting 2 coprime numbers made up out of the prime factors 2,3, and 5?” and, once you dig through the jargon, the question is easily answered by the computer, which quickly finds many counterexamples, such as !!162+625=787!! and !!2^{19}+3^4=524369!!. But sometimes the search appears too large to be practical, and then you need to apply theory. Sometimes you can deploy a lot of theory and solve the problem completely, avoiding the search. But theory is expensive, and not always available. A hybrid approach often works, which uses a tiny amount of theory to restrict the search space to the point where the search is easy. One of these I wrote up on this blog back in 2006:
The programmer who gave me thie problem had tried a brute-force search over all numbers, but to find all 10-digit excellent numbers, this required an infeasible search of 9,000,000,000 candidates. With the application of a tiny amount of algebra, one finds that !!a(10^k+a) = b^2+b!! and it's not hard to quickly test candidates for !!a!! to see if !!a(10^k+a)!! has this form and if so to find the corresponding value of !!b!!. (Details are in the other post.) This reduces the search space for 10-digit excellent numbers from 9,000,000,000 candidates to 90,000, which could be done in under a minute even with last-century technology, and is pretty nearly instantaneous on modern equipment. But anyway, the real point of this note is to discuss a different problem entirely. A recreational mathematician on stackexchange wanted to find distinct integers !!a,b,c,d!! for which !!a^2+b^2, b^2+c^2, c^2+d^2, !! and !!d^2+a^2!! were all perfect squares. You can search over all possible quadruples of numbers, but this takes a long time. The querent indicated later that he had tried such a search but lost patience before it yielded anything. Instead, observe that if !!a^2+b^2!! is a perfect square then !!a!! and !!b!! are the legs of a right triangle with integer sides; they are terms in what is known as a Pythagorean triple. The prototypical example is !!3^2 + 4^2 = 5^2!!, and !!\langle 3,4,5\rangle!! is the Pythagorean triple. (The querent was quite aware that he was asking for Pythagorean triples, and mentioned them specifically.) Here's the key point: It has been known since ancient times that if !!\langle a,b,c\rangle!! is a Pythagorean triple, then there exist integers !!m!! and !!n!! such that: $$\begin{align} \require{align} a & = n^2-m^2 \\ b & = 2mn \\ c & = n^2 + m^2 \end{align}$$ So you don't have to search for Pythagorean triples; you can just generate them with no searching:
This builds a hash table,
The table has only around 40,000 entries. Having constructed it, we now search it:
The outer loop runs over each !!a!! that is known to be a member of a Pythagorean triple. (Actually the !!m,n!! formulas show that every number bigger than 2 is a member of some triple, but we may as well skip the ones that are only in triples we didn't tabulate.) Then the next loop runs over every !!b!! that can possibly form a triple with !!a!!; that is, every !!b!! for which !!a^2+b^2!! is a perfect square. We don't have to search for them; we have them tabulated ahead of time. Then for each such !!b!! (and there aren't very many) we run over every !!c!! that forms a triple with !!b!!, and again there is no searching and very few candidates. Then then similarly !!d!!, and if the !!d!! we try forms a triple with !!a!!, we have a winner. The This runs in less than a second on so-so hardware and produces 11 solutions:
Only five of these are really different. For example, the last one is the same as the second, with every element multiplied by 2; the third, seventh, and eighth are similarly the same. In general if !!\langle a,b,c,d\rangle!! is a solution, so is !!\langle ka, kb,kc,kd\rangle!! for any !!k!!. A slightly improved version would require that the four numbers not have any common factor greater than 1; there are few enough solutions that the cost of this test would be completely negligible. The only other thing wrong with the program is that it produces each solution 8 times; if !!\langle a,b,c,d\rangle!! is a solution, then so are !!\langle b,c,d,a\rangle, \langle d,c,b,a\rangle,!! and so on. This is easily fixed with a little post-filtering; pipe the output through
or something of that sort. The corresponding run with !!m!! and !!n!! up to 2,000 instead of only 200 takes 5 minutes and finds 445 solutions, of which 101 are distinct, including !!\langle 3614220, 618192, 2080820, 574461\rangle!!. It would take a very long time to find this with a naïve search. [ For a much larger and more complex example of the same sort of thing, see When do !!n!! and !!2n!! have the same digits?. I took a seemingly-intractable problem and analyzed it mathematically. I used considerably more than an ounce of theory in this case, and while the theory was not enough to solve the problem, it was enough to reduce the pool of candates to the point that a computer search was feasible. ] [ Addendum 20150728: Another example ] [Other articles in category /math] permanent link Sat, 22 Nov 2014
Within this instrument, resides the Universe
When opportunity permits, I have been trying to teach my ten-year-old daughter Katara rudiments of algebra and group theory. Last night I posed this problem:
I have tried to teach Katara that these problems have several phases. In the first phase you translate the problem into algebra, and then in the second phase you manipulate the symbols, almost mechanically, until the answer pops out as if by magic. There is a third phase, which is pedagogically and practically essential. This is to check that the solution is correct by translating the results back to the context of the original problem. It's surprising how often teachers neglect this step; it is as if a magician who had made a rabbit vanish from behind a screen then forgot to take away the screen to show the audience that the rabbit had vanished. Katara set up the equations, not as I would have done, but using four unknowns, to represent the two ages today and the two ages in the future: $$\begin{align} MT & = 3ST \\ MY & = 2SY \\ \end{align} $$ (!!MT!! here is the name of a single variable, not a product of !!M!! and !!T!!; the others should be understood similarly.) “Good so far,” I said, “but you have four unknowns and only two equations. You need to find two more relationships between the unknowns.” She thought a bit and then wrote down the other two relations: $$\begin{align} MY & = MT + 2 \\ SY & = ST + 2 \end{align} $$ I would have written two equations in two unknowns: $$\begin{align} M_T & = 3S_T\\ M_T+2 & = 2(S_T + 2) \end{align} $$ but one of the best things about mathematics is that there are many ways to solve each problem, and no method is privileged above any other except perhaps for reasons of practicality. Katara's translation is different from what I would have done, and it requires more work in phase 2, but it is correct, and I am not going to tell her to do it my way. The method works both ways; this is one of its best features. If the problem can be solved by thinking of it as a problem in two unknowns, then it can also be solved by thinking of it as a problem in four or in eleven unknowns. You need to find more relationships, but they must exist and they can be found. Katara may eventually want to learn a technically easier way to do it, but to teach that right now would be what programmers call a premature optimization. If her formulation of the problem requires more symbol manipulation than what I would have done, that is all right; she needs practice manipulating the symbols anyway. She went ahead with the manipulations, reducing the system of four equations to three, then two and then one, solving the one equation to find the value of the single remaining unknown, and then substituting that value back to find the other unknowns. One nice thing about these simple problems is that when the solution is correct you can see it at a glance: Mary is six years old and Sue is two, and in two years they will be eight and four. Katara loves picking values for the unknowns ahead of time, writing down a random set of relations among those values, and then working the method and seeing the correct answer pop out. I remember being endlessly delighted by almost the same thing when I was a little older than her. In The Dying Earth Jack Vance writes of a wizard who travels to an alternate universe to learn from the master “the secret of renewed youth, many spells of the ancients, and a strange abstract lore that Pandelume termed ‘Mathematics.’”
After Katara had solved this problem, I asked if she was game for something a little weird, and she said she was, so I asked her:
“WHAAAAAT?” she said. She has a good number sense, and immediately saw that this was a strange set of conditions. (If they aren't the same age now, how can they be the same age in two years?) She asked me what would happen. I said (truthfully) that I wasn't sure, and suggested she work through it to find out. So she set up the equations as before and worked out the solution, which is obvious once you see it: Both girls are zero years old today, and zero is three times as old as zero. Katara was thrilled and delighted, and shared her discovery with her mother and her aunt. There are some powerful lessons here. One is that the method works even when the conditions seem to make no sense; often the results pop out just the same, and can sometimes make sense of problems that seem ill-posed or impossible. Once you have set up the equations, you can just push the symbols around and the answer will emerge, like a familiar building approached through a fog. But another lesson, only hinted at so far, is that mathematics has its own way of understanding things, and this is not always the way that humans understand them. Goethe famously said that whatever you say to mathematicians, they immediately translate it into their own language and then it is something different; I think this is exactly what he meant. In this case it is not too much of a stretch to agree that Mary is three times as old as Sue when they are both zero years old. But in the future I plan to give Katara a problem that requires Mary and Sue to have negative ages—say that Mary is twice as old as Sue today, but in three years Sue will be twice as old—to demonstrate that the answer that pops out may not be a reasonable one, or that the original translation into mathematics can lose essential features of the original problem. The solution that says that !!M_T=-2, S_T=-1 !! is mathematically irreproachable, and if the original problem had been posed as “Find two numbers such that…” it would be perfectly correct. But translated back to the original context of a problem that asks about the ages of two sisters, the solution is unacceptable. This is the point of the joke about the spherical cow. [Other articles in category /math] permanent link Wed, 23 Jul 2014
When do n and 2n have the same digits?
[This article was published last month on the math.stackexchange blog, which seems to have died young, despite many earnest-sounding promises beforehand from people who claimed they would contribute material. I am repatriating it here.] A recent question on math.stackexchange asks for the smallest positive integer !!A!! for which the number !!2A!! has the same decimal digits in some other order. Math geeks may immediately realize that !!142857!! has this property, because it is the first 6 digits of the decimal expansion of !!\frac 17!!, and the cyclic behavior of the decimal expansion of !!\frac n7!! is well-known. But is this the minimal solution? It is not. Brute-force enumeration of the solutions quickly reveals that there are 12 solutions of 6 digits each, all permutations of !!142857!!, and that larger solutions, such as 1025874 and 1257489 seem to follow a similar pattern. What is happening here? Stuck in Dallas-Fort Worth airport one weekend, I did some work on the problem, and although I wasn't able to solve it completely, I made significant progress. I found a method that allows one to hand-calculate that there is no solution with fewer than six digits, and to enumerate all the solutions with 6 digits, including the minimal one. I found an explanation for the surprising behavior that solutions tend to be permutations of one another. The short form of the explanation is that there are fairly strict conditions on which sets of digits can appear in a solution of the problem. But once the set of digits is chosen, the conditions on that order of the digits in the solution are fairly lax. So one typically sees, not only in base 10 but in other bases, that the solutions to this problem fall into a few classes that are all permutations of one another; this is exactly what happens in base 10 where all the 6-digit solutions are permutations of !!124578!!. As the number of digits is allowed to increase, the strict first set of conditions relaxes a little, and other digit groups appear as solutions. NotationThe property of interest, !!P_R(A)!!, is that the numbers !!A!! and !!B=2A!! have exactly the same base-!!R!! digits. We would like to find numbers !!A!! having property !!P_R!! for various !!R!!, and we are most interested in !!R=10!!. Suppose !!A!! is an !!n!!-digit numeral having property !!P_R!!; let the (base-!!R!!) digits of !!A!! be !!a_{n-1}\ldots a_1a_0!! and similarly the digits of !!B = 2A!! are !!b_{n-1}\ldots b_1b_0!!. The reader is encouraged to keep in mind the simple example of !!R=8, n=4, A=\mathtt{1042}, B=\mathtt{2104}!! which we will bring up from time to time. Since the digits of !!B!! and !!A!! are the same, in a different order, we may say that !!b_i = a_{P(i)}!! for some permutation !!P!!. In general !!P!! might have more than one cycle, but we will suppose that !!P!! is a single cycle. All the following discussion of !!P!! will apply to the individual cycles of !!P!! in the case that !!P!! is a product of two or more cycles. For our example of !!a=\mathtt{1042}, b=\mathtt{2104}!!, we have !!P = (0\,1\,2\,3)!! in cycle notation. We won't need to worry about the details of !!P!!, except to note that !!i, P(i), P(P(i)), \ldots, P^{n-1}(i)!! completely exhaust the indices !!0. \ldots n-1!!, and that !!P^n(i) = i!! because !!P!! is an !!n!!-cycle. Conditions on the set of digits in a solutionFor each !!i!! we have $$a_{P(i)} = b_{i} \equiv 2a_{i} + c_i\pmod R $$ where the ‘carry bit’ !!c_i!! is either 0 or 1 and depends on whether there was a carry when doubling !!a_{i-1}!!. (When !!i=0!! we are in the rightmost position and there is never a carry, so !!c_0= 0!!.) We can then write: $$\begin{align} a_{P(P(i))} &= 2a_{P(i)} + c_{P(i)} \\ &= 2(2a_{i} + c_i) + c_{P(i)} &&= 4a_i + 2c_i + c_{P(i)}\\ a_{P(P(P(i)))} &= 2(4a_i + 2c_i + c_{P(P(i)})) + c_{P(i)} &&= 8a_i + 4c_i + 2c_{P(i)} + c_{P(P(i))}\\ &&&\vdots\\ a_{P^n(i)} &&&= 2^na_i + v \end{align} $$ all equations taken !!\bmod R!!. But since !!P!! is an !!n!!-cycle, !!P^n(i) = i!!, so we have $$a_i \equiv 2^na_i + v\pmod R$$ or equivalently $$\big(2^n-1\big)a_i + v \equiv 0\pmod R\tag{$\star$}$$ where !!v\in\{0,\ldots 2^n-1\}!! depends only on the values of the carry bits !!c_i!!—the !!c_i!! are precisely the binary digits of !!v!!. Specifying a particular value of !!a_0!! and !!v!! that satisfy this equation completely determines all the !!a_i!!. For example, !!a_0 = 2, v = \color{darkblue}{0010}_2 = 2!! is a solution when !!R=8, n=4!! because !!\bigl(2^4-1\bigr)\cdot2 + 2\equiv 0\pmod 8!!, and this solution allows us to compute $$\def\db#1{\color{darkblue}{#1}}\begin{align} a_0&&&=2\\ a_{P(0)} &= 2a_0 &+ \db0 &= 4\\ a_{P^2(0)} &= 2a_{P(0)} &+ \db0 &= 0 \\ a_{P^3(0)} &= 2a_{P^2(0)} &+ \db1 &= 1\\ \hline a_{P^4(0)} &= 2a_{P^3(0)} &+ \db0 &= 2\\ \end{align}$$ where the carry bits !!c_i = \langle 0,0,1,0\rangle!! are visible in the third column, and all the sums are taken !!\pmod 8!!. Note that !!a_{P^n(0)} = a_0!! as promised. This derivation of the entire set of !!a_i!! from a single one plus a choice of !!v!! is crucial, so let's see one more example. Let's consider !!R=10, n=3!!. Then we want to choose !!a_0!! and !!v!! so that !!\left(2^3-1\right)a_0 + v \equiv 0\pmod{10}!! where !!v\in\{0\ldots 7\}!!. One possible solution is !!a_0 = 5, v=\color{darkblue}{101}_2 = 5!!. Then we can derive the other !!a_i!! as follows: $$\begin{align} a_0&&&=5\\ a_{P(0)} &= 2a_0 &+ \db1 &= 1\\ a_{P^2(0)} &= 2a_{P(0)} &+ \db0 &= 2 \\\hline a_{P^3(0)} &= 2a_{P^2(0)} &+ \db1 &= 5\\ \end{align}$$ And again we have !!a_{P^n(0)}= a_0!! as required. Since the bits of !!v!! are used cyclically, not every pair of !!\langle a_0, v\rangle!! will yield a different solution. Rotating the bits of !!v!! and pairing them with different choices of !!a_0!! will yield the same cycle of digits starting from a different place. In the first example above, we had !!a_0 = 2, v = 0010_2 = 2!!. If we were to take !!a_0 = 4, v = 0100_2 = 4!! (which also solves !!(\star)!!) we would get the same cycle of values of the !! a\_i !! but starting from !!4!! instead of from !!2!!, and similarly if we take !!a_0=0, v = 1000_2 = 8!! or !!a_0 = 1, v = 0001_2!!. So we can narrow down the solution set of !!(\star)!! by considering only the so-called bracelets of !!v!! rather than all !!2^n!! possible values. Two values of !!v!! are considered equivalent as bracelets if one is a rotation of the other. When a set of !!v!!-values are equivalent as bracelets, we need only consider one of them; the others will give the same cyclic sequence of digits, but starting in a different place. For !!n=4!!, for example, the bracelets are !!0000, 0001, 0011, 0101, 0111, !! and !!1111!!; the sequences !!0110, 1100,!! and !!1001!! being equivalent to !!0011!!, and so on. ExampleLet us take !!R=9, n=3!!, so we want to find 3-digit numerals with property !!P_9!!. According to !!(\star)!! we need !!7a_i + v \equiv 0\pmod{9}!! where !!v\in\{0\ldots 7\}!!. There are 9 possible values for !!a_i!!; for each one there is at most one possible value of !!v!! that makes the sum zero: $$\begin{array}{rrr} a_i & 7a_i & v \\ \hline 0 & 0 & 0 \\ 1 & 7 & 2 \\ 2 & 14 & 4 \\ 3 & 21 & 6 \\ 4 & 28 & \\ 5 & 35 & 1 \\ 6 & 42 & 3 \\ 7 & 49 & 5 \\ 8 & 56 & 7 \\ \end{array} $$ (For !!a_i=4!! there is no solution.) We may disregard the non-bracelet values of !!v!!, as these will give us solutions that are the same as those given by bracelet values of !!v!!. The bracelets are: $$\begin{array}{rl} 000 & 0 \\ 001 & 1 \\ 011 & 3 \\ 111 & 7 \end{array}$$ so we may disregard the solutions exacpt when !!v=0,1,3,7!!. Calculating the digit sequences from these four values of !!v!! and the corresponding !!a_i!! we find: $$\begin{array}{ccl} a_0 & v & \text{digits} \\ \hline 0 & 0 & 000 \\ 5 & 1 & 512 \\ 6 & 3 & 637 \\ 8 & 7 & 888 \ \end{array} $$ (In the second line, for example, we have !!v=1 = 001_2!!, so !!1 = 2\cdot 5 + 0; 2 = 1\cdot 2 + 0;!! and !!5 = 2\cdot 2 + 1!!.) Any number !!A!! of three digits, for which !!2A!! contains exactly the same three digits, in base 9, must therefore consist of exactly the digits !!000, 125, 367,!! or !!888!!. A warningAll the foregoing assumes that the permutation !!P!! is a single cycle. In general, it may not be. Suppose we did an analysis like that above for !!R=10, n=5!! and found that there was no possible digit set, other than the trivial set Something like this occurs, for example, in the !!n=4, R=8!! case. Solving the governing equation !!(2^5-1)a_0 + v \equiv 0\pmod 8!! yields only four possible digit cycles, namely !!\{0,1,2,4\}, \{1,3,6,4\}, \{2,5,2,5\}!!, and !!\{3,7,6,5\}!!. But there are several additional solutions: !!2500_8\cdot 2 = 5200_8, 2750_8\cdot 2 = 5720_8, !! and !!2775_8\cdot 2 = 5772_8!!. These correspond to permutations !!P!! with more than one cycle. In the case of !!2750_8!!, for example, !!P!! exchanges the !!5!! and the !!2!!, and leaves the !!0!! and the !!7!! fixed. For this reason we cannot rule out the possibility of an !!n!!-digit solution without first considering all smaller !!n!!. The Large Equals Odd ruleWhen !!R!! is even there is a simple condition we can use to rule out certain sets of digits from being single-cycle solutions. Recall that !!A=a_{n-1}\ldots a_0!! and !!B=b_{n-1}\ldots b_0!!. Let us agree that a digit !!d!! is large if !!d\ge \frac R2!! and small otherwise. That is, !!d!! is large if, upon doubling, it causes a carry into the next column to the left. Since !!b_i =(2a_i + c_i)\bmod R!!, where the !!c_i!! are carry bits, we see that, except for !!b_0!!, the digit !!b_i!! is odd precisely when there is a carry from the next column to the right, which occurs precisely when !!a_{i-1}!! is large. Thus the number of odd digits among !!b_1,\ldots b_{n-1}!! is equal to the number of large digits among !!a_1,\ldots a_{n-2}!!. This leaves the digits !!b_0!! and !!a_{n-1}!! uncounted. But !!b_0!! is never odd, since there is never a carry in the rightmost position, and !!a_{n-1}!! is always small (since otherwise !!B!! would have !!n+1!! digits, which is not allowed). So the number of large digits in !!A!! is exactly equal to the number of odd digits in !!B!!. And since !!A!! and !!B!! have exactly the same digits, the number of large digits in !!A!! is equal to the number of odd digits in !!A!!. Observe that this is the case for our running example !!1042_8!!: there is one odd digit and one large digit (the 4). When !!R!! is odd the analogous condition is somewhat more complicated, but since the main case of interest is !!R=10!!, we have the useful rule that: For !!R!! even, the number of odd digits in any solution !!A!! is equal to the number of large digits in !!A!!. Conditions on the order of digits in a solutionWe have determined, using the above method, that the digits !!\{5,1,2\}!! might form a base-9 numeral with property !!P_9!!. Now we would like to arrange them into a base-9 numeral that actually does have that property. Again let us write !!A = a_2a_1a_0!! and !!B=b_2b_1b_0!!, with !!B=2A!!. Note that if !!a_i = 1!!, then !!b_i = 3!! (if there was a carry from the next column to the right) or !!2!! (if there was no carry), but since !!b_i=3!! is impossible, we must have !!a_i = 2!! and therefore !!a_{i-1}!! must be small, since there is no carry into position !!i!!. But since !!a_{i-1}!! is also one of !!\{5,1,2\}!!, and it cannot also be !!1!!, it must be !!2!!. This shows that the 1, unless it appears in the rightmost position, must be to the left of the !!2!!; it cannot be to the left of the !!5!!. Similarly, if !!a_i = 2!! then !!b_i = 5!!, because !!4!! is impossible, so the !!2!! must be to the left of a large digit, which must be the !!5!!. Similar reasoning produces no constraint on the position of the !!5!!; it could be to the left of a small digit (in which case it doubles to !!1!!) or a large digit (in which case it doubles to !!2!!). We can summarize these findings as follows: $$\begin{array}{cl} \text{digit} & \text{to the left of} \\ \hline 1 & 1, 2, \text{end} \\ 2 & 5 \\ 5 & 1,2,5,\text{end} \end{array}$$ Here “end” means that the indicated digit could be the rightmost. Furthermore, the left digit of !!A!! must be small (or else there would be a carry in the leftmost place and !!2A!! would have 4 digits instead of 3) so it must be either 1 or 2. It is not hard to see from this table that the digits must be in the order !!125!! or !!251!!, and indeed, both of those numbers have the required property: !!125_9\cdot 2 = 251_9!!, and !!251_9\cdot 2 = 512_9!!. This was a simple example, but in more complicated cases it is helpful to draw the order constraints as a graph. Suppose we draw a graph with one vertex for each digit, and one additional vertex to represent the end of the numeral. The graph has an edge from vertex !!v!! to !!v'!! whenever !!v!! can appear to the left of !!v'!!. Then the graph drawn for the table above looks like this: A 3-digit numeral with property !!P_9!! corresponds to a path in this graph that starts at one of the nonzero small digits (marked in blue), ends at the red node marked ‘end’, and visits each node exactly once. Such a path is called hamiltonian. Obviously, self-loops never occur in a hamiltonian path, so we will omit them from future diagrams. Now we will consider the digit set !!637!!, again base 9. An analysis similar to the foregoing allows us to construct the following graph: Here it is immediately clear that the only hamiltonian path is !!3-7-6-\text{end}!!, and indeed, !!376_9\cdot 2 = 763_9!!. In general there might be multiple instances of a digit, and so multiple nodes labeled with that digit. Analysis of the !!0,0,0!! case produces a graph with no legal start nodes and so no solutions, unless leading zeroes are allowed, in which case !!000!! is a perfectly valid solution. Analysis of the !!8,8,8!! case produces a graph with no path to the end node and so no solutions. These two trivial patterns appear for all !!R!! and all !!n!!, and we will ignore them from now on. Returning to our ongoing example, !!1042!! in base 8, we see that !!1!! and !!2!! must double to !!2!! and !!4!!, so must be to the left of small digits, but !!4!! and !!0!! can double to either !!0!! or !!1!! and so could be to the left of anything. Here the constraints are so lax that the graph doesn't help us narrow them down much: Observing that the only arrow into the 4 is from 0, so that the 4 must follow the 0, and that the entire number must begin with 1 or 2, we can enumerate the solutions: 1042 1204 2041 2104 If leading zeroes are allowed we have also: 0412 0421 All of these are solutions in base 8. The case of !!R=10!!Now we turn to our main problem, solutions in base 10. To find all the solutions of length 6 requires an enumeration of smaller solutions, which, if they existed, might be concatenated into a solution of length 6. This is because our analysis of the digit sets that can appear in a solution assumes that the digits are permuted cyclically; that is, the permutations !!P!! that we considered had only one cycle each. There are no smaller solutions, but to prove that the length 6 solutions are minimal, we must analyze the cases for smaller !!n!! and rule them out. We now produce a complete analysis of the base 10 case with !!R=10!! and !!n\le 6!!. For !!n=1!! there is only the trivial solution of !!0!!, which we disregard. (The question asked for a positive number anyway.) !!n=2!!For !!n=2!!, we want to find solutions of !!3a_i + v \equiv 0\pmod{10}!! where !!v!! is a two-bit bracelet number, one of !!00_2, 01_2, !! or !!11_2!!. Tabulating the values of !!a_i!! and !!v\in\{0,1,3\}!! that solve this equation we get: $$\begin{array}{ccc} v& a_i \\ \hline 0 & 0 \\ 1& 3 \\ 3& 9 \\ \end{array}$$ We can disregard the !!v=0!! and !!v=3!! solutions because the former yields the trivial solution !!00!! and the latter yields the nonsolution !!99!!. So the only possibility we need to investigate further is !!a_i = 3, v = 1!!, which corresponds to the digit sequence !!36!!: Doubling !!3!! gives us !!6!! and doubling !!6!!, plus a carry, gives us !!3!! again. But when we tabulate of which digits must be left of which informs us that there is no solution with just !!3!! and !!6!!, because the graph we get, once self-loops are eliminated, looks like this: which obviously has no hamiltonian path. Thus there is no solution for !!R=10, n=2!!. !!n=3!!For !!n=3!! we need to solve the equation !!7a_i + v \equiv 0\pmod{10}!! where !!v!! is a bracelet number in !!\{0,\ldots 7\}!!, specifically one of !!0,1,3,!! or !!7!!. Since !!7!! and !!10!! are relatively prime, for each !!v!! there is a single !!a_i!! that solves the equation. Tabulating the possible values of !!a_i!! as before, and this time omitting rows with no solution, we have: $$\begin{array}{rrl} v & a_i & \text{digits}\\ \hline 0& 0 & 000\\ 1& 7 & 748 \\ 3& 1 & 125\\ 7&9 & 999\\ \end{array}$$ The digit sequences !!0,0,0!! and !!9,9,9!! yield trivial solutions or nonsolutions as usual, and we will omit them in the future. The other two lines suggest the digit sets !!1,2,5!! and !!4,7,8!!, both of which fails the “odd equals large” rule. This analysis rules out the possibility of a digit set with !!a_0 \to a_1 \to a_2 \to a_1!!, but it does not completely rule out a 3-digit solution, since one could be obtained by concatenating a one-digit and a two-digit solution, or three one-digit solutions. However, we know by now that no one- or two-digit solutions exist. Therefore there are no 3-digit solutions in base 10. !!n=4!!For !!n=4!! the governing equation is !!15a_i + v \equiv 0\pmod{10}!! where !!v!! is a 4-bit bracelet number, one of !!\{0,1,3,5,7,15\}!!. This is a little more complicated because !!\gcd(15,10)\ne 1!!. Tabulating the possible digit sets, we get: $$\begin{array}{crrl} a_i & 15a_i& v & \text{digits}\\ \hline 0 & 0 & 0 & 0000\\ 1 & 5 & 5 & 1250\\ 1 & 5 & 15 & 1375\\ 2 & 0 & 0 & 2486\\ 3 & 5 & 5 & 3749\\ 3 & 5 & 15 & 3751\\ 4 & 0 & 0 & 4862\\ 5 & 5 & 5 & 5012\\ 5 & 5 & 5 & 5137\\ 6 & 0 & 0 & 6248\\ 7 & 5 & 5 & 7493\\ 7 & 5 & 5 & 7513\\ 8 & 0 & 0 & 8624 \\ 9 & 5 & 5 & 9874\\ 9 & 5 & 15 & 9999 \\ \end{array}$$ where the second column has been reduced mod !!10!!. Note that even restricting !!v!! to bracelet numbers the table still contains duplicate digit sequences; the 15 entries on the right contain only the six basic sequences !!0000, 0125, 1375, 2486, 3749, 4987!!, and !!9999!!. Of these, only !!0000, 9999,!! and !!3749!! obey the odd equals large criterion, and we will disregard !!0000!! and !!9999!! as usual, leaving only !!3749!!. We construct the corresponding graph for this digit set as follows: !!3!! must double to !!7!!, not !!6!!, so must be left of a large number !!7!! or !!9!!. Similarly !!4!! must be left of !!7!! or !!9!!. !!9!! must also double to !!9!!, so must be left of !!7!!. Finally, !!7!! must double to !!4!!, so must be left of !!3,4!! or the end of the numeral. The corresponding graph is: which evidently has no hamiltonian path: whichever of 3 or 4 we start at, we cannot visit the other without passing through 7, and then we cannot reach the end node without passing through 7 a second time. So there is no solution with !!R=10!! and !!n=4!!. !!n=5!!We leave this case as an exercise. There are 8 solutions to the governing equation, all of which are ruled out by the odd equals large rule. !!n=6!!For !!n=6!! the possible solutions are given by the governing equation !!63a_i + v \equiv 0\pmod{10}!! where !!v!! is a 6-bit bracelet number, one of !!\{0,1,3,5,7,9,11,13,15,21,23,27,31,63\}!!. Tabulating the possible digit sets, we get: $$\begin{array}{crrl} v & a_i & \text{digits}\\ \hline 0 & 0 & 000000\\ 1 & 3 & 362486 \\ 3 & 9 & 986249 \\ 5 & 5 & 500012 \\ 7 & 1 & 124875 \\ 9 & 7 & 748748 \\ 11 & 3 & 362501 \\ 13 & 9 & 986374 \\ 15 & 5 & 500137 \\ 21 & 3 & 363636 \\ 23 & 9 & 989899 \\ 27 & 1 & 125125 \\ 31 & 3 & 363751 \\ 63 & 9 & 999999 \\ \end{array}$$ After ignoring !!000000!! and !!999999!! as usual, the large equals odd rule allows us to ignore all the other sequences except !!124875!! and !!363636!!. The latter fails for the same reason that !!36!! did when !!n=2!!. But !!142857!! , the lone survivor, gives us a complicated derived graph containing many hamiltonian paths, every one of which is a solution to the problem: It is not hard to pick out from this graph the minimal solution !!125874!!, for which !!125874\cdot 2 = 251748!!, and also our old friend !!142857!! for which !!142857\cdot 2 = 285714!!. We see here the reason why all the small numbers with property !!P_{10}!! contain the digits !!124578!!. The constraints on which digits can appear in a solution are quite strict, and rule out all other sequences of six digits and all shorter sequences. But once a set of digits passes these stringent conditions, the constraints on it are much looser, because !!B!! is only required to have the digits of !!A!! in some order, and there are many possible orders, many of which will satisfy the rather loose conditions involving the distribution of the carry bits. This graph is typical: it has a set of small nodes and a set of large nodes, and each node is connected to either all the small nodes or all the large nodes, so that the graph has many edges, and, as in this case, a largish clique of small nodes and a largish clique of large nodes, and as a result many hamiltonian paths. OnwardThis analysis is tedious but is simple enough to perform by hand in under an hour. As !!n!! increases further, enumerating the solutions of the governing equation becomes very time-consuming. I wrote a simple computer program to perform the analysis for given !!R!! and !!n!!, and to emit the possible digit sets that satisfied the large equals odd criterion. I had wondered if every base-10 solution contained equal numbers of the digits !!1,2,4,8,5,!! and !!7!!. This is the case for !!n=7!! (where the only admissible digit set is !!\{1,2,4,5,7,8\}\cup\{9\}!!), for !!n=8!! (where the only admissible sets are !!\{1,2,4,5,7,8\}\cup \{3,6\}!! and !!\{1,2,4,5,7,8\}\cup\{9,9\}!!), and for !!n=9!! (where the only admissible sets are !!\{1,2,4,5,7,8\}\cup\{3,6,9\}!! and !!\{1,2,4,5,7,8\}\cup\{9,9,9\}!!). But when we reach !!n=10!! the increasing number of bracelets has loosened up the requirements a little and there are 5 admissible digit sets. I picked two of the promising-seeming ones and quickly found by hand the solutions !!4225561128!! and !!1577438874!!, both of which wreck any theory that the digits !!1,2,4,5,8,7!! must all appear the same number of times. AcknowledgmentsThanks to Karl Kronenfeld for corrections and many helpful suggestions. [Other articles in category /math] permanent link Sat, 01 Mar 2014Intuitionistic logic is deeply misunderstood by people who have not studied it closely; such people often seem to think that the intuitionists were just a bunch of lunatics who rejected the law of the excluded middle for no reason. One often hears that intuitionistic logic rejects proof by contradiction. This is only half true. It arises from a typically classical misunderstanding of intuitionistic logic. Intuitionists are perfectly happy to accept a reductio ad absurdum proof of the following form: $$(P\to \bot)\to \lnot P$$ Here !!\bot!! means an absurdity or a contradiction; !!P\to \bot!! means that assuming !!P!! leads to absurdity, and !!(P\to \bot)\to \lnot P!! means that if assuming !!P!! leads to absurdity, then you can conclude that !!P!! is false. This is a classic proof by contradiction, and it is intuitionistically valid. In fact, in many formulations of intuitionistic logic, !!\lnot P!! is defined to mean !!P\to \bot!!. What is rejected by intuitionistic logic is the similar-seeming claim that: $$(\lnot P\to \bot)\to P$$ This says that if assuming !!\lnot P!! leads to absurdity, you can conclude that !!P!! is true. This is not intuitionistically valid. This is where people become puzzled if they only know classical logic. “But those are the same thing!” they cry. “You just have to replace !!P!! with !!\lnot P!! in the first one, and you get the second.” Not quite. If you replace !!P!! with !!\lnot P!! in the first one, you do not get the second one; you get: $$(\lnot P\to \bot)\to \lnot \lnot P$$ People familiar with classical logic are so used to shuffling the !!\lnot !! signs around and treating !!\lnot \lnot P!! the same as !!P!! that they often don't notice when they are doing it. But in intuitionistic logic, !!P!! and !!\lnot \lnot P!! are not the same. !!\lnot \lnot P!! is weaker than !!P!!, in the sense that from !!P!! one can always conclude !!\lnot \lnot P!!, but not always vice versa. Intuitionistic logic is happy to agree that if !!\lnot P!! leads to absurdity, then !!\lnot \lnot P!!. But it does not agree that this is sufficient to conclude !!P!!. As is often the case, it may be helpful to try to understand intuitionistic logic as talking about provability instead of truth. In classical logic, !!P!! means that !!P!! is true and !!\lnot P!! means that !!P!! is false. If !!P!! is not false it is true, so !!\lnot \lnot P!! and !!P!! mean the same thing. But in intuitionistic logic !!P!! means that !!P!! is provable, and !!\lnot P!! means that !!P!! is not provable. !!\lnot \lnot P!! means that it is impossible to prove that !!P!! is not provable. If !!P!! is provable, it is certainly impossible to prove that !!P!! is not provable. So !!P!! implies !!\lnot \lnot P!!. But just because it is impossible to prove that there is no proof of !!P!! does not mean that !!P!! itself is provable, so !!\lnot \lnot P!! does not imply !!P!!. Similarly, $$(P\to \bot)\to \lnot P $$ means that if a proof of !!P!! would lead to absurdity, then we may conclude that there cannot be a proof of !!P!!. This is quite valid. But $$(\lnot P\to \bot)\to P$$ means that if assuming that a proof of !!P!! is impossible leads to absurdity, there must be a proof of !!P!!. But this itself isn't a proof of !!P!!, nor is it enough to prove !!P!!; it only shows that there is no proof that proofs of !!P!! are impossible. [ Addendum 20141124: This article by Andrej Bauer says much the same thing. ] [ Addendum 20170508: This article by Robert Harper is another in the same family. ] [Other articles in category /math] permanent link Sat, 04 Jan 2014There is a famous mistake of Augustin-Louis Cauchy, in which he is supposed to have "proved" a theorem that is false. I have seen this cited many times, often in very serious scholarly literature, and as often as not Cauchy's purported error is completely misunderstood, and replaced with a different and completely dumbass mistake that nobody could have made. The claim is often made that Cauchy's Course d'analyse of 1821 contains a "proof" of the following statement: a convergent sequence of continuous functions has a continuous limit. For example, the Wikipedia article on "uniform convergence" claims:
The nLab article on "Cauchy sum theorem" states:
Cauchy never claimed to have proved any such thing, and it beggars belief that Cauchy could have made such a claim, because the counterexamples are so many and so easily located. For example, the sequence !! f_n(x) = x^n!! on the interval !![-1,1]!! is a sequence of continuous functions that converges everywhere on !![0,1]!! to a discontinuous limit. You would have to be a mathematical ignoramus to miss this, and Cauchy wasn't. Another simple example, one that converges everywhere in !!\mathbb R!!, is any sequence of functions !!f_n!! that are everywhere zero, except that each has a (continuous) bump of height 1 between !!-\frac1n!! and !!\frac1n!!. As !!n\to\infty!!, the width of the bump narrows to zero, and the limit function !!f_\infty!! is everywhere zero except that !!f_\infty(0)=1!!. Anyone can think of this, and certainly Cauchy could have. A concrete example of this type is $$f_n(x) = e^{-x^{2}/n}$$ which converges to 0 everywhere except at !! x=0 !!, where it converges to 1. Cauchy's controversial theorem is not what Wikipedia or nLab claim. It is that that the pointwise limit of a convergent series of continuous functions is always continuous. Cauchy is not claiming that $$f_\infty(x) = \lim_{i\to\infty} f_i(x)$$ must be continuous if the limit exists and the !!f_i!! are continuous. Rather, he claims that $$S(x) = \sum_{i=1}^\infty f_i(x)$$ must be continuous if the sum converges and the !!f_i!! are continuous. This is a completely different claim. It premise, that the sum converges, is much stronger, and so the claim itself is much weaker, and so much more plausible. Here the counterexamples are not completely trivial. Probably the best-known counterexample is that a square wave (which has a jump discontinuity where the square part begins and ends) can be represented as a Fourier series. (Cauchy was aware of this too, but it was new mathematics in 1821. Lakatos and others have argued that the theorem, understood in the way that continuity was understood in 1821, is not actually erroneous, but that the idea of continuity has changed since then. One piece of evidence strongly pointing to this conclusion is that nobody complained about Cauchy's controversial theorem until 1847. But had Cauchy somehow, against all probability, mistakenly claimed that a sequence of continuous functions converges to a continuous limit, you can be sure that it would not have taken the rest of the mathematical world 26 years to think of the counterexample of !!x^n!!.) The confusion about Cauchy's controversial theorem arises from a perennially confusing piece of mathematical terminology: a convergent sequence is not at all the same as a convergent series. Cauchy claimed that a convergent series of continuous functions has a continuous limit. He did not ever claim that a convergent sequence of continuous functions had a continuous limit. But I have often encountered claims that he did that, even though such such claims are extremely implausible. The claim that Cauchy thought a sequence of continuous functions converges to a continuous limit is not only false but is manifestly so. Anyone making it has at best made a silly and careless error, and perhaps doesn't really understand what they are talking about, or hasn't thought about it. [ I had originally planned to write about this controversial theorem in my series of articles about major screwups in mathematics, but the longer and more closely I looked at it the less clear it was that Cauchy had actually made a mistake. ] [Other articles in category /math] permanent link Sat, 25 Aug 2012
On the consistency of PA
Li Fu said, "The axioms are consistent because they have a model."
[Other articles in category /math] permanent link Fri, 24 Aug 2012
More about ZF's asymmetry between union and intersection
Let's review those briefly. The relevant axioms concern the operations by which sets can be constructed. There are two that are important. First is the axiom of union, which says that if !!{\mathcal F}!! is a family of sets, then we can form !!\bigcup {\mathcal F}!!, which is the union of all the sets in the family. The other is actually a family of axioms, the specification axiom schema. It says that for any one-place predicate !!\phi(x)!! and any set !!X!! we can construct the subset of !!X!! for which !!\phi!! holds: $$\{ x\in X \;|\; \phi(x) \}$$ Both of these are required. The axiom of union is for making bigger sets out of smaller ones, and the specification schema is for extracting smaller sets from bigger ones. (Also important is the axiom of pairing, which says that if !!x!! and !!y!! are sets, then so is the two-element set !!\{x, y\}!!; with pairing and union we can construct all the finite sets. But we won't need it in this article.) Conspicuously absent is an axiom of intersection. If you have a family !!{\mathcal F}!! of sets, and you want a set of every element that is in some member of !!{\mathcal F}!!, that is easy; it is what the axiom of union gets you. But if you want a set of every element that is in every member of !!{\mathcal F}!!, you have to use specification. Let's begin by defining this compact notation: $$\bigcap_{(X)} {\mathcal F}$$ for this longer formula: $$\{ x\in X \;|\; \forall f\in {\mathcal F} . x\in f \}$$ This is our intersection of the members of !!{\mathcal F}!!, taken "relative to !!X!!", as we say in the biz. It gives us all the elements of !!X!! that are in every member of !!{\mathcal F}!!. The !!X!! is mandatory in !!\bigcap_{(X)}!!, because ZF makes it mandatory when you construct a set by specification. If you leave it out, you get the Russell paradox. Most of the time, though, the !!X!! is not very important. When !!{\mathcal F}!! is nonempty, we can choose some element !!f\in {\mathcal F}!!, and consider !!\bigcap_{(f)} {\mathcal F}!!, which is the "normal" intersection of !!{\mathcal F}!!. We can easily show that $$\bigcap_{(X)} {\mathcal F}\subseteq \bigcap_{(f)} {\mathcal F}$$ for any !!X!! whatever, and this immediately implies that $$\bigcap_{(f)} {\mathcal F} = \bigcap_{(f')}{\mathcal F}$$ for any two elements of !!{\mathcal F}!!, so when !!{\mathcal F}!! contains an element !!f!!, we can omit the subscript and just write $$\bigcap {\mathcal F}$$ for the usual intersection of members of !!{\mathcal F}!!. Even the usually troublesome case of an empty family !!{\mathcal F}!! is no problem. In this case we have no !!f!! to use for !!\bigcap_{(f)} {\mathcal F}!!, but we can still take some other set !!X!! and talk about !!\bigcap_{(X)} \emptyset!!, which is just !!X!!. Now, let's return to topology. I suggested that we should consider the following definition of a topology, in terms of closed sets, but without an a priori notion of the underlying space: A co-topology is a family !!{\mathcal F}!! of sets, called "closed" sets, such that:
It now immediately follows that !!U!! itself is a closed set, since it is the intersection !!\bigcap_{(U)} \emptyset!! of the empty subfamily of !!{\mathcal F}!!. If !!{\mathcal F}!! itself is empty, then so is !!U!!, and !!\bigcap_{(U)} {\mathcal F} = \emptyset!!, so that is all right. From here on we will assume that !!{\mathcal F}!! is nonempty, and therefore that !!\bigcap {\mathcal F}!!, with no relativization, is well-defined. We still cannot prove that the empty set is closed; indeed, it might not be, because even !!M = \bigcap {\mathcal F}!! might not be empty. But as David Turner pointed out to me in email, the elements of !!M!! play a role dual to the extratoplogical points of a topological space that has been defined in terms of open sets. There might be points that are not in any open set anywhere, but we may as well ignore them, because they are topologically featureless, and just consider the space to be the union of the open sets. Analogously and dually, we can ignore the points of !!M!!, which are topologically featureless in the same way. Rather than considering !!{\mathcal F}!!, we should consider !!{\widehat{\mathcal F}}!!, whose members are the members of !!{\mathcal F}!!, but with !!M!! subtracted from each one: $${\mathcal F}HAT = \{\hat{f}\in 2^U \;|\; \exists f\in {\mathcal F} . \hat{f} = f\setminus M \}$$ So we may as well assume that this has been done behind the scenes and so that !!\bigcap {\mathcal F}!! is empty. If we have done this, then the empty set is closed. Now we move on to open sets. An open set is defined to be the complement of a closed set, but we have to be a bit careful, because ZF does not have a global notion of the complement !!S^C!! of a set. Instead, it has only relative complements, or differences. !!X\setminus Y!! is defined as: $$X\setminus Y = \{ x\in X \;|\; x\notin Y\} $$ Here we say that the complement of !!Y!! is taken relative to !!X!!. For the definition of open sets, we will say that the complement is taken relative to the universe of discourse !!U!!, and a set !!G!! is open if it has the form !!U\setminus f!! for some closed set !!f!!. Anatoly Karp pointed out on Twitter that we know that the empty set is open, because it is the relative complement of !!U!!, which we already know is closed. And if we ensure that !!\bigcap {\mathcal F}!! is empty, as in the previous paragraph, then since the empty set is closed, !!U!! is open, and we have recovered all the original properties of a topology. But gosh, what a pain it was; in contrast recovering the missing axioms from the corresponding open-set definition of a topology was painless. (John Armstrong said it was bizarre, and probably several other people were thinking that too. But I did not invent this bizarre idea; I got it from the opening paragraph of John L. Kelley's famous book General Topology, which has been in print since 1955. Here Kelley deals with the empty set and the universe in two sentences, and never worries about them again. In contrast, doing the same thing for closed sets was fraught with technical difficulties, mostly arising from ZF. (The exception was the need to repair the nonemptiness of the minimal closed set !!M!!, which was not ZF's fault.)
On the other hand, perhaps this conclusion is knocking down a straw man. I think working mathematicians probably don't concern themselves much with whether their stuff works in ZF, much less with what silly contortions are required to make it work in ZF. I think day-to-day mathematical work, to the extent that it needs to deal with set theory at all, handles it in a fairly naïve way, depending on a sort of folk theory in which there is some reasonably but not absurdly big universe of discourse in which one can take complements and intersections, and without worrying about this sort of technical detail. [ MathJax doesn't work in Atom or RSS syndication feeds, and can't be made to work, so if you are reading a syndicated version of this article, such as you would in Google Reader, or on Planet Haskell or PhillyLinux, you are seeing inlined images provided by the Google Charts API. The MathJax looks much better, and if you would like to compare, please visit my blog's home site. ]
[Other articles in category /math] permanent link Wed, 22 Aug 2012
The non-duality of open and closed sets
We can define a topology without reference to the underlying space as follows: A family !!{\mathfrak I}!! of sets is a topology if it is closed under pairwise intersections and arbitrary unions, and we call a set "open" if it is an element of !!{\mathfrak I}!!. From this we can recover the omitted axiom that says that !!\emptyset!! is open: it must be in !!{\mathfrak I}!! because it is the empty union !!\bigcup_{g\in\emptyset} g!!. We can also recover the underlying space of the topology, or at least some such space, because it is the unique maximal open set !!X=\bigcup_{g\in{\mathfrak I}} g!!. The space !!X!! might be embedded in some larger space, but we won't ever have to care, because that larger space is topologically featureless. From a topological point of view, !!X!! is our universe of discourse. We can then say that a set !!C!! is "closed" whenever !!X\setminus C!! is open, and prove all the usual theorems. If we choose to work with closed sets instead, we run into problems. We can try starting out the same way: A family !!{\mathfrak I}!! of sets is a co-topology if it is closed under pairwise unions and arbitrary intersections, and we call a set "closed" if it is an element of !!{\mathfrak I}!!. But we can no longer prove that !!\emptyset\in{\mathfrak I}!!. We can still recover an underlying space !!X = \bigcup_{c\in{\mathfrak I}} c!!, but we cannot prove that !!X!! is closed, or identify any maximal closed set analogous to the maximal open set of the definition of the previous paragraph. We can construct a minimal closed set !!\bigcap_{c\in{\mathfrak I}} c!!, but we don't know anything useful about it, and in particular we don't know whether it is empty, whereas with the open-sets definition of a topology we can be sure that the empty set is the unique minimal open set. We can repair part of this asymmetry by changing the "pairwise unions" axiom to "finite unions"; then the empty set is closed because it is a finite union of closed sets. But we still can't recover any maximal closed set. Given a topology, it is easy to identify the unique maximal closed set, but given a co-topology, one can't, and indeed there may not be one. The same thing goes wrong if one tries to define a topology in terms of a Kuratowski closure operator. We might like to go on and say that complements of closed sets are open, but we can't, because we don't have a universe of discourse in which we can take complements. None of this may make very much difference in practice, since we usually do have an a priori idea of the universe of discourse, and so we do not care much whether we can define a topology without reference to any underlying space. But it is at least conceivable that we might want to abstract away the underlying space, and if we do, it appears that open and closed sets are not as exactly symmetric as I thought they were. Having thought about this some more, it seems to me that the ultimate source of the asymmetry here is in our model of set theory. The role of union and intersection in ZF is not as symmetric as one might like. There is an axiom of union, which asserts that the union of the members of some family of sets is again a set, but there is no corresponding axiom of intersection. To get the intersection of a family of sets !!\mathcal S!!, you use a specification axiom. Because of the way specification works, you cannot take an empty intersection, and there is no universal set. If topology were formulated in a set theory with a universal set, such as NF, I imagine the asymmetry would go away. [ This is my first blog post using MathJax, which I hope will completely replace the ad-hoc patchwork of systems I had been using to insert mathematics. Please email me if you encounter any bugs. ] [ Addendum 20120823: MathJax depends on executing Javascript, and so it won't render in an RSS or Atom feed or on any page where the blog content is syndicated. So my syndication feed is using the Google Charts service to render formulas for you. If the formulas look funny, try looking at http://blog.plover.com/ directly. ] [ Addendum 20120824: There is a followup to this article. ]
[Other articles in category /math] permanent link Tue, 10 Jan 2012
Elaborations of Russell's paradox
I wasn't sure she would get this, but it succeeded much better than I expected. After I prompted her to consider what color cover it would have, she thought it out, first ruling out one color, and then, when she got to the second color, she just started laughing. A couple of days ago she asked me if I could think of anything that was like that but with three different colors. Put on the spot, I suggested she consider what would happen if there could be green catalogs that might or might not include themselves. This is somewhat interesting, because you now can have a catalog of all the blue catalogs; it can have a green cover. But I soon thought of a much better extension. I gave it to Katara like this: say you have a catalog, let's call it X. If X mentions a catalog that mentions X, it has a gold stripe on the spine. Otherwise, it has a silver stripe. Now:
Translating this into barber language is left as an exercise for the reader.
[Other articles in category /math] permanent link Sat, 11 Jun 2011 At a book sale I recently picked up Terence Tao's little book on problem solving for 50¢. One of the exercises (pp. 85–86) is the following little charmer: There are six musicians who will play a series of concerts. At each concert, some of the musicians will be on stage and some will be in the audience. What is the fewest number of concerts that can be played to that each musician gets to see the each of the others play?Obviously, no more than six concerts are required. (I have a new contribution to the long-debated meaning of the mathematical jargon term "obviously": if my six-year-old daughter Katara could figure out the answer, so can you.) And an easy argument shows that four are necessary: let's say that when a musician views another, that is a "viewing event"; we need to arrange at least 5×6 = 30 viewing events. A concert that has p performers and 6-p in the audience arranges p(6 - p) events, which must be 5, 8, or 9. Three concerts yield no more than 27 events, which is insufficient. So there must be at least 4 concerts, and we may as well suppose that each concert has three musicians in the audience and three onstage, to maximize the number of events at 9·4 = 36. (It turns out there there is no solution otherwise, but that is a digression.) Each musician must attend at least 2 concerts, or else they would see only 3 other musicians onstage. But 6 musicians attending 2 concerts each takes up all 12 audience spots, so every musician is at exactly 2 concerts. Each musician thus sees exactly six musicians onstage, and since five of them must be different, one is a repeat, and the viewing event is wasted. We knew there would be some waste, since there are 36 viewing avents available and only 30 can be useful, but now we know that each spectator wastes exactly one event. A happy side effect of splitting the musicians evenly between the stage and the audience in every concert is that we can exploit the symmetry: if we have a solution to the problem, then we can obtain a dual solution by exchanging the performers and the audience in each concert. The conclusion of the previous paragraph is that in any solution, each spectator wastes exactly one event; the duality tells us that each performer is the subject of exactly one wasted event. Now suppose the same two musicians, say A and B, perform together twice. We know that some spectator must see A twice; this spectator sees B twice also, this wasting two events. But each spectator wastes only one event. So no two musicians can share the stage twice; each two musicians share the stage exactly once. By duality, each two spectators are in the same audience together exactly once. So we need to find four 3-sets of the elements { A, B, C, D, E, F }, with each element appearing in precisely two sets, and such that each two sets have exactly one element in common. Or equivalently, we need to find four triangles in K_{4}, none of which share an edge. The solution is not hard to find:
If you generalize these arguments to 2m musicians, you find that there is a lower bound of $$\left\lceil{4m^2 - 2m \over m^2 }\right\rceil$$ concerts, which is 4. And indeed, even with as few as 4 musicians, you still need four concerts. So it's tempting to wonder if 4 concerts is really sufficient for all even numbers of musicians. Consider 8 musicians, for example. You need 56 viewing events, but a concert with half the musicians onstage and half in the audience provides 16 events, so you might only need as few as 4 concerts to provide the necessary events. The geometric formulation is that you want to find four disjoint K_{4}s in a K_{4}; or alternatively, you want to find four 4-element subsets of { 1,2,3,4,5,6,7,8 }, such that each element appears in exactly two sets and no two elements are in the same. There seemed to be no immediately obvious reason that this wouldn't work, and I spent a while tinkering around looking for a way to do it and didn't find one. Eventually I did an exhaustive search and discovered that it was impossible. But the tinkering and the exhaustive search were a waste of time, because there is an obvious reason why it's impossible. As before, each musician must be in exactly two audiences, and can share audiences with each other musician at most once. But there are only 6 ways to be in two audiences, and 8 musicians, so some pair of musicians must be in precisely the same pair of audiences, this wastes too many viewing events, and so there's no solution. Whoops! It's easy to find solutions for 8 musicians with 5 concerts, though. There is plenty of room to maneuver and you can just write one down off the top of your head. For example:
[ Addendum: For n = 1…10 musicians, the least number of concerts required is 0, 2, 3, 4, 4, 4, 5, 5, 5, 5; beyond this, I only have bounds. ]
[Other articles in category /math] permanent link Mon, 15 Nov 2010
A draft of a short introduction to topology
CS grad students often have to take classes in category theory. These classes always want to use groups and topological spaces as examples, and my experience is that at this point many of the students shift uncomfortably in their seats since they have not had undergraduate classes in group theory, topology, analysis, or anything else relevant. But you do not have to know much topology to be able to appreciate the example, so I tried to write up the minimal amount necessary. Similarly, if you already understand intuitionistic logic, you do not need to know much topology to understand the way in which topological spaces are natural models for intuitionistic logic—but you do need to know more than zero. So a couple of years ago I wrote up a short introduction to topology for first-year computer science grad students and other people who similarly might like to know the absolute minimum, and only the absolute minimum, about topology. It came out somewhat longer than I expected, 11 pages, of which 6 are the introduction, and 5 are about typical applications to computer science. But it is a very light, fluffy 11 pages, and I am generally happy with it. I started writing this shortly after my second daughter was born, and I have not yet had a chance to finish it. It contains many errors. Many, many errors. For example, there is a section at the end about the compactness principle, which can only be taken as a sort of pseudomathematical lorem ipsum. This really is a draft; it is only three-quarters finished. But I do think it will serve a useful function once it is finished, and that finishing it will not take too long. If you have any interest in this project, I invite you to help. The current draft is version 0.6 of 2010-11-14. I do not want old erroneous versions wandering around confusing people in my name, so please do not distribute this draft after 2010-12-15. I hope to have an improved draft available here before that. Please do send me corrections, suggestions, questions, advice, patches, pull requests, or anything else.
[Other articles in category /math] permanent link Mon, 08 Nov 2010
Semi-boneless ham
The term is informal, however, and it's not clear just what it should mean in all cases. For example, consider the set S of 1/n for every positive integer n. Is this set semi-infinite? It is bounded in both directions, since it is contained in [0, 1]. But as you move left through the set, you ancounter an infinite number of elements, so it ought to be semi-infinite in the same sense that S ∪ { 1-x : x ∈ S } is fully-infinite. Whatever sense that is. Informal and ill-defined it may be, but the term is widely used; one can easily find mentions in the literature of semi-infinite paths, semi-infinite strips, semi-infinite intervals, semi-infinite cylinders, and even semi-infinite reservoirs and conductors. The term has spawned an offshoot, the even stranger-sounding "quarter-infinite". This seems to refer to a geometric object that is unbounded in the same way that a quarter-plane is unbounded, where "in the same way" is left rather vague. Consider the set (depicted at left) of all points of the plane for which 0 ≤ |y/x| ≤ √3, for example; is this set quarter-infinite, or only 1/6-infinite? Is the set of points (depicted at right) with xy > 1 and x, y > 0 quarter-infinite? I wouldn't want to say. But the canonical example is simple: the product of two semi-infinite intervals is a quarter-infinite set. I was going to say that I had never seen an instance of the obvious next step, the eighth-infinite solid, but in researching this article I did run into a few. I can't say it trips off the tongue, however. And if we admit that a half of a quarter-infinite plane segment is also eighth-infinite, we could be getting ourselves into trouble. (This all reminds me of the complaint of J.H. Conway of the increasing use of the term "biunique". Conway sarcastically asked if he should expect to see "triunique" and soforth, culminating in the idiotic "polyunique".) Sometimes "semi" really does mean exactly one-half, as in "semimajor axis" (the longest segment from the center of an ellipse to its boundary), "semicubic parabola" (determined by an equation with a term kx^{3/2}), or "semiperimeter" (half the perimeter of a triangle). But just as often, "semi" is one of the dazzling supply of mathematical pejoratives. ("Abnormal, irregular, improper, degenerate, inadmissible, and otherwise undesirable", says Kelley's General Topology.) A semigroup, for example, is not half of a group, but rather an algebraic structure that possesses less structure than a group. Similarly, one has semiregular polyhedra and semidirect products. I was planning to end with a note that mathematics has so far avoided the "demisemi-" prefix. But alas! Google found this 1971 paper on Demi-semi-primal algebras and Mal'cev-type conditions.
[Other articles in category /math] permanent link Mon, 14 Dec 2009
Another short explanation of Gödel's theorem
A while back I started writing up an article titled "World's shortest explanation of Gödel's theorem". But I didn't finish it... I went and had a look to see what was wrong with it, and to my surprise, there seemed to be hardly anything wrong with it. Perhaps I just forgot to post it. So if you disliked yesterday's brief explanation of Gödel's theorem—and many people did—you'll probably dislike this one even more. Enjoy!
A reader wrote to question my characterization of Gödel's theorem in the previous article. But I think I characterized it correctly; I said:
The only systems of mathematical axioms strong enough to prove all true statements of arithmetic, are those that are so strong that they also prove all the false statements of arithmetic.I'm going to explain how this works. You start by choosing some system of mathematics that has some machinery in it for making statements about things like numbers and for constructing proofs of theorems like 1+1=2. Many such systems exist. Let's call the one we have chosen M, for "mathematics". Gödel shows that if M has enough mathematical machinery in it to actually do arithmetic, then it is possible to produce a statement S whose meaning is essentially "Statement S cannot be proved in system M." It is not at all obvious that this is possible, or how it can be done, and I am not going to get into the details here. Gödel's contribution was seeing that it was possible to do this. So here's S again:
S: Statement S cannot be proved in system M. Now there are two possibilities. Either S is in fact provable in system M, or it is not. One of these must hold. If S is provable in system M, then it is false, and so it is a false statement that can be proved in system M. M therefore proves some false statements of arithmetic. If S is not provable in system M, then it is true, and so it is a true statement that cannot be proved in system M. M therefore fails to prove some true statements of arithmetic. So something goes wrong with M: either it fails to prove some true statements, or else it succeeds in proving some false statements. List of topics I deliberately omitted from this article, that mathematicians should not write to me about with corrections: Presburger arithmetic. Dialetheism. Inexhaustibility. ω-incompleteness. Non-RE sets of axioms.
Well, I see now that left out the step where I go from "M proves a false statement" to "M proves all false statements". Oh well, another topic for another post. If you liked this post, you may enjoy Torkel Franzén's books Godel's Theorem: An Incomplete Guide to Its Use and Abuse and Inexhaustibility: A Non-Exhaustive Treatment. If you disliked this post, you are even more likely to enjoy them.
Many thanks to Robert Bond for his contribution.
[Other articles in category /math] permanent link Sun, 13 Dec 2009
World's shortest explanation of Gödel's theorem
We have some sort of machine that prints out statements in some sort of language. It needn't be a statement-printing machine exactly; it could be some sort of technique for taking statements and deciding if they are true. But let's think of it as a machine that prints out statements. In particular, some of the statements that the machine might (or might not) print look like these:
For example, NPR*FOO means that the machine will never print FOOFOO. NP*FOOFOO means the same thing. So far, so good. Now, let's consider the statement NPR*NPR*. This statement asserts that the machine will never print NPR*NPR*. Either the machine prints NPR*NPR*, or it never prints NPR*NPR*. If the machine prints NPR*NPR*, it has printed a false statement. But if the machine never prints NPR*NPR*, then NPR*NPR* is a true statement that the machine never prints. So either the machine sometimes prints false statements, or there are true statements that it never prints. So any machine that prints only true statements must fail to print some true statements. Or conversely, any machine that prints every possible true statement must print some false statements too.
The proof of Gödel's theorem shows that there are statements of pure arithmetic that essentially express NPR*NPR*; the trick is to find some way to express NPR*NPR* as a statement about arithmetic, and most of the technical details (and cleverness!) of Gödel's theorem are concerned with this trick. But once the trick is done, the argument can be applied to any machine or other method for producing statements about arithmetic. The conclusion then translates directly: any machine or method that produces statements about arithmetic either sometimes produces false statements, or else there are true statements about arithmetic that it never produces. Because if it produces something like NPR*NPR* then it is wrong, but if it fails to produce NPR*NPR*, then that is a true statement that it has failed to produce. So any machine or other method that produces only true statements about arithmetic must fail to produce some true statements. Hope this helps! (This explanation appears in Smullyan's book 5000 BC and Other Philosophical Fantasies, chapter 3, section 65, which is where I saw it. He discusses it at considerable length in Chapter 16 of The Lady or the Tiger?, "Machines that Talk About Themselves". It also appears in The Mystery of Scheherezade.)
I gratefully acknowledge Charles Colht for his generous donation to this blog.
[ Addendum 20091214: Another article on the same topic. ] [ Addendum 20150403: Reddit user cafe_anon has formalized this argument as a Coq proof. ] [ Addendum 20150406: Reddit user TezlaKoil has formalized this argument as an Agda proof. ]
[Other articles in category /math] permanent link Sun, 21 Jun 2009
Gray code at the pediatrician's office
How did the bracket know exactly what height to report? This was done in a way I hadn't seen before. It had a photosensor looking at the post, which was printed with this pattern: (Click to view the other pictures I took of the post.) The pattern is binary numerals. Each numeral is a certain fraction of a centimeter high, say 1/4 centimeter. If the sensor reads the number 433, that means that the bracket is 433/4 = 108.25 cm off the ground, and so that Katara is 108.25 cm tall. The patterned strip in the left margin of this article is a straightforward translation of binary numerals to black and white boxes, with black representing 1 and white representing 0:
0000000000If you are paying attention, you will notice that although the strip at left is similar to the pattern in the doctor's office, it is not the same. That is because the numbers on the post are Gray-coded. Gray codes solve the following problem with raw binary numbers. Suppose Katara is close to 104 = 416/4 cm tall, so that the photosensor is in the following region of the post:
...But suppose that the sensor (or the post) is slightly mis-aligned, so that instead of properly reading the (416) row, it reads the first half of the (416) row and last half of the (415) row. That makes 0110111111, which is 447 = 111.75 cm, an error of almost 7.5%. (That's three inches, for my American and Burmese readers.) Or the error could go the other way: if the sensor reads the first half of the (415) and the second half of the (416) row, it will see 0110000000 = 384 = 96 cm. Gray code is a method for encoding numbers in binary so that each numeral differs from the adjacent ones in only one position:
0000000000This is the pattern from the post, which you can also see at the right of this article. Now suppose that the mis-aligned sensor reads part of the (416) line and part of the (417) line. With ordinary binary coding, this could result in an error of up to 7.75 cm. (And worse errors for children of other heights.) But with Gray coding no error results from the misreading:
...No matter what parts of 0101110000 and 0101110001 are stitched together, the result is always either 416 or 417. Converting from Gray code to standard binary is easy: take the binary expansion, and invert every bit that is immediately to the right of a 1 bit. For example, in 1111101000, each red bit is to the right of a 1, and so is inverted to obtain the Gray code 1000011100. Converting back is also easy: of the Gray code. Replace every sequence of the form 1000...01 with 1111...10; also replace 1000... with 1111... if it appears at the end of the code. For example, Gray code 1000011100 contains two such sequences, 100001 and 11, which are replaced with 111110 and 10, to give 1111101000. [ Addendum 20110525: Every so often someone asks why the stadiometer is so sophisticated. Here is the answer. ]
[Other articles in category /math] permanent link Sat, 23 May 2009
A child is bitten by a dog every 0.07 seconds...
This is obviously nonsense, because suppose the post office employs half a million letter carriers. (The actual number is actually about half that, but we are doing a back-of-the-envelope estimate of plausibility.) Then the bite rate is six bites per thousand letter carriers per year, and if children are 900 times more likely to be bitten, they are getting bitten at a rate of 5,400 bites per thousand children per year, or 5.4 bites per child. Insert your own joke here, or use the prefabricated joke framework in the title of this article. I wrote to the reporter, who attributed the claim to the Postal Bulletin 22258 of 7 May 2009. It does indeed appear there. I am trying to track down the ultimate source, but I suspect I will not get any farther. I have discovered that the "900 times" figure appears in the Post Office's annual announcements of Dog Bite Prevention Month as far back as 2004, but not as far back as 2002. Meantime, what are the correct numbers? The Centers for Disease Control and Prevention have a superb on-line database of injury data. It immediately delivers the correct numbers for dog bite rate among children:
According to the USPS 2008 Annual Report, in 2008 the USPS employed 211,661 city delivery carriers and 68,900 full-time rural delivery carriers, a total of 280,561. Since these 280,561 carriers received 3,000 dog bites, the rate per 100,000 carriers per year is 1069.29 bites. So the correct statistic is not that children are 900 times more likely than carriers to be bitten, but rather that carriers are 6.6 times as likely as children to be bitten, 5.6 times if you consider only children under 13. Incidentally, your toddler's chance of being bitten in the course of a year is only about a quarter of a percent, ceteris paribus. Where did 900 come from? I have no idea. There are 293 times as many children as there are letter carriers, and they received a total of 44.5 times as many bites. The "900" figure is all over the Internet, despite being utterly wrong. Even with extensive searching, I was not able to find this factoid in the brochures or reports of any other reputable organization, including the American Veterinary Medical Association, the American Academy of Pediatrics, the Centers for Disease Control and Prevention, or the Humane Society of the Uniited States. It appears to be the invention of the USPS. Also in the same newspaper, the new Indian restaurant on Baltimore avenue was advertising that they "specialize in vegetarian and non-vegetarian food". It's just a cornucopia of stupidity today, isn't it?
[Other articles in category /math] permanent link Sun, 17 May 2009
Bipartite matching and same-sex marriage
However, if same-sex marriages are permitted, there may not be a stable matching, so the character of the problem changes significantly. A minimal counterexample is:
Suppose we match A–B, C–X. Then since B prefers C to A, and C prefers B to X, B and C divorce their mates and marry each other, yielding B–C, A–X. But now C can improve her situation further by divorcing B in favor of A, who is only too glad to dump the miserable X. The marriages are now A–C, B–X. B now realizes that his first divorce was a bad idea, since he thought he was trading up from A to C, but has gotten stuck with X instead. So he reconciles with A, who regards the fickle B as superior to her current mate C. The marriages are now A–B, C–X, and we are back where we started, having gone through every possible matching. This should not be taken as an argument against same-sex marriage. The model fails to generate the following obvious real-world solution: A, B, and C should all move in together and live in joyous tripartite depravity, and X should jump off a bridge.
[Other articles in category /math] permanent link Mon, 09 Mar 2009
Happy birthday
Happy birthday, girls.
[Other articles in category /math] permanent link Thu, 29 Jan 2009
A simple trigonometric identity
Here I put A at (0,0), B at (4,0), and 1 at (2,2). This is clear enough, but I wished that it were more nearly equilateral. So that night as I was waiting to fall asleep, I thought about the problem of finding lattice points that are at the vertices of an equilateral triangle. This is a sort of two-dimensional variation on the problem of finding rational approximations to surds, which is a topic that has turned up here many times over the years. Or rather, I wanted to find lattice points that are almost at the vertices of an equilateral triangle, because I was pretty sure that there were no equilateral lattice triangles. But at the time I could not remember a proof. I started doing some calculations based on the law of cosines, which was a mistake, because nobody but John Von Neumann can do calculations like that in their head as they wait to fall asleep, and I am not John Von Neumann, in case you hadn't noticed. A simple proof that there are no equilateral lattice triangles has just now occurred to me, though, and I am really pleased with it, so we are about to have a digression. The area A of an equilateral triangle is s√3/2, where s is the length of the side. And s has the form √t because of the Pythagorean theorem, so A = √(3t)/2, where t is a sum of two squares, because the endpoints of the side are lattice points. By Pick's theorem, the area of any lattice triangle is a half-integer. So 3t is a perfect square, and thus there are an odd number of threes in t's prime factorization. But t is a sum of two squares, and by the sum of two squares theorem, its prime factorization must have an even number of threes. We now have a contradiction, so there was no such triangle. Wasn't that excellent? That is just the sort of thing that I could have thought up while waiting to fall asleep, so it proves even more conclusively that starting with the law of cosines was a mistake. Okay, end of digression. Back to the law of cosines. We have a triangle with sides a, b, and c, and opposite angles A, B, and C, and you no doubt recall from high school that c^{2} = a^{2} + b^{2} - 2ab cos C. We'll call this "law C". Before I fell alseep, it occurred to me that you could take the analogous law B, which is b^{2} = a^{2} + c^{2} - 2ac cos B, and substitute the right-hand side for the b^{2} term in law C. Then a bunch of stuff will cancel out and you should either get something interesting or something tautological. Von Neumann would have known right away which it was, but I needed paper. So today I got out the paper and did the thing, and came up with the very simple relation that:
c = a cos B + b cos AWhich holds in any triangle. But somehow I had never seen this before, or, if I had, I had completely forgotten it. The thing is so simple that I thought that it must be wrong, or I would have known it already. But no, it checked out for the easy cases (right triangles, equilateral triangles, trivial triangles) and the geometric proof is easy: Just drop a perpendicular from C. The foot of the perpendicular divides the base c into two segments, which, by the simplest possible trigonometry, have lengths a cos B and b cos A, respectively. QED. Perhaps that was anticlimactic. Have I mentioned that I have a sign on the door of my office that says "Penn Institute of Lower Mathematics"? This is the kind of thing I'm talking about. I will let you all know if I come up with anything about the almost-equilateral lattice triangles. Clearly, you can approximate the equilateral triangle as closely as you like by making the lattice coordinates sufficiently large, just as you can approximate √3 as closely as you like with rationals by making the numerator and denominator sufficiently large. Proof: Your computer draws equilateral-seeming triangles on the screen all the time. I note also that it is important that the lattice is two-dimensional. In three or more dimensions the triangle (1,0,0,0...), (0,1,0,0...), (0,0,1,0...) is a perfectly equilateral lattice triangle with side √2. [ Addendum 20090130: Vilhelm Sjöberg points out that the area of an equilateral triangle is s^{2}√3/4, not s√3/2. Whoops. This spoils my lovely proof, because the theorem now follows immediately from Pick's: s^{2} is an integer by Pythagoras, so the area is irrational rather than a half-integer as Pick's theorem requires. ] [ Addendum 20140403: As a practical matter, one can draw a good lattice approximation to an equilateral triangle by choosing a good rational approximation to !!\sqrt3!!, say !!\frac ab!!, and then drawing the points !!(0,0), (b,a),!! and !!(2b, 0)!!. The rational approximations to !!\sqrt3!! quickly produce triangles that are indistinguishable from equilateral. For example, the rational approximation !!\frac74!! gives the isosceles triangle with vertices !!(0,0), (4,7), (8,0)!! which has one side of length 8 and two sides of length !!\sqrt{65}\approx 8.06!!, an error of less than one percent. The next such approximation, !!\frac{26}{15}!!, gives a triangle that is correct to about 1 part in 1800. (For more about rational approximations to !!\sqrt3!!, see my article on Archimedes and the square root of 3.) ]
[Other articles in category /math] permanent link Tue, 27 Jan 2009
Amusements in Hyperspace
This evening I tried to imagine life in a 1000-dimensional universe. I didn't get too far, but what I did get seemed pretty interesting. What's it like? Well, it's very dark. Lamps wouldn't work very well, because if the illumination one foot from the source is I, then the illumination two feet from the source is I · 9.3·10^{-302}. Actually it's even worse than that; there's a double whammy. Suppose you had a cubical room ten feet across. If you thought it was hard to light up the dark corners of a big room in Boston in February, imagine how much worse it is in hyperspace where the corners are 158 feet away. There are some upsides, however. Rooms won't have to be ten feet on a side because everything will be smaller. You take up about 70,000 cubic centimeters of space; in hyperspace that is just not a lot of room, because a box barely more than a centimeter on a side takes up 70,000 hypercentimeters. In fact, a box barely more than a centimeter on a side can hold as much as you want; an 11 millimeter box already contains 2.5·10^{41} hypercentimeters. It's hard to put people in prison in hyperspace, because there are so many directions that you can go to get out. Flatland prison cells have four walls; ours have six, if you count the ceiling and the floor. Hyperspace prison cells have 2000 walls, and each one is very expensive to build. So that's hyperspace: Big, dark, and easy to get around. [ Addenda 20120510: An anonymous commenter on Colm Mulcahy's blog observed that "high dimension cubes are qualitatively more like hedgehogs than building blocks". And recently someone asked on stackexchange.math for "What are some examples of a mathematical result being counterintuitive?"; the top-scoring reply concerned the bizarre behavior of high-dimension cubes. ]
[Other articles in category /math] permanent link Fri, 23 Jan 2009
Archimedes and the square root of 3, revisited
I pointed out that although the approximations seem to come out of thin air, a little thought reveals where they probably did come from; it's not very hard. Briefly, you tabulate a^{2} and 3b^{2}, and look for numbers from one column that are close to numbers from the other; see the previous article for details. But Dr. Chuck Lindsey, the author of a superb explanation of Archimedes' methods, and a professor at Florida Gulf Coast University, seemed mystified by the appearance of the fraction 265/153:
Throughout this proof, Archimedes uses several rational approximations to various square roots. Nowhere does he say how he got those approximations—they are simply stated without any explanation—so how he came up with some of these is anybody's guess.I left it there for a few years, but just recently I got puzzled email from a gentleman named Peter Nockolds. M. Nockolds was not puzzled by the 265/153. Rather, he wanted to know why so many noted historians of mathematics should be so puzzled by the 265/153. This was news to me. I did not know anyone else had been puzzled by the 265/153. I had assumed that nearly everyone else saw it the same way that M. Nockolds and I did. But M. Nockolds provided me with a link to an extensive discussion of the matter, which included quotations from several noted mathematicians and historians of mathematics:
It would seem...that [Archimedes] had some (at present unknown) method of extracting the square root of numbers approximately. ...the calculation [of π] starts from a greater and lesser limit to the value of √3, which Archimedes assumes without remark as known, namely 265/153 < √3 < 1351/780. How did Archimedes arrive at this particular approximation? No puzzle has exercised more fascination upon writers interested in the history of mathematics... The simplest supposition is certainly [the "Babylonian method"; see Kline below]. Another suggestion...is that the successive solutions in integers of the equations x^{2}-3y^{2}=1 and x^{2}-3y^{2}=-2 may have been found...in a similar way to...the Pythagoreans. The rest of the suggestions amount for the most part to the use of the method of continued fractions more or less disguised.Heath said "The simplest supposition is certainly ..." and then followed with the "Babylonian method", which is considerably more complicated than the extremely simple method I suggested in my earlier article. Morris Kline explains the Babylonian method:
He also obtained an excellent approximation to √3, namely 1351/780 > √3 > 265/153, but does not explain how he got this result. Among the many conjectures in the historical literature concerning its derivation the following is very plausible. Given a number A, if one writes it as a^{2} ± b where a^{2} is the rational square nearest to A, larger or smaller, and b is the remainder, then a ± b/2a > √A > a ± b/(2a±1). Several applications of this procedure do produce Archimedes' result.And finally:
Archimedes approximated √3 by the slightly smaller value 265/153... How he managed to extract his square roots with such accuracy...is one of the puzzles that this extraordinary man has bequeathed to us.Nockolds asked me "Have you had any feedback from historians of maths who explain why it wasn't so easy to arrive at 265/153 or even 1351/780? Have you any idea why they make such a big deal out of this?" No, I'm mystified. Even working with craptastic Greek numerals, it would not take Archimedes very long to tabulate kn^{2} far enough to discover that 3·780^{2} = 1351^{2} - 1. Or, if you don't like that theory, try this one: He tabulated n^{2} and 3n^{2} far enough to discover the following approximations:
I know someone wants to claim that this is nothing more than the Babylonian method. But this is missing an important point. Although this sort of numeric tinkering might well lead you to discover the Babylonian method, especially if you were Archimedes, it is not the Babylonian method, and it can be done in complete ignorance of the Babylonian method. But it yields the required approximations anyway. So I will echo Nockolds' puzzlement here. There are a lot of things that Archimedes did that were complex and puzzling, but this is not one of them. You do not need sophisticated algebraic technique to find approximations to surds. You only need to do (at most) a few hours of integer calculation. The puzzle is why people like Rouse Ball and Heath think it is puzzling. There's an explanation I'm groping for but can't quite articulate, but which goes something like this: Perhaps mathematicians of the late Victorian age lent too much weight to theory and analysis, and not enough to heuristic and simple technique. As a lifelong computer programmer, I have a great appreciation for what can be accomplished by just grinding out the numbers. See my anecdote about the square root algorithm used by the ENIAC, for example. I guessed then that perhaps computer science professors know more about mathematics than I expect, but less about computation. I can imagine the same thing of Victorian mathematicians—but not of Archimedes. One thing you often hear about pre-19th-century mathematicians is that they were great calculators. I wonder if appreciation of simple arithmetic technique might not have been sometimes lost to the mathematicans from the very end of the pre-computation age, say 1880–1940. Then again, perhaps I'm not giving them enough credit. Maybe there's something going on that I missed. I haven't checked the original sources to see what they actually say, so who knows? Perhaps Heath discusses the technique I suggested, and then rejects it for some fascinating reason that I, not being an expert in Greek mathematics, can't imagine. If I find out anything else, I will report further.
[Other articles in category /math] permanent link Tue, 20 Jan 2009
Triples and Closure
In topology, we have the idea of a "closure" of a set, which is essentially the union of the set with its boundary. For example, consider an open disk D, say the set of all points less than one mile from my house. The boundary of this set is a circle with radius one mile, centered at my house. The closure of D is the union of D with its boundary, and so is closed disk consisting of all points less than or equal to one mile from my house. Representing the closure of a set S as C(S), we have the obvious theorem that S ⊂ C(S), because the closure includes everything in S, plus the boundary. Another easy, but not quite obvious theorem is that C(C(S)) ⊂ C(S). This says that once you take the closure, you have included the boundary, and you do not get any more boundary by taking the closure again. The closure of a set is "closed"; the closure of a "closed" set C is just C. A third fundamental theorem about closures is that A⊂ B → C(A) ⊂ C(B). Now we turn to monads. A monad is first of all a functor, which, if you restrict your attention to programming languages, means that a monad is a type constructor M with an associated function fmap such that for any function f of type α → β, fmap f has type M α → M β. But a monad is also equipped with two other functions. There is a return function, which has type α → M α, and a join function, which has type M M α → M α. Haskell provides monads with a "bind" function, written >>=, which is interdefinable with join:
join x = x >>= id a >>= b = join (fmap b a)but we are going to forget about >>= for now. So the monad is equipped with three fundamental operations: fmap :: (a → b) → (M a → M b) join :: M M a → M a return :: a → M aThe three basic theorems about topological closures are:
(A ⊂ B) → (C(A) ⊂ C(B)If we imagine that ⊂ is a special kind of implication, the similarity with the monad laws is clear. And ⊂ is a special kind of implication, since (A ⊂ B) is just an abbreviation for (x ∈ A → x ∈ B). If we name the three closure theorems "fmap", "join", and "return", we might guess that "bind" also turns out to be a theorem. And it is, because >>= has the type M a → (a → M b) → M b. The corresponding theorem is: x ∈ C(A) → (A ⊂ C(B)) → x ∈ C(B)If the truth of this is hard to see, it is partly because the implications are in an unnatural order. The theorem is stated in the form P → Q → R, but it would be easier to understand as the equivalent Q → P → R:
A ⊂ C(B) → x ∈ C(A) → x ∈ C(B)Or more briefly:
A ⊂ C(B) → C(A) ⊂ C(B)This is quite true. We can prove it from the other three theorems as follows. Suppose A ⊂ C(B). Then by "fmap", C(A) ⊂ C(C(B)). By "join", C(C(B)) ⊂ C(B). By transitivity of ⊂, C(A) ⊂ C(B). This is what we wanted. Haskell defines a =<< operator which is the same as >>= except with the arguments forwards instead of backwards:
=<< :: (a → M b) → M a → M b a =<< b = b >>= aThe type of this function is analogous to the bind theorem, and I have seen claims in the literature that the argument order is in some ways more natural. Where the >>= function takes a value first, and then feeds it to a given function, the =<< function makes more sense as a curried function, taking a function of type a → M b and yielding the corresponding function of type M a → M b. I think it's also worth noticing that the structure of the proof of the bind theorem (invoke "fmap" and then "join") is exactly the same as the structure of the code that defines "bind". We can go the other way also, and prove the "join" theorem from the "bind" theorem. The definition of join in terms of >>= is:
join a = a >>= idFollowing the program again, id in the program code corresponds to the theorem that B ⊂ B for any B. A special case of this theorem is that C(B) ⊂ C(B) for any B. Then in the "bind" theorem: A ⊂ C(B) → C(A) ⊂ C(B)take A = C(B):
C(B) ⊂ C(B) → C(C(B)) ⊂ C(B)The left side of the implication is satisfied, so we conclude the consequent, C(C(B)) ⊂ C(B), which is what we wanted. But wait, monad operations are also required to satisfy some monad laws. For example, join (return x) = x. How does this work out in topological closure world? In programming language world, x here is required to have monad type. Monad types correspond to closed sets, so this is a theorem about closed sets. The theorem says that if X is a closed set, then the closure of X is the same as x. This is true. The identity between these two things can be found in (surprise) category theory. In category theory, a monad is a (categorial) functor equipped with two natural transformations, the "return" and "join" operations. The categorial version of a closure operator is essentially the same. Closure operations have a natural opposite. In topology, it is the "interior of" operation. The interior of a set is what you get if you discard the boundary of the set. The interior of a closed disc is an open disc; the interior of an open disc is the same open disc. Interior operations satisfy laws analogous but opposite to those enjoyed by closures:
Notice that the third theorem does not get turned around. I think this is because it comes from the functor itself, which goes the same way, not from the natural transformations, which go the other way. But I have not finished thinking abhout it carefully yet. Sooner or later I am going to program in Haskell with comonads, and it gives me a comfortable feeling to know that I am pre-equipped with a way to understand them as interior operations. I have an idea that the power of mathematics comes principally from the places where it succeeds in understanding two different things as aspects of the same thing. For example, why is group theory so useful? Because it understands transformations of objects (say, rotations of a polyhedron) and algebraic operations as essentially the same thing. If you have a hard problem about one, you can often make it into an easier problem about the other one. Similarly analytic geometry transforms numerical problems into geometric problems and back again. Most often the geometry is harder than the numerical problem, and you use it in that direction, but often you go in the other direction instead. It is quite possible that this notion is too vague to qualify as an actual theory. But category theory fits the description. Category theory lets you say that types are objects, type constructors are functors, and polymorphic functions are natural transformations. Then you can understand natural transformations as structure-preserving maps of something or other and get some insight into polymorphic functions, or vice-versa. Category theory is a large agglomeration of such identities. Lambek and Scott's book starts with several slogans about category theory. One of these is that many objects of interest to mathematicians form categories, such as the category of sets. Another is that many objects of interest to mathematicians are categories. (For example, each set is a discrete category.) So one of the reasons category theory is so extremely useful is that it sets up these multiple entities as different aspects of the same thing. I went to lunch and found more to say on the subject, but it will have to wait until another time.
[Other articles in category /math] permanent link Mon, 24 Nov 2008
Variations on the Goldbach conjecture
[Other articles in category /math] permanent link Tue, 11 Nov 2008
Another note about Gabriel's Horn
Presumably people think it's paradoxical that the thing should have a finite volume but an infinite surface area. But since the horn is infinite in extent, the infinite surface area should be no surprise. The surprise, if there is one, should be that an infinite object might contain a merely finite volume. But we swallowed that gnat a long time ago, when we noticed that the infinitely wide series of bars below covers only a finite area when they are stacked up as on the right. The pedigree for that paradox goes at least back to Zeno, so perhaps Gabriel's Horn merely shows that there is still some life in it, even after 2,400 years. [ Addendum 2014-07-03: I have just learned that this same analogy was also described in this math.stackexchange post of 2010. ] [Other articles in category /math] permanent link Mon, 10 Nov 2008
Gabriel's Horn is not so puzzling
The calculations themselves do not lend much insight into what is going on here. But I recently read a crystal-clear explanation that I think should be more widely known. Take out some Play-Doh and roll out a snake. The surface area of the snake (neglecting the two ends, which are small) is the product of the length and the circumference; the circumference is proportional to the diameter. The volume is the product of the length and the cross-sectional area, which is proportional to the square of the diameter. Now roll the snake with your hands so that it becomes half as thick as it was before. Its diameter decreases by half, so its cross-sectional area decreases to one-fourth. Since the volume must remain the same, the snake is now four times as long as it was before. And the surface area, which is the product of the length and the diameter, has doubled. As you continue to roll the snake thinner and thinner, the volume stays the same, but the surface area goes to infinity. Gabriel's Horn does exactly the same thing, except without the rolling, because the parts of the Horn that are far from the origin look exactly the same as very long snakes. There's nothing going on in the Gabriel's Horn example that isn't also happening in the snake example, except that in the explanation of Gabriel's Horn, the situation is obfuscated by calculus. I read this explanation in H. Jerome Keisler's caclulus textbook. Keisler's book is an ordinary undergraduate calculus text, except that instead of basing everything on limits and on limiting processes, it is based on nonstandard analysis and explicit infinitesimal quantities. Check it out; it is available online for free. (The discussion of Gabriel's Horn is in chapter 6, page 356.) [ Addendum 20081110: A bit more about this. ] [Other articles in category /math] permanent link Fri, 10 Oct 2008
Representing ordinal numbers in the computer and elsewhere
The Turner paper is a must-read. It's about functional programming in languages where every program is guaranteed to terminate. This is more useful than it sounds at first. Turner's initial point is that the presence of ⊥ values in languages like Haskell spoils one's ability to reason from the program specification. His basic example is simple: loop :: Integer -> Integer loop x = 1 + loop xTaking the function definition as an equation, we subtract (loop x) from both sides and get 0 = 1which is wrong. The problem is that while subtracting (loop x) from both sides is valid reasoning over the integers, it's not valid over the Haskell Integer type, because Integer contains a ⊥ value for which that law doesn't hold: 1 ≠ 0, but 1 + ⊥ = 0 + ⊥. Before you can use reasoning as simple and as familiar as subtracting an expression from both sides, you first have to prove that the value of the expression you're subtracting is not ⊥. By banishing nonterminating functions, one also banishes ⊥ values, and familiar mathematical reasoning is rescued. You also avoid a lot of confusing language design issues. The whole question of strictness vanishes, because strictness is solely a matter of what a function does when its argument is ⊥, and now there is no ⊥. Lazy evaluation and strict evaluation come to the same thing. You don't have to wonder whether the logical-or operator is strict in its first argument, or its second argument, or both, or neither, because it comes to the same thing regardless. The drawback, of course, is that if you do this, your language is no longer Turing-complete. But that turns out to be less of a problem in practice than one would expect. The paper was so interesting that I am following up several of its precursor papers, including Abel's paper, about which the Turner paper says "The problem of writing a decision procedure to recognise structural recursion in a typed lambda calculus with case-expressions and recursive, sum and product types is solved in the thesis of Andreas Abel." And indeed it is. But none of that is what I was planning to discuss. Rather, Abel introduces a representation for ordinal numbers that I hadn't thought much about before. I will work up to the ordinals via an intermediate example. Abel introduces a type Nat of natural numbers:
Nat = 1 ⊕ NatThe "1" here is not the number 1, but rather a base type that contains only one element, like Haskell's () type or ML's unit type. For concreteness, I'll write the single value of this type as '•'. The ⊕ operator is the disjoint sum operator for types. The elements of the type S ⊕ T have one of two forms. They are either left(s) where s∈S or right(t) where t∈T. So 1⊕1 is a type with exactly two values: left(•) and right(•). The values of Nat are therefore left(•), and right(n) for any element n of Nat. So left(•), right(left(•)), right(right(left(•))), and so on. One can get a more familiar notation by defining:
So much for the natural numbers. Abel then defines a type of ordinal numbers, as: Ord = (1 ⊕ Ord) ⊕ (Nat → Ord)In this scheme, an ordinal is either left(left(•)), which represents 0, or left(right(n)), which represents the successor of the ordinal n, or right(f), which represents the limit ordinal of the range of the function f, whose type is Nat → Ord. We can define abbreviations:
id :: Nat → Ord id 0 = Zero id (n + 1) = Succ(id n)then ω = Lim(id). Then we easily get ω+1 = Succ(ω), etc., and the limit of this function is 2ω:
plusomega :: Nat → Ord plusomega 0 = Lim(id) plusomega (n + 1) = Succ(plusomega n)We can define an addition function on ordinals:
+ :: Ord → Ord → Ord ord + Zero = ord ord + Succ(n) = Succ(ord + n) ord + Lim(f) = Lim(λx. ord + f(x))This gets us another way to make 2ω: 2ω = Lim(λx.id(x) + ω). Then this function multiplies a Nat by ω:
timesomega :: Nat → Ord timesomega 0 = Zero timesomega (n + 1) = ω + (timesomega n)and Lim(timesomega) is ω^{2}. We can go on like this. But here's what puzzled me. The ordinals are really, really big. Much too big to be a set in most set theories. And even the countable ordinals are really, really big. We often think we have a handle on uncountable sets, because our canonical example is the real numbers, and real numbers are just decimal numbers, which seem simple enough. But the set of countable ordinals is full of weird monsters, enough to convince me that uncountable sets are much harder than most people suppose. So when I saw that Abel wanted to define an arbitrary ordinals as a limit of a countable sequence of ordinals, I was puzzled. Can you really get every ordinal as the limit of a countable sequence of ordinals? What about Ω, the first uncountable ordinal? Well, maybe. I can't think of any reason why not. But it still doesn't seem right. It is a very weird sequence, and one that you cannot write down. Because suppose you had a notation for all the ordinals that you would need. But because it is a notation, the set of things it can denote is countable, and so a fortiori the limit of all the ordinals that it can denote is a countable ordinal, not Ω. And it's all very well to say that the sequence starts out (0, ω, 2ω, ω^{2}, ω^{ω}, ε_{0}, ε_{1}, ε_{ε0}, ...), or whatever, but the beginning of the sequence is totally unimportant; what is important is the end, and we have no way to write the end or to even comprehend what it looks like. So my question to set theory experts: is every limit ordinal the least upper bound of some countable sequence of ordinals? I hate uncountable sets, and I have a fantasy that in the mathematics of the 23rd Century, uncountable sets will be looked back upon as a philosophical confusion of earlier times, like Zeno's paradox, or the luminiferous aether. [ Addendum 20081106: Not every limit ordinal is the least upper bound of some countable sequence of (countable) ordinals, and my guess that Ω is not was correct, but the proof is so simple that I was quite embarrassed to have missed it. More details here. ] [ Addendum 20160716: In the 8 years since I wrote this article, the link to Turner's paper at Middlesex has expired. Fortunately, Miëtek Bak has taken it upon himself to create an archive containing this paper and a number of papers on related topics. Thank you, M. Bak! ] [Other articles in category /math] permanent link Thu, 02 Oct 2008
The Lake Wobegon Distribution
To take my favorite example: nearly everyone has an above-average number of legs. I wish I could remember who first brought this to my attention. James Kushner, perhaps? But the world abounds with less droll examples. Consider a typical corporation. Probably most of the employees make a below-average salary. Or, more concretely, consider a small company with ten employees. Nine of them are paid $40,000 each, and one is the owner, who is paid $400,000. The average salary is $76,000, and 90% of the employees' salaries are below average. The situation is familiar to people interested in baseball statistics because, for example, most baseball players are below average. Using Sean Lahman's database, I find that 588 players received at least one at-bat in the 2006 National League. These 588 players collected a total of 23,501 hits in 88,844 at-bats, for a collective batting average of .265. Of these 588, only 182 had an individual batting average higher than 265. 69% of the baseball players in the 2006 National League were below-average hitters. If you throw out the players with fewer than 10 at-bats, you are left with 432 players of whom 279, or 65%, hit worse than their collective average of 23430/88325 = .265. Other statistics, such as earned-run averages, are similarly skewed. The reason for this is not hard to see. Baseball-hitting talent in the general population is normally distributed, like this: Here the right side of the graph represents the unusually good hitters, of whom there aren't very many. The left side of the graph represents the unusually bad hitters; there aren't many of those either. Most people are somewhere in the middle, near the average, and there are about as many above-average hitters as below-average hitters in the general population. But major-league baseball players are not the general population. They are carefully selected, among the best of the best. They are all chosen from the right-hand edge of the normal curve. The people in the middle of the normal curve, people like me, play baseball in Clark Park, not in Quankee Stadium. Here's the right-hand corner of the curve above, highly magnified: As you can see here, the shape is not at all like the curve for the general population, which had the vast majority of the population in the middle, around the average. Here, the vast majority of the population is way over on the left side, just barely good enough to play in the majors, hanging on to their jobs by the skin of their teeth, subject at any moment to replacement by some kid up from the triple-A minors. The above-average players are the ones over on the right end, the few of the few. Actually I didn't present the case strongly enough. There are around 800 regular major-league ballplayers in the USA, drawn from a population of around 300 million, a ratio of one per 375,000. Well, no, the ratio is smaller, since the U.S. leagues also draw the best players from Mexico, Venezuela, Canada, the Dominican Republic, Japan, and elsewhere. The curve above is much too inclusive. The real curve for major-league ballplayers looks more like this: (Note especially the numbers on the y-axis.) This has important implications for the analysis of baseball. A player who is "merely" above average is a rare and precious resource, to be cherished; far more players are below average. Skilled analysts know that comparisons with the "average" player are misleading, because baseball is full of useful, effective players who are below average. Instead, analysts compare players to a hypothetical "replacement level", which is effectively the leftmost edge of the curve, the level at which a player can be easily replaced by one of those kids from triple-A ball. In the Historical Baseball Abstract, Bill James describes some great team, I think one of the Cincinnati Big Red Machine teams of the mid-1970s, as "possibly the only team in history that was above average at every position". That's an important thing to know about the sport, and about team sports in general: you don't need great players to completely clobber the opposition; it suffices to have players that are merely above average. But if you're the coach, you'd better learn to make do with a bunch of players who are below average, because that's what you have, and that's what the other team will beat you with. The right-skewedness of the right side of a normal distribution has implications that are important outside of baseball. Stephen Jay Gould wrote an essay about how he was diagnosed with cancer and given six months to live. This sounds awful, and it is awful. But six months was the expected lifetime for patients with his type of cancer—the average remaining lifetime, in other words—and in fact, nearly everyone with that sort of cancer lived less than six months, usually much less. The average was only skewed up as high as six months because of a few people who took years to die. Gould realized this, and then set about trying to find out how the few long-lived outliers survived and what he could do to turn himself into one of the long-lived freaks. And he succeeded, and lived for twenty years, dying eventually at age 60. My heavens, I just realized that what I've written is an article about the "long tail". I had no idea I was being so trendy. Sorry, everyone.
[Other articles in category /math] permanent link Fri, 26 Sep 2008
Sprague-Grundy theory
Let N be any positive integer. Does there necessarily exist a positive integer k such that the base-10 representation of kN contains only the digits 0 through 4?M. Penn was right there with the answer. Yesterday, M. Penn asked a question to which I happened to know the answer, and I was so pleased that I wrote up the whole theory in appalling detail. Since I haven't posted a math article in a while, and since the mailing list only has about twelve people on it, I thought I would squeeze a little more value out of it by posting it here. Richard Penn asked:
N dots are placed in a circle. Players alternate moves, where a move consists of crossing out any one of the remaining dots, and the dots on each side of it (if they remain). The winner is the player who crosses out the last dot. What is the optimal strategy with 19 dots? with 20? Can you generalize?M. Penn observed that there is a simple strategy for the 20-dot circle, but was not able to find one for the 19-dot circle. But solving such problems in general is made easy by the Sprague-Grundy theory, which I will explain in detail.
0. Short SpoilersBoth positions are wins for the second player to move.The 20-dot case is trivial, since any first-player move leaves a row of 17 dots, from which the second player can leave two disconnected rows of 7 dots each. Then any first-player move in one of these rows can be effectively answered by the second player in the other row. The 19-dot case is harder. The first player's move leaves a row of 16 dots. The second player can win by removing 3 dots to leave disconnected rows of 6 and 7 dots. After this, the strategy is complicated, but is easily found by the Sprague-Grundy theory. It's at the end of this article if you want to skip ahead. Sprague-Grundy theory is a complete theory of all finite impartial games, which are games like this one where the two players have exactly the same moves from every position. The theory says:
1. NimIn the game of Nim, one has some piles of beans, and a legal move is to remove some or all of the beans from any one pile. The winner is the player who takes the last bean. Equivalently, the winner is the last player who has a legal move.Nim is important because every position in every impartial game is somehow equivalent to a position in Nim, as we will see. In fact, every position in every impartial game is equivalent to a Nim position with at most one heap of beans! Since single Nim-heaps are trivially analyzed, one can completely analyze any impartial game position by calculating the Nim-heap to which it is equivalent.
2. Disjoint sums of gamesDefinition: The "disjoint sum" A # B of two games A and B is a new game whose rules are as follows: a legal move in A # B is either a move in A or a move in B; the winner is the last player with a legal move.Three easy exercises:
0 # a = a # 0 = afor all games a. 0 is a win for the previous player: the next player to move has no legal moves, and loses. We will call the next player to move "P1", and the player who just moved "P2". Note that a Nim-heap of 0 beans is precisely the 0 game.
3. Sums of Nim-heapsWe usually represent a single Nim-heap with n beans as "∗n". I'll do that from now on.We observed that ∗0 is a win for the second player. Observe now that when n is positive, ∗n is a win for the first player, by a trivial strategy. From now on we will use the symbol "=" to mean a weaker relation on games than strict equality. Two games A and B will be equivalent if their outcomes are the same in a rather strong sense:
A = B means that for any game X, A # X is a winning position if and only if B # X is also.Taking X = 0, the condition A = B implies that both games have the same outcome in isolation: if one is a first-player win, so is the other. But the condition is stronger than that. Both ∗1 and ∗2 are first-player wins, but ∗1 ≠ ∗2, because ∗1 # ∗1 is a second-player win, while ∗2 # ∗1 is a first-player win. Exercise: ∗x = ∗y if and only if x = y. It so happens that the disjoint sum of two Nim-heaps is equivalent to a single Nim-heap: Nim-sum theorem: ∗a # ∗b = ∗(a ⊕ b), Where ⊕ is the bitwise exclusive-or operation. I'll omit the proof, which is pretty easy to find. ⊕ is often described as "write a and b in binary, and add, ignoring all carries." For example 1 ⊕ 2 = 3, and 13 ⊕ 7 = 10. This implies that ∗1 # ∗2 = ∗3, and that ∗13 # ∗7 = ∗10. Although I omitted the proof that # for Nim-heaps is essentially the ⊕ operation in disguise, there are many natural implications of this that you can use to verify that the claim is plausible. For example:
4. The MEX ruleThe important thing about disjoint sums is that they abstract away the strategy. If you have some complicated set of Nim-heaps ∗a # ∗b # ... # ∗z, you can ignore them and pretend instead that they are a single heap ∗(a ⊕ b ⊕ ... ⊕ z). Your best move in the compound heap can be easily worked out from the corresponding best move in the fictitious single heap.For example, how do you figure out how to play in ∗2 # ∗3 # ∗x? You consider it as (∗2 # ∗3) # ∗x = ∗1 # ∗x. That is, you pretend that the ∗2 and the ∗3 are actually a single heap of size 1. Then your strategy is to win in ∗1 # ∗x, which you obviously do by reducing ∗x to size 1, or, if ∗x is already ∗0, by changing ∗1 to ∗0. Now, that is very facile, but ∗2 # ∗3 is not the same game as ∗1, because from ∗1 there is just one legal move, which is to ∗0. Whereas from ∗2 # ∗3 there are several moves. It might seem that your opponent could complicate the situation, say by moving from ∗2 # ∗3 to ∗3, which she could not do if it were really ∗1. But actually this extra option can't possibly help your opponent, because you have an easy response to that move, which is to move right back to ∗1! If pretending that ∗2 # ∗3 was ∗1 was good before, it is certainly good after you make it ∗1 for real. From ∗2 # ∗3 there are a whole bunch of moves:
Move to ∗3But you can disregard the first four of these, because they are reversible: if some player X has a winning strategy that works by pretending that ∗2 # ∗3 is identical with ∗1, then the extra options of moving to ∗2 and ∗3 won't help X's opponent, because X can reverse those moves and turn the ∗2 # ∗3 component back into ∗1. So we can ignore these options, and say that there's just one move from ∗2 # ∗3 worth considering further, namely to ∗2 # ∗2 = 0. Since this is exactly the same set of moves that is available from ∗1, ∗2 # ∗3 behaves just like ∗1 in all situations, and have just proved that ∗2 # ∗3 = ∗1. Unlike the other moves, the move from ∗2 # ∗3 to ∗0 is not reversible. Once someone turns ∗2 # ∗3 into ∗0, by equalizing the piles, it cannot then be turned back into ∗1, or anything else. Considering this in more generality, suppose we have some game position P where the options are to move to one of several possible Nim-heaps, and M is the smallest Nim-heap that is not among the options. Then P = ∗M. Why? Because P has just the same options that ∗M has, namely the options of moving to one of ∗0 ... ∗(M-1). P also has some extra options, but we can ignore these because they're reversible. If you have a winning strategy in X # ∗M, then you have a winning strategy in X # P also, as follows:
For example, let's consider what happens if we augment Nim by adding a special token, called ♦. A player may, in lieu of a regular move, replace ♦ by a pile of beans of any positive size. What effect does this have on Nim? Since the legal moves from ♦ are {∗1, ∗2, ∗3, ...} and the MEX is 0, ♦ should behave like ∗0. That is, adding a ♦ token to any position should leave the outcome unaffected. And indeed it does. If you have a winning strategy in game G, then you have a winning strategy in G # ♦ also, as follows: If your opponent plays in G, reply in G. If your opponent replaces ♦ with a pile of beans, remove it, leaving only G. Exercise: Let G be a game where all the legal moves are to Nim-heaps. Then G is a win for P1 if and only if one of the legal moves from G is to ∗0, and a win for P2 if and only if none of the legal moves from G is to ∗0.
5. The Sprague-Grundy theoryAn "impartial game" is one where both players have the same moves from every position.Sprague-Grundy theorem: Any finite impartial game is equivalent to some Nim-heap ∗n, which is the "Nim-value" of the game. Now let's consider Richard Penn's game, which is impartial. A legal move is to cross out any dot, and the adjacent dot or dots, if any. The Sprague-Grundy theorem says that every row of dots in Penn's game is equivalent to some Nim-heap. Let's tabulate the size of this heap (the Nim-value) for each row of n dots. We'll represent a row of n dots as [οοοοο...ο]. Obviously, [] = ∗0 so the Nim-value of [] is 0. Also obviously, [ο] = ∗1, since they're exactly the same game. [οο] = ∗1 also, since the only legal move from [οο] is to [] = 0, and the MEX of {0} is 1. The legal moves from [οοο] are to [] = ∗0 and [ο] = ∗1, so {∗0, ∗1}, and the MEX is 2. So [οοο] = ∗2. Let's check that this is working. Since the Nim-value of [οοο] is 2, the theory predicts that [οοο] # ∗2 = 0 and so should be a win for P2. P2 should be able to pretend that [οοο] is actually ∗2. Suppose P1 turns the ∗2 into ∗1, moving to [οοο] # ∗1. Then P2 should turn [οοο] into ∗1 also, which he can do by crossing out an end dot and the adjacent one, leaving [ο] # ∗1, which he easily wins. If P1 turns ∗2 into ∗0, moving to [οοο] # ∗0, then P2 should turn [οοο] into ∗0 also, which he can do by crossing out the middle and adjacent dots, leaving [] # ∗0, which he wins immediately. If P1 plays in the [οοο] component, she must move to [] or to [ο], each equivalent to some Nim-heap of size x < 2, and P2 can answer by reducing the true Nim-heap ∗2 to contain x beans also. Continuing our analysis of rows of dots: In Penn's game, the legal moves from [οοοο] are to [οο] and [ο]. Both of these have Nim-value ∗1, so the MEX is 0. Easy exercise: Since [οοοο] is supposedly equivalent to ∗0, you should be able to show that a player who has a winning strategy in some game G also has a winning strategy in G + [οοοο]. The legal moves from [οοοοο] are to [οοο], [οο], and [ο] # [ο]. The Nim-values of these three games are ∗2, ∗1, and ∗0 respectively, so the MEX is 3 and [οοοοο] = ∗3. The legal moves from [οοοοοο] are to [οοοο], [οοο], and [ο] # [οο]. The Nim-values of these three games are 0, 2, and 0, so [οοοοοο] = ∗1.
6. Richard Penn's game analyzed
The table says that a row of 19 dots should be a win for P1, if she reduces the Nim-value from 3 to 0. And indeed, P1 has an easy winning strategy, which is to cross the 3 dots in the middle of the row, replacing [οοοοοοοοοοοοοοοοοοο] with [οοοοοοοο] # [οοοοοοοο]. But no such easy strategy obtains in a row of 20 dots, which, indeed, is a win for P2. The original question involved circles of dots, not rows. But from a circle of n dots there is only one legal move, which is to a row of n-3 dots. From a circle of 20 dots, the only legal move is to [ο × 17] = ∗2, which should be a win for P1. P1 should win by changing ∗2 to ∗0, so should look for the move from [ο × 17] to ∗0. This is the obvious solution Richard Penn discovered: move to [οοοοοοο] # [οοοοοοο]. So the circle of 20 dots is an easy win for P2, the second player. But for the circle of 19 dots the answer is the same, a win for the second player. The first player must move to [ο × 16] = ∗2, and then the second player should win by moving to a 0 position. [ο × 16] must have such a move, because if it didn't, the MEX rule would imply that its Nim-value was 0 instead of 2. So what's the second player's zero move here? There are actually two options. The second player can win by playing to [ο × 14], or by splitting the row into [οοοοοο] # [οοοοοοο]. 7. Complete strategy for 19-bean circleJust for completeness, let's follow one of these purportedly winning moves in detail. I claimed that the second player could win by moving to [οοοοοο] # [οοοοοοο]. But what next?First recall that any isolated row of four dots, [οοοο], can be disregarded, because any first-player move in such a row can be answered by a second-player move that crosses out the rest of the row. And any pair of isolated rows of one or two dots, [ο] or [οο], can be similarly disregarded, because any move that crosses out one can be answered by a move that crosses out the other. So in what follows, positions like [οο] # [ο] # [οοοο] will be assumed to have been won by the second player, and we will say that the second player "has an easy win" if he has a move to such a position.
But the important point here is not the strategy itself, which is hard to remember, and which could have been found by computer search. The important thing to notice is that computing the table of Nim-values for each row of n dots is easy, and once you have done this, the rest of the strategy almost takes care of itself. Do you need to find a good move from [οοοοοοο] # [οοοοοοοοο] # [οοοοοοοοοο]? There's no need to worry, because the table says that this can be viewed as ∗1 # ∗3 # ∗3, and so a good move is to reduce the ∗1 component, the [οοοοοοο], to ∗0, say by changing it to [οοοο] or to [οο] # [οο]. Whatever your opponent does next, calculating your reply will be similarly easy.
[Other articles in category /math] permanent link Wed, 10 Sep 2008
Factorials are not quite as square as I thought
Let s(n) be the smallest perfect square larger than n. Then to have n! = a^{2} - 1 we must have a^{2} = s(n!), and in particular we must have s(n!) - n! square. This actually occurs for n in { 4, 5, 6, 7, 8, 9, 10, 11 }, and since 11 was as far as I got on the lunch line yesterday, I had an exaggerated notion of how common it is. had I worked out another example, I would have realized that after n=11 things start going wrong. The value of s(12!) is 21887^{2}, but 21887^{2} - 12! = 39169, and 39169 is not a square. (In fact, the n=11 solution is quite remarkable; which I will discuss at the end of this note.) So while there are (of course) solutions to 12! = a^{2} - b^{2}, and indeed where b is small compared to a, as I said, the smallest such b takes a big jump between 11 and 12. For 4 ≤ n ≤ 11, the minimal b takes the values 1, 1, 3, 1, 9, 27, 15, 18. But for n = 12, the solution with the smallest b has b = 288. Calculations with Mathematica by Mitch Harris show that one has n! = s(n!) - b^{2} only for n in {1, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16}, and then not for any other n under 1,000. The likelihood that I imagine of another solution for n! = a^{2} - 1, which was already not very high, has just dropped precipitously. My thanks to M. Harris, and also to Stephen Dranger, who also wrote in with the results of calculations. Having gotten this far, I then asked OEIS about the sequence 1, 1, 3, 1, 9, 27, 15, 18, and (of course) was delivered a summary of the current state of the art in n! = a^{2} - 1. Here's my summary of the summary. The question is known as "Brocard's problem", and was posed by Brocard in 1876. No solutions are known with n > 7, and it is known that if there is a solution, it must have n > 10^{9}. According to the Mathworld article on Brocard's problem, it is believed to be "virtually certain" that there are no other solutions. The calculations for n ≤ 10^{9} are described in this unpublished paper of Berndt and Galway, which I found linked from the Mathworld article. The authors also investigated solutions of n! = a^{2} - b^{2} for various fixed b between 2 and 50, and found no solutions with 12 ≤ n ≤ 10^{5} for any of them. The most interesting was the 11! = 6318^{2} - 18^{2} I mentioned already. [ The original version of this article contained some confusion about whether s(n) was the largest square less than n, or the largest number whose square was less than n. Thanks to Roie Marianer for pointing out the error. ]
[Other articles in category /math] permanent link Tue, 09 Sep 2008
Factorials are almost, but not quite, square
$$n! = a^{2} - b^{2}\qquad (*)$$ will always have solutions where b is small compared to a. For example, we have 11! = 6318^{2} - 18^{2}.But to get b=1 might require a lot of luck, perhaps more luck than there is. (Jeremy Kahn once argued that |2^{x} - 3^{y}| = 1 could have no solutions other than the obvious ones, essentially because it would require much more fabulous luck than was available. I sneered at this argument at the time, but I have to admit that there is something to it.) Anyway, back to the subject at hand. Is there an example of n! = a^{2} -1 with n > 7? I haven't checked yet. In related matters, it's rather easy to show that there are no nontrivial examples with b=0. It would be pretty cool to show that equation (*) implied n = O(f(b)) for some function f, but I would not be surprised to find out that there is no such bound. This kept me amused for twenty minutes while I was in line for lunch, anyway. Incidentally, on the lunch line I needed to estimate √11. I described in an earlier article how to do this. Once again it was a good trick, the sort you should keep handy if you are the kind of person who needs to know √11 while standing in line on 33rd Street. Here's the short summary: √11 = √(99/9) = √((100-1)/9) = √((100/9)(1 - 1/100) = (10/3)√(1 - 1/100) ≈ (10/3)(1 - 1/200) = (10/3)(199/200) = 199/60. [ Addendum 20080909: There is a followup article. ] [Other articles in category /math] permanent link Sat, 12 Jul 2008
Period three and chaos
A Möbius function is simply a function of the form f : x → (ax + b) / (cx + d) for some constants a, b, c, and d. Möbius functions are of major importance in complex analysis, where they correspond to certain transformations of the Riemann sphere, but I'm mostly looking at the behavior of Möbius functions on the reals, and so restricting a, b, c, and d to be real. One nice thing about the Möbius functions is that you can identify the Möbius function f : x → (ax + b) / (cx + d) with the matrix , because then composition of Möbius functions is the same as multiplication of the corresponding matrices, and so the inverse of a Möbius function with matrix M is just the function that corresponds to M^{-1}. Determining whether a set of Möbius functions is closed under composition is the same as determining whether the corresponding matrices form a semigroup; you can figure out what happens when you iterate a Möbius function by looking at the eigenvalues of M, and so on. The matrices are not quite identical with the Möbius functions, because the matrix and the matrix !!{ 2\, 0 \choose 0\,2}!! are the same Möbius function. So you really need to consider the set of matrices modulo the equivalence relation that makes two matrices equivalent if they are the same up to a scalar factor. If you do this you get a group of matrices called the "projective linear group", PGL(2). This takes us off into classical group theory and Lie groups, which I have been intermittently trying to figure out. You can also consider various subgroups of PGL(2), such as the subgroup that leaves the set {0, 1, ∞, -1} fixed. The reciprocal function x → 1/x is one such; it leaves 1 and -1 fixed and exchanges 0 and ∞. In general a Möbius function has three degrees of freedom, since you can choose the four constants a, b, c, and d however you like, but one degree of freedom is removed because of the equivalence relation—or, to look at it another way, you get to pick b/a, c/a, and d/a however you like. So in general you can pick any p, q, and r and find the unique Möbius function m with m(0) = p, m(1) = q, m(-1) = r. These then determine m(∞), which turns out to be (4qr - 2p(q+r))/(q + r - 2p) when that is defined. And sometimes even when it isn't. You may be worrying about the infinities here, but it's really nothing much to worry about. f(∞) is nothing more than !!\lim_{x\rightarrow\infty} f(x)!!. If (4qr - 2p(q+r))/(q + r - 2p) in the presence of infinities worries you, try a few examples. For instance, consider m : x → x+1. This function has p = m(0) = 1, q = m(1) = 2, r = m(-1) = 0. Plugging into the formula, we get m(∞) = -2pq/(q - 2p) = -4 / (2-2) = -4/0 = ∞, which is just right. The only other thing you have to remember is that +∞ = -∞, because we're really living on the Riemann sphere. Or rather, we're living on the real part of the Riemann sphere, but either way there's only one ∞. We might call this space the "Riemann circle", but I've never heard it called that. And neither has Google, although it did turn up a bulletin board post in which someone else asked the same question in a similar context. There's a picture of it farther down on the right. Anyway, most choices of p, q, and r in {0, 1, ∞, -1} do not get you permutations of {0, 1, ∞, -1}, because they end up mapping ∞ outside that set. For example, if you take p = 1, q = -1, r = 0, you get m(∞) = -2/3. But obviously the identity function has the desired property, and if you think about the Riemann circle (excuse me, Riemann sphere) you immediately get the rest: any rigid motion of the Riemann sphere is a Möbius function, and some of those motions permute the four points {0, 1, ∞, -1}. In fact, there are eight such functions, because {0, 1, ∞, -1} are at the vertices of a square, so any rigid motion of the Riemann sphere that permutes {0, 1, ∞, -1} must be a rigid motion of that square, and the square has eight symmetries, namely the elements of the group D_{4}:
Here we have eight functions on the reals which make the group D_{4} under the operation of composition. For example, if f(x) = (x-1)/(x+1), then f(f(f(f(x)))) = x. Isn't that nice? Anyway, none of that was what I was really planning to talk about. (You knew that was coming, didn't you?) What I wanted to discuss was the function f : x → 1 / (1 - x). I found this function because I was considering other permutations of {0, 1, ∞, -1}. The f function takes 0 → 1 → ∞ → 0. (It also takes -1 → 1/2, and so is not one of the functions in the D_{4} table above.) We say that f has a periodic point of order 3 because f(f(f(x))) = x for some x; in this case at least for x ∈ {0, 1, ∞}. A function with a periodic point of order three is not something you see every day, and I was somewhat surprised that as simple a function as 1/(1-x) had one. But if you do the algebra and calculate f(f(f(x))) explicitly, you find that you do indeed get x, so every point is a periodic point of order 3, or possibly 1. Or you can do a simpler calculation: since f is the Möbius function that corresponds to the matrix F = !!{ \hphantom{-}0\, 1 \choose -1\,1}!!, just calculate F^{3}. You get !!{ -1\, \hphantom{-}0 \choose \hphantom{-}0\, -1}!!, which is indeed the identity function. This also gives you a simple matrix M for which M^{7} = M, if you happened to be looking for such a thing. I had noticed a couple of years ago that this 1/(1-x) function had period 3, and then forgot about it. Then I noticed it again a few weeks ago, and a nagging question came into my mind, which is reflected in a note I wrote in my notebook at that point: "WHAT ABOUT SARKOVSKY'S THEOREM?" Well, what about it? Sharkovskii's theorem (I misspelled it in the notebook) is a delightful generalization of the "Period three implies chaos" theorem of Li and Yorke. It says, among other things, that if a continuous function of the reals has a periodic point of order 3, then it also has a periodic point of order n for all positive integers n. In particular, we can take n=1, so the function f, which has a periodic point of order 3 must also have a fixed point. But it's quite easy to see that f has no fixed point on the reals: Just put f(x) = 1/(1-x) = x and solve for x; there are no real solutions. So what about Sharkovskii's theorem? Oh, it only applies to continuous functions, and f is not, because f(1) = ∞. So that's all right. The Sharkovskii thing is excellent. The Sharkovskii ordering of the integers is:
3 < 5 < 7 < 9 < ... The 1/(1-x) function led me to read more about Sharkovskii's theorem and its predecessor, the "period three implies chaos" theorem. Isn't that a great name for a theorem? And Li and Yorke knew it, because that's what they titled their paper. "Chaos" in this context means the following: say that two values a and b are "scrambled" by f if, for any given d and ε, there is some n for which |f^{n}(a) - f^{n}(b)| > d, and some m for which |f^{m}(a) - f^{m}(b)| < ε. That is, a and b are scrambled if repeated application of f drives a and b far apart, then close together, then far apart again, and so on. Then, if f is a continuous function with a periodic point of order 3, there is some uncountable set S of reals such that f scrambles all distinct pairs of values a and b from S. All that was from memory; I hope it got it more or less correct. (The Li and Yorke paper also includes an example of a continuous function with a periodic point of order 5 but no periodic point of order 3. It's pretty simple.) Reading about Sharkovskii's theorem and related matters led me to the web pages of James A. Yorke (of Li and Yorke), and then to the book Chaos: An Introduction to Dynamical Systems that he did with Alligood and Sauer, which is very readable. I was pleased to finally be studying this material, because it was a very early inspiration to me. When I was about fourteen, my cousin Alex, who is an analytic chemist, came to visit, and told me about period-doubling and chaos in the logistic map. (It was all over the news at the time.) The logistic map is just f : x → λx(1-x) for some constant λ. For small λ, the map has a single fixed point, which increases as λ does. But at a certain critical value of λ (λ=3, actually) the function's behavior changes, and it suddenly begins to have a periodic point of order 2. As λ increases further, the behavior changes again, and the periodicity changes from order 2 to order 4. As λ increases, this happens again and again, with the splits occurring at exponentially closer and closer values of λ. Eventually there is a magic value of λ at which the function goes berserk and is chaotic. Chaos continues for a while, and then the function develops a periodic point of order 3, which bifurcates... (The illustration here, which I copied from Wikipedia, uses r instead of λ.) I was deeply impressed. For some reason I got the idea that I would need to understand partial differential equations to understand the chaos and the logistic map, so I immediately set out on a program to learn what I thought I would need to know. I enrolled in differential equations courses at Columbia University instead of in something more interesting. The partial differential equations turned out to be a sidetrack, but in those days there were no undergraduate courses in iterated dynamic systems. I am happy to discover that after only twenty-five years I am finally arriving at the destination. Cousin Alex also told me to carry a notebook and pen with me wherever I went. That was good advice, and it took me rather less time to learn.
[Other articles in category /math] permanent link Wed, 23 Apr 2008
Recounting the rationals
Let b(n) be the number of ways of adding up powers of 2 to get n, with each power of 2 used no more than twice. So, for example, b(5) = 2, because there are 2 ways to get 5:
And b(10) = 5, because there are 5 ways to get 10:
The sequence of values of b(n) begins as follows:
1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 ...Now consider the sequence b(n) / b(n+1). This is just what you get if you take two copies of the b(n) sequence and place one over the other, with the bottom one shifted left one place, like this:
1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 ... - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 ...Reading each pair as a rational number, we get the sequence b(n) / b(n+1), which is 1/1, 1/2, 2/1, 1/3, 3/2, 2/3, 3/1, 1/4, 4/3, 3/5, 5/2, ... . Here is the punchline: This sequence contains each positive rational number exactly once. If you are just learning to read math papers, or you think you might like to learn to read them, the paper in which this is proved would be a good place to start. It is serious research mathematics, but elementary. It is very short. The result is very elegant. The proofs are straightforward. The techniques used are typical and widely applicable; there is no weird ad-hockery. The discussion in the paper is sure to inspire you to tinker around with it more on your own. All sorts of nice things turn up. The b(n) sequence satisfies a simple recurrence, the fractions organize themselves neatly into a tree structure, and everything is related to everything else. Check it out. Thanks to Brent Yorgey for bringing this to my attention. I saw it in this old blog article, but then discovered he had written a six-part series about it. I also discovered that M. Yorgey independently came to the same conclusion that I did about the paper: it would be a good first paper to read. [ Addendum 20080505: Brad Clow agrees that it was a good place to start. ]
[Other articles in category /math] permanent link Sat, 01 Mar 2008
More rational roots of polynomials
Suppose you have a polynomial P(x) = x^{n} + ...+ p = 0. If it has a rational root r, this must be an integer that divides p = P(0). So far so good. But consider P(x-1). This is a different polynomial, and if r is a root of P(x), then r+1 is a root of P(x-1). So, just as r must divide P(0), r+1 must divide P(-1). And similarly, r-1 must divide P+1. So we have an extension of the rational root theorem: instead of guessing that some factor r of P(0) is a root, and checking it to see, we first check to see if r+1 is a factor of P(-1), and if r-1 is a factor of P(1), and proceed with the full check only if these two auxiliary tests pass. My notes conclude with:
Is this really less work than just trying all the divisors of P(0) directly?Let's find out. As in the previous article, say P(x) = 3x^{2} + 6x - 45. The method only works for monic polynomials, so divide everything by 3. (It can be extended to work for non-monic polynomials, but the result is just that you have to divide everything by 3, so it comes to the same thing.) So we consider x^{2} + 2x - 15 instead. Say r is a rational root of P(x). Then:
So we need to find three consecutive integers that respectively divide 12, 15, and 16. The Britannica has no specific technique for this; it suggests doing it by eyeball. In this case, 2–3–4 jumps out pretty quickly, giving the root 3, and so does 6–5–4, which is the root -5. But the method also yields a false root: 4–3–2 suggests that -3 might be a root, and it is not. Let's see how this goes for a harder example. I wrote a little Haskell program that generated the random polynomial x^{4} - 26x^{3} + 240 x^{2} - 918x + 1215.
That required a fair amount of mental arithmetic, and I screwed up and got 502 instead of 512, which I only noticed because 502 is not composite enough; but had I been doing a non-contrived example, I would not have noticed the error. (Then again, I would have done the addition on paper instead of in my head.) Clearly this example was not hard enough because 2–3–4 and 4–5–6 are obviously solutions, and it will not always be this easy. I increased the range on my random number generator and tried again. The next time, it came up with the very delightful polynomial x^{4} - 2735x^{3} + 2712101 x^{2} - 1144435245x + 170960860950, and I decided not going to go any farther with it. The table values are easy to calculate, but they will be on the order of 170960860950, and I did not really care to try to factor that. I decided to try one more example, of intermediate difficulty. The program first gave me x^{4} - 25x^{3} + 107 x^{2} - 143x + 60, which is a lucky fluke, since it has a root at 1. The next example it produced had a root at 3. At that point I changed the program to generate polynomials that had integer roots between 10 and 20, and got x^{4} - 61x^{3} + 1364 x^{2} - 13220x + 46800.
This is just past my mental arithmetic ability; I got 34884 instead of 34864 in the first row, and balked at factoring 61446 in my head. But going ahead (having used the computer to finish the arithmetic), the 17 and 19 in the first and last rows are suggestive, and there is indeed a 17–18–19 to be found. Following up on the 19 in the first row suggests that we look for 19–20–21, which there is, and following up on the 11 in the last row, hoping for a 9–10–11, finds one of those too. All of these are roots, and I do have to admit that I don't know any better way of discovering that. So perhaps the method does have some value in some cases. But I had to work hard to find examples for which it made sense. I think it may be more reasonable with 18th-century technology than it is with 21st-century technology.
[Other articles in category /math] permanent link Thu, 28 Feb 2008
Algebra techniques that don't work, except when they do
She says "I stopped him before he factored out the x.". I was a bit surprised by this, because the work so far seemed reasonable to me. I think the only mistake was not dividing the whole thing by 3 in the first step. But it is not too late to do that, and even without it, you can still make progress. x(3x + 6) = 45, so if there are any integer solutions, x must divide 45. So try x = ±1, ±3, ±5, ±9, ±15 in roughly that order. (The "look for the wallet under the lamppost" principle.) x = 3 solves the equation, and then you can get the other root, x=-5, by further application of the same method, or by dividing the original polynomial by x-3, or whatever. If you get rid of the extra factor of 3 in the first place, the thing is even easier, because you have x(x + 2) = 15, so x = ±1, ±3, or ±5, and it is obviously solved by x=3 and x=-5. Now obviously, this is not always going to work, but it works often enough that it would have been the first thing I would have tried. It is a lot quicker than calculating b^{2} - 4ac when c is as big as 45. If anyone hassles you about it, you can get them off your back by pointing out that it is an application of the so-called rational root theorem. But probably the student did not have enough ingenuity or number sense to correctly carry off this technique (he didn't notice the 3), so that M. Hirta's advice to just use the damn quadratic formula already is probably good. Still, I wonder if perhaps such students would benefit from exposure to this technique. I can guess M. Hirta's answer to this question: these students will not benefit from exposure to anything. [ Addendum 20080228: Robert C. Helling points out that I could have factored the 45 in the first place, without any algebraic manipulations. Quite so; I completely botched my explanation of what I was doing. I meant to point out that once you have x(x+2) = 15 and the list [1, 3, 5, 15], the (3,5) pair jumps out at you instantly, since 3+2=5. I spent so much time talking about the unreduced polynomial x(3x+6) that I forgot to mention this effect, which is much less salient in the case of the unreduced polynomial. My apologies for any confusion caused by this omission. ] [ Addendum 20080301: There is a followup to this article. ]
[Other articles in category /math] permanent link Fri, 15 Feb 2008
Acta Quandalia
[Other articles in category /math] permanent link Wed, 13 Feb 2008
The least interesting number
This reads like a joke, and it is tempting to dismiss it as a trite bit of foolishness. But it has rather interesting and deep connections to other related matters, such as the Grelling-Nelson paradox and Gödel's incompleteness theorem. I plan to write about that someday. But today my purpose is only to argue that there are demonstrably uninteresting real numbers. I even have an example. Liouville's number L is uninteresting. It is defined as:
$$\sum_{i=1}^\infty {10}^{-i!} = 0.1100010000000000000001000\ldots$$ Why is this number of any concern? In 1844 Joseph Liouville showed that there was an upper bound on how closely an irrational algebraic number could be approximated by rationals. L can be approximated much more closely than that, and so must therefore be transcendental. This was the proof of the existence of transcendental numbers.The only noteworthy mathematical property possessed by L is its transcendentality. But this is certainly not enough to qualify it as interesting, since nearly all real numbers are transcendental. Liouville's theorem shows how to construct many transcendental numbers, but the construction generates many similar numbers. For example, you can replace the 10 with a 2, or the n! with floor(e^{n}) or any other fast-growing function. It appears that any potentially interesting property possessed by Liouville's number is also possessed by uncountably many other numbers. Its uninterestingness is identical to that of other transcendental numbers constructed by Liouville's method. L was neither the first nor the simplest number so constructed, so Liouville's number is not even of historical interest. The argument in Berry's paradox fails for the real numbers: since the real numbers are not well-ordered, the set of uninteresting real numbers need have no smallest element, and in fact (by Berry's argument) does not. Liouville's number is not the smallest number of its type, nor the largest, nor anything else of interest. If someone were to come along and prove that Liouville's number was the most uninteresting real number, that would be rather interesting, but it has not happened, nor is it likely.
[Other articles in category /math] permanent link Thu, 07 Feb 2008
Trivial theorems
Except of course it wasn't Acta Quandalia (which would never commit such a silly error) and it didn't concern k-quandles; it was some unspecified journal, and it concerned some property of some sort of topological space, and that was the end of the investigation of those topological spaces. This would not qualify as a major screwup under my definition in the original article, since the theorems are true, but it certainly would have been rather embarrassing. Journals are not supposed to publish papers about the properties of the empty set. Hmm, there's a thought. How about a Journal of the Properties of the Empty Set? The editors would never be at a loss for material. And the cover almost designs itself. Handsome, isn't it? I See A Great Need!Ahem. Anyway, if the folklore in question is true, I suppose the mathematicians involved might have felt proud rather than ashamed, since they could now boast of having completely solved the problem of partially uniform k-quandles. But on the other hand, suppose you had been granted a doctorate on the strength of your thesis on the properties of objects from some class which was subsequently shown to be empty. Wouldn't you feel at least a bit like a fraud? Is this story true? Are there any examples? Please help me, gentle readers.
[Other articles in category /math] permanent link Tue, 05 Feb 2008
Major screwups in mathematics: example 1
Readers suggested several examples, and I got lucky and turned up one on my own. Some of the examples were rather obscure technical matters, where Professor Snorfus publishes in Acta Quandalia that all partially uniform k-quandles have the Cosell property, and this goes unchallenged for several years before one of the other three experts in partially uniform quandle theory notices that actually this is only true for Nemontovian k-quandles. I'm not going to report on matters that sounded like that to me, although I realize that I'm running the risk that all the examples that I do report will sound that way to most of the audience. But I'm going to give it a try.
General remarksI would like to make some general remarks first, but I don't quite know yet what they are. Two readers independently suggested that I should read Proofs and Refutations by Imre Lakatos, and raised a number of interesting points that I'm sure I'd like to expand on, except that I haven't read the book. Both copies are checked out of the Penn library, which is a good sign, and the interlibrary loan copy I ordered won't be here for several days.Still, I can relate a partial secondhand understanding of the ideas, which seem worth repeating. Whether a result is "correct" may be largely a matter of definition. Consider Lakatos' principal example, Euler's theorem about polyhedra: Let F, E, and V be the number of faces, edges, and vertices in a polyhedron. Then F - E + V = 2. For example, the cube has (F, E, V) = (6, 12, 8), and 6 - 12 + 8 = 2. Sometime later, someone observed that Euler's theorem was false for polyhedra with holes in them. For example, consider the object shown at right. It has (F, E, V) = (9, 18, 9), giving F - E + V = 9 - 18 - 9 = 0. Can we say that Euler was wrong? Not really. The question hinges on the definition of "polyhedron". Euler's theorem is proved for "polyhedra", but we can see from the example above that it only holds for "simply-connected polyhedra". If Euler proved his theorem at a time when "polyhedra" was implicitly meant "simply-connected", and the generally-understood definition changed out from under him, we can't hold that against Euler. In fact, the failure of Euler's theorem for the object above suggests that maybe we shouldn't consider it to be a polyhedron, that it is somehow rather different from a polyhedron in at least one important way. So the theorem drives the definition, instead of the other way around. Okay, enough introductory remarks. My first example is unquestionably a genuine error, and from a first-class mathematician.
Mathematical backgroundSome terminology first. A "formula" is just that, for example something like this:
$$\displaylines{ ((\forall a.\lnot R(a,a)) \wedge\cr (\forall b\forall c.R(b,c)\to\lnot R(c,b))\wedge\cr (\forall d\forall e\forall f.(R(d,e)\wedge R(e,f)\to R(d,f))) \to\cr (\forall x\exists y.R(y,x)) }$$ It may contain a bunch of quantified variables (a, b, c, etc.), relations (like R), and logical connectives like ∧. A formula might also include functions and constants (which I didn't) or equality symbols (there are none here).One can ask whether the formula is true (or, in the jargon, "valid"), which means that it must hold regardless of how one chooses the set S from which the values of the variables will be drawn, and regardless of the meanings assigned to the relation symbols (and to the functions and constants, if there are any). The following formula, although not very interesting, is valid:
$$ \forall a\exists b.(P(a)\wedge P(b))\to P(a) $$ This is true regardless of the meaning we ascribe to P, and regardless of the set from which a and b are required to be drawn.The longer formula above, which requires that R be a linear order, and then that the linear order R have no minimal element, is not universally valid, but it is valid for some interpretations of R and some sets S from which a...f, x, and y may be drawn. Specifically, it is true if one takes S to be the set of integers and R(x, y) to mean x < y. Such formulas, which are true for some interpretations but not for all, are called "satisfiable". Obviously, valid formulas are satisfiable, because satisfiable formulas are true under some interpretations, but valid formulas are true under all interpretations. Gödel famously showed that it is an undecidable problem to determine whether a given formula of arithmetic is satisfiable. That is, there is no method which, given any formula, is guaranteed to tell you correctly whether or not there is some interpretation in which the formula is true. But one can limit the form of the allowable formulas to make the problem easier. To take an extreme example, just to illustrate the point, consider the set of formulas of the form:
∃a∃b... ((a=0)∨(a=1))&and((b=0)∨(b=1))∧...∧R(a,b,...) for some number of variables. Since the formula itself requires that a, b, etc. are each either 0 or 1, all one needs to do to decide whether the formula is satisfiable is to try every possible assignment of 0 and 1 to the n variables and see whether R(a,b,...) is true in any of the 2^{n} resulting cases. If so, the formula is satisfiable, if not then not.
Kurt Gödel, 1933One would like to prove decidability for a larger and more general class of formulas than the rather silly one I just described. How big can the class of formulas be and yet be decidable?It turns out that one need only consider formulas where all the quantifiers are at the front, because there is a simple method for moving quantifiers to the front of a formula from anywhere inside. So historically, attention has been focused on formulas in this form. One fascinating result concerns the class of formulas called [∃^{*}∀^{2}∃^{*}, all, (0)]. These are the formulas that begin with ∃a∃b...∃m∀n∀p∃q...∃z, with exactly two ∀ quantifiers, with no intervening ∃s. These formulas may contain arbitrary relations amongst the variables, but no functions or constants, and no equality symbol. [∃^{*}∀^{2}∃^{*}, all, (0)] is decidable: there is a method which takes any formula in this form and decides whether it is satisfiable. But if you allow three ∀ quantifiers (or two with an ∃ in between) then the set of formulas is no longer decidable. Isn't that freaky? The decidability of the class [∃^{*}∀^{2}∃^{*}, all, (0)] was shown by none other than Gödel, in 1933. However, in the last sentence of his paper, Gödel added that the same was true even if the formulas were also permitted to include equality:
In conclusion, I would still like to remark that Theorem I can also be proved, by the same method, for formulas that contain the identity sign. OopsThis was believed to be true for more than thirty years, and the result was used by other mathematicians to prove other results. But in the mid-1960s, Stål Aanderaa showed that Gödel's proof would not actually work if the formulas contained equality, and in 1983, Warren D. Goldfarb proved that Gödel had been mistaken, and the satisfiability of formulas in the larger class was not decidable.
SourcesGödel's original 1933 paper is Zum Entscheidungsproblem des logischen Funktionenkalküls (On the decision problem for the functional calculus of logic) which can be found on pages 306–327 of volume I of his Collected Works. (Oxford University Press, 1986.) There is an introductory note by Goldfarb on pages 226–231, of which pages 229–231 address Gödel's error specifically.I originally heard the story from Val Tannen, and then found it recounted on page 188 of The Classical Decision Problem, by Egon Boerger, Erich Grädel, and Yuri Gurevich. But then blog reader Jeffrey Kegler found the Goldfarb note, of which the Boerger-Grädel-Gurevich account appears to be a summary. Thanks very much to everyone who contributed, and especially to M. Kegler. (I remind readers who have temporarily forgotten, that Acta Quandalia is the quarterly journal of the Royal Uzbek Academy of Semi-Integrable Quandle Theory. Professor Snorfus, you will no doubt recall, won the that august institution's prestigious Utkur Prize in 1974.) [ Addendum 20080206: Another article in this series. ]
[Other articles in category /math] permanent link Thu, 31 Jan 2008
Ramanujan's congruences
Ramanujan's congruences state that:
Looking at this, anyone could conjecture that p(13k+7) = 0 (mod 13), but it isn't so; p(7) = 15 and p(20) = 48·13+3. But there are other such congruences. For example, according to Partition Congruences and the Andrews-Garvan-Dyson Crank:
$$ p(17\cdot41^4k + 1122838) = 0 \pmod{17} $$ Isn't mathematics awesome?
[Other articles in category /math] permanent link Fri, 25 Jan 2008
Nonstandard adjectives in mathematics
The property is not really attached to the adjective itself. Red emeralds are not emeralds, so "red" is nonstandard when applied to emeralds. Fake expressions of sympathy are still expressions of sympathy, however insincere. "Toy" often goes both ways: a toy fire engine is not a fire engine, but a toy ball is a ball and a toy dog is a dog. Adjectives in mathematics are rarely nonstandard. An Abelian group is a group, a second-countable topology is a topology, an odd integer is an integer, a partial derivative is a derivative, a well-founded order is an order, an open set is a set, and a limit ordinal is an ordinal. When mathematicians want to express that a certain kind of entity is similar to some other kind of entity, but is not actually some other entity, they tend to use compound words. For example, a pseudometric is not (in general) a metric. The phrase "pseudo metric" would be misleading, because a "pseudo metric" sounds like some new kind of metric. But there is no such term. But there is one glaring exception. A partial function is not (in general) a function. The containment is in the other direction: all functions are partial functions, but not all partial functions are functions. The terminology makes more sense if one imagines that "function" is shorthand for "total function", but that is not usually what people say. If I were more quixotic, I would propose that partial functions be called "partialfunctions" instead. Or perhaps "pseudofunctions". Or one could go the other way and call them "normal relations", where "normal" can be replaced by whatever adjective you prefer—ejective relations, anyone? I was about to write "any of these would be preferable to the current confusion", but actually I think it probably doesn't matter very much. [ Addendum 20080201: Another example, and more discussion of "partial". ] [ Addendum 20081205: A contravariant functor is not a functor. ] [ Addendum 20090121: A hom-set is not a set. ] [ Addendum 20110905: A skew field is not a field. The Wikipedia article about division rings observes that this use of "skew" is counter to the usual behavior of adjectives in mathematics. ] [ Addendum 20120819: A snub cube is not a cube. Several people have informed me that a quantum group is not a group. ] [ Addendum 20140708: nLab refers to the red herring principle, that “in mathematics, a ‘red herring’ need not, in general, be either red or a herring”. ] [ Addendum 20160505: The gaussian integers contain the integers, not vice versa, so a gaussian integer is not in general an integer. ]
[Other articles in category /math] permanent link Wed, 09 Jan 2008
Major screwups in mathematics
There are many examples of statements that were believed without proof that turned out to be false, such as any number of decidability and completeness (non-)theorems. If it turns out that P=NP, this will be one of those type, but as yet there is no generally accepted proof to the contrary, so it is not an example. Similarly, if would be quite surprising to learn that the Goldbach conjecture was false, but at present mathematicians do not generally believe that it has been proved to be true, so the Goldbach conjecture is not an example of this type, and is unlikely ever to be. There are a lot of results that could have gone one way or another, such as the three-dimensional kissing number problem. In this case some people believing they could go one way and some the other, and then they found that it was one way, but no proof to the contrary was ever widely accepted. Then we have results like the independence of the parallel postulate, where people thought for a long time that it should be implied by the rest of Euclidean geometry, and tried to prove it, but couldn't, and eventually it was determined to be independent. But again, there was no generally accepted proof that it was implied by the other postulates. So mathematics got the right answer in this case: the mathematicians tried to prove a false statement, and failed, and then eventually figured it out. Alfred Kempe is famous for producing an erroneous proof of the four-color map theorem, which was accepted for eleven years before the error was detected. But the four-color map theorem is true. I want an example of a false statement that was believed for years because of an erroneous proof. If there isn't one, that is an astonishing declaration of success for all of mathematics and for its deductive methods. 2300 years without one major screwup! It seems too good to be true. Is it?
Glossary for non-mathematicians
[ Addendum 20080206: Another article in this series, asking readers for examples of a different type of screwup. ]
[Other articles in category /math] permanent link Tue, 11 Dec 2007
More notes on power series
coss = zipWith (*) (cycle [1,0,-1,0]) (map ((1/) . fact) [0..])one has the choice to define the sine function analogously:
sins = zipWith (*) (cycle [0,1,0,-1]) (map ((1/) . fact) [0..])or in a totally different way, by reference to cosine:
sins = (srt . (add one) . neg . sqr) cossHere is a third way. Sine and cosine are solutions of the differential equation f = -f''. Since I now have enough infrastructure to get Haskell to solve differential equations, I can use this to define sine and cosine:
solution_of_equation f0 f1 = func where func = int f0 (int f1 (neg func)) sins = solution_of_equation 0 1 coss = solution_of_equation 1 0The constants f0 and f1 specify the initial conditions of the differential equation, values for f(0) and f'(0), respectively. Well, that was fun. One problem with the power series approach is that the answer you get is not usually in a recognizable form. If what you get out is [1.0,0.0,-0.5,0.0,0.0416666666666667,0.0,-0.00138888888888889,0.0,2.48015873015873e-05,0.0,...]then you might recognize it as the cosine function. But last night I couldn't sleep because I was wondering about the equation f·f' = 1, so I got up and put it in, and out came:
[1.0,1.0,-0.5,0.5,-0.625,0.875,-1.3125,2.0625,-3.3515625,5.5859375,-9.49609375,16.40234375,...]Okay, now what? Is this something familiar? I'm wasn't sure. One thing that might help a bit is to get the program to disgorge rational numbers rather than floating-point numbers. But even that won't completely solve the problem. One thing I was thinking about in the shower is doing Fourier analysis; this should at least identify the functions that are sinusoidal. Suppose that we know (or believe, or hope) that some power series a_{1}x + a_{3}x^{3} + ... actually has the form c_{1} sin x + c_{2} sin 2x + c_{3} sin 3x + ... . Then we can guess the values of the c_{i} by solving a system of n equations of the form:
$$\sum_{i=1}^n i^kc_i = k!a_k\qquad{\hbox{($k$ from 1 to $n$)}}$$ And one ought to be able to do something analogous, and more general, by including the cosine terms as well. I haven't tried it, but it seems like it might work.But what about more general cases? I have no idea. If you have the happy inspiration to square the mystery power series above, you get [1, 2, 0, 0, 0, ...], so it is √(2x+1), but what if you're not so lucky? I wasn't; I solved it by a variation of Gareth McCaughan's method of a few days ago: f·f' is the derivative of f^{2}/2, so integrate both sides of f·f' = 1, getting f^{2}/2 = x + C, and so f = √(2x + C). Only after I had solved the equation this way did I try squaring the power series, and see that it was simple. I'll keep thinking.
[Other articles in category /math] permanent link Mon, 10 Dec 2007
Lazy square roots of power series return
To do that I decided I would need a function to calculate the square root of a power series, which I did figure out; it's in the earlier article. But then I got distracted with other issues, and then folks wrote to me with several ways to solve the differential equation, and I spent a lot of time writing that up, and I didn't get back to the original problem until today, when I had to attend the weekly staff meeting. I get a lot of math work done during that meeting. At least one person wrote to ask me for the Haskell code for the power series calculations, so here's that first off. A power series a_{0} + a_{1}x + a_{2}x^{2} + a_{3}x^{3} + ... is represented as a (probably infinite) list of numbers [a_{0}, a_{1}, a_{2}, ...]. If the list is finite, the missing terms are assumed to be all 0. The following operators perform arithmetic on functions:
-- add functions a and b add [] b = b add a [] = a add (a:a') (b:b') = (a+b) : add a' b' -- multiply functions a and b mul [] _ = [] mul _ [] = [] mul (a:a') (b:b') = (a*b) : add (add (scale a b') (scale b a')) (0 : mul a' b') -- termwise multiplication of two series mul2 = zipWith (*) -- multiply constant a by function b scale a b = mul2 (cycle [a]) b neg a = scale (-1) aAnd there are a bunch of other useful utilities:
-- 0, 1, 2, 3, ... iota = 0 : zipWith (+) (cycle [1]) iota -- 1, 1/2, 1/3, 1/4, ... iotaR = map (1/) (tail iota) -- derivative of function a deriv a = tail (mul2 iota a) -- integral of function a -- c is the constant of integration int c a = c : (mul2 iotaR a) -- square of function f sqr f = mul f f -- constant function con c = c : cycle [0] one = con 1The really interesting operators perform division and evolve square roots of functions. I discussed how these work in the earlier article. The reciprocal operation is well-known; it appears in Structure and Interpretation of Computer Programs, Higher-Order Perl, and I presume elsewhere. I haven't seen the square root extractor anywhere else, but I'm sure that's just because I haven't looked.
-- reciprocal of argument function inv (s0:st) = r where r = r0 : scale (negate r0) (mul r st) r0 = 1/s0 -- divide function a by function b squot a b = mul a (inv b) -- square root of argument function srt (s0:s) = r where r = r0 : (squot s (add [r0] r)) r0 = sqrt(s0)We can define the cosine function as follows:
coss = zipWith (*) (cycle [1,0,-1,0]) (map ((1/) . fact) [0..])We could define the sine function analogously, or we can say that sin(x) = √(1 - cos^{2}(x)):
sins = (srt . (add one) . neg . sqr) cossThis works fine. Okay, so as usual that is not what I wanted to talk about; I wanted to show how to solve the differential equation. I found I was getting myself confused, so I decided to try to solve a simpler differential equation first. (Pólya says: "Can you solve a simpler problem of the same type?" Pólya is a smart guy. When the voice talking in your head is Pólya's, you better pay attention.) The simplest relevant differential equation seemed to be f = f'. The first thing I tried was observing that for all f, f = f_{0} : mul2 iotaR f'. This yields the code:
f = f0 : mul2 iotaR (deriv f)This holds for any function, and so it's unsolvable. But if you combine it with the differential equation, which says that f = f', you get:
f = f0 : mul2 iotaR f where f0 = 1 -- or whatever the initial conditions dictateand in fact this works just fine. And then you can observe that this is just the definition of int; replacing the definition with the name, we have:
f = int f0 f where f0 = 1 -- or whateverThis runs too, and calculates the power series for the exponential function, as it should. It's also transparently obvious, and makes me wonder why it took me so long to find. But I was looking for solutions of the form:
f = deriv fwhich Haskell can't figure out. It's funny that it only handles differential equations when they're expressed as integral equations. I need to meditate on that some more. It occurs to me just now that the f = f0 : mul2 iotaR (deriv f) identity above just says that the integral and derivative operators are inverses. These things are always so simple in hindsight. Anyway, moving along, back to the original problem, instead of f = f', I want f^{2} + (f')^{2} = 1, or equivalently f' = √(1 - f^{2}). So I take the derivative-integral identity as before:
f = int f0 (deriv f)and put in √(1 - f^{2}) for deriv f:
f = int f0 ((srt . (add one) . neg . sqr) f) where f0 = sqrt 0.5 -- or whateverAnd now I am done; Haskell cheerfully generates the power series expansion for f for any given initial condition. (The parameter f0 is precisely the desired value of f(0).) For example, when f(0) = √(1/2), as above, the calculated terms show the function to be exactly √(1/2)·(sin(x) + cos(x)); when f(0) = 0, the output terms are exactly those of sin(x). When f(0) = 1, the output blows up and comes out as [1, 0, NaN, NaN, ...]. I'm not quite sure why yet, but I suspect it has something to do with there being two different solutions that both have f(0) = 1. All of this also works just fine in Perl, if you build a suitable lazy-list library; see chapter 6 of HOP for complete details. Sample code is here. For a Scheme implementation, see SICP. For a Java, Common Lisp, Python, Ruby, or SML implementation, do the obvious thing. But anyway, it does work, and I thought it might be nice to blog about something I actually pursued to completion for a change. Also I was afraid that after my week of posts about Perl syntax, differential equations, electromagnetism, Unix kernel internals, and paint chips in the shape of Austria, the readers of Planet Haskell, where my blog has recently been syndicated, were going to storm my house with torches and pitchforks. This article should mollify them for a time, I hope. [ Addendum 20071211: Some additional notes about this. ]
[Other articles in category /math] permanent link Sun, 09 Dec 2007
Four ways to solve a nonlinear differential equation
which I was trying to solve by various methods. The article was actually about calculating square roots of power series; I got sidetracked on this. Before I got back to the original equation, I got interested in this a few weeks ago when I was sitting in on a freshman physics lecture at Penn. I took pretty much the same class when I was a freshman, but I've never felt like I really understood physics. Sitting in freshman physics class again confirms this. Every time I go to a class, I come out with bigger questions than I went in. The instructor was talking about LC circuits, which are simple circuits with a capacitor (that's the "C") and an inductor (that's the "L", although I don't know why). The physics people claim that in such a circuit the capacitor charges up, and then discharges again, repeatedly. When one plate of the capacitor is full of electrons, the electrons want to come out, and equalize the charge on the plates, and so they make a current flowing from the negative to the positive plate. Without the inductor, the current would fall off exponentially, as the charge on the plates equalized. Eventually the two plates would be equally charged and nothing more would happen. But the inductor generates an electromotive force that tends to resist any change in the current through it, so the decreasing current in the inductor creates a force that tends to keep the electrons moving anyway, and this means that the (formerly) positive plate of the capacitor gets extra electrons stuffed into it. As the charge on this plate becomes increasingly negative, it tends to oppose the incoming current even more, and the current does eventually come to a halt. But by that time a whole lot of electrons have moved from the negative to the positive plate, so that the positive plate has become negative and the negative plate positive. Then the electrons come out of the newly-negative plate and the whole thing starts over again in reverse. In practice, of course, all the components offer some resistance to the current, so some of the energy is dissipated as heat, and eventually the electrons stop moving back and forth. Anyway, the current is nothing more nor less than the motion of the electrons, and so it is proportional to the derivative of the charge in the capacitor. Because to say that current is flowing is exactly the same as saying that the charge in the capacitor is changing. And the magnetic flux in the inductor is proportional to rate of change of the current flowing through it, by Maxwell's laws or something. The amount of energy in the whole system is the sum of the energy stored in the capacitor and the energy stored in the magnetic field of the inductor. The former turns out to be proportional to the square of the charge in the capacitor, and the latter to the square of the current. The law of conservation of energy says that this sum must be constant. Letting f(t) be the charge at time t, then df/dt is the current, and (adopting suitable units) one has:
$$(f(x))^2 + \left(df(x)\over dx\right)^2 = 1$$ which is the equation I was considering.Anyway, the reason for this article is mainly that I wanted to talk about the different methods of solution, which were all quite different from each other. Michael Lugo went ahead with the power series approach I was using. Say that:
$$ \halign{\hfil $\displaystyle #$&$\displaystyle= #$\hfil\cr f & \sum_{i=0}^\infty a_{i}x^{i} \cr f' & \sum_{i=0}^\infty (i+1)a_{i+1}x^{i} \cr } $$ Then:
$$ \halign{\hfil $\displaystyle #$&$\displaystyle= #$\hfil\cr f^2 & \sum_{i=0}^\infty \sum_{j=0}^{i} a_{i-j} a_j x^{i} \cr (f')^2 & \sum_{i=0}^\infty \sum_{j=0}^{i} (i-j+1)a_{i-j+1}(j+1)a_{j+1} x^{i} \cr } $$ And we want the sum of these two to be equal to 1.Equating coefficients on both sides of the equation gives us the following equations:
Now take the next line from the table, 2a_{0}a_{2} + a_{1}^{2} + 6a_{1}a_{3} + 4a_{2}^{2}. This can be separated into the form 2a_{2}(a_{0} + 2a_{2}) + a_{1}(a_{1} + 6a_{3}). The left-hand term is zero, by the previous paragraph, and since the whole thing equals zero, we have a_{3} = -a_{1}/6. Continuing in this way, we can conclude that a_{0} = -2!a_{2} = 4!a_{4} = -6!a_{6} = ..., and that a_{1} = -3!a_{3} = 5!a_{5} = ... . These should look familiar from first-year calculus, and together they imply that f(x) = a_{0} cos(x) + a_{1} sin(x), where (according to the first line of the table) a_{0}^{2} + a_{1}^{2} = 1. And that is the complete solution of the equation, except for the case we omitted, when either a_{0} or a_{1} is zero; these give the trivial solutions f(x) = ±1. Okay, that was a lot of algebra grinding, and if you're not as clever as M. Lugo, you might not notice that the even terms of the series depend only on a_{0} and the odd terms only on a_{1}; I didn't. I thought they were all mixed together, which is why I alluded to "a bunch of not-so-obvious solutions" in the earlier article. Is there a simpler way to get the answer? Gareth McCaughan wrote to me to point out a really clever trick that solves the equation right off. Take the derivative of both sides of the equation; you immediately get 2ff' + 2f'f'' = 0, or, factoring out f', f'(f + f'') = 0. So there are two solutions: either f'=0 and f is a constant function, or f + f'' = 0, which even the electrical engineers know how to solve. David Speyer showed a third solution that seems midway between the two in the amount of clever trickery required. He rewrote the equation as:
$${df\over dx} = \sqrt{1 - f^2}$$ $${df\over\sqrt{1 - f^2} } = dx$$ The left side is an old standby of calculus I classes; it's the derivative of the arcsine function. On integrating both sides, we have:$$\arcsin f = x + C$$ so f = sin(x + C). This is equivalent to the a_{0} cos(x) + a_{1} sin(x) form that we got before, by an application of the sum-of-angles formula for the sine function. I think M. McCaughan's solution is slicker, but M. Speyer's is the only one that I feel like I should have noticed myself.Finally, Walt Mankowski wrote to tell me that he had put the question to Maple, which disgorged the following solution after a few seconds:
f(x) = 1, f(x) = -1, f(x) = sin(x - _C1), f(x) = -sin(x - _C1).This is correct, except that the appearance of both sin(x + C) and -sin(x + C) is a bit odd, since -sin(x + C) = sin(x + (C + π)). It seems that Maple wasn't clever enough to notice that. Walt says he will ask around and see if he can find someone who knows what Maple did to get the solution here. I would like to add a pithy and insightful conclusion to this article, but I've been working on it for more than a week now, and also it's almost lunch time, so I think I'll have to settle for observing that sometimes there are a lot of ways to solve math problems. Thanks again to everyone who wrote in about this.
[Other articles in category /math] permanent link Sat, 01 Dec 2007
19th-century elementary arithmetic
They have a tradition that one or two of the pies are "Jonahs": they look the same on the outside, but instead of being filled with fruit, they are filled with something you don't want to eat, in this case a mixture of bran and cayenne pepper. If you get the Jonah pie, you must either eat the whole thing, or crawl under the table to be a footstool for the rest of the meal. Just as they are about to serve, a stranger knocks at the door. He is an old friend of Grandpa's. They invite him to lunch, of course removing the Jonahs from the platter. But he insists that they be put back, and he gets the Jonah, and crawls under the table, marching it around the dining room on his back. The ice is broken, and the rest of the afternoon is filled with laughter and stories. Later on, when the grandparents return, the kids learn that the elderly visitor was none other than Hannibal Hamlin, formerly Vice-President of the United States. A few years ago I tried to track this down, and thanks to the Wonders of the Internet, I was successful. Then this month I had the library get me some other C. A. Stephens stories, and they were equally delightful and amusing. In one of these, the narrator leaves the pump full of water overnight, and the pipe freezes solid. He then has to carry water for forty head of cattle, in buckets from the kitchen, in sub-freezing weather. He does eventually manage to thaw the pipe. But why did he forget in the first place? Because of fractions:
I had been in a kind of haze all day over two hard examples in complex fractions at school. One of them I still remember distinctly:At that point I had to stop reading and calculate the answer, and I recommend that you do the same. I got the answer wrong, by the way. I got 25/64 or 64/25 or something of the sort, which suggests that I flipped over an 8/5 somewhere, because the correct answer is exactly 1. At first I hoped perhaps there was some 19th-century precedence convention I was getting wrong, but no, it was nothing like that. The precedence in this problem is unambiguous. I just screwed up. Entirely coincidentally (I was investigating the spelling of the word "canceling") I also recently downloaded (from Google Books) an arithmetic text from the same period, The National Arithmetic, on the Inductive System, by Benjamin Greenleaf, 1866. Here are a few typical examples:
Some of these are rather easy, but others are a long slog. For example, #1 and #3 here (actually #1 and #25 in the book) can be solved right off, without paper. But probably very few people have enough skill at mental arithmetic to carry off $612/(83 3/7) * (14 7/10) in their heads. The "complex fractions" section, which the original problem would have fallen under, had it been from the same book, includes problems like this: "Add 1/9, 2 5/8, 45/(94 7/11), and (47 5/9)/(314 3/5) together." Such exercises have gone out of style, I think. In addition to the complicated mechanical examples, there is some good theory in the book. For example, pages 227–229 concern continued fraction expansions of rational numbers, as a tool for calculating simple rational approximations of rationals. Pages 417–423 concern radix-n numerals, with special attention given to the duodecimal system. A typical problem is "How many square feet in a floor 48 feet 6 inches long, and 24 feet 3 inches broad?" The remarkable thing here is that the answer is given in the form 1176 sq. feet. 1' 6'', where the 1' 6'' actually means 1/12 + 6/144 square feet— that is, it is a base-12 "decimal". I often hear people bemoaning the dumbing-down of the primary and secondary school mathematics curricula, and usually I laugh at those people, because (for example) I have read a whole stack of "College Algebra" books from the early 20th century, which deal in material that is usually taken care of in 10th and 11th grades now. But I think these 19th-century arithmetics must form some part of an argument in the other direction. On the other hand, those same people often complain that students' time is wasted by a lot of "new math" nonsense like base-12 arithmetic, and that we should go back to the tried and true methods of the good old days. I did not have an example in mind when I wrote this paragraph, but two minutes of Google searching turned up the following excellent example:
Most forms of life develop random growths which are best pruned off. In plants they are boles and suckerwood. In humans they are warts and tumors. In the educational system they are fashionable and transient theories of education created by a variety of human called, for example, "Professor Of The Teaching Of Mathematics."(Smart Machines, by Lawrence J. Kamm; chapter 11, "Smart Machines in Education".) Pages 417–423 of The National Arithmetic, with their problems on the conversion from base-6 to base-11 numerals, suggest that those people may not know what they are talking about.
[Other articles in category /math] permanent link Fri, 30 Nov 2007
Lazy square roots of power series
$$(f(x))^2 + (f'(x))^2 = 1$$ where f' is the derivative of f. This equation has a couple of obvious solutions (f(x) = 1; f(x) = sin(x)) and a bunch of not-so-obvious ones. Since I couldn't solve the equation symbolically, I decided to fall back on power series. Representing f(x) as a_{0} + a_{1}x + a_{2}x^{2} + ... one can manipulate the power series and solve for a_{0}, a_{1}, a_{2}, etc. In fact, this is exactly the application for which mathematicians first became intersted in power series. The big question is "once you have found a_{0}, a_{1}, etc., do these values correspond to a real function? And for what x does the power series expression actually make sense?" This question, springing from a desire to solve intractable differential equations, motivates a lot of the theoretical mathematics of the last hundred and fifty years.I decided to see if I could use the power series methods of chapter 6 of Higher-Order Perl to calculate a_{0}, etc. So far, not yet, although I am getting closer. The key is that if $series is the series you want, and if you can calculate at least one term at the front of the series, and then express the rest of $series in terms of $series, you win. For example:
# Perl my $series; $series = node(1, promise { scale(2, $series) } );This is perfectly well-defined code; it runs fine and sets $series to be the series [1,2,4,8,16...]. In Haskell this is standard operating procedure:
-- Haskell series = 1 : scale 2 seriesBut in Perl it's still a bit outré. Similarly, the book shows, on page 323, how to calculate the reciprocal of a series s. Any series can be expressed as the sum of the first term and the rest of the terms:
s = head(s) + x·tail(s)Now suppose that r = 1/s.
r = head(r) + x·tail(r)we have:
rs = 1This shows (equating the constant terms on both sides) that head(r) = 1/head(s). And equating the non-constant terms then gives:
x·(1/head(s))·tail(s) + x·tail(r)·head(s) + x^{2}·tail(r)tail(s) = 0and we win. This same calculation appears on page 323, in a somewhat more confused form. (Also, I considered only the special case where head(s) = 1.) The code works just fine. To solve the differential equation f^{2} + (f')^{2} = 1, I want to do something like this:
$$f = \sqrt{1 - {(f')}^{2}}$$ so I need to be able to take the square root of a power series. This does not appear in the book, and I have not seen it elsewhere. Here it is.Say we want r^{2} = s, where s is known. Then write, as usual:
s = head(s) + x·tail(s)as before, and, since r^{2} = s, we have:
(head(r))^{2} + 2x head(r) tail(r) + x^{2}(tail(r))^{2} = head(s) + x·tail(s)so, equating coefficients on both sides, (head(r))^{2} = head(s), and head(r) = √(head(s)). Subtracting the head(s) from both sides, and dividing by x:
2·head(r) tail(r) + x·(tail(r))^{2} = tail(s)and we win. Or rather, we win once we write the code, which would be something like this:
# Perl sub series_sqrt { my $s = shift; my ($s0, $st) = (head($s), tail($s)); my $r0 = sqrt($s0); my $r; $r = node($r0, promise { divide($st, add2(node($r0, undef), $r)) }); return $r; }I confess I haven't tried this in Perl yet, but I have high confidence that it will work. I actually did the implementation in Haskell:
-- Haskell series_sqrt (s0:st) = r where r = r0 : (divide st (add [r0] r)) r0 = sqrt(s0)And when I asked it for the square root of [1,1,0,0,0,...] (that is, of 1+x) it gave me back [1, 0.5, -0.125, -0.0625, ...], which is indeed correct. The Perl code is skankier than I wish it were. A couple of years ago I said in an interview that "I wish Perl's syntax were less verbose." Some people were surprised by this at the time, since Perl programmers consider Perl's syntax to be quite terse. But comparison of the Perl and Haskell code above demonstrates the sort of thing I mean. Part of ths issue here, of course, is that the lazy list data structure is built in to Haskell, but I have to do it synthetically in Perl, and so every construction of a lazy list structure in Perl is accompanied by a syntactic marker (such as node(...) or promise { ... }) that is absent, or nearly absent, from the Haskell. But when I complained about Perl's verbose syntax in 2005, one thing I had specifically in mind was Perl's argument acquisition syntax, here represented by my $s = shift;. Haskell is much terser, with no loss of expressiveness. Haskell gets another win in the automatic destructuring bind: instead of explicitly calling head() and tail() to acquire the values of s0 and st, as in the Perl code, they are implicitly called by the pattern match (s0:st) in the Haskell code, which never mentions s at all. It is quite fair to ascribe this to a failure of Perl's syntax, since there's no reason in principle why Perl couldn't support this, at least for built-in data structures. For example, consider the Perl code:
sub blah { my $href = shift(); my $a = $href->{this}; my $tmp = $href->{that}; my $b = $tmp->[0]; my $c = $tmp->[2]; # Now do something with $a, $b, $c }It would be much more convenient to write this as: sub blah { my { this => $a, that => [$b, undef, $c] } = shift(); # Now do something with $a, $b, $c }This is a lot easier to understand. There are a number of interesting user-interface issues to ask about here: What if the assigned value is not in the expected form? Are $a, $b, and $c copied from $href or are they aliases into it? And so on. One easy way to dispense with all of these interesting questions (perhaps not in the best way) is to assert that this notation is just syntactic sugar for the long version. I talked to Chip Salzenberg about this at one time, and he said he thought it would not be very hard to implement. But even if he was right, what is not very hard for Chip Salzenberg to do can turn out to be nearly impossible for us mortals. [ Addendum 20071209: There's a followup article that shows several different ways of solving the differential equation, including the power-series method. ] [ Addendum 20071210: I did figure out how to get Haskell to solve the differential equation. ] [Other articles in category /math] permanent link Fri, 12 Oct 2007
The square of the Catalan sequence
The idea is that when you calculate derived data in a database, such as a view or a selection, you can simultaneously calculate exactly which input tuples contributed to each output tuple's presence in the output. Each input tuple is annotated with an identifier that says who was responsible for putting it there, and the output annotations are polynomials in these identifiers. (The complete paper is here.) A simple example may make this a bit clearer. Suppose we have the following table R:
If you annotate the tuples of R with identifiers like this:
This assignment of polynomials generalizes a lot of earlier work on tuple annotation. For example, suppose each tuple in R is annotated with a probability of being correct. You can propagate the probabilities to S just by substituting the appropriate numbers for the variables in the polynomials. Or suppose each tuple in R might appear multiple times and is annotated with the number of times it appears. Then ditto. If your queries are recursive, then the polynomials might be infinite. For example, suppose you are calculating the transitive closure T of relation R. This is like the previous example, except that instead of having S(x, z) = R(x, y) and R(y, z), we have T(x, z) = R(x, z) or (T(x, y) and R(y, z)). This is a recursive equation, so we need to do a fixpoint solution for it, using certain well-known techniques. The result in this example is:
Anyway, in his talk, Val referred to the sequence as "bizarre", and I had to jump in to point out that it was not at all bizarre, it was the Catalan numbers, which are just what you would expect from a relation like V = s + V^{2}, blah blah, and he cut me off, because of course he knows all about the Catalan numbers. He only called them bizarre as a rhetorical flourish, meant to echo the presumed puzzlement of the undergraduates in the room. (I never know how much of what kind of math to expect from computer science professors. Sometimes they know things I don't expect at all, and sometimes they don't know things that I expect everyone to know. (This was indeed what was going on, and the professor seemed to think it was a surprising insight. I am not relating this boastfully, because I truly don't think it was a particularly inspired guess. (Now that I think about it, maybe the answer here is that computer science professors know more about math than I expect, and less about computation.) Anyway, I digress, and the whole article up to now was not really what I wanted to discuss anyway. What I wanted to discuss was that when I started blathering about Catalan numbers, Val said that if I knew so much about Catalan numbers, I should calculate the coefficient of the x^{59} term in V^{2}, which also appeared as one of the annotations in his example. So that's the puzzle, what is the coefficient of the x^{59} term in V^{2}, where V = 1 + s + 2s^{2} + 5s^{3} + 14s^{4} + ... ? After I had thought about this for a couple of minutes, I realized that it was going to be much simpler than it first appeared, for two reasons. The first thing that occurred to me was that the definition of multiplication of polynomials is that the coefficient of the x^{n} term in the product of A and B is Σa_{i}b_{n-i}. When A=B, this reduces to Σa_{i}a_{n-i}. Now, it just so happens that the Catalan numbers obey the relation c_{n+1} = Σ c_{i}c_{n-i}, which is exactly the same form. Since the coefficients of V are the c_{i}, the coefficients of V^{2} are going to have the form Σc_{i}c_{n-i}, which is just the Catalan numbers again, but shifted up by one place. The next thing I thought was that the Catalan numbers have a pretty simple generating function f(x). This just means that you pretend that the sequence V is a Taylor series, and figure out what function it is the Taylor series of, and use that as a shorthand for the whole series, ignoring all questions of convergence and other such analytic fusspottery. If V is the Taylor series for f(x), then V^{2} is the Taylor series for f(x)^{2}. And if f has a compact representation, say as sin(x) or something, it might be much easier to square than the original V was. Since I knew in this case that the generating function is simple, this seemed likely to win. In fact the generating function of V is not sin(x) but (1-√(1-4x))/2x. When you square this, you get almost the same thing back, which matches my prediction from the previous paragraph. This would have given me the right answer, but before I actually finished that calculation, I had an "oho" moment. The generating function is known to satisfy the relation f(x) = 1 + xf(x)^{2}. This relation is where the (1-√(1-4x))/2x thing comes from in the first place; it is the function that satisfies that relation. (You can see this relation prefigured in the equation that Val had, with V = s + V^{2}. There the notation is a bit different, though.) We can just rearrange the terms here, putting the f(x)^{2} by itself, and get f(x)^{2} = (f(x)-1)/x. Now we are pretty much done, because f(x) = V = 1 + x + 2x^{2} + 5x^{3} + 14x^{4} + ... , so f(x)-1 = x + 2x^{2} + 5x^{3} + 14x^{4} + ..., and (f(x)-1)/x = 1 + 2x + 5x^{2} + 14x^{3} + ... . Lo and behold, the terms are the Catalan numbers again. So the answer is that the coefficient of the x^{59} term is just c(60), calculation of which is left as an exercise for the reader. I don't know what the point of all that was, but I thought it was fun how the hairy-looking problem seemed likely to be simple when I looked at it a little more carefully, and then how it did turn out to be quite simple. This blog has had a recurring dialogue between subtle technique and the sawed-off shotgun method, and I often favor the sawed-off shotgun method. Often programmers' big problem is that they are very clever and learned, and so they want to be clever and learned all the time, even when being a knucklehead would work better. But I think this example provides some balance, because it shows a big win for the clever, learned method, which does produce a lot more understanding. Then again, it really doesn't take long to whip up a program to multiply infinite polynomials. I did it in chapter 6 of Higher-Order Perl, and here it is again in Haskell:
data Poly a = P [a] deriving Show instance (Eq a) => Eq (Poly a) where (P x) == (P y) = (x == y) polySum x [] = x polySum [] y = y polySum (x:xs) (y:ys) = (x+y) : (polySum xs ys) polyTimes [] _ = [] polyTimes _ [] = [] polyTimes (x:xs) (y:ys) = (x*y) : more where more = (polySum (polySum (map (x *) ys) (map (* y) xs)) (0 : (polyTimes xs ys))) instance (Num a) => Num (Poly a) where (P x) + (P y) = P (polySum x y) (P x) * (P y) = P (polyTimes x y) [Other articles in category /math] permanent link Tue, 09 Oct 2007
Relatively prime polynomials over Z2
Polynomials over Z_{2} are one of my favorite subjects, and the answer to the question turned out to be beautiful. So I thought I'd write about it here. First, what does it mean for two polynomials to be relatively prime? It's analogous to the corresponding definition for integers. For any numbers a and b, there is always some number d such that both a and b are multiples of d. (d = 1 is always a solution.) The greatest such number is called the greatest common divisor or GCD of a and b. The GCD of two numbers might be 1, or it might be some larger number. If it's 1, we say that the two numbers are relatively prime (to each other). For example, the GCD of 100 and 28 is 4, so 100 and 28 are not relatively prime. But the GCD of 100 and 27 is 1, so 100 and 27 are relatively prime. One can prove theorems like these: If p is prime, then either a is a multiple of p, or a is relatively prime to p, but not both. And the equation ap + bq = 1 has a solution (in integers) if and only if p and q are relatively prime. The definition for polynomials is just the same. Take two polynomials over some variable x, say p and q. There is some polynomial d such that both p and q are multiples of d; d(x) = 1 is one such. When the only solutions are trivial polynomials like 1, we say that the polynomials are relatively prime. For example, consider x^{2} + 2x + 1 and x^{2} - 1. Both are multiples of x+1, so they are not relatively prime. But x^{2} + 2x + 1 is relatively prime to x^{2} - 2x + 1. And one can prove theorems that are analogous to the ones that work in the integers. The analog of "prime integer" is "irreducible polynomial". If p is irreducible, then either a is a multiple of p, or a is relatively prime to p, but not both. And the equation a(x)p(x) + b(x)q(x) = 1 has a solution for polynomials a and b if and only if p and q are relatively prime. One uses Euclid's algorithm to calculate the GCD of two integers. Euclid's algorithm is simple: To calculate the GCD of a and b, just subtract the smaller from the larger, repeatedly, until one of the numbers becomes 0. Then the other is the GCD. One can use an entirely analogous algorithm to calculate the GCD of two polynomials. Two polynomials are relatively prime just when their GCD, as calculated by Euclid's algorithm, has degree 0. Anyway, that was more introduction than I wanted to give. The article in Mathematics Magazine concerned polynomials over Z_{2}, which means that the coefficients are in the field Z_{2}, which is just like the regular integers, except that 1+1=0. As I explained in the earlier article, this implies that a=-a for all a, so there are no negatives and subtraction is the same as addition. I like this field a lot, because subtraction blows. Do you have trouble because you're always dropping minus signs here and there? You'll like Z_{2}; there are no minus signs. Here is a table that shows which pairs of polynomials over Z_{2} are relatively prime. If you read this blog through some crappy aggregator, you are really missing out, because the table is awesome, and you can't see it properly. Check out the real thing.
A pink square means that the polynomials are relatively prime; a white square means that they are not. Another version of this table appeared on the cover of Mathematics Magazine. It's shown at right. The thin black lines in the diagram above divide the polynomials of different degrees. Suppose you pick two degrees, say 2 and 2, and look at the corresponding black box in the diagram:
The proof of this is delightful. If you run Euclid's algorithm on two relatively prime polynomials over Z_{2}, you get a series of intermediate results, terminating in the constant 1. Given the intermediate results and the number of steps, you can run the algorithm backward and find the original polynomials. If you run the algorithm backward starting from 0 instead of from 1, for the same number of steps, you get two non-relatively-prime polynomials of the same degrees instead. This establishes a one-to-one correspondence between pairs of relatively prime polynomials and pairs of non-relatively-prime polynomials of the same degrees. End of proof. (See the paper for complete details.) You can use basically the same proof to show that the probability that two randomly-selected polynomials over Z_{p} is 1-1/p. The argument is the same: Euclid's algorithm could produce a series of intermediate results terminating in 0, in which case the polynomials are not relatively prime, or it could produce the same series of intermediate results terminating in something else, in which case they are relatively prime. The paper comes to an analogous conclusion about monic polynomials over Z.
[ Puzzle: The (11,12) white squares in the picture are connected to the others via row and column 13, which doesn't appear. Suppose the quilt were extended to cover the entire quarter-infinite plane. Would the white area be connected? ]
[Other articles in category /math] permanent link Mon, 08 Oct 2007 Reduces your risk of auto theft by 400%.
[Other articles in category /math] permanent link Sat, 08 Sep 2007
The missing deltahedron
The number of edges that meet at a vertex is its valence. Vertices in convex deltahedra have valences of 3, 4, or 5. The valence can't be larger than 5 because only six equilateral triangles will fit, and if you fit 6 then they lie flat and the polyhedron is not properly convex. Let V_{3}, V_{4}, and V_{5} be the number of vertices of valences 3, 4, and 5, respectively. Then:
Well, this is all oversubtle, I realized later, because you don't need to do the V_{3}–V_{4}–V_{5} analysis to see that something is missing. There are convex deltahedra with 4, 6, 8, 10, 12, 14, and 20 faces; what happened to 18? Still, I did a little work on a more careful analysis that might shed some light on the 18-hedron situation. I'm still in the middle of it, but I'm trying to continue my policy of posting more frequent, partial articles. Let V be the number of vertices in a convex deltahedron, E be the number of edges, and F be the number of faces. We then have V = V_{3} + V_{4} + V_{5}. We also have E = ½(3V_{3} + 4V_{4} + 5V_{5}). And since each face has exactly 3 edges, we have 3F = 2E. By Euler's formula, F + V = E + 2. Plugging in the stuff from the previous paragraph, we get 3V_{3} + 2V_{4} + V_{5} = 12. It is very easy to enumerate all possible solutions of this equation. There are 19:
(3,1,1) fails completely because to have V_{5} > 0 you need V ≥ 6. There isn't even a graph with (V_{3}, V_{4}, V_{5}) = (3,1,1), much less a polyhedron. There is a graph with (3,0,3), but it is decidedly nonplanar: it contains K_{3,3}, plus an additional triangle. But the graph of any polyhedron must be planar, because you can make a little hole in one of the faces of the polyhedron and flatten it out without the edges crossing. Another way to think about (3,0,3) is to consider it as a sort of triangular tripyramid. Each of the V_{5}s shares an edge with each of the other five vertices, so the three V_{5}s are all pairwise connected by edges and form a triangle. Each of the three V_{3}s must be connected to each of the three vertices of this triangle. You can add two of the required V_{3}s, by erecting a triangular pyramid on the top and the bottom of the triangle. But then you have nowhere to put the third pyramid. On Thursday I didn't know what went wrong with (2,2,2); it seemed fine. (I found it a little challenging to embed it in the plane, but I'm not sure if it would still be challenging if it hadn't been the middle of the night.) I decided that when I got into the office on Friday I would try making a model of it with my magnet toy and see what happened. It turned out that nothing goes wrong with (2,2,2). It makes a perfectly good non-convex deltahedron. It's what you get when you glue together three tetrahedra, face-to-face-to-face. The concavity is on the underside in the picture.
There is a planarity failure, which is also topological, but less severe, as with (3,0,3). (3,0,3) also fails because you can't embed it into R^{3}. (I mean that you cannot embed its 3-skeleton. Of course you can embed its 1-skeleton in R^{3}, but that is not sufficient for the thing to be a polyhedron.) I'm not sure if this is really different from the previous failure; I need to consider more examples. And (3,0,3) fails in yet another way: you can't even embed its 1-skeleton in R^{3} without violating the constraint that says that the edges must all have unit length. The V_{5}s must lie at the vertices of an equilateral triangle, and then the three unit spheres centered at the V_{5}s intersect at exactly two points of R^{3}. You can put two of the V_{3}s at these points, but this leaves nowhere for the third V_{3}. Again, I'm not sure that this is a fundamentally different failure mode than the other two. Another failure mode is that the graph might be embeddable into R^{3}, and might satisfy the unit-edge constraint, but in doing so it might determine a concave polyhedron, like (2,2,2) does, or a non-polyhedron, like (2,0,6) does. I still have six (V_{3},V_{4},V_{5}) triples to look into. I wonder if there are any other failure modes? I should probably think about (0,1,10) first, since the whole point of all this was to figure out what happened to D_{18}. But I'm trying to work up from the simple cases to the harder ones. I suppose the next step is to look up the proof that there are only eight convex deltahedra and see how it goes. I suspect that (2,1,4) turns out to be nonplanar, but I haven't looked at it carefully enough to actually find a forbidden minor. One thing that did occur to me today was that a triple (V_{3}, V_{4}, V_{5}) doesn't necessarily determine a unique graph, and I need to look into that in more detail. I'll be taking a plane trip on Sunday and I plan to take the magnet toy with me and continue my investigations on the plane. In other news, Katara and I went to my office this evening to drop off some books and pick up some stuff for the trip, including the magnet toy. Katara was very excited when she saw the collection of convex deltahedron models on my desk, each in a different color, and wanted to build models just like them. We got through all of them, except D_{10}, because we ran out of ball bearings. By the end Katara was getting pretty good at building the models, although I think she probably wouldn't be able to do it without directions yet. I thought it was good work, especially for someone who always skips from 14 to 16 when she counts. On the way home in the car, we were talking about how she was getting older and I rhapsodized about how she was learning to do more things, learning to do the old things better, learning to count higher, and so on. Katara then suggested that when she is older she might remember to include 15.
[Other articles in category /math] permanent link Thu, 06 Sep 2007
Followup notes about dice and polyhedra
[Other articles in category /math] permanent link Tue, 07 Aug 2007
Different arrangements for standard dice
To understand just what is being asked for here, first observe that a standard pair of dice throws a 2 exactly 1/36 of the time, a 3 exactly 2/36 of the time, and so forth:
The standard dice have faces numbered 1, 2, 3, 4, 5, and 6. It should be clear that if one die had {0,1,2,3,4,5} instead, and the other had {2,3,4,5,6,7}, then the probabilities would be exactly the same. Similarly you could subtract 3.7 from every face of one die, giving it labels {-2.7, -1.7, -0.7, 0.3, 1.3, 2.3}, and if you added the 3.7 to every face of the other die, giving labels {4.7, 5.7, 6.7, 7.7, 8.7, 9.7}, you'd still have the same chance of getting any particular total. For example, there are still exactly 2 ways out of 36 possible rolls to get the total 3: you can roll -2.7 + 5.7, or you can roll -1.7 + 4.7. But the question is to find a nontrivial relabeling. Like many combinatorial problems, this one is best solved with generating functions. Suppose we represent a die as a polynomial. If the polynomial is Σa_{i}x^{i}, it represents a die that has a_{i} chances to produce the value i. A standard die is x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x, with one chance to produce each integer from 1 to 6. (We can deal with probabilities instead of "chances" by requiring that Σa_{i} = 1, but it comes to pretty much the same thing.) The reason it's useful to adopt this representation is that rolling the dice together corresponds to multiplication of the polynomials. Rolling two dice together, we multiply (x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x) by itself and get P(x) = x^{12} + 2x^{11} + 3x^{10} + 4x^{9} + 5x^{8} + 6x^{7} + 5x^{6} + 4x^{5} + 3x^{4} + 2x^{3} + x^{2}, which gives the chances of getting any particular sum; the coefficient of the x^{9} term is 4, so there are 4 ways to roll a 9 on two dice. What we want is a factorization of this 12th-degree polynomial into two polynomials Q(x) and R(x) with non-negative coefficients. We also want Q(1) = R(1) = 6, which forces the corresponding dice to have 6 faces each. Since we already know that P(x) = (x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x)^{2}, it's not hard; we really only have to factor x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x and then see if there's any suitable way of rearranging the factors. x^{6} + x^{5} + x^{4} + x^{3} + x^{2} + x = x(x^{4} + x^{2} + 1)(x + 1) = x(x^{2} + x + 1)(x^{2} - x + 1)(x + 1). So P(x) has eight factors:
We want to combine these into two products Q(x) and R(x) such that Q(1) = R(1) = 6. If we calculate f(1) for each of these, we get 1, 3 (pink), 1, and 2 (blue). So each of Q and R will require one of the factors that has f(1) = 3 and one that has f(1) = 2; we can distribute the f(1) = 1 factors as needed. For normal dice the way we do this is to assign all the factors in each row to one die. If we want alternative dice, our only real choice is what to do with the x^{2} - x + 1 and x factors. Redistributing the lone x factors just corresponds to subtracting 1 from all the faces of one die and adding it back to all the faces of the other, so we can ignore them. The only interesting question is what to do with the x^{2} - x + 1 factors. The normal distribution assigns one to each die, and the only alternative is to assign both of them to a single die. This gives us the two polynomials:
And so the solution is that one die has faces {1,2,2,3,3,4} and the other has faces {1,3,4,5,6,8}:
Counting up entries in the table, we see that there are indeed 6 ways to throw a 7, 4 ways to throw a 9, and so forth. One could apply similar methods to the problem of making a pair of dice that can't roll 7. Since there are six chances in 36 of rolling 7, we need to say what will happen instead in these 6 cases. We might distribute them equally among some of the other possibilities, say 2, 4, 6, 8, 10, and 12, so that we want the final distribution of results to correspond to the polynomial 2x^{12} + 2x^{11} + 4x^{10} + 4x^{9} + 6x^{8} + 6x^{6} + 4x^{5} + 4x^{4} + 2x^{3} + 2x^{2}. The important thing to notice here is that the coefficient of the x^{7} term is 0. Now we want to factor this polynomial and proceed as before. Unfortunately, it is irreducible. (Except for the trivial factor of x^{2}.) Several other possibilities are similarly irreducible. It's tempting to reason from the dice to the algebra, and conjecture that any reducible polynomial that has a zero x^{7} term must be rather exceptional in other ways, such as by having only even exponents. But I'm not sure it will work, because the polynomials are more general than the dice: the polynomials can have negative coefficients, which are meaningless for the dice. Still, I can fantasize that there might be some result of this type available, and I can even imagine a couple of ways of getting to this result, one combinatorial, another based on Fourier transforms. But I've noticed that I have a tendency to want to leave articles unpublished until I finish exploring all possible aspects of them, and I'd like to change that habit, so I'll stop here, for now. [ Addendum 20070905: There are some followup notes. ]
[Other articles in category /math] permanent link Mon, 06 Aug 2007
Standard analytic polyhedra
(0,0,0)And you can see at a glance whether two vertices share an edge (they are the same in two of their three components) or are opposite (they differ in all three components). Last week I was reading the Wikipedia article about the computer game "Hunt the Wumpus", which I played as a small child. For the Guitar Hero / WoW generation I should explain Wumpus briefly. The object of "Wumpus" is to kill the Wumpus, which hides in a network of twenty caves arranged in a dodecahedron. Each cave is thus connected to three others. On your turn, you may move to an adjacent cave or shoot a crooked arrow. The arrow can pass through up to five connected caves, and if it enters the room where the Wumpus is, it kills him and you win. Two of the caves contain bottomless pits; to enter these is death. Two of the caves contain giant bats, which will drop you into another cave at random; if it contains a pit, too bad. If you are in a cave adjacent to a pit, you can feel a draft; if you are adjacent to bats, you can hear them. If you are adjacent to the Wumpus, you can smell him. If you enter the Wumpus's cave, he eats you. If you shoot an arrow that fails to kill him, he wakes up and moves to an adjacent cave; if he enters you cave, he eats you. You have five arrows. I did not learn until much later that the caves are connected in a dodecahedron; indeed, at the time I probably didn't know what a dodecahedron was. The twenty caves were numbered, so that cave 1 was connected to 2, 5, and 8. This necessitated a map, because otherwise it was too hard to remember which room was connected to which. Or did it? If the map had been a cube, the eight rooms could have been named 000, 001, 010, etc., and then it would have been trivial to remember: 011 is connected to 111, 001, and 010, obviously, and you can see it at a glance. It's even easy to compute all the paths between two vertices: the paths from 011 to 000 are 011–010–000 and 011–001–000; if you want to allow longer paths you can easily come up with 011–111–110–100–000 for example. And similarly, the Wumpus source code contains a table that records which caves are connected to which, and consults this table in many places. If the caves had been arranged in a cube, no table would have been required. Or if one was wanted, it could have been generated algorithmically. So I got to wondering last week if there was an analogous nomenclature for the vertices of a dodecahedron that would have obviated the Wumpus map and the table in the source code. I came up with a very clever proof that there was none, which would have been great, except that the proof also worked for the tetrahedron, and the tetrahedron does have such a convenient notation: you can name the vertices (0,0,0), (0,1,1), (1,0,1), and (1,1,0), where there must be an even number of 1 components. (I mentioned this yesterday in connection with something else and promised to come back to it. Here it is.) So the proof was wrong, which was good, and I kept thinking about it. The next-simplest case is the octahedron, and I racked my brains trying to come up with a convenient notation for the vertices that would allow one to see at a glance which were connected. When I finally found it, I felt like a complete dunce. The octahedron has six vertices, which are above, below, to the left of, to the right of, in front of, and behind the center. Their coordinates are therefore (1,0,0), (-1,0,0), (0,1,0), (0,-1,0), (0,0,1) and (0,0,-1). Two vertices are opposite when they have two components the same (necessarily both 0) and one different (necessarily negatives). Otherwise, they are connected by an edge. This is really simple stuff. Still no luck with the dodecahedron. There are nice canonical representations of the coordinates of the vertices—see the Wikipedia article, for example—but I still haven't looked at it closely enough to decide if there is a simple procedure for taking two vertices and determining their geometric relation at a glance. Obviously, you can check for adjacent vertices by calculating the distance between them and seeing if it's the correct value, but that's not "at a glance"; arithmetic is forbidden. It's easy to number the vertices in layers, say by calling the top five vertices A_{1} ... A_{5}, then the five below that B_{1} ... B_{5}, and so on. Then it's easy to see that A_{3} will be adjacent to A_{2}, A_{4}, and B_{3}, for example. But this nomenclature, unlike the good ones above, is not isometric: it has a preferred orientation of the dodecahedron. It's obvious that A_{1}, A_{2}, A_{3}, A_{4}, and A_{5} form a pentagonal face, but rather harder to see that A_{2}, A_{3}, B_{2}, B_{3}, and C_{5} do. With the cube, it's easy to see what a rotation or a reflection looks like. For example, rotation of 120° around an axis through a pair of vertices of the cube takes vertex (a, b, c) to (c, a, b); rotation of 90° around an axis through a face takes it to (1-b, a, c). Similarly, rotations and reflections of the tetrahedron correspond to simple permutations of the components of the vertices. Nothing like this exists for the A-B-C-D nomenclature for the dodecahedron. I'll post if I come up with anything nice.
[Other articles in category /math] permanent link Sun, 05 Aug 2007
The 123456 die
That got me thinking about the problem in general. For some reason I've been trying to construct a die whose faces come up with probabilities 1/21, 2/21, 3/21, 4/21, 5/21, and 6/21 respectively. Unless there is a clever insight I haven't had, I think this will be rather difficult to do explicitly. (Approximation methods will probably work fairly easily though, I think.) I started by trying to make a hexahedron with faces that had areas 1, 2, 3, 4, 5, 6, and even this has so far evaded me. This will not be sufficient to solve the problem, because the probability that the hexahedron will land on face F is not proportional to the area of F, but rather to the solid angle subtended by F from the hexahedron's center of gravity. Anyway, I got interested in the idea of making a hexahedron whose faces had areas 1..6. First I tried just taking a bunch of simple shapes (right triangles and the like) of the appropriate sizes and fitting them together geometrically; so far that hasn't worked. So then I thought maybe I could get what I wanted by taking a tetrahedron or a disphenoid or some such and truncating a couple of the corners. As Polya says, if you can't solve the problem, you should try solving a simpler problem of the same sort, so I decided to see if it was possible to take a regular tetrahedron and chop off one vertex so that the resulting pentahedron had faces with areas 1, 2, 3, 4, 5. The regular tetrahedron is quite tractable, geometrically, because you can put its vertices at (0,0,0), (0,1,1), (1,0,1), and (1,1,0), and then a plane that chops off the (0,0,0) vertex cuts the three apical edges at points (0,a,a), (b,0,b), and (c,c,0), for some 0 ≤ a, b, c ≤ 1. The chopped-off areas of the three faces are simply ab√3/4, bc√3/4, ca√3/4, and the un-chopped base has area √3/4, so if we want the three chopped faces to have areas of 2/5, 3/5 and 4/5 times √3/4, respectively, we must have ab = 3/5, bc = 2/5, and ca = 1/5, and we can solve for a, b, c. (We want the new top face to have area 1/5 · √3/4, but that will have to take care of itself, since it is also determined by a, b, and c.) Unfortunately, solving these equations gives b = √6/√5, which is geometrically impossible. We might fantasize that there might be some alternate solution, say with the three chopped faces having areas of 1/5, 2/5 and 4/5 times √3/4, and the top face being 3/5 · √3/4 instead of 1/5 · √3/4, but none of those will work either. Oh well, it was worth a shot. I do think it's interesting that if you know the areas of the bottom four faces of a truncated regular tetrahedron, that completely determines the apical face. Because you can solve for the lengths of the truncated apical edges, as above, and then that gives you the coordinates of the three apical vertices. I had a brief idea about truncating a square pyramid to get the hexahedron I wanted in the first place, but that's more difficult, because you can't just pick the lengths of the four apical edges any way you want; their upper endpoints must be coplanar. The (0,a,a), (b,0,b), (c,c,0) thing has been on my mind anyway, and I hope to write tomorrow's blog article about it. But I've decided that my articles are too long and too intermittent, and I'm going to try to post some shorter, more casual ones more frequently. I recently remembered that in the early days of the blog I made an effort to post every day, and I think I'd like to try to resume that. [ Addendum 20070905: There are some followup notes. ]
[Other articles in category /math] permanent link Wed, 01 Aug 2007
The snub disphenoid
Anyway, earlier this week I was visiting John Batzel, who works upstairs from me, and discovered that he had obtained a really cool toy. It was a collection of large steel ball bearings and colored magnetic rods, which could be assembled into various polyhedra and trusses. This is irresistible to me. The pictures at right, taken around 2002, show me modeling a dodecahedron with less suitable materials. The first thing I tried to make out of John's magnetic sticks and balls was a regular dodecahedron, because it is my favorite polyhedron. (Isn't it everyone's?) This was unsuccessful, because it wasn't rigid enough, and kept collapsing. It's possible that if I had gotten the whole thing together it would have been stable, but holding the 50 separate magnetic parts in the right place long enough to get it together was too taxing, so I tried putting together some other things. A pentagonal dipyramid worked out well, however. To understand this solid, imagine a regular pyramid, such as the kind that entombs the pharaohs or collects mystical energy. This sort of pyramid is known as a square pyramid, because it has a square base, and thus four triangular sides. Imagine that the base was instead a pentagon, so that there were five triangular sides sides instead of only four. Then it would be a pentagonal pyramid. Now take two such pentagonal pyramids and glue the pentagonal bases together. You now have a pentagonal dipyramid. The success of the pentagonal dipyramid gave me the idea that rigid triangular lattices were the way to go with this toy, so I built an octahedron (square dipyramid) and an icosahedron to be sure. Even the icosahedron (thirty sticks and twelve balls) held together and supported its own weight. So I had John bring up the Wikipedia article about deltahedra. A deltahedron is just a polyhedron whose faces are all equilateral triangles.
Four of the deltahedra are the tetrahedron (triangular pyramid, with 4 faces), triangular dipyramid (6 faces), octahedron (square dipyramid, 8 faces), and pentagonal dipyramid (10 faces).
Another deltahedron is the "gyroelongated square dipyramid". You get this by taking two square pyramids, as with the octahedron. But instead of gluing their square bases together directly, you splice a square antiprism in between. The two square faces of the antiprism are not aligned; they are turned at an angle of 45° to each other, so that when you are looking at the top pyramid face-on, you are looking at the bottom pyramid edge-on, and this is the "gyro" in "gyroelongated". (The icosahedron is a gyroelongated pentagonal dipyramid.) I made one of these in John's office, but found it rather straightforward. The last deltahedron, however, was quite a puzzle. Wikipedia calls it a "snub disphenoid", and as I mentioned before, the name did not help me out at all. It took me several tries to build it correctly. It contains 12 faces and 8 vertices. When I finally had the model I still couldn't figure it out, and spent quite a long time rotating it and examining it. It has a rather strange symmetry. It is front-back and left-right symmetric. And it is almost top-bottom symmetric: If you give it a vertical reflection, you get the same thing back, but rotated 90° around the vertical axis. When I planned this article I thought I understood it better. Imagine sticking together two equilateral triangles. Call the common edge the "rib". Fold the resulting rhombus along the rib so that the edges go up, down, up, down in a zigzag. Let's call the resulting shape a "wing"; it has a concave side and a convex side. Take two wings. Orient them with the concave sides facing each other, and with the ribs not parallel, but at right angles. So far, so good. But this is where I started to get it wrong. The two wings have between them eight edges, and I had imagined that you could glue a rhombic antiprism in between them. I'm not convinced that there is such a thing as a rhombic antiprism, but I'll have to do some arithmetic to be sure. Anyway, supposing that there were such a thing, you could glue it in as I said, but if you did the wings would flatten out and what you would get would not be a proper polyhedron because the two triangles in each wing would be coplanar, and polyhedra are not allowed to have abutting coplanar faces. (The putative gyroelongated triangular dipyramid fails for this reason, I believe.) To make the snub disphenoid, you do stick eight triangles in between the two wings, but the eight triangles do not form a rhombic antiprism. Even supposing that such a thing exists. I hope to have some nice renderings for you later. I have been doing some fun work in rendering semiregular polyhedra, and I am looking forward to discussing it here. Advance peek: suppose you know how the vertices are connected by edges. How do you figure out where the vertices are located in 3-space? If you would like to investigate this, the snub disphenoid has 8 vertices, which we can call A, B, ... H. Then:
Here is a list of the eight deltahedra, with links to the corresponding Wikipedia articles:
[ Addendum 20070908: More about deltahedra. ]
[Other articles in category /math] permanent link Thu, 19 Jul 2007
More about fixed points and attractors
I picked a few example functions, some of which worked and some of which didn't. One glaring omission from the article was that I forgot to mention the so-called "Babylonian method" for calculating square roots. The Babylonian method for calculating √n is simply to iterate the function x → ½(x + n/x). (This is a special case of the Newton-Raphson method for finding the zeroes of a function. In this case the function whose zeroes are being found is is x → x^{2} - n.) The Babylonian method converges quickly for almost all initial values of x. As I was writing the article, at 3 AM, I had the nagging feeling that I was leaving out an important example function, and then later on realized what it was. Oops. But there's a happy outcome, which is that the Babylonian method points the way to a nice general extension of this general technique. Suppose you've found a function f that has your target value, say √2, as a fixed point, but you find that iterating f doesn't work for some reason. For example, one of the functions I considered in the article was x → 2/x. No matter what initial value you start with (other than √2 and -√2) iterating the function gets you nowhere; the values just hop back and forth between x and 2/x forever. But as I said in the original article, functions that have √2 as a fixed point are easy to find. Suppose we have such a function, f, which is badly-behaved because the fixed point repels, or because of the hopping-back-and-forth problem. Then we can perturb the function by trying instead x → ½(x + f(x)), which has the same fixed points, but which might be better-behaved. (More generally, x → (ax + bf(x)) / (a + b) has the same fixed points as f for any nonzero a and b, but in this article we'll leave a = b = 1.) Applying this transformation to the function x → 2/x gives us the Babylonian method. I tried applying this transform to the other example I used in the original article, which was x → x^{2} + x - 2. This has √2 as a fixed point, but the √2 is a repelling fixed point. √2 ± ε → √2 ± (1 + 2√2)ε, so the error gets bigger instead of smaller. I hoped that perturbing this function might improve its behavior, and at first it seemed that it didn't. The transformed version is x → ½(x + x^{2} + x - 2) = x^{2}/2 + x - 1. That comes to pretty much the same thing. It takes √2 ± ε → √2 + (1 + √2)ε, which has the same problem. So that didn't work; oh well. But actually things had improved a bit. The original function also has -√2 as a fixed point, and again it's one that repels from both sides, because -√2 ± ε → -√2 ± (1 - 2√2)ε, and |1 - 2√2| > 1. But the transformed function, unlike the original, has -√2 as an attractor, since it takes -√2 ± ε → -√2 ± (1 - √2)ε and |1 - √2| < 1. So the perturbed function works for calculating √2, in a slightly backwards way; you pick a value close to -√2 and iterate the function, and the iterated values get increasingly close to -√2. Or you can get rid of the minus signs entirely by transforming the function again, and considering -f(-x) instead of f(x). This turns x^{2}/2 + x - 1 into -x^{2}/2 + x + 1. The fixed points change places, so now √2 is the attractor, and -√2 is the repeller, since √2 ± ε → √2 ± (1 - √2)ε. Starting with x = 1, we get:
I had meant to write about Möbius transformations, but that will have to wait until next week, I think.
[Other articles in category /math] permanent link Sat, 30 Jun 2007
How to calculate the square root of 2
I said that this formula comes from consideration of continued fractions. But I was thinking about it a little more, and I realized that there is a way to get such a recurrence for pretty much any algebraic constant you want. Consider for a while the squaring function s : x → x^{2}. This function has two obvious fixed points, namely 0 and 1, by which I mean that s(0) = 0 and s(1) = 1. Actually it has a third fixed point, ∞. If you consider the behavior on some x in the interval (0, 1), you see that s(x) is also in the same interval. But also, s(x) < x on this interval. Now consider what happens when you iterate s on this interval, calculating the sequence s(x), s(s(x)), and so on. The values must stay in (0, 1), but must always decrease, so that no matter what x you start with, the sequence converges to 0. We say that 0 is an "attracting" fixed point of s, because any starting value x, no matter how far from 0 it is (as long as it's still in (0, 1)), will eventually be attracted to 0. Similarly, 1 is a "repelling" fixed point, because any starting value of x, no matter how close to 1, will be repelled to 0. Consideration of the interval (1, ∞) is similar. 1 is a repeller and ∞ is an attractor. Fixed points are not always attractors or repellers. The function x → 1/x has fixed points at ±1, but these points are neither attractors nor repellers. Also, a fixed point might attract from one side and repel from the other. Consider x → x/(x+1). This has a fixed point at 0. It maps the interval (0, ∞) onto (0, 1), which is a contraction, so that 0 attracts values on the right. On the other hand, 0 repels values on the left, because 1/-n goes to 1/(-n+1). -1/4 goes to -1/3 goes to -1/2 goes to -1, at which point the whole thing blows up and goes to -∞. The idea about the fixed point attractors is suggestive. Suppose we were to pick a function f that had √2 as a fixed point. Then √2 might be an attractor, in which case iterating f will get us increasingly accurate approximations to √2. So we want to find some function f such that f(√2) = √2. Such functions are very easy to find! For example, take √2. square it, and divide by 2, and add 1, and take the square root, and you have √2 again. So x → √(1+x^{2}/2) is such a function. Or take √2. Take the reciprocal, double it, and you have √2 again. So x → 2/x is another such function. Or take √2. Add 1 and take the reciprocal. Then add 1 again, and you are back to √2. So x → 1 + 1/(x+1) is a function with √2 as a fixed point. Or we could look for functions of the form ax^{2} + bx + c. Suppose √2 were a fixed point of this function. Then we would have 2a + b√2 + c = √2. We would like a, b, and c to be simple, since the whole point of this exercise is to calculate √2 easily. So let's take a=b=1, c=-2. The function is now x → x^{2} + x - 2. Which one to pick? It's an embarrasment of riches. Let's start with the polynomial, x → x^{2} + x - 2. Well, unfortunately this is the wrong choice. √2 is a fixed point of this function, but repels on both sides: √2 ± ε → √2 ± ε(1 + 2√2), which is getting farther away. The inverse function of x → x^{2} + x - 2 will have √2 as an attractor on both sides, but it is not so convenient to deal with because it involves taking square roots. Still, it does work; if you iterate ½(-1 + √(9 + 4x)) you do get √2. Of the example functions I came up with, x → 2/x is pretty simple too, but again the fixed points are not attractors. Iterating the function for any initial value other than the fixed points just gets you in a cycle of length 2, bouncing from one side of √2 to the other forever, and not getting any closer. But the next function, x → 1 + 1/(x+1), is a winner. (0, ∞) is crushed into (1, 2), with √2 as the fixed point, so √2 attracts from both sides. Writing x as a/b, the function becomes a/b → 1 + 1/(a/b+1), or, simplifying, a/b → (a + 2b) / (a + b). This is exactly the recurrence I gave at the beginning of the article. We did get a little lucky, since the fixed point of interest, √2, was the attractor, and the other one, -√2, was the repeller. ((-∞, -1) is mapped onto (-∞, 1), with -√2 as the fixed point; -√2 repels on both sides.) But had it been the other way around we could have exchanged the behaviors of the two fixed points by considering -f(-x) instead. Another way to fix it is to change the attractive behavior into repelling behavior and vice versa by running the function backwards. When we tried this for x → x^{2} + x - 2 it was a pain because of the square roots. But the inverse of x → 1 + 1/(x+1) is simply x → (-x + 2) / (x - 1), which is no harder to deal with. The continued fraction stuff can come out of the recurrence, instead of the other way around. Let's iterate the function x → 1 + 1/(1+x) formally, repeatedly replacing x with 1 + 1/(1+x). We get:
1 + 1/(1+x)So we might expect the fixed point, if there is one, to be 1 + 1/(2 + 1/(2 + 1/(2 + ...))), if this makes sense. Not all such expressions do make sense, but this one is a continued fraction, and continued fractions always make sense. This one is eventually periodic, and a theorem says that such continued fractions always have values that are quadratic surds. The value of this one happens to be √2. I hope you are not too surprised. In the course of figuring all this out over the last two weeks or so, I investigated many fascinating sidetracks. The x → 1 + 1/(x+1) function is an example of a "Möbius transformation", which has a number of interesing properties that I will probably write about next month. Here's a foretaste: a Möbius transformation is simply a function x → (ax + b) / (cx + d) for some constants a, b, c, and d. If we agree to abbreviate this function as !!{ a\, b \choose c\,d}!!, then the inverse function is also a Möbius transformation, and is in fact !!{a\, b\choose c\,d}^{-1}!!. [ Addendum 20070719: There is a followup article to this one. ] [Other articles in category /math] permanent link Sun, 17 Jun 2007
Square triangular numbers
I no longer remember how I solved the problem the first time around, but I was tinkering around with it today and came up with an approach that I think is instructive, or at least interesting. We want to find non-negative integers a and b such that ½(a^{2} + a) = b^{2}. Or, equivalently, we want a and b such that √(a^{2} + a) = b√2. Now, √(a^{2} + a) is pretty nearly a + ½. So suppose we could find p and q with a + ½ = b·p/q, and p/q a bit larger than √2. a + ½ is a bit too large to be what we want on the left, but p/q is a bit larger than what we want on the right too. Perhaps the fudging on both sides would match up, and we would get √(a^{2} + a) = b√2 anyway. If this magic were somehow to occur, then a and b would be the numbers we wanted. Finding p/q that is a shade over √2 is a well-studied problem, and one of the things I have in my toolbox, because it seems to come up over and over in the solution of other problems, such as this one. It has interesting connections to several other parts of mathematics, and I have written about it here before. The theoretical part of finding p/q close to √2 is some thing about continued fractions that I don't want to get into today. But the practical part is very simple. The following recurrence generates all the best rational approximations to √2; the farther you carry it, the better the approximation:
Now, we want a + ½ = b·p/q, or equivalently (2a + 1)/2b = p/q. This means we can restrict our attention to the rows of the table that have q even. This is a good thing, because we need p/q a bit larger than √2, and those are precisely the rows with even q. The rows that have q odd have p/q a bit smaller than √2, which is not what we need. So everything is falling into place. Let's throw away the rows with q odd, put a = (p - 1)/2 and b = q/2, and see what we get:
I have two points to make about this. One is that I have complained in the past about mathematical pedagogy, how the convention is to come up with some magic-seeming guess ahead of time, as when pulling a rabbit from a hat, and then at the end it is revealed to be the right choice, but what really happened was that the author worked out the whole thing, then saw at the end what he would need at the beginning to make it all work, and went back and filled in the details. That is not what happened here. My apparent luck was real luck. I really didn't know how it was going to come out. I was really just exploring, trying to see if I could get some insight into the answer without necessarily getting all the way there; I thought I might need to go back and do a more careful analysis of the fudge factors, or something. But sometimes when you go exploring you stumble on the destination by accident, and that is what happened this time. The other point I want to make is that I've written before about how a mixture of equal parts of numerical sloppiness and algebraic tinkering, with a dash of canned theory, can produce useful results, in a sort of alchemical transmutation that turns base metals into gold, or at least silver. Here we see it happen again.
[Other articles in category /math] permanent link Wed, 13 Jun 2007
How to calculate binomial coefficients, again
A couple of people pointed out that, contrary to what I asserted, the algorithm I described can in fact overflow even when the final result is small enough to fit in a machine word. Consider for example. The algorithm, as I wrote it, calculates intermediate values 8, 8, 56, 28, 168, 56, 280, 70, and 70 is the final answer. If your computer has 7-bit machine integers, the answer (70) will fit, but the calculation will overflow along the way at the 168 and 280 steps. Perhaps more concretely, !!35\choose11!! is 417,225,900, which is small enough to fit in a 32-bit unsigned integer, but the algorithm I wrote wants to calculate this as !!35{34\choose10}\over11!!, and the numerator here is 4,589,484,900, which does not fit. One Reddit user suggested that you can get around this as follows: To multiply r by a/b, first check if b divides r. If so, calculate (r/b)·a; otherwise calculate (r·a)/b. This should avoid both overflow and fractions. Unfortunately, it does not. A simple example is !!{14\choose4} = {11\over1}{12\over2}{13\over3}{14\over4}!!. After the first three multiplications one has 286. One then wants to multiply by 14/4. 4 does not divide 286, so the suggestion calls for multiplying 286 by 14/4. But 14/4 is 3.5, a non-integer, and the goal was to use integer arithmetic throughout. Fortunately, this is not hard to fix. Say we want to multiply r by a/b without overflow or fractions. First let g be the greatest common divisor of r and b. Then calculate ((r/g) · a)/(b/g). In the example above, g is 2, and we calculate (286/2) · (14/2) = 143 · 7; this is the best we can do. I haven't looked, but it is hard to imagine that Volume II of Knuth doesn't discuss this in exhaustive detail, including all the stuff I just said, plus a bunch of considerations that hadn't occurred to any of us. A few people also pointed out that you can save time when n > m/2 by calculating !!m\choose m-n!! instead of . For example, instead of calculating !!100\choose98!!, calculate . I didn't mention this in the original article because it was irrelevant to the main point, and because I thought it was obvious.
[Other articles in category /math] permanent link Tue, 12 Jun 2007
How to calculate binomial coefficients
$${n\choose k} = {n!\over k!(n-k)!}$$ This is a fine definition, brief, closed-form, easy to prove theorems about. But these good qualities seduce people into using it for numerical calculations:
fact 0 = 1 fact (n+1) = (n+1) * fact n choose n k = (fact n) `div` ((fact k)*(fact (n-k)))(Is it considered bad form among Haskellites to use the n+k patterns? The Haskell Report is decidedly ambivalent about them.) Anyway, this is a quite terrible way to calculate binomial coefficients. Consider calculating !!100\choose 2!!, for example. The result is only 4950, but to get there the computer has to calculate 100! and 98! and then divide these two 150-digit numbers. This requires the use of bignums in languages that have bignums, and causes an arithmetic overflow in languages that don't. A straightforward implementation in C, for example, drops dead with an arithmetic exception; using doubles instead, it claims that the value of is -2147483648. This is all quite sad, since the correct answer is small enough to fit in a two-byte integer. Even in the best case, !!2n\choose n!!, the result is only on the order of 4^{n}, but the algorithm has to divide a numerator of about 4^{n}n^{2n} by a denominator of about n^{2n} to get it. A much better way to calculate values of is to use the following recurrence:
$${n+1\choose k+1} = {n+1\over k+1}{n\choose k}$$ This translates to code as follows:
choose n 0 = 1 choose 0 k = 0 choose (n+1) (k+1) = (choose n k) * (n+1) `div` (k+1)This calculates !!8\choose 4!! as !!{5\over1}{6\over2}{7\over3}{8\over4} !!. None of the intermediate results are larger than the final answer. An iterative version is also straightforward:
unsigned choose(unsigned n, unsigned k) { unsigned r = 1; unsigned d; if (k > n) return 0; for (d=1; d <= k; d++) { r *= n--; r /= d; } return r; }This is speedy, and it cannot cause an arithmetic overflow unless the final result is too large to be represented. It's important to multiply by the numerator before dividing by the denominator, since if you do this, all the partial results are integers and you don't have to deal with fractions or floating-point numbers or anything like that. I think I may have mentioned before how much I despise floating-point numbers. They are best avoided. I ran across this algorithm last year while I was reading the Lilavati, a treatise on arithmetic written about 850 years ago in India. The algorithm also appears in the article on "Algebra" from the first edition of the Encyclopaedia Britannica, published in 1768. So this algorithm is simple, ancient, efficient, and convenient. And the problems with the other algorithm are obvious, or should be. Why isn't this better known? [ Addendum 20070613: There is a followup article to this one. ] [Other articles in category /math] permanent link Fri, 08 Jun 2007
Counting transitive relations
A relation is transitive if, whenever it has both (a, b) and (b, c), it also has (a, c). For the last week I've been trying to find a good way to calculate the number of transitive relations on a set with three elements. There are 13 transitive relations on a set with 2 elements. This is easy to see. There are 16 relations in all. The only way a relation can fail to be transitive is to contain both (1, 2) and (2, 1). There are clearly four such relations. Of these four, the only one that is transitive has (1, 1) and (2, 2) also. Similarly it's quite easy to see that there are only 2 relations on a 1-element set, and both are transitive. There are 512 relations on a set with 3 elements. How many are transitive? It would be very easy to write a computer program to check them all and count the transitive ones. That is not what I am after here. In fact, it would also be easy to enumerate the transitive relations by hand; 512 is not too many. That is not what I am after either. I am trying to find some method or technique that scales reasonably well, well enough that I could apply it for larger n. No luck so far. Relations on 3-sets can fail to be transitive in all sorts of interesting ways. Say that a relation has the F_{abc} property if it contains (a,b) and (b,c) but not (a,c). Such a relation is intransitive. Now clearly there are 64 F_{abc} relations for each distinct choice of a, b, and c. But some of these properties overlap. For example, {(a,b), (b,c), (c,a)} has not only the F_{abc} property but also the F_{bca} and F_{cab} properties. Of the 64 relations with the F_{abc} property, 16 have the F_{bca} property also. 16 have the F_{aba} property. None have the F_{acb} property. There are 12 of these properties, and they overlap in a really complicated way. After a week I gave in and looked in the literature. I have a couple of papers in my bag I haven't read yet. But it seems that there is no simple solution, which is reassuring. One problem is that the number of relations on n elements grows very rapidly (it's 2^{n2}) and the number of transitive relations is a good-sized fraction of these.
[Other articles in category /math] permanent link Sun, 29 Apr 2007
Your age as a fraction, again
I discussed several methods of finding the answer, including a clever but difficult method that involved fiddling with continued fractions, and some dead-simple brute force methods that take nominally longer but are much easier to do. But a few days ago on IRC, a gentleman named Mauro Persano said he thought I could use the Stern-Brocot tree to solve the problem, and he was absolutely right. Application of a bit of clever theory sweeps away all the difficulties of the continued-fraction approach, leaving behind a solution that is clever and simple and fast. Here's the essence of it: We consider a list of intervals that covers all the positive rational numbers; initially, the list contains only the interval (0/1, 1/0). At each stage we divide each interval in the list in two, by chopping it at the simplest fraction it contains. To chop the interval (a/b, c/d), we split it into the two intervals (a/b, (a+c)/(b+d)), ((a+c)/(b+d)), c/d). The fraction (a+c)/(b+d) is called the mediant of a/b and c/d. It's not obvious that the mediant is always the simplest possible fraction in the interval, but it is true. So we start with the interval (0/1, 1/0), and in the first step we split it at (0+1)/(1+0) = 1/1. It is now two intervals, (0/1, 1/1) and (1/1, 1/0). At the next step, we split these two intervals at 1/2 and 2/1, respectively; the resulting four intervals are (0/1, 1/2), (1/2, 1/1), (1/1, 2/1), and (2/1, 1/0). We split these at 1/3, 2/3, 3/2, and 3/1. The process goes on from there:
Or, omitting the repeated items at each step:
If we disregard the two corners, 0/1 and 1/0, we can see from this diagram that the fractions naturally organize themselves into a tree. If a fraction is introduced at step N, then the interval it splits has exactly one endpoint that was introduced at step N-1, and this is its parent in the tree; conversely, a fraction introduced at step N is the parent of the two step-N+1 fractions that are introduced to split the two intervals of which it is an endpoint. This process has many important and interesting properties. The splitting process eventually lists every positive rational number exactly once, as a fraction in lowest terms. Every fraction is simpler than all of its descendants in the tree. And, perhaps most important, each time an interval is split, it is divided at the simplest fraction that the interval contains. ("Simplest" just means "has the smallest denominator".) This means that we can find the simplest fraction in some interval simply by doing binary tree search until we find a fraction in that interval. For example, Placido Polanco had a .368 batting average last season. What is the smallest number of at-bats he could have had? We are asking here for the denominator of the simplest fraction that lies in the interval [.3675, .3685).
Calculation of mediants is incredibly simple, even easier than adding fractions. Tree search is simple, just compare and then go left or right. Calculating whether a fraction is in an interval is simple too. Everything is simple simple simple. Our program wants to find the simplest fraction in some interval, say (L, R). To do this, it keeps track of l and r, initially 0/1 and 1/0, and repeatedly calculates the mediant m of l and r. If the mediant is in the target interval, the function is done. If the mediant is too small, set l = m and continue; if it is too large set r = m and continue:
# Find and return numerator and denominator of simplest fraction # in the range [$Ln/$Ld, $Rn/$Rd) # sub find_simplest_in { my ($Ln, $Ld, $Rn, $Rd) = @_; my ($ln, $ld) = (0, 1); my ($rn, $rd) = (1, 0); while (1) { my ($mn, $md) = ($ln + $rn, $ld + $rd); # print " $ln/$ld $mn/$md $rn/$rd\n"; if (isin($Ln, $Ld, $mn, $md, $Rn, $Rd)) { return ($mn, $md); } elsif (isless($mn, $md, $Ln, $Ld)) { ($ln, $ld) = ($mn, $md); } elsif (islessequal($Rn, $Rd, $mn, $md)) { ($rn, $rd) = ($mn, $md); } else { die; } } }(In this program, rn and rd are the numerator and the denominator of r.) The isin, isless, and islessequal functions are simple utilities for comparing fractions.
# Return true iff $an/$ad < $bn/$bd sub isless { my ($an, $ad, $bn, $bd) = @_; $an * $bd < $bn * $ad; } # Return true iff $an/$ad <= $bn/$bd sub islessequal { my ($an, $ad, $bn, $bd) = @_; $an * $bd <= $bn * $ad; } # Return true iff $bn/$bd is in [$an/$ad, $cn/$cd). sub isin { my ($an, $ad, $bn, $bd, $cn, $cd) = @_; islessequal($an, $ad, $bn, $bd) and isless($bn, $bd, $cn, $cd); }The asymmetry between isless and islessequal is because I want to deal with half-open intervals. Just add a trivial scaffold to run the main function and we are done:
#!/usr/bin/perl my $D = shift || 10; for my $N (0 .. $D-1) { my $Np1 = $N+1; my ($mn, $md) = find_simplest_in($N, $D, $Np1, $D); print "$N/$D - $Np1/$D : $mn/$md\n"; }Given the argument 10, the program produces this output:
0/10 - 1/10 : 1/11 1/10 - 2/10 : 1/6 2/10 - 3/10 : 1/4 3/10 - 4/10 : 1/3 4/10 - 5/10 : 2/5 5/10 - 6/10 : 1/2 6/10 - 7/10 : 2/3 7/10 - 8/10 : 3/4 8/10 - 9/10 : 4/5 9/10 - 10/10 : 9/10This says that the simplest fraction in the range [0/10, 1/10) is 1/11; the simplest fraction in the range [3/10, 4/10) is 1/3, and so forth. The simplest fractions that do not appear are 1/5, which is beaten out by the simpler 1/4 in the [2/10, 3/10) range, and 3/5, which is beaten out by 2/3 in the [6/10, 7/10) range. Unlike the programs from the previous article, this program is really fast, even in principle, even for very large arguments. The code is brief and simple. But we had to deploy some rather sophisticated number theory to get it. It's a nice reminder that the sawed-off shotgun doesn't always win. This is article #200 on my blog. Thanks for reading.
[Other articles in category /math] permanent link Sat, 21 Apr 2007
Degrees of algebraic numbers
For example, all rational numbers have degree 1, since the rational number a/b is a zero of the first-degree polynomial bx - a. √2 has degree 2, since it is a zero of x^{2} - 2, but (as the Greeks showed) not of any first-degree polynomial. It's often pretty easy to guess what degree some number has, just by looking at it. For example, the nth root of a prime number p has degree n. !!\sqrt{1 + \sqrt 2}!! has a square root of a square root, so it's fourth-degree number. If you write !!x = \sqrt{1 + \sqrt 2}!! then eliminate the square roots, you get x^{4} - 2x^{2} - 1, which is the 4th-degree polynomial satisfied by this 4th-degree number. But it's not always quite so simple. One day when I was in high school, I bumped into the fact that !!\sqrt{7 + 4 \sqrt 3}!!, which looks just like a 4th-degree number, is actually a 2nd-degree number. It's numerically equal to !!2 + \sqrt 3!!. At the time, I was totally boggled. I couldn't believe it at first, and I had to get out my calculator and calculate both values numerically to be sure I wasn't hallucinating. I was so sure that the nested square roots in would force it to be 4th-degree. If you eliminate the square roots, as in the other example, you get the 4th-degree polynomial x^{4} - 14x^{2} + 1, which is satisfied by . But unlike the previous 4th-degree polynomial, this one is reducible. It factors into (x^{2} + 4x + 1)(x^{2} - 4x + 1). Since is a zero of the polynomial, it must be a zero of one of the two factors, and so it is second-degree. (It is a zero of the second factor.) I don't know exactly why I was so stunned to discover this. Clearly, the square of any number of the form a + b√c is another number of the same form (namely (a^{2} + b^{2}c) + 2ab√c), so it must be the case that lots of a + b√c numbers must be squares of other such, and so that lots of !!\sqrt{a + b \sqrt c}!! numbers must be second-degree. I must have known this, or at least been capable of knowing it. Socrates says that the truth is within us, and we just don't know it yet; in this case that was certainly true. I think I was so attached to the idea that the nested square roots signified fourth-degreeness that I couldn't stop to realize that they don't always. In the years since, I came to realize that recognizing the degree of an algebraic number could be quite difficult. One method, of course, is the one I used above: eliminate the radical signs, and you have a polynomial; then factor the polynomial and find the irreducible factor of which the original number is a root. But in practice this can be very tricky, even before you get to the "factor the polynomial" stage. For example, let x = 2^{1/2} + 2^{1/3}. Now let's try to eliminate the radicals. Proceeding as before, we do x - 2^{1/3} = 2^{1/2} and then square both sides, getting x^{2} - 2·2^{1/3}x + 2^{2/3} = 2, and then it's not clear what to do next. So we try the other way, starting with x - 2^{1/2} = 2^{1/3} and then cube both sides, getting x^{3} - 3·2^{1/2}x^{2} + 6x - 2·2^{1/2} = 2. Then we move all the 2^{1/2} terms to the other side: x^{3} + 6x - 2 = (3x^{2} + 2)·2^{1/2}. Now squaring both sides eliminates the last radical, giving us x^{6} + 12x^{4} - 4x^{3} + 36x^{2} - 24x + 4 = 18x^{4} + 12x^{2} + 8. Collecting the terms, we see that 2^{1/2} + 2^{1/3} is a root of x^{6} - 6x^{4} - 4x^{3} + 12x^{2} - 24x - 4. Now we need to make sure that this polynomial is irreducible. Ouch. In the course of writing this article, though, I found a much better method. I'll work a simpler example first, √2 + √3. The radical-eliminating method would have us put x - √2 = √3, then x^{2} - 2√2x + 2 = 3, then x^{2} - 1 = 2√2x, then x^{4} - 2x^{2} + 1 = 8x^{2}, so √2 + √3 is a root of x^{4} - 10x^{2} + 1. The new improved method goes like this. Let x = √2 + √3. Now calculate powers of x:
That's a lot of calculating, but it's totally mechanical. All of the powers of x have the form a_{6}√6 + a_{2}√2 + a_{3}√3 + a_{1}. This is easy to see if you write p for √2 and q for √3. Then x = p + q and powers of x are polynomials in p and q. But any time you have p^{2} you replace it with 2, and any time you have q^{2} you replace it with 3, so your polynomials never have any terms in them other than 1, p, q, and pq. This means that you can think of the powers of x as being vectors in a 4-dimensional vector space whose canonical basis is {1, √2, √3, √6}. Any four vectors in this space, such as {1, x, x^{2}, x^{3}}, are either linearly independent, and so can be combined to total up to any other vector, such as x^{4}, or else they are linearly dependent and three of them can be combined to make the fourth. In the former case, we have found a fourth-degree polynomial of which x is a root, and proved that there is no simpler such polynomial; in the latter case, we've found a simpler polynomial of which x is a root. To complete the example above, it is evident that {1, x, x^{2}, x^{3}} are linearly independent, but if you don't believe it you can use any of the usual mechanical tests. This proves that √2 + √3 has degree 4, and not less. Because if √2 + √3 were of degree 2 (say) then we would be able to find a, b, c such that ax^{2} + bx + c = 0, and then the x^{2}, x^{1}, and x^{0} vectors would be dependent. But they aren't, so we can't, so it isn't. Instead, there must be a, b, c, and d such that x^{4} = ax^{3} + bx^{2} + cx + d. To find these we need merely solve a system of four simultaneous equations, one for each column in the table:
And we immediately get a=0, b=10, c=0, d=-1, so x^{4} = 10x^{2} - 1, and our polynomial is x^{4} - 10x^{2} + 1, as before. Yesterday's draft of this article said: I think [2^{1/2} + 2^{1/3}] turns out to be degree 6, but if you try to work it out in the straightforward way, by equating it to x and then trying to get rid of the roots, you get a big mess. I think it turns out that if two numbers have degrees a and b, then their sum has degree at most ab, but I wouldn't even want to swear to that without thinking it over real carefully.Happily, I'm now sure about all of this. I can work through the mechanical method on it. Putting x = 2^{1/2} + 2^{1/3}, we get:
Where the vector [a, b, c, d, e, f] is really shorthand for a2^{1/2}·2^{2/3} + b2^{2/3} + c2^{1/2}·2^{1/3} + d2^{1/3} + e2^{1/2} + f. x^{0}...x^{5} turn out to be linearly independent, almost by inspection, so 2^{1/2} + 2^{1/3} has degree 6. To express x^{6} as a linear combination of x^{0}...x^{5}, we set up the following equations:
Solving these gives [a, b, c, d, e, f]= [0, 6, 4, -12, 24, 4], so x^{6} = 6x^{4} + 4x^{3} - 12x^{2} + 24x + 4, and 2^{1/2} + 2^{1/3} is a root of x^{6} - 6x^{4} - 4x^{3} + 12x^{2} - 24x - 4, which is irreducible. And similarly, using this method, one can calculate in a few minutes that 2^{1/2} + 2^{1/4} has degree 4 and is a root of x^{4} - 4x^{2} - 8x + 2. I wish I had figured this out in high school; it would have delighted me.
[Other articles in category /math] permanent link Thu, 22 Mar 2007
Symmetric functions
The most difficult question on the Algebra III exam presented the examinee with some intractable third degree polynomial—say x^{3} + 4x^{2} - 2x + 6—and asked for the sum of the cubes of its roots. You might like to match your wits against the Algebra III students before reading the solution below. In the three summers I taught, only about two students were able to solve this problem, which is rather tricky. Usually they would start by trying to find the roots. This is doomed, because the Algebra III course only covers how to find the roots when they are rational, and the roots here are totally bizarre. Even clever students didn't solve the problem, which required several inspired tactics. First you must decide to let the roots be p, q, and r, and, using Descartes' theorem, say that
x^{3} + bx^{2} + cx + d = (x - p)(x - q)(x - r) This isn't a hard thing to do, and a lot of the kids probably did try it, but it's not immediately clear what the point is, or that it will get you anywhere useful, so I think a lot of them never took it any farther.But expanding the right-hand side of the equation above yields:
x^{3} + bx^{2} + cx + d = x^{3} - (p + q + r)x^{2} + (pq + pr + qr)x - pqr And so, equating coefficients, you have:
Or, to take an example that we can actually check, consider x^{3} - 6x^{2} + 11x - 6, whose roots are 1, 2, and 3. The sum of the cubes is 1 + 8 + 27 = 36, and indeed -b^{3} + 3bc - 3d = 6^{3} + 3·(-6)·11 + 18 = 216 - 198 + 18 = 36. This was a lot of algebra III, but once you have seen this example, it's not hard to solve a lot of similar problems. For instance, what is the sum of the squares of the roots of x^{2} + bx + c? Well, proceeding as before, we let the roots be p and q, so x^{2} + bx + c = (x - p)(x - q) = x^{2} - (p + q)x + pq, so that b = -(p + q) and c = pq. Then b^{2} = p^{2} + 2pq+ q^{2}, and b^{2} - 2c = p^{2} + q^{2}. In general, if F is any symmetric function of the roots of a polynomial, then F can be calculated from the coefficients of the polynomial without too much difficulty. Anyway, I was tinkering around with this at breakfast a couple of days ago, and I got to thinking about b^{2} - 2c = p^{2} + q^{2}. If roots p and q are both integers, then b^{2} - 2c is the sum of two squares. (The sum-of-two-squares theorem is one of my favorites.) And the roots are integers only when the discriminant of the original polynomial is itself a square. But the discriminant in this case is b^{2} - 4c. So we have the somewhat odd-seeming statement that when b^{2} - 4c is a square, then b^{2} - 2c is a sum of two squares. I found this surprising because it seemed so underconstrained: it says that you can add some random even number to a fairly large class of squares and the result must be a sum of two squares, even if the even number you added wasn't a square itself. But after I tried a few examples to convince myself I hadn't made a mistake, I was sure there had to be a very simple, direct way to get to the same place. It took some fiddling, but eventually I did find it. Say that b^{2} - 4c = a^{2}. Then b and a must have the same parity, so p = (b + a)/2 is an integer, and we can write b = p + q and a = p - q where p and q are both integers. Then c = (b^{2} - a^{2})/4 is just pq, and b^{2} - 2c = p^{2} + q^{2}. So that's where that comes from. It seems like there ought to be an interesting relationship between the symmetric functions of roots of a polynomial and their expression in terms of the coefficients of the polynomial. The symmetric functions of degree N are all linear combinations of a finite set of symmetric functions. For example, any second-degree symmetric function of two variables has the form a(p^{2} + q^{2}) + 2bpq. We can denote these basic symmetric functions of two variables as F_{i,j}(p, q) = Σp^{i}q^{j}. Then we have identities like (F_{1,0})^{2} = F_{2,0} + F_{1,1} and (F_{1,0})^{3} = F_{3,0} + 3F_{2,1}. Maybe I'll do an article about this in a week or two.
[Other articles in category /math] permanent link Mon, 19 Mar 2007
Your age as a fraction
However, these reports are not quite accurate. On January 2, 1973, exactly 3 years and 9 months from your birthday, you would be 1,371 days old, or 3 years plus 275 days. 275/365 = 0.7534. On January 1, you were only 3 + 274/365 days old, which is 3.7507 years, and so January 1 is the day on which you should have been allowed to start reporting your age as "three and three quarters". This slippage between days and months occurs in the other direction as well, so there may be kids wandering around declaring themselves as "three and a half" a full day before they actually reach that age. Clearly this is one of the major problems facing our society, so I wanted to make up a table showing, for each number of days d from 1 to 365, what is the simplest fraction a/b such that when it is d days after your birthday, you are (some whole number and) a/b years. That is, I wanted a/b such that d/365 ≤ a/b < (d+1)/365. Then, by consulting the table each day, anyone could find out what new fraction they might have qualified for, and, if they preferred the new fraction to the old, they might start reporting their age with that fraction. There is a well-developed branch of mathematics that deals with this problem. To find simple fractions that approximate any given rational number, or lie in any range, we first expand the bounds of the range in continued fraction form. For example, suppose it has been 208 days since your birthday. Then today your age will range from y plus 208/365 years up to y plus 209/365 years. Then we expand 208/365 and 209/365 as continued fractions:
208/365 = [0; 1, 1, 3, 12, 1, 3]Where [0; 1, 1, 3, 12, 1, 3] is an abbreviation for the typographically horrendous expression:
$$ 0 + {1\over \displaystyle 1 + {\strut 1\over\displaystyle 1 + {\strut 1\over\displaystyle 3 + {\strut 1\over\displaystyle 12 + {\strut 1\over\displaystyle 1 + {\strut 1\over\displaystyle 3 }}}}}}$$ And similarly the other one. (Oh, the suffering!)Then you need to find a continued fraction that lies numerically in between these two but is as short as possible. (Shortness of continued fractions corresponds directly to simplicity of the rational numbers they represent.) To do this, take the common initial segment, which is [0; 1, 1], and then apply an appropriate rule for the next place, which depends on whether the numbers in the next place differ by 1 or by more than 1, whether the first difference occurs in an even position or an odd one, mumble mumble mumble; in this case the rules say we should append 3. The result is [0; 1, 1, 3], or, in conventional notation:
$$ 0 + {1\over \displaystyle 1 + {\strut 1\over\displaystyle 1 + {\strut 1\over\displaystyle 3 }}} $$ which is equal to 4/7. And indeed, 4/7 of a year is 208.57 days, so sometime on the 208th day of the year, you can start reporting your age as (y and) 4/7 years.Since I already had a library for calculating with continued fractions, I started extending it with functions to handle this problem, to apply all the fussy little rules for truncating the continued fraction in the right place, and so on. Then I came to my senses, and realized there was a better way, at least for the cases I wanted to calculate. Given d, we want to find the simplest fraction a/b such that d/365 ≤ a/b < (d+1)/365. Equivalently, we want the smallest integer b such that there is some integer a with db/365 ≤ a < (d+1)b/365. But b must be in the range (2 .. 365), so we can easily calculate this just by trying every possible value of b, from 2 on up:
use POSIX 'ceil', 'floor'; sub approx_frac { my ($n, $d) = @_; for my $b (1 .. $d) { my ($lb, $ub) = ($n*$b/$d, ($n+1)*$b/$d); if (ceil($lb) < ceil($ub) && ceil($ub) > $ub) { return (int($ub), $b); } } return ($n, $d); }The fussing with ceil() in the main test is to make the ranges open on the upper end: 2/5 is not in the range [3/10, 4/10), but it is in the range [4/10, 5/10). Then we can embed this in a simple report-printing program:
my $N = shift || 365; for my $i (1..($N-1)) { my ($a, $b) = approx_frac($i, $N); print "$i/$N: $a/$b\n"; }For tenths, the simplest fractions are:
This works fine, and it is a heck of a lot simpler than all the continued fraction stuff. The more so because the continued fraction library is written in C. For the application at hand, an alternative algorithm is to go through all fractions, starting with the simplest, placing each one into the appropriate d/365 slot, unless that slot is already filled by a simpler fraction:
my $N = shift || 365; my $unfilled = $N; DEN: for my $d (2 .. $N) { for my $n (1 .. $d-1) { my $a = int($n * $N / $d); unless (defined $simple[$a]) { $simple[$a] = [$n, $d]; last DEN if --$unfilled == 0; } } } for (1 .. $N-1) { print "$_/$N: $simple[$_][0]/$simple[$_][1]\n"; }A while back I wrote an article about using the sawed-off shotgun approach instead of the subtle technique approach. This is another case where the simple algorithm wins big. It is an n^{2} algorithm, whereas I think the continued fraction one is n log n in the worst case. But unless you're preparing enormous tables, it really doesn't matter much. And the proportionality constant on the O() is surely a lot smaller for the simple algorithms. (It might also be that you could optimize the algorithms to go faster: you can skip the body of the loop in the slot-filling algorithm whenever $n and $d have a common factor, which means you are executing the body only n log n times. But testing for common factors takes time too...) I was going to paste in a bunch of tabulations, but once again I remembered that it makes more sense to just let you run the program for yourself. Here is a form that will generate the table for all the fractions 1/N .. (N-1)/N; use N=365 to generate a table of year fractions for common years, and N=366 to generate the table for leap years:
[ Addendum 20070429: There is a followup to this article. ]
[Other articles in category /math] permanent link Fri, 09 Mar 2007
Bernoulli processes
Well,it depends how you count. Are there three possibilities or five?
This distribution is depicted in the graph at right. Individually, (3, 1) and (1, 3) are less likely than (2, 2). But "three-and-one" includes both (1, 3) and (3, 1), whereas "two-and-two" includes only (2, 2). So if you group outcomes into three categories, as in the green division above left, "three-and-one" comes out more frequent overall than "two-and-two":
This is true in general. Suppose someone has 1,000 kids. What's the most likely distribution of sexes? It's 500 boys and 500 girls, which I've been writing (500, 500). This is more likely than either (499, 501) or (501, 499). But if you consider "Equal numbers" versus "501-to-499", which I've been writing as [500, 500] and [501, 499], then [501, 499] wins:
Why is this? [4, 3, 3, 3] covers the four most frequent distributions: (4, 3, 3, 3), (3, 4, 3, 3), (3, 3, 4, 3), and (3, 3, 3, 4). But [4, 4, 3, 2] covers twelve quite frequent distributions: (4, 4, 3, 2), (4, 3, 2, 4), and so on. Even though the individual distributions aren't as common as (4, 4, 4, 3), there are twelve of them instead of 4. This gives [4, 4, 3, 2] the edge. [5, 4, 3, 1] includes 24 distributions, and ends up tied for second place. A complete table is in the sidebar at left. (For 5-card poker hands, the situation is much simpler. [2, 2, 1, 0] is most common, followed by [2, 1, 1, 1] and [3, 1, 1, 0] (tied), then [3, 2, 0, 0], [4, 1, 0, 0], and [5, 0, 0, 0].) This same issue arose in my recent article on Yahtzee roll probabilities. There we had six "suits", which represented the six possible rolls of a die, and I asked how frequent each distribution of "suits" was when five dice were rolled. For distribution [p_{1}, p_{2}, ...], we let n_{i} be the number of p's that are equal to i. Then the expression for probability of the distribution has a factor of in the denominator, with the result that distributions with a lot of equal-sized parts tend to appear less frequently than you might otherwise expect. I'm not sure how I got so deep into this end of the subject, since I didn't really want to compare complex distributions to each other so much as to compare simple distributions under different conditions. I had originally planned to discuss the World Series, which is a best-four-of-seven series of baseball games that we play here in the U.S. and sometimes in that other country to the north. Sometimes one team wins four games in a row ("sweeps"); other times the Series runs the full seven games. You might expect that even splits would tend to occur when the two teams playing were evenly matched, but that when one team was much better than the other, the outcome would be more likely to be a sweep. Indeed, this is generally so. The chart below graphs the possible outcomes. The x-axis represents the probability of the Philadelphia Phillies winning any individual game. The y-axis is the probability that the Phillies win the entire series (red line), which in turn is the sum of four possible events: the Phillies win in 4 games (green), in 5 games (dark blue), in 6 games (light blue), or in 7 games (magenta). The probabilities of the Nameless Opponents winning are not shown, because they are exactly the opposite. (That is, you just flip the whole chart horizontally.) (The Opponents are a semi-professional team that hails from Nameless, Tennessee.) Clearly, the Phillies have a greater-than-even chance of winning the Series if and only if they have a greater-than-even chance of winning each game. If they are playing a better team, they are likely to lose, but if they do win they are most likely to do so in 6 or 7 games. A sweep is the most likely outcome only if the Opponents are seriously overmatched, and have a less than 25% chance of winning each game. (The lines for the 4-a outcome and the 4-b outcome cross at 1-(p_{a} / p_{b})^{1/(b-a)}, where p_{i} is 1, 4, 10, 20 for i = 0, 1, 2, 3.) If we consider just the first four games of the World Series, there are five possible outcomes, ranging from a Phillies sweep, through a two-and-two split, to an Opponents sweep. Let p be the probability of the Phillies winning any single game. As p increases, so does the likelihood of a Phillies sweep. The chart below plots the likelihood of each of the five possible outcomes, for various values of p, charted here on the horizontal axis: The leftmost red curve is the probability of an Opponents sweep; the red curve on the right is the probability of a Phillies sweep. The green curves are the probabilities of 3-1 outcomes favoring the Opponents and the Phillies, respectively, with the Phillies on the right as before. The middle curve, in dark blue, is the probability of a 2-2 split. When is the 2-2 split the most likely outcome? Only when the Phillies and the Opponents are approximately evenly matched, with neither team no more than 60% likely to win any game. But just as with the sexes of the four kids, we get a different result if we consider the outcomes that don't distinguish the teams. For the first four games of the World Series, there are only three outcomes: a sweep (which we've been writing [4, 0]), a [3, 1] split, and a [2, 2] split: Here the green lines in the earlier chart have merged into a single outcome; similarly the red lines have merged. As you can see from the new chart, there is no pair of teams for which a [2, 2] split predominates; the even split is buried. When one team is grossly overmatched, winning less than about 19% of its games, a sweep is the most likely outcome; otherwise, a [3, 1] split is most likely. Here are the corresponding charts for series of various lengths.
I have no particular conclusion to announce about this; I just thought that the charts looked cool. Coming later, maybe: reasoning backwards: if the Phillies sweep the World Series, what can we conclude about the likelihood that they are a much better team than the Opponents? (My suspicion is that you can conclude a lot more by looking at the runs scored and runs allowed totals.) (Incidentally, baseball players get a share of the ticket money for World Series games, but only for the first four games. Otherwise, they could have an an incentive to prolong the series by playing less well than they could, which is counter to the ideals of sport. I find this sort of rule, which is designed to prevent conflicts of interest, deeply satisfying.)
[Other articles in category /math] permanent link Mon, 05 Mar 2007
An integer partition puzzle
Here's one interesting fact: it's quite easy to calculate the number of partitions of N. Let P(n, k) be the number of partitions of n into parts that are at least k. Then it's easy to see that:
$$P(n, k) = \sum_{i=k}^{n-1} P(n-i, k)$$ And there are simple boundary conditions: P(n, n) = 1; P(n, k) = 0 when k > n, and so forth. And P(n), the number of partitions of n into parts of any size, is just P(n, 1). So a program to calculate P(n) is very simple:
my @P; sub P { my ($n, $k) = @_; return 0 if $n < 0; return 1 if $n == 0; return 0 if $k > $n; my $r = $P[$n] ||= []; return $r->[$k] if defined $r->[$k]; return $r->[$k] = P($n-$k, $k) + P($n, $k+1); } sub part { P($_[0], 1); } for (1..100) { printf "%3d %10d\n", $_, part($_); }I had a funny conversation once with someone who ought to have known better: I remarked that it was easy to calculate P(n), and disagreed with me, asking why Rademacher's closed-form expression for P(n) had been such a breakthrough. But the two properties are independent; the same is true for lots of stuff. Just because you can calculate something doesn't mean you understand it. Calculating ζ(2) is quick and easy, but it was a major breakthrough when Euler discovered that it was equal to π^{2}/6. Calculating ζ(3) is even quicker and easier, but nobody has any idea what the value represents. Similarly, P(n) is easy to calculate, but harder to understand. Ramanujan observed, and proved, that P(5k+4) is always a multiple of 5, which had somehow escaped everyone's notice until then. And there are a couple of other similar identities which were proved later: P(7k+5) is always a multiple of 7; P(11k+6) is always a multiple of 11. Based on that information, any idiot could conjecture that P(13k+7) would always be a multiple of 13; this conjecture is wrong. (P(7) = 15.) Anyway, all that is just leading up to the real point of this note, which is that I was tabulating the number of partitions of n into exactly k parts, which is also quite easy. Let's call this Q(n, k). And I discovered that Q(13, 4) = Q(13, 5). There are 18 ways to divide a pile of 13 beans into 4 piles, and also 18 ways to divide the beans into 5 piles.
So far, I haven't turned anything up; it seems to be a coincidence. A simpler problem of the same type is that Q(8, 3) = Q(8, 4); that seems to be a coincidence too:
Oh well, sometimes these things don't work out the way you'd like.
[Other articles in category /math] permanent link Wed, 21 Feb 2007
A polynomial trivium
$${9\over 8}x^4 - {45\over 4}x^3 + 39{3\over8}x^2 - 54{1\over4}x + 27$$ The property this polynomial was designed to have is this: at x = 1, 2, 3, 4, it takes the values 2, 4, 6, 8. But at x=5 it gives not 10 but 37.
[Other articles in category /math] permanent link
Addenda to Apostol's proof that sqrt(2) is irrational
[Other articles in category /math] permanent link Mon, 19 Feb 2007
A new proof that the square root of 2 is irrational
In short, if √2 were rational, we could construct an isosceles right triangle with integer sides. Given one such triangle, it is possible to construct another that is smaller. Repeating the construction, we could construct arbitrarily small integer triangles. But this is impossible since there is a lower limit on how small a triangle can be and still have integer sides. Therefore no such triangle could exist in the first place, and √2 is irrational. In hideous detail: Suppose that √2 is rational. Then by scaling up the isosceles right triangle with sides 1, 1, and √2 appropriately, we obtain the smallest possible isosceles right triangle whose sides are all integers. (If √2 = a/b, where a/b is in lowest terms, then the desired triangle has legs with length b and hypotenuse a.) This is ΔOAB in the diagram below: By hypothesis, OA, OB, and AB are all integers. Now construct arc BC, whose center is at A. AC and AB are radii of the same circle, so AC = AB, and thus AC is an integer. Since OC = OA - CA, OC is also an integer. Let CD be the perpendicular to OA at point C. Then ΔOCD is also an isosceles right triangle, so OC = CD, and CD is an integer. CD and BD are tangents to the same arc from the same point D, so CD = BD, and BD is an integer. Since OB and BD are both integers, so is OD. Since OC, CD, and OD are all integers, ΔOCD is another isosceles right triangle with integer sides, which contradicts the assumption that OAB was the smallest such. The thing I find amazing about this proof is not just how simple it is, but how strongly geometric. The Greeks proved that √2 was irrational a long time ago, with an argument that was essentially arithmetical. The Greeks being who they were, their essentially arithmetical argument was phrased in terms of geometry, with all the numbers and arithmetic represented by operations on line segments. The Tom Apostol proof is much more in the style of the Greeks than is the one that the Greeks actually found! [ 20070220: There is a short followup to this article. ]
[Other articles in category /math] permanent link Fri, 16 Feb 2007
Yahtzee probability
A related problem is to calculate the probability of certain poker hands. Early in the history of poker, rules varied about whether a straight beat a flush; players weren't sure which was more common. Eventually it was established that straights were more common than flushes. This problem is complicated by the fact that the deck contains a finite number of each card. With cards, drawing a 6 reduces the likelihood of drawing another 6; this is not true when you roll a 6 at dice. With three dice, it's quite easy to calculate the likelihood of rolling various patterns:
A high school student would have no trouble with this. For pattern AAA, there are clearly only six possibilities. For pattern AAB, there are 6 choices for what A represents, times 5 choices for what B represents, times 3 choices for which die is B; this makes 90. For pattern ABC, there are 6 choices for what A represents times 5 choices for what B represents times 4 choices for what C represents; this makes 120. Then you check by adding up 6+90+120 to make sure you get 6^{3} = 216. It is perhaps a bit surprising that the majority of rolls of three dice have all three dice different. Then again, maybe not. In elementary school I was able to amaze some of my classmates by demonstrating that I could flip three coins and get a two-and-one pattern most of the time. Anyway, it should be clear that as the number of dice increases, the chance of them all showing all different numbers decreases, until it hits 0 for more than 6 dice. The three-die case is unusually simple. Let's try four dice:
There are obviously 6 ways to throw the pattern AAAA. For pattern AAAB there are 6 choices for A × 5 choices for B × 4 choices for which die is the B = 120. So far this is no different from the three-die case. But AABB has an added complication, so let's analyze AAAA and AAAB a little more carefully. First, we count the number of ways of assigning numbers of pips on the dice to symbols A, B, and so on. Then we count the number of ways of assigning the symbols to actual dice. The total is the product of these. For AAAA there are 6 ways of assigning some number of pips to A, and then one way of assigning A's to all four dice. For AAAB there are 6×5 ways of assigning pips to symbols A and B, and then four ways of assigning A's and B's to the dice, namely AAAB, AABA, ABAA, and BAAA. With that in mind, let's look at AABB and AABC. For AABB, There are 6 choices for A and 5 for B, as before. And there are !!4\choose2!! = 6 choices for which dice are A and which are B. This would give 6·5·6 = 180 total. But of the 6 assignments of A's and B's to the dice, half are redundant. Assignments AABB and BBAA, for example, are completely equivalent. Taking A=2 B=4 with pattern AABB yields the same die roll as A=4 B=2 with pattern BBAA. So we have double-counted everything, and the actual total is only 90, not 180. Similarly, for AABC, we get 6 choices for A × 5 choices for B × 4 choices for C = 120. And then there seem to be 12 ways of assigning dice to symbols:
But no, actually there are only 6, because B and C are entirely equivalent, and so the patterns in the left column cover all the situations covered by the ones in the right column. The total is not 120×12 but only 120×6 = 720. Then similarly for ABCD we have 6×5×4×3 = 360 ways of assigning pips to the symbols, and 24 ways of assigning the symbols to the dice, but all 24 ways are equivalent, so it's really only 1 way of assigning the symbols to the dice, and the total is 360. The check step asks if 6 + 120 + 90 + 720 + 360 = 6^{4} = 1296, which it does, so that is all right. Before tackling five dice, let's try to generalize. Suppose the we have N dice and the pattern has k ≤ N distinct symbols which occur (respectively) p_{1}, p_{2}, ... p_{k} times each. There are !!{6\choose k}k!!! ways to assign the pips to the symbols. (Note for non-mathematicians: when k > 6, !!{6\choose k}!! is zero.) Then there are !!N\choose p_1 p_2 \ldots p_k!! ways to assign the symbols to the dice, where denotes the so-called multinomial coefficient, equal to !!{N!\over p_1!p_2!\ldots p_k!}!!. But some of those p_{i} might be equal, as with AABB, where p_{1} = p_{2} = 2, or with AABC, where p_{2} = p_{3} = 1. In such cases case some of the assignments are redundant. So rather than dealing with the p_{i} directly, it's convenient to aggregate them into groups of equal numbers. Let's say that n_{i} counts the number of p's that are equal to i. Then instead of having p_{i} = (3, 1, 1, 1, 1) for AAABCDE, we have n_{i} = (4, 0, 1) because there are 4 symbols that appear once, none that appear twice, and one ("A") that appears three times. We can re-express in terms of the n_{i}: $$N!\over {1!}^{n_1}{2!}^{n_2}\ldots{k}!^{n_k}$$ And the reduced contribution from equivalent patterns is easy to express too; we need to divide by !!\prod {n_i}!!!. So we can write the total as:
$$ {6\choose k}k! {N!\over \prod {i!}^{n_i}{n_i}!} \qquad \text{where $k = \sum n_i$} $$ Note that k, the number of distinct symbols, is merely the sum of the n_{i}.To get the probability, we just divide by 6^{N}. Let's see how that pans out for the Yahtzee example, which is the N=5 case:
6 + 150 + 300 + 1,200 + 1,800 + 3,600 + 720 = 7,776, so this checks out. The table is actually not quite right for Yahtzee, which also recognizes "large straight" (12345 or 23456) and "small straight" (1234X, 2345X, or 3456X.) I will continue to disregard this. The most common Yahtzee throw is one pair, by a large margin. (Any Yahtzee player could have told you that.) And here's a curiosity: a full house (AAABB), which scores 25 points, occurs twice as often as four of a kind (AAAAB), which scores at most 29 points and usually less. The key item in the formula is the factor of !!{N!\over \prod {i!}^{n_i}{n_i}!}!! on the right. This was on my mind because of the article I wrote a couple of days ago about counting permutations by cycle class. The key formula in that article was: which has a very similar key item. The major difference is that instead of i!^{ni} we have i^{pi}. The common term arises because both formulas are intimately concerned with the partition structure of the things being counted. I should really go back and reread the stuff in Concrete Mathematics about the Stirling numbers of the first kind, which count the number of partitions of various sizes, but maybe that's a project for next week. Anyway, I digress. We can generalize the formula above to work for S-sided dice; this is a simple matter of replacing the 6 with an S. We don't even need to recalculate the n_{i}. And since the key factor of does not involve S, we can easily precalculate it for some pattern and then plug it into the rest of the formula to get the likelihood of rolling that pattern with different kinds of dice. For example, consider the two-pairs pattern AABBC. This pattern has n_{1} = 1, n_{2} = 2, so the key factor comes out to be 15. Plugging this into the rest of the formula, we see that the probability of rolling AABBC with five S-sided dice is !!90 {S \choose 3} S^{-5}!!. Here is a tabulation:
The graph is quite typical, and each pattern has its own favorite kind of dice. Here's the corresponding graph and table for rolling the AABBCDEF pattern on eight dice:
Returning to the discussion of poker hands, we might ask what the ranking of poker hands whould be, on the planet where a poker hand contains six cards instead of five. Does four of a kind beat three pair? Using the methods in this article, we can get a quick approximation. It will be something like this:
I was going to end the article with tabulations of the number of different ways to roll each possible pattern, and the probabilities of getting them, but then I came to my senses. Instead of my running the program and pasting in the voluminous output, why not just let you run the program yourself, if you care to see the answers?
[Other articles in category /math] permanent link Tue, 13 Feb 2007
Cycle classes of permutations
This also makes it a good way to pass the time on trains and in boring meetings. I've written before about the time-consuming math problems I use to pass time on trains. Today's article is about another entertainment I've been using lately in meetings: count the number of permutations in each cycle class.In case you have forgotten, here is a brief summary: a permutation is a mapping from a set to itself. A cycle of a permutation is a subset of the set for which the elements fall into a single orbit. For example, the permutation: $$ \pmatrix{1&2&3&4&5&6&7&8\cr 1&4&2&8&5&7&6&3\cr}$$ can be represented by the following diagram:And, since it contains four cycles (the closed loops), it is the product of the four cycles (1), (2 4 8 3), (5), and (6 7). We can sort the permutations into cycle classes by saying that two permutations are in the same cycle class if the lengths of the cycles are all the same. This effectively files the numeric labels off the points in the diagrams. So, for example, the permutations of {1,2,3} fall into the three following cycle classes:
It is not too hard a problem, and would probably only take fifteen or twenty minutes outside of a meeting, but this is exactly what makes it a good problem for meetings, where you can give the problem only partial and intermittent attention. Now that I have a simple formula, the enumeration of cycle classes loses all its entertainment value. That's the way the cookie crumbles.
Here's the formula. Suppose we want to know how many permutations of {1,...,n} are in the cycle class C. C is a partition of the number n, which is to say it's a multiset of positive integers whose sum is n. If C contains p_{1} 1's, p_{2} 2's, and so forth, then the number of permutations in cycle class C is: $$ N(C) = {n! \over {\prod i^{p_i}{p_i}!}} $$ This can be proved by a fairly simple counting argument, plus a bit of algebraic tinkering. Note that if any of the p_{i} is 0, we can disregard it, since it will contribute a factor of i^{0}·0! = 1 in the denominator.For example, how many permutations of {1,2,3,4,5} have one 3-cycle and one 2-cycle? The cycle class is therefore {3,2}, and all the p_{i} are 0 except for p_{2} = p_{3} = 1. The formula then gives 5! in the numerator and factors 2 and 3 in the denominator, for a total of 120/6 = 20. And in fact this is right. (It's equal to !!2{5\choose3}!!: choose three of the five elements to form the 3-cycle, and then the other two go into the 2-cycle. Then there are two possible orders for the elements of the 3-cycle.) How many permutations of {1,2,3,4,5} have one 2-cycle and three 1-cycles? Here we have p_{1} = 3, p_{2} = 1, and the other p_{i} are 0. Then the formula gives 120 in the numerator and factors of 6 and 2 in the denominator, for a total of 10. Here are the breakdowns of the number of partitions in each cycle class for various n:
Incidentally, the thing about the average permutation having exactly one fixed point is quite easy to prove. Consider a permutation of N things. Each of the N things is left fixed by exactly (N-1)! of the permutations. So the total number of fixed points in all the permutations is N!, and we are done. A similar but slightly more contorted analysis reveals that the average number of 2-cycles per permutation is 1/2, the average number of 3-cycles is 1/3, and so forth. Thus the average number of total cycles per permutation is !!\sum_{i=1}^n{1\over i} = H_n!!. For example, for n=4, examination of the table above shows that there is 1 permutation with 4 independent cycles (the identity permutation), 6 with 3 cycles, 11 with 2 cycles, and 6 with 1 cycle, for an average of (4+18+22+6)/24 = 50/24 = 1 + 1/2 + 1/3 + 1/4. The 1, 6, 11, 6 are of course the Stirling numbers of the first kind; the identity !!\sum{n\brack i}i = n!H_n!! is presumably well-known.
[Other articles in category /math] permanent link Fri, 09 Feb 2007What to make of this? Many answers are possible. The point of this note is to refute one particular common answer, which is that the whole thing is just meaningless. This view is espoused by many people who, it seems, ought to know better. There are two problems with this view. The first problem is that it involves a theory of meaning that appears to have nothing whatsoever to do with pragmatics. You can certainly say that something is meaningless, but that doesn't make it so. I can claim all I want to that "jqgc ihzu kenwgeihjmbyfvnlufoxvjc sndaye" is a meaningful utterance, but that does not avail me much, since nobody can understand it. And conversely, I can say as loudly and as often as I want to that the utterance "Snow is white" is meaningless, but that doesn't make it so; the utterance still means that snow is white, at least to some people in some contexts. Similarly, asserting that the sentences are meaningless is all very well, but the evidence is against this assertion. The meaning of the utterance "sentence 2 is false" seems quite plain, and so does the meaning of the utterance "sentence 1 is true". A theory of meaning in which these simple and plain-seeming sentences are actually meaningless would seem to be at odds with the evidence: People do believe they understand them, do ascribe meaning to them, and, for the most part, agree on what the meaning is. Saying that "snow is white" is meaningless, contrary to the fact that many people agree that it means that snow is white, is foolish; saying that the example sentences above are meaningless is similarly foolish. I have heard people argue that although the sentences are individually meaningful, they are meaningless in conjunction. This position is even more problematic. Let us refer to a person who holds this position as P. Suppose sentence 1 is presented to you in isolation. You think you understand its meaning, and since P agrees that it is meaningful, he presumably would agree that you do. But then, a week later, someone presents you with sentence 2; according to P's theory, sentence 1 now becomes meaningless. It was meaningful on February 1, but not on February 8, even though the speaker and the listener both think it is meaningful and both have the same idea of what it means. But according to P, as midnight of February 8, they are suddenly mistaken. The second problem with the notion that the sentences are meaningless comes when you ask what makes them meaningless, and how one can distinguish meaningful sentences from sentences like these that are apparently meaningful but (according to the theory) actually meaningless. The answer is usually something along the lines that sentences that contain self-reference are meaningless. This answer is totally inadequate, as has been demonstrated many times by many people, notably W.V.O. Quine. In the example above, the self-reference objection is refuted simply by observing that neither sentence is self-referent. One might try to construct an argument about reference loops, or something of the sort, but none of this will avail, because of Quine's example:
"snow is white" is false when you change "is" to "is not".Or similarly: If a sentence is false, then its negation is true.Nevertheless, Quine's sentence is an antinomy of the same sort as the example sentences at the top of the article. But all of this is peripheral to the main problem with the argument that sentences that contain self-reference are meaningless. The main problem with this argument is that it cannot be true. The sentence "sentences that contain self-reference are meaningless" is itself a sentence, and therefore refers to itself, and is therefore meaningless under its own theory. If the assertion is true, then the sentence asserting it is meaningless under the assertion itself; the theory deconstructs itself. So anyone espousing this theory has clearly not thought through the consequences. (Graham Priest says that people advancing this theory are subject to a devastating ad hominem attack. He doesn't give it specifically, but many such come to mind.) In fact, the self-reference-implies-meaninglessness theory obliterates not only itself, but almost all useful statements of logic. Consider for example "The negation of a true sentence is false and the negation of a false sentence is true." This sentence, or a variation of it, is probably found in every logic textbook ever written. Such a sentence refers to itself, and so, in the self-reference-implies-meaninglessness theory, is meaningless. So too with most of the other substantive assertions of our logic textbooks, which are principally composed of such self-referent sentences about properties of sentences; so much for logic. The problems with ascribing meaninglessness to self-referent sentences run deeper still. If a sentence is meaningless, it cannot be self-referent, because, being meaningless, it cannot refer to anything at all. Is "jqgc ihzu kenwgeihjmbyfvnlufoxvjc sndaye" self-referent? No, because it is meaningless. In order to conclude that it was self-referent, we would have to understand it well enough to ascribe a meaning to it, and this would prove that it was meaningful. So the position that the example sentences 1 and 2 are "meaningless" has no logical or pragmatic validity at all; it is totally indefensible. It is the philosophical equivalent of putting one's fingers in one's ears and shouting "LA LA LA I CAN'T HEAR YOU!" There are better positions. Priest's position is that the sentences are both true and false. This would seem to be just as defensible as the position that they are neither true nor false, but in fact the two positions are neither equivalent nor symmetric. For fuller details, see the article on "dialetheism" in The Stanford Encyclopedia of Philosophy (Summer 2004 Edition); for fullest details, see Priest's book In Contradiction.
[Other articles in category /math/logic] permanent link Tue, 06 Feb 2007
Mnemonics
The Wikipedia article about the number e mentions a very silly mnemonic for remembing the digits of e: "2.7-Andrew Jackson-Andrew Jackson-Isosceles Right Triangle". Apparently, Andrew Jackson was elected President in 1828. When I saw this, my immediate thought was "that's great; from now on I'll always remember when Andrew Jackson was elected President." In high school, I had a math teacher who pointed out that a mnemonic for the numerical value of √3 was to recall that George Washington was born in the year 1732. And indeed, since that day I have never forgotten that Washington was born in 1732.
[Other articles in category /math] permanent link Tue, 23 Jan 2007 In need of some bathroom reading last week, I grabbed my paperback copy of Thomas Hobbes' Leviathan, which is always a fun read. The thing that always strikes me about Leviathan is that almost every sentence makes me nod my head and mutter "that is so true," and then want to get in an argument with someone in which I have the opportunity to quote that sentence to refute them. That may sound like a lot to do on every sentence, but the sentences in Leviathan are really long.Here's a random example:
And as in arithmetic unpractised men must, and professors themselves may often, err, and cast up false; so also in any other subject of reasoning, the ablest, most attentive, and most practised men may deceive themselves, and infer false conclusions; not but that reason itself is always right reason, as well as arithmetic is a certain and infallible art: but no one man's reason, nor the reason of any one number of men, makes the certainty; no more than an account is therefore well cast up because a great many men have unanimously approved it. And therefore, as when there is a controversy in an account, the parties must by their own accord set up for right reason the reason of some arbitrator, or judge, to whose sentence they will both stand, or their controversy must either come to blows, or be undecided, for want of a right reason constituted by Nature; so is it also in all debates of what kind soever: and when men that think themselves wiser than all others clamour and demand right reason for judge, yet seek no more but that things should be determined by no other men's reason but their own, it is as intolerable in the society of men, as it is in play after trump is turned to use for trump on every occasion that suit whereof they have most in their hand. For they do nothing else, that will have every of their passions, as it comes to bear sway in them, to be taken for right reason, and that in their own controversies: bewraying their want of right reason by the claim they lay to it.Gosh, that is so true. Leviathan is of course available online at many locations; here is one such. Anyway, somewhere in the process of all this I learned that Hobbes had some mathematical works, and spent a little time hunting them down. The Penn library has links to online versions of some, so I got to read a little with hardly any investment of effort. One that particularly grabbed my attention was "Three papers presented to the Royal Society against Dr. Wallis". Wallis was a noted mathematician of the 17th century, a contemporary of Isaac Newton, and a contributor to the early development of the calculus. These days he is probably best known for the remarkable formula:
$${\pi\over2} = {2\over1}{2\over3}{4\over3}{4\over5}{6\over5}{6\over7}{8\over7}\cdots$$ So I was reading this Hobbes argument against Wallis, and I hardly got through the first page, because it was so astounding. I will let Hobbes speak for himself:
Hobbes says, in short, that the square root of 100 squares is not 10 unit lengths, but 10 squares. That is his whole argument. Hobbes, of course, is totally wrong here. He's so totally wrong that it might seem hard to believe that he even put such a totally wrong notion into print. One wants to imagine that maybe we have misunderstood Hobbes here, that he meant something other than what he said. But no, he is perfectly lucid as always. That is a drawback of being such an extremely clear writer: when you screw up, you cannot hide in obscurity. Here is the original document, in case you cannot believe it. I picture the members of the Royal Society squirming in their seats as Hobbes presents this "confutation" of Wallis. There is a reason why John Wallis is a noted mathematician of the 17th century, and Hobbes is not a noted mathematician at all. Oh well! Wallis presented a rebuttal sometime later, which I was not going to mention, since I think everyone will agree that Hobbes is totally wrong. But it was such a cogent rebuttal that I wanted to quote a bit from it:
Like as 10 dozen is the root, not of 100 dozen, but of 100 dozen dozen. ... But, says he, the root of 100 soldiers, is 10 soldiers. Answer: No such matter, for 100 soldiers is not the product of 10 soldiers into 10 soldiers, but of 10 soldiers into the number 10: And therefore neither 10, nor 10 soldiers, is the root of it.Post scriptum: The remarkable blog Giornale Nuovo recently had an article about engraved title pages of English books, and mentioned Leviathan's famous illustration specifically. Check it out.
[Other articles in category /math] permanent link Tue, 09 Jan 2007
R3 is not a square
It would be rather surprising if there were, since you could then describe any point in space unambiguously by giving its two coordinates from X. This would mean that in some sense, R^{3} could be thought of as two-dimensional. You would expect that any such X, if it existed at all, would have to be extremely peculiar. I had been wondering about this rather idly for many years, but last week a gentleman on IRC mentioned to me that there had been a proof in the American Mathematical Monthly a couple of years back that there was in fact no such X. So I went and looked it up. The paper was "Another Proof That R^{3} Has No Square Root", Sam B. Nadler, Jr., American Mathematical Monthly vol 111 June–July 2004, pp. 527–528. The proof there is straightforward enough, analyzing the topological dimension of X and arriving at a contradiction. But the Nadler paper referenced an earlier paper which has a much better proof. The proof in "R^{3} Has No Root", Robbert Fokkink, American Mathematical Monthly vol 109 March 2002, p. 285, is shorter, simpler, and more general. Here it is. A linear map R^{n} → R^{n} can be understood to preserve or reverse orientation, depending on whether its determinant is +1 or -1. This notion of orientation can be generalized to arbitrary homeomorphisms, giving a "degree" deg(m) for every homeomorphism which is +1 if it is orientation-preserving and -1 if it is orientation-reversing. The generalization has all the properties that one would hope for. In particular, it coincides with the corresponding notions for linear maps and differentiable maps, and it is multiplicative: deg(f o g) = deg(f)·deg(g) for all homeomorphisms f and g. In particular ("fact 1"), if h is any homeomorphism whatever, then h o h is an orientation-preserving map. Now, suppose that h : X^{2} → R^{3} is a homeomorphism. Then X^{4} is homeomorphic to R^{6}, and we can view quadruples (a,b,c,d) of elements of X as equivalent to sextuples (p,q,r,s,t,u) of elements of R. Consider the map s on X^{4} which takes (a,b,c,d) → (d,a,b,c). Then s o s is the map (a,b,c,d) → (c,d,a,b). By fact 1 above, s o s must be an orientation-preserving map. But translated to the putatively homeomorphic space R^{6}, the map (a,b,c,d) → (c,d,a,b) is just the linear map on R^{6} that takes (p,q,r,s,t,u) → (s,t,u,p,q,r). This map is orientation-reversing, because its determinant is -1. This is a contradiction. So X^{4} must not be homeomorphic to R^{6}, and X^{2} therefore not homeomorphic to R^{3}. The same proof goes through just fine to show that R^{2n+1} = X^{2} is false for all n, and similarly for open subsets of R^{2n+1}. The paper also refers to an earlier paper ("The cartesian product of a certain nonmanifold and a line is E^{4}", R.H. Bing, Annals of Mathematics series 2 vol 70 1959 pp. 399–412) which constructs an extremely pathological space B, called the "dogbone space", not even a manifold, which nevertheless has B × R^{3} = R^{4}. This is on my desk, but I have not read this yet, and I may never.
[Other articles in category /math] permanent link Wed, 18 Oct 2006
A statistical puzzle
Answer:
Can you explain this?
Explanation:
[Other articles in category /math] permanent link Tue, 17 Oct 2006
It's not pi
I think a wall display of "boring lava" is really funny. Yes, I know this means I'm a doofus. The inbound platform walls have a bunch of mathematics displays, including a display of Pascal's triangle. Here's a picture of one of them that I found extremely puzzling: Bona fide megageeks will see the problem at once: it appears to be π, but it isn't. π is 3.14159265358979323846... ., not 3.1415926535821480865144... as graven in stone above. So what's the deal? Did they just screw up? Did they think nobody would notice? Is it a coded message? Or is there something else going on that I didn't get? [ Addendum 20061017: The answer! ]
[Other articles in category /math] permanent link
Why it was the wrong pi
I asked for an explanantion. Thanks to the Wonders of the Internet, an explanantion was not long in coming. In short, the artist screwed up. He used a table of digits in the following format:
3.1415926535 8979323846 2643383279 5028841971 6939937510 5820974944 5923078164 0628620899 8628034825 3421170679 8214808651 3282306647 0938446095 5058223172 5359408128 4811174502 8410270193 8521105559 6446229489 5493038196 4428810975 6659334461 2847564823 3786783165 2712019091 4564856692 3460348610 4543266482 1339360726 0249141273 7245870066 0631558817 4881520920 9628292540 9171536436 7892590360 0113305305 4882046652 1384146951 9415116094 3305727036 5759591953 0921861173 8193261179 3105118548 0744623799 6274956735 1885752724 8912279381 8301194912 9833673362 4406566430 8602139494 6395224737 1907021798 6094370277 0539217176 2931767523 8467481846 7669405132 0005681271 4526356082 7785771342 7577896091 7363717872 1468440901 2249534301 4654958537 1050792279 6892589235 4201995611 2129021960 8640344181 5981362977 4771309960 5187072113 4999999837 2978049951 0597317328 1609631859 5024459455 3469083026 4252230825 3344685035 2619311881 7101000313 7838752886 5875332083 8142061717 7669147303 5982534904 2875546873 1159562863 8823537875 9375195778 1857780532 1712268066 1300192787 6611195909 2164201989 897932The digits are meant to be read across, so that after the "535" in the first group, the next digits are "8979..." on the same line. But instead, the artist has skipped down to the first group in the second line, "8214...". Whoops. This explanation was apparently discovered by Oregonians for Rationality. I found it by doing Google search for "portland 'washington park' max station pi wrong value". Thank you, Google. One thing that struck me about the digits as written is that there seemed to be too many repeats; this is what made me wonder if the digits were invented by the artist out of laziness or an attempt to communicate a secret message. We now know that they weren't. But I wondered if my sense that there were an unusually large number of duplicates was accurate, so I counted. If the digits are normal, we would expect exactly 1/10 of the digits to be the same as the previous digit. In fact, of 97 digit pairs, 14 are repeats; we would expect 9.7. So this does seem to be on the high side. Calculating the likelihood of 14 repeats appearing entirely by chance seems like a tedious chore, without using somewhat clever methods. I'm in the middle of reading some books by Feller and Gnedenko about probability theory, and they do explain the clever methods, but they're at home now, so perhaps I'll post about this further tomorrow.
[Other articles in category /math] permanent link Wed, 13 Sep 2006
Russell and Whitehead or Whitehead and Russell?
Everyone always says "Russell and Whitehead". Google results for "Russell and Whitehead" outnumber those for "Whitehead and Russell" by two to one, for example. Why? The cover and the title page [of Principia Mathematica] say "Alfred North Whitehead and Bertrand Russell, F.R.S.". How and when did Whitehead lose out on top billing? I was going to write that I thought the answer was that when Whitehead died, he left instructions to his family that they destroy his papers; this they did. So Whitehead's work was condemned to a degree of self-imposed obscurity that Russell's was not. I was planning to end this article there. But now, on further reflection, I think that this theory is oversubtle. Russell was a well-known political and social figure, a candidate for political office, a prolific writer, a celebrity, a famous pacifist. Whitehead was none of these things; he was a professor of philosophy, about as famous as other professors of philosophy. The obvious answer to my question above would be "Whitehead lost out on top billing on 10 December, 1950, when Russell was awarded the Nobel Prize." Oh, yeah. That. I'm reminded of the advertising for the movie Space Jam. The posters announced that it starred Bugs Bunny and Michael Jordan, in that order. I reflected for a while on the meaning of this. Was Michael Jordan incensed at being given second billing to a fictitious rabbit? (Probably not, I think; I imagine that Michael Jordan is entirely unthreatened by the appurtenances of any else's fame, and least of all by the fame of a fictitious rabbit.) Why does Bugs Bunny get top billing over Michael Jordan? I eventually decided that while Michael Jordan is a hero, Bugs Bunny is a god, and gods outrank heroes.
[Other articles in category /math] permanent link
Automorphisms of the complex numbers
Robert C. Helling points out that there is a much simpler proof that this is the case. Suppose that f is an automorphism, and that x^{2} = y. Then f(x^{2}) = (f(x))^{2} = f(y), so that if x is a square root of y, then f(x) is a square root of f(y). As I pointed out, f(1) = 1. Since -1 is a square root of 1, f(-1) must be a square root of 1, and so it must be -1. (It can't be 1, since automorphisms may not map two different arguments to the same value.) Since i is a square root of -1, f(i) must also be a square root of -1. So f(i) must be either ±i, and the theorem is proved. This is a nice example of why I am not a mathematician. When I want to find the automorphisms of C, my first idea is to explicitly write down the general automorphism and then start bashing away on the algebra. This sort of mathematical pig-slaughtering gets the pig cut up all right, but mathematicians are not interested in slaughtering pigs. By which I mean that the approach gets the result I want, usually, but not new or mathematically interesting results. In computer programming, the pig-slaughtering approach often works really well. Most programs are oversubtle, and can be easily improved by doing the necessary tasks in the simplest and most straightforward possible way, rather than in whatever baroque way the original programmer dreamed up. [ Previous articles in this series: Part 1 Part 2 Part 3 Followup article: Part 5 ] [Other articles in category /math] permanent link
More about automorphisms
My proof started by referring to a previous result that any such automorphism f must have f(1) = 1. But actually, I had only proved this for automorphisms that must preserve multiplication. For automorphisms that preserve addition only, f(1) need not be 1; it can be anything. In fact, x → kx is an automorphism of R for all k except zero. It is not hard to show, following the technique in the earlier article, that every continuous automorphism has this form. In hopes of salvaging something from my embarrassing error, I thought I'd spend a little time talking about the other automorphisms of R, the ones that aren't "reasonable". They are unreasonable in at least two ways: they are everywhere discontinuous, and they cannot be exhibited explicitly. To manufacture the function, we first need a mathematical horror called a Hamel basis. A Hamel basis is a set of real numbers H_{α} such that every real number r has a unique representation in the form:
$$r = \sum_{i=1}^n q_i H_{\alpha_i}$$ where all the q_{i} are rational. (It is called a Hamel basis because it makes the real numbers into a vector space over the rationals. If this explanation makes no sense to you, please ignore it.)The sum here is finite, so only a finite number of the uncountably many H_{α} are involved for any particular r; this is what characterizes it as a Hamel basis. Leaving aside the proof that the Hamel basis exists at all, if we suppose we have one, we can easily construct an automorphism of R. Just pick some rational numbers m_{α}, one for each H_{α}. Then if as above, we have: The automorphism is:
$$f(r) = \sum_{i=1}^n q_i H_{\alpha_i}m_{\alpha_i}$$ At this point I should probably prove that this is an automorphism. But it seems unwise, because I think that in the unlikely case that you have understood everything so far, you will find the statement that this is an automorphism both clear and obvious, and will be able to imagine the proof yourself, and for me to spell it out will only confuse the issue. And I think that if you have not understood everything so far, the proof will not help. So I should probably just say "clearly, this is an automorphism" and move on.
But against my better judgement, I'll give it a try. Let r and s be real numbers. We want to show that f(s) + f(r) = f(s + r). Represent r and s using the Hamel basis. For each element H of the Hamel basis, let's say that c_{H}(r) is the (rational) coefficient of H in the representation of r. That is, it's the q_{i} in the definition above. By a simple argument involving commutativity and associativity of addition, c_{H}(r+s) = c_{H}(r) + c_{H}(s) for all r, s, and H. Also, c_{H}(f(r)) = m·c_{H}(r), for all r and H, where m is the multiplier we chose for H back when we were making up the automorphism, because that's how we defined f. Then c_{H}(f(r+s)) = m·c_{H}(r+s) = m·(c_{H}(r) + c_{H}(s)) = m·c_{H}(r) + m·c_{H}(s) = c_{H}(f(r)) + c_{H}(f(s)) = c_{H}(f(r) + f(s)), for all H. This means that f(r+s) and f(r) + f(s) have the same Hamel basis representation. They are therefore the same number. This is what we wanted to show. If anyone actually found that in the least enlightening, I would be really interested to hear about it.
One property of a Hamel basis is that exactly one of its uncountably many elements is rational. Say it's H_{0}. Then every rational number q is represented as q = (q/H_{0})·H_{0}. Then f(q) = (q/H_{0})·H_{0}m_{0} = m_{0}q for all rational numbers q. But in general, an irrational number x will not have f(x) = m_{0}x, so the automorphism is discontinuous everywhere, unless all the m_{α} are equal, in which case it's just x → mx again. The problem with this construction is that it is completely abstract. Nobody can exhibit an example of a Hamel basis, being, as it is, an uncountably infinite set of mainly irrational numbers. So the discontinuous automorphisms constructed here are among the most utterly useless of all mathematical examples. I think that is the full story about additive automorphisms of R. I hope I got everything right this time. I should add, by the way, that there seems to be some disagreement about what is called a Hamel basis. Some people say it is what I said: a basis for the reals over the rationals, with the properties I outlined above. However, some people, when you say this, will sniff, adjust their pocket protectors, and and correct you, saying that that a Hamel basis is any basis for any vector space, as long as it has the analogous property that each vector is representable as a combination of a finite subset of the basis elements. Some say one, some the other. I have taken the definition that was convenient for the exposition of this article. [ Thanks to James Wetterau for pointing out the error in the earlier article. ] [ Previous articles in this series: Part 1 Part 2 Part 3 Part 4 ] [Other articles in category /math] permanent link Tue, 12 Sep 2006
Imaginary units, again
The proof of the theorem is not too hard. What we're looking for is what's called an automorphism of the complex numbers. This is a function, f, which "relabels" the complex numbers, so that arithmetic on the new labels is the same as the arithmetic on the old labels. For example, if 3×4 = 12, then f(3) × f(4) should be f(12). Let's look at a simpler example, and consider just the integers, and just addition. The set of even integers, under addition, behaves just like the set of all integers: it has a zero; there's a smallest positive number (2, whereas it's usually 1) and every number is a multiple of this smallest positive number, and so on. The function f in this case is simply f(n) = 2n, and it does indeed have the property that if a + b = c, then f(a) + f(b) = f(c) for all integers a, b, and c. Another automorphism on the set of integers has g(n) = -n. This just exchanges negative and positive. As far as addition is concerned, these are interchangeable. And again, for all a, b, and c, g(a) + g(b) = g(c). What we don't get with either of these examples is multiplication. 1 × 1 = 1, but f(1) × f(1) = 2 × 2 = 4 ≠ f(1) = 2. And similarly g(1) × g(1) = -1 × -1 = 1 ≠ g(1) = -1. In fact, there are no interesting automorphisms on the integers that preserve both addition and multiplication. To see this, consider an automorphism f. Since f is an automorphism that preserves multiplication, f(n) = f(1 × n) = f(1) × f(n) for all integers n. The only way this can happen is if f(1) = 1 or if f(n) = 0 for all n. The latter is clearly uninteresting, and anyway, I neglected to mention that the definition of automorphism rules out functions that throw away information, as this one does. Automorphisms must be reversible. So that leaves only the first possibility, which is that f(1) = 1. But now consider some positive integer n. f(n) = f(1 + 1 + ... + 1) = f(1) + f(1) + ... + f(1) = 1 + 1 + ... + 1 = n. And similarly for 0 and negative integers. So f is the identity function. One can go a little further: there are no interesting automorphisms of the real numbers that preserve both addition and multiplication. In fact, there aren't even any reasonable ones that preserve addition. The proof is similar. First, one shows that f(1) = 1, as before. Then this extends to a proof that f(n) = n for all integers n, as before. Then suppose that a and b are integers. b·f(a/b) = f(b)f(a/b) = f(b·a/b) = f(a) = a, so f(a/b) = a/b for all rational numbers a/b. Then if you assume that f is continuous, you can fill in f(x) = x for the irrational numbers also. (Actually this is enough to show that the only continuous addition-preserving automorphism of the reals is the identity function. There are discontinuous addition-preserving functions, but they are very weird. I shouldn't need to drag in the continuity issue to show that the only addition-and-multiplication-preserving automorphism is the identity, but it's been a long day and I'm really fried.) [ Addendum 20060913: This previous paragraph is entirely wrong; any function x → kx is an addition-preserving automorphism, except of course when k=0. For more complete details, see this later article. ] But there is an interesting automorphism of the complex numbers; it has f(a + bi) = a - bi for all real a and b. (Note that it leaves the real numbers fixed, as we just showed that it must.) That this function f is an automorphism is precisely the content of the statement that i and -i are numerically indistinguishable. The proof that f is an automorphism is very simple. We need to show that if f(a + bi) + f(c + di) = f((a + bi) + (c + di)) for all complex numbers a+bi and c+di, and similarly f(a + bi) × f(c + di) = f((a + bi) × (c + di)). This is really easy; you can grind out the algebra in about two steps. What's more interesting is that this is the only nontrivial automorphism of the complex numbers. The proof of this is also straightforward, but a little more involved. The purpose of this article is to present the proof. Let's suppose that f is an automorphism of the complex numbers that preserves both addition and multiplication. Let's say that f(i) = p + qi. Then f(a + bi) = f(a) + f(b)f(i) = a + bf(i) (because f must leave the real numbers fixed) = a + b(p + qi) = (a + bp) + bqi. Now we want f(a + bi) + f(c + di) = f((a + bi) + (c + di)) for all real numbers a, b, c, and d. That is, we want (a + bp + bqi) + (c + dp + dqi) = (a + c) + (b + d)(p + qi). It is, so that part is just fine. We also want f(a + bi) × f(c + di) = f((a + bi) × (c + di)) for all real numbers a, b, c, and d. That means we need:
Equating the real and imaginary parts gives us two equations:
Equation 2 implies that either p or q is 0. If they're both zero, then f(a + bi) = a, which is not reversible and so not an automorphism. Trying q=0 renders equation 1 insoluble because there is no real number p with p^{2} = -1. But p=0 gives two solutions. One has p=0 and q=1, so f(a+bi) = a+bi, which is the identity function, and not interesting. The other has p=0 and q=-1, so f(a+bi) = a-bi, which is the one we already knew about. But we now know that there are no others, which is what I wanted to show. [ Previous articles in this series: Part 1 Part 2 Followup articles: Part 4 Part 5 ] [Other articles in category /math] permanent link Sat, 09 Sep 2006
Imaginary units, revisited
The two square roots of -1 are indistinguishable in the same way that the top and bottom faces of a cube are. Sure, one is the top, and one is the bottom, but it doesn't matter, and it could just as easily be the other way around. Sure, you could say something like this: "If you embed the cube in R^{3}, then the top face is the set of points that have z-coordinate +1, and the bottom face is the set of points that have z-coordinate -1." And indeed, once you arbitrarily designate that one face is on the top and the other is on the bottom, then one is on the top, and one is on the bottom—but that doesn't mean that the two faces had any a priori difference, that one of them was intrinsically the top, or that the designation wasn't completely arbitrary; trying to argue that the faces are distinguishable, after having made an arbitrary designation to distinguish them, is begging the question. Now can you imagine anyone seriously arguing that the top and bottom faces of a cube are mathematically distinguishable? [ Previous article in this series: Part 1 Followup articles: Part 3 Part 4 Part 5 ] [Other articles in category /math] permanent link Fri, 08 Sep 2006
Imaginary units
I should back up and discuss square roots in more detail. The square root of x, written √x, is defined to be the number y such that y^{2} = x. Well, no, that actually contains a subtle error. The error is in the use of the word "the". When we say "the number y such that...", we imply that there is only one. But every number (except zero) has two square roots. For example, the square roots of 16 are 4 and -4. Both of these are numbers y with the property that y^{2} = 16. In many contexts, we can forget about one of the square roots. For example, in geometry problems, all quantities are positive. (I'm using "positive" here to mean "≥ 0".) When we consider a right triangle whose legs have lengths a and b, we say simply that the hypotenuse has length √(a^{2} + b^{2}), and we don't have to think about the fact that there are actually two square roots, because one of them is negative, and is nonsensical when discussing hypotenuses. In such cases we can talk about the square root function, sqrt(x), which is defined to be the positive number y such that y^{2} = x. There the use of "the" is justified, because there is only one such number. But pinning down which square root we mean has a price: the square root function applies only to positive arguments. We cannot ask for sqrt(-1), because there is no positive number y such that y^{2} = -1. For negative arguments, this simplification is not available, and we must fall back to using √ in its full generality. In high school algebra, we all learn about a number called i, which is defined to be the square root of -1. But again, the use of the word "the" here is misleading, because "the" square root is not unique; -1, like every other number (except 0) has two square roots. We cannot avail ourselves of the trick of taking the positive one, because neither root is positive. And in fact there is no other trick we can use to distinguish the two roots; they are mathematically indistinguishable. The annoying discussion was whether it was correct to say that the two roots are mathematically indistinguishable. It was annoying because it's so obviously true. The number i is, by definition, a number such that i^{2} = -1. This is its one and only defining property. Since there is another number which shares this single defining property, it stands to reason that this other root is completely interchangeable with i—mathematically indistinguishable from it, in other words. This other square root is usually written "-i", which suggests that it's somehow secondary to i. But this is not the case. Every numerical property possessed by i is possessed by -i as well. For example, i^{3} = -i. But we can replace i with -i and get (-i)^{3} = -(-i), which is just as true. Euler's famous formula says that e^{ix} = cos x + i sin x. But replacing i with -i here we get e^{-ix} = cos x + -i sin x, which is also true. Well, one of them is i, and the other is -i, so can't you distinguish them that way? No; those are only expressions that denote the numbers, not the numbers themselves. There is no way to know which of the numbers is denoted by which expression, and, in fact, it does not even make much sense to ask which number is denoted by which expression, since the two numbers are entirely interchangeable. One is i, and one is -i, sure, but this is just saying that one is the negative of the other. But so too is the other the negative of the one. One of the #math people pointed out that there is a well-known Im() function, the "imaginary part" function, such that Im(i) = 1, but Im(-i) = -1, and suggested, rather forcefully, that they could be distinguished that way. This, of course, is hopeless. Because in order to define the "imaginary part" function in the first place, you must start by making an entirely arbitrary choice of which square root of -1 you are using as the unit, and then define Im() in terms of this choice. For example, one often defines Im(z) as !!z - \bar{z} \over 2i!!. But in order to make this definition, you have to select one of the imaginary units and designate it as i and use it in the denominator, thus begging the question. Had you defined Im() with -i in place of i, then Im(i) would have been -1, and vice versa. Similarly, one #math inhabitant suggested that if one were to define the complex numbers as pairs of reals (a, b), such that (a, b) + (c, d) = (a + c, b + d), (a, b) × (c, d) = (ac - bd, ad + bc), then i is defined as (0,1), not (0,-1). This is even more clearly begging the question, since the definition of i here is solely a traditional and conventional one; defining i as (0, -1) instead of (0,1) works exactly as well; we still have i^{2} = -1 and all the other important properties. As IRC discussions do, this one then started to move downwards into straw man attacks. The #math folks then argued that i ≠ -i, and so the two numbers are indeed distinguishable. This would have been a fine counterargument to the assertion that i = -i, but since I was not suggesting anything so silly, it was just stupid. When I said that the numbers were indistinguishable, I did not mean to say that they were numerically equal. If they were, then -1 would have only one square root. Of course, it does not; it has two unequal, but entirely interchangeable, square roots. The that the square roots of -1 are indistinguishable has real content. 1 has two square roots that are not interchangeable in this way. Suppose someone tells you that a and b are different square roots of 1, and you have to figure out which is which. You can do that, because among the two equations a^{2} = a, b^{2} = b, only one will be true. If it's the former, then a=1 and b=-1; if the latter, then it's the other way around. The point about the square roots of -1 is that there is no corresponding criterion for distinguishing the two roots. This is a theorem. But the result is completely obvious if you just recall that i is merely defined to be a square root of -1, no more and no less, and that -1 has two square roots. Oh well, it's IRC. There's no solution other than to just leave. [ Addenda: Part 2 Part 3 Part 4 Part 5 ] [Other articles in category /math] permanent link Thu, 20 Jul 2006
Flipping coins, corrected
After a million tosses of a fair coin, you can expect that the numbers of heads and tails will differ by about 1,000.In fact, the expected difference is actually !!\sqrt{2n/\pi}!!. For n=1,000,000, this gives an expected difference of about 798, not 1,000 as I said. I correctly remembered that the expected difference is on the order of √n, but forgot that the proportionality constant was not 1. The main point of my article, however, is still correct. I said that the following assertion is not quite right (although not quite wrong either):
Over a million tosses you'll have almost the same amount of heads as tails I pointed out that although the relative difference tends to get small, the absolute difference tends to infinity. This is still true. Thanks to James Wetterau for pointing out my error.
[Other articles in category /math] permanent link Wed, 19 Jul 2006
Flipping coins
Over a million tosses you'll have almost the same amount of heads as tails Well, yes, and no. It depends on how you look at it. After a million tosses of a fair coin, you can expect that the numbers of heads and tails will differ by about 1,000. This is a pretty big number. On the other hand, after a million tosses of a fair coin, you can expect that the numbers of heads and tails will differ by about 0.1%. This is a pretty small number. In general, if you flip the coin n times, the expected difference between the numbers of heads and tails will be about √n. As n gets larger, so does √n. So the more times you flip the coin, the larger the expected difference in the two totals. But the relative difference is the quotient of the difference and the total number of flips; that is, √n/n = 1/√n. As n gets larger, 1/√n goes to zero. So the more times you flip the coin, the smaller the expected difference in the two totals. It's not quite right to say that you will have "almost the same amount of heads as tails". But it's not quite wrong either. As you flip the coin more and more, you can expect the totals to get farther and farther apart—but the difference between them will be less and less significant, compared with the totals themselves. [ Addendum 20060720: Although the main point of this article is correct, I made some specific technical errors. A correction is available. ]
[Other articles in category /math] permanent link Tue, 20 Jun 2006
484848 is excellent
So, are all concatenations of odd numbers of "48" excellent? I demand a proof!So okay, it wasn't a request. I have been given no choice but to comply. I hear and I obey, O mighty one! First let's define the items we're talking about:
In order to show that c_{n} is excellent, it then suffices to show that c_{n} = b_{n}^{2} - a_{n}^{2} for all n. First we'll prove the following lemma: for all n, 4b_{n} - 7a_{n} = 4. This follows easily by induction. For n=0, we have 4·8 - 7·4 = 32 - 28 = 4. Now suppose the lemma is proved for n=i; we want to show that it is true for n=i+1. That is, we want to calculate:
Now the main theorem, again by induction. We want to show that:
b_{n}^{2} - a_{n}^{2} = c_{n} for all n. For n=0 this is trivial, since we have 8^{2} - 4^{2} = 64 - 16 = 48. Now suppose we know it is true for n=i; we will show that it is true for n = i+1 as well:
I may have more to say about this later. I have a half-written article that complains about homework questions of the form "Solve problem X using technique Z," where Z is something like induction. The article was inspired by a particularly odious problem of this type:
if n + 1 balls are put inside n boxes, then at least one box will contain more than one ball. prove this principle by induction.Nobody in his right mind would prove this principle by induction. You prove it by pointing out that if the conclusion were to fail, no box would have more than one ball; since there are n boxes, each of which has no more than one ball, then there are no more than n balls, and this contradicts the hypothesis. Using induction is idiotic. A student faced with this kind of question will conclude (correctly) that he or she is being forced to jump through a pointless hoop, and may conclude (incorrectly) that induction is useless. And students are frequently confused by pointless applications of principles. People learn better when they understand why things are happening; when students feel that they don't understand the point of what is being done, they feel that they don't understand the mechanics either. In the real world—by which I mean what real scientists, mathematicians, and engineers do, in addition to what people in the grocery store do—I am excluding only homework assignments—we almost never get a problem of the form "solve X using technique Y". Problems we face in the real world always have the form "solve X, by hook or by crook." The closest we ever see to a prescribed technique are mere suggestions like "Well, Y might work here, so you could try that." Questions that prescribe techniques are either lazy pedagogy or bad curriculum design. If technique Z—say, induction—is a useful technique, then it is because there is some problem Y such that Z is superior to all other techniques for solving Y. If all such Y are outside the scope of the class, then Z is outside the scope of the class too. If, on the other hand, there is some Y that is in the scope of the class, it is the instructor's job to find it and present it to the students, as an instructive example. To fail in this, and to make up a contrived and irrelevant problem in place of Y, is a failure of the instructor's principal duty, which is to illustrate the subject matter by realistic and relevant examples. For the theorem above about 484848, induction is clearly a good way to solve it; to solve the problem by direct calculation is painful. There are other things to learn from the demonstration above. It serves as a wonderful example of what is wrong with standard mathematical style for writing up proofs. A student seeing this proof might well ask "where the heck did you get that lemma about 4b - 7a = 4? Is that something you knew from before? Did you just guess? Was it in the book somewhere?" But no, I did not guess, I did not know that before, and I did not get it from the book. The answer is that I did the main demonstration first, starting with b_{i}^{2} - a_{i}^{2} and trying to get from there to c_{i} by using algebraic manipulations and the definitions of a, b, and c. And just when everything seemed to be going along well, I got stuck. I had:
10000c_{i} + 2400(4b_{i} - 7a_{i}) - 4752 This looked something like what I was trying to manufacture, which was:
10000c_{i} + 4848 but it was not quite right. The 10000c_{i} part was fine, but instead of 2400(4b_{i} - 7a_{i}) - 4752 I needed 4848.So if it was going to work, I needed to have:
2400(4b_{i} - 7a_{i}) - 4752 = 4848 or equivalently:
2400(4b_{i} - 7a_{i}) = 9600 which is equivalent to:
4b_{i} - 7a_{i} = 4. So I had better have 4b_{i} - 7a_{i} = 4; if this turns out false, the whole thing falls apart.But a quick check of a couple of examples shows that 4b_{i} - 7a_{i} = 4 does work, at least for i=0 and 1, so maybe it would worth trying to prove in the general case. And indeed, the proof went through fine, and I won. But in the presentation of the proof, everything is backwards: I pull the mystery lemma out of my ass at the beginning for no apparent reason, and then later on it happens to be what what I need at the crucial moment. Almost as if I knew beforehand what was going to happen! There are a lot of things wrong with mathematics pedagogy, and those were two of them: artifically prescribed techniques to solve homework problems, and the ass-extraction of lemmas backwards in time.
[Other articles in category /math] permanent link Sun, 18 Jun 2006 Whitehead and Russell's Principia Mathematica is famous for taking a thousand pages to prove that 1+1=2. Of course, it proves a lot of other stuff, too. If they had wanted to prove only that 1+1=2, it would probably have taken only half as much space.Principia Mathematica is an odd book, worth looking into from a historical point of view as well as a mathematical one. It was written around 1910, and mathematical logic was still then in its infancy, fresh from the transformation worked on it by Peano and Frege. The notation is somewhat obscure, because mathematical notation has evolved substantially since then. And many of the simple techniques that we now take for granted are absent. Like a poorly-written computer program, a lot of Principia Mathematica's bulk is repeated code, separate sections that say essentially the same things, because the authors haven't yet learned the techniques that would allow the sections to be combined into one. For example, section ∗22, "Calculus of Classes", begins by defining the subset relation (∗22.01), and the operations of set union and set intersection (∗22.02 and .03), the complement of a set (∗22.04), and the difference of two sets (∗22.05). It then proves the commutativity and associativity of set union and set intersection (∗22.51, .52, .57, and .7), various properties like !!\alpha\cap\alpha = \alpha!! (∗22.5) and the like, working up to theorems like ∗22.92: !!\alpha\subset\beta \rightarrow \alpha\cup(\beta - \alpha)!!. Section ∗23 is "Calculus of Relations" and begins in almost exactly the same way, defining the subrelation relation (∗23.01), and the operations of relational union and intersection (∗23.02 and .03), the complement of a relation (∗23.04), and the difference of two relations (∗23.05). It later proves the commutativity and associativity of relational union and intersection (∗23.51, .52, .57, and .7), various properties like !!\alpha\dot\cap\alpha = \alpha!! (∗22.5) and the like, working up to theorems like ∗23.92: !!\alpha\dot\subset\beta \rightarrow \alpha\dot\cup(\beta \dot- \alpha)!!. The section ∗24 is about the existence of sets, the null set !!\Lambda!!, the universal set !!{\rm V}!!, their properties, and so on, and then section ∗24 is duplicated in ∗25 in a series of theorems about the existence of relations, the null relation !!\dot\Lambda!!, the universal relation !!\dot {\rm V}!!, their properties, and so on. That is how Whitehead and Russell did it in 1910. How would we do it today? A relation from S to T is defined as a subset of S × T and is therefore a set. Union, intersection, difference, and the other operations are precisely the same for relations as they are for sets, because relations are sets. All the theorems about unions and intersections of relations, like , just go away, because we already proved them for sets and relations are sets. The null relation is is the null set. The universal relation is the universal set. A huge amount of other machinery goes away in 2006, because of the unification of relations and sets. Principia Mathematica needs a special notation and a special definition for the result of restricting a relation to those pairs whose first element is a member of a particular set S, or whose second element is a member of S, or both of whose elements are members of S; in 2006 we would just use the ordinary set intersection operation and talk about R ∩ (S×B) or whatever. Whitehead and Russell couldn't do this in 1910 because a crucial piece of machinery was missing: the ordered pair. In 1910 nobody knew how to build an ordered pair out of just logic and sets. In 2006 (or even 1956), we would define the ordered pair <a, b> as the set {{a}, {a, b}}. Then we would show as a theorem that <a, b> = <c, d> if and only if a=c and b=d, using properties of sets. Then we would define A×B as the set of all p such that p = <a, b> ∧ a ∈ A ∧ b ∈ B. Then we would define a relation on the sets A and B as a subset of A×B. Then we would get all of ∗23 and ∗25 and a lot of ∗33 and ∗35 and ∗36 for free, and probably a lot of other stuff too. (By the way, the {{a}, {a, b}} thing was invented by Kuratowski. It is usually attributed to Norbert Wiener, but Wiener's idea, although similar, was actually more complicated.) There are no ordered pairs in Principia Mathematica, except implicitly. There are barely even any sets. Whitehead and Russell want to base everything on logic. For Whitehead and Russell, the fundamental notion is the "propositional function", which is a function φ whose output is a truth value. For each such function, there is a corresponding set, which they denote by !!\hat x\phi(x)!!, the set of all x such that φ(x) is true. For Whitehead and Russell, a relation is implied by a propositional function of two variables, analogous to the way that a set is implied by a propositional function of one variable. In 2006, we dispense with "functions of two variables", and just talk about functions whose (single) argument is an ordered pair; a relation then becomes the set of all ordered pairs for which a function is true. Russell is supposed to have said that the discovery of the Sheffer stroke (a single logical operator from which all the other logical operators can be built) was a tremendous advance, and would change everything. This seems strange to us now, because the discovery of the Sheffer stroke seems so simple, and it really doesn't change anything important. You just need to append a note to the beginning of chapter 1 that says that ∼p and p∨q are abbreviations for p|p and p|p.|.q|q, respectively, prove the five fundamental axioms, and leave everything else the same. But Russell might with some justice have said the same thing about the discovery that ordered pairs can be interpreted as sets, a simple discovery that truly would have transformed the Principia Mathematica into quite a different work. Anyway, with that background in place, we can discuss the Principia Mathematica proof of 1+1=2. This occurs quite late in Principia Mathematica, in section ∗102. My abridged version only goes to ∗56, but that is far enough to get to the important precursor theorem, ∗54.43, scanned below: The notation can be overwhelming, so let's focus just on the statement of the theorem, ignore everything else, even the helpful remark at the bottom: This is the theorem that is being proved; what follows is the proof. Now I should explain the notation, which has changed somewhat since 1910. First off, Principia Mathematica uses Peano's "dots" notation to disambiguate precedence, where we now use parentheses instead. The dot notation takes some getting used to, but has some distinct advantages over the parentheses. The idea is just that you indicate grouping by putting in dots, so that (1+2)×(3+4)×(5+6) is written as 1+2.×.3+4.×.5+6. The middle sub-formula is between a pair of dots. The (1+2) sub-formula is between a pair of dots also, but the dot on the left end is superfluous, and we omit it; similarly, the sub-formula (5+6) is delimited by a dot on the left and by the end of the formula on the right. What if you need to nest parentheses? Then you use more dots. A double dot (:) is like a single dot, but stronger. For example, we write ((1+2)×3)+4 as 1+2 . × 3 : + 4, and the double dot isolates the entire 1+2 . × 3 expression into a single sub-formula to which the +4 applies. Sometimes you need more levels of precedence, and then you use triple dots (.: and :.) and quadruple (::). This formula, as you see, has double and triple dots. Translating the dots into standard parenthesis notation, we have $$\ast54.43. \vdash ((\alpha, \beta \in 1 ) \supset (( \alpha\cap\beta = \Lambda) \equiv (\alpha\cup\beta \in 2)))$$. This is rather more cluttered-looking than the version with the dots, and in complicated formulas you can have trouble figuring out which parentheses match with which. With the dots, it's always easy. So I think it's a bit unfortunate that this convention has fallen out of use. The !!\vdash!! symbol has not changed; it means that the formula to which it applies is asserted to be true. !!\supset!! is logical implication, and !!\equiv!! is logical equivalence. Λ is the empty set, which we write nowadays as ∅. ∩ ∪ and ∈ have their modern meanings: ∩ and ∪ are the set intersection and the union operators, and x∈y means that x is an element of set y. The remaining points are semantic. α and β are sets. 1 denotes the set of all sets that have exactly one element. That is, it's the set { c : there exists a such that c = { a } }. Theorems about 1 include, for example:
So here is theorem ∗54.43 again: It asserts that if sets α and β each have exactly one element, then they are disjoint (that is, have no elements in common) if and only if their union has exactly two elements.The proof, which appears in the scan above following the word "Dem." (short for "demonstration") goes like this: "Theorem ∗54.26 implies that if α = {x} and β = {y}, then α∪β has 2 elements if and only if x is different from y." "By theorem ∗51.231, this last bit (x is different from y) is true if and only if {x} and {y} are disjoint." "By ∗13.12, this last bit ({x} and {y} are disjoint) is true if and only if α and β themselves are disjoint." The partial conclusion at this point, which is labeled (1), is that if α = {x} and β = {y}, then α∪β ∈ 2 if and only if α∩β = Λ. The proof continues: "Conclusion (1), with theorems ∗11.11 and ∗11.35, implies that if there exists x and y so that α is {x} and β is {y}, then α∪β ∈ 2 if and only if α and β are disjoint." This conclusion is labeled (2). Finally, conclusion (2), together with theorems ∗11.54 and ∗52.1, implies the theorem we were trying to prove. Maybe the thing to notice here is how very small the steps are. ∗54.26, on which this theorem heavily depends, is almost the same; it asserts that {x}∪{y} ∈ 2 if and only if x≠y. ∗54.26, in turn, depends on ∗54.101, which says that α has 2 elements if and only if there exist x and y, not the same, such that α = {x} ∪ {y}. ∗54.101 is just a tiny bit different from the definition of 2. Theorem ∗51.231 says that {x} and {y} are disjoint if and only if x and y are different. ∗52.1 is a basic property of 1; we saw it before. The other theorems cited in the demonstration are very tiny technical matters. ∗11.54 says that you can take an assertion that two things exist and separate it into two assertions, each one asserting that one of the things exists. ∗11.11 is even slimmer: it says that if φ(x, y) is always true, then you can attach a universal quantifier, and assert that φ(x, y) is true for all x and y. ∗13.12 concerns the substitution of equals for equals: if x and y are the same, then x possesses a property ψ if and only if y does too. I haven't seen the later parts of Principia Mathematica, because my copy stops after section ∗56, and the arithmetic stuff is much later. But this theorem clearly has the sense of 1+1=2 in it, and the later theorem (∗110.643) that actually asserts 1+1=2 depends strongly on this one. Although I am not completely sure what is going to happen later on (I've wasted far too much time on this already to put in more time to get the full version from the library) I can make an educated guess. Principia Mathematica is going to define the number 17 as being the set of all 17-element sets, and similarly for every other number; the use of the symbol 2 to represent the set of all 2-element sets prefigures this. These sets-of-all-sets-of-a-certain-size will then be identified as the "cardinal numbers". The Principia Mathematica will define the sum of cardinal numbers p and q something like this: take a representative set a from p; a has p elements. Take a representative set b from q; b has q elements. Let c = a∪b. If c is a member of some cardinal number r, and if a and b are disjoint, then the sum of p and q is r. With this definition, you can prove the usual desirable properties of addition, such as x + 0 = x, x + y = y + x, and 1 + 1 = 2. In particular, 1+1=2 follows directly from theorem ∗54.43; it's just what we want, because to calculate 1+1, we must find two disjoint representatives of 1, and take their union; ∗54.43 asserts that the union must be an element of 2, regardless of which representatives we choose, so that 1+1=2. Post scriptum: Peter Norvig says that the circumflex in the Principia Mathematica notation is the ultimate source of the use of the word lambda to denote an anonymous function in the Lisp and Python programming languages. I am sure you know that these languages get "lambda" from the use of the Greek letter λ by Alonzo Church to represent function abstraction in his "λ-calculus": In Lisp, (lambda (u) B) is a function that takes an argument u and returns the value of B; in the λ-calculus, λu.B is a function that takes an argument u and returns the value of B. Norvig says that Church was originally planning to write the function λu.B as û.B, but his printer could not do circumflex accents. So he considered moving the circumflex to the left and using a capital lambda instead: Λu.B. The capital Λ looked too much like logical and ∧, which was confusing, so he used lowercase lambda λ instead. Post post scriptum: Everyone always says "Russell and Whitehead". Google results for "Russell and Whitehead" outnumber those for "Whitehead and Russell" by two to one, for example. Why? The cover and the title page say "Alfred North Whitehead and Bertrand Russell, F.R.S.". How and when did Whitehead lose out on top billing? [ Addendum 20060913: I figured out how and when Whitehead lost out on top billing: 10 December 1950. ]
[Other articles in category /math] permanent link Sat, 17 Jun 2006
Excellent numbers
This other programmer had written a program to do brute-force search, which wasn't very successful, because there aren't very many excellent numbers. He was presenting it to the Perl Mongers because in the course of trying to speed up the program, he had learned something interesting: using the Memoize module to memoize the squaring function makes a program slower, not faster. But that is another matter for another article. A slightly less brutal brute-force search locates excellent numbers quite easily. Suppose the number n has 2k digits, and that it is the concatenation of a and b, each with k digits. We want b^{2} - a^{2} = n. Since n is the concatenation of a and b, it is equal to a·10^{k} + b. So we want:
a·10^{k} + b = b^{2} - a^{2} Or equivalently:
a·(10^{k} + a) = b^{2} - b Let's say that a number of the form b^{2} - b is "rectangular". So our problem is simply to find k-digit numbers a for which a·(10^{k} + a) is rectangular.That is, instead of brute-forcing n, which has 2k digits, we need only do a brute-force search on a, which has only k digits. This is a lot faster. To find all 10-digit excellent numbers, we are now searching only 90,000 values of a instead of 9,000,000,000 values of n. This is feasible, whereas the larger search isn't. All we need is some way to determine whether a given number q is rectangular, and that is not very hard to do. One way is simply to precompute a table of all rectangular numbers up to a certain size, and look up q in the table. But another way is to notice that it is sufficient to be able to extract the "rectangular root" of q. That's the number b such that b^{2} - b = q. If we can make a good guess about what b might be, then we can calculate b^{2}-b and see if we get back q. This is analogous to checking whether a number s is a perfect square by guessing its square root r and then checking to see if r^{2} = s. (Of course, it only works if you can guess right!) But how can we guess the rectangular root of q? The computer has a built-in square root function, but no rectangular root function. But that's okay, because they are very nearly the same thing. Rectangular numbers are very nearly squares: when b is large, b^{2} - b is relatively very close to b^{2}. So if we are given some number q, and we want to know if it has the form b^{2} - b, we can get a very good estimate of the size of b just by taking √q. For example, is 29750 a rectangular number? √29750 = 172.48, so if 29750 is rectangular, it will be 173^{2} - 173 = 29756. So, no. Is 1045506 rectangular? √1045506 = 1022.4998, so if 1045506 is rectangular, it will be 1023^{2}-1023 = 1045506—check! The only possible problem is that our assumption that b^{2}-b and b^{2} will be close may not hold when b is too small. So we might need to put a special case in our program to use a different test for rectangularity when q is too small. But it turns out that the test works fine even for q as small as 2: Is 2 rectangular? √2 = 1.4, so if 2 is rectangular, it will be 2^{2}-2—check! So we code up an is_rectangular function:
sub is_rectangular { my $q = shift; my $b = 1 + int(sqrt($q)); # Guess if ($b * $b - $b == $q) { # Check guess return $b; # Aha! } else { return; # Not rectangular } }This function returns false if its argument is not rectangular; and it returns the rectangular root otherwise. We will need the rectangular root, because that's exactly the lower half of an excellent number. Now we need a function that tells us whether we have some number a that might be the upper half of an excellent number:
my $k = shift || 3; my $p = 10**$k; sub is_upper_half_of_excellent_number { my $a = shift; return is_rectangular($a * ($p + $a)); }$k here is the number of digits in a. We'll let it be a command-line argument, and default to 3. We could infer k from a itself, but we'll hardwire it for two reasons. One reason is speed. The other is that we might want to accept a number like 0003514284 as being excellent, where here a has some leading zeroes; fixing k is one way to do this. We also precalculate the constant p = 10^{k}, which we need for the calculation. Now we just write the brute-force loop:
for my $a (0 .. $p-1) { if(my $b = is_upper_half_of_excellent_number($a)) { print "$a$b\n"; } }The program instantaneously coughs up:
01 10101 16128 34188 140400 190476 216513 300625 334668 416768 484848 530901 6401025 8701276Most of these are correct. For example, 901^{2} - 530^{2} = 811801 - 280900 = 530901. 513^{2} - 216^{2} = 263169 - 46656 = 216513. The ones at the top of the list are correct, if you remember about the leading zeroes: 188^{2} - 034^{2} = 35344 - 1156 = 034188, and 001^{2} - 000^{2} = 1 - 0 = 000001. The seven-digit numbers at the bottom don't work, because the program has a bug: we forgot to enforce the restriction that b must also have exactly k digits; the last two numbers in the display have four-digit b's. So we need to add another test to the main loop:
for my $a (0 .. $p-1) { if(my $b = is_upper_half_of_excellent_number($a)) { print "$a$b\n" if length($b) == $k; } }This eliminates the wrong answers from the tail of the list. It also skips over cases where b is too short, and needs leading zeroes, such as a=000, b=001, but fixing that is both easy and unimportant. The ten-digit solutions are:
0101010101 3333466668 4848484848 4989086476I think this is interesting because it shows how a very little bit of mathematical analysis, and some very sloppy numerical work, can turn an intractable problem into a tractable one. We could have worked real hard and maybe come up with some way to generate excellent numbers with no search. But instead, we did just a little work, and that was enough to narrow down the search enormously, to the point where it's practical. The other thing that I think is interesting is that the other programmer was trying to solve his performance problem with optimization tricks. But when you are searching over 9,000,000,000 cases, optimization tricks don't work. At 1,000 cases per second, 9,000,000,000 cases takes about 104 days to finish. If you can optimize your program to speed it up by 50%, it will still take 52 days to finish. But cutting the 9,000,000,000 cases to 90,000 cuts the run time from 104 days to 90 seconds. Not to say that optimizing programs never works, of course. But you have to know when optimization might work and when it can't. In this case, micro-optimization wasn't going to help; the only way to fix an algorithm that does a brute-force search of 9,000,000,000 cases is to find a better algorithm. The point of this article is to show that sometimes finding a better algorithm can be pretty easy. [ Addendum 20060620: I wrote a followup article about how it's not a coincidence that 4848484848 is excellent. ] [ Addendum 20160105: brian d foy has a whole website about his work on this problem. ] [Other articles in category /math] permanent link Fri, 16 Jun 2006
The envelope paradox
This is on my mind because someone asked about it in IRC yesterday and I was surprised at how coherently I was able to explain it on the spur of the moment. There are several versions of this paradox. My favorite version goes like this: you're going to play a game with an adversary. The adversary writes two different numbers on slips of paper and puts them in an envelope. The numbers are completely arbitrary; they could be absolutely any numbers whatsoever: zero, or π, or -1428573901823.00013, or anything else. You pick one slip at random from the envelope and examine the number written on it. You then make a prediction about whether the other number is larger or smaller. If your prediction is correct, you win a dollar; if it is incorrect, you lose a dollar. Clearly, you can break even in the long run simply by making your prediction at random. And it seems just just as clear that there is no strategy you can use that does better than breaking even. But this is the paradox: there is a strategy you can use that does better than breaking even. (This is what W.V.O. Quine calls a "veridical paradox": it's something that seems impossible, but is nevertheless true.) Spoilers follow, so you might want to stop reading here for twenty-four hours and try to figure out a winning strategy yourself.
Let's call the number you get from the envelope A and the number still in the envelope B. You can see A, and you are trying to predict whether B is larger or smaller than A. Here's your winning strategy. Before you see A, choose a random number R. If A < R, then conclude that A is "small", and predict that B is larger. If A > R, make the opposite prediction. There are three possibilities. Either (1) A and B are both less than R, or (2) they are both greater than R, or else (3) one is less than R and one is greater. In case 1, you predict that B > A, and you have a 50% probability of being correct. In case 2, you predict that B < A, and you have a 50% probability of being correct. But in case 3, you win every time! If A < R < B then you see A, conclude that A is "small", and predict that B > A, which is correct; if B < R < A then you see A, conclude that A is "large", and predict that B < A, which is correct. Since you're breaking even in cases 1 and 2, and you have a guaranteed win in case 3, you have a better-than-even chance of winning overall. There's some positive probability p (which depends on the method you use to choose R) that you have case 3, and if so, then your expected positive return on the game is p dollars per game. The paradoxical part is that it initially seems as though you can't get any idea, just from looking at A, of whether it's larger or smaller than the unknown number B. But you can get such an idea, because you can tell from looking at A how big it is, and big numbers are more likely to be larger than B than small numbers are. What you've done with R is to invent a definition of "large" and "small" numbers: numbers larger than R are "large" and those smaller than R are "small". It's an arbitrary definition, and it doesn't always succeed in distinguishing large from small numbers—it thinks that R+1 and 1000000R+1000000 are both "large"—but it can distinguish some large numbers from some small numbers, and it never gets confused and concludes that x is large and y is small when x is actually smaller than y. So it may be arbitrary, and extremely coarse, but it is never actually wrong. In the cases where this very coarse method of deciding "large" from "small" fails to distinguish A from B, you get no new information, but that's okay, because you can still break even. But if you get lucky and the adversary has chosen numbers that you can distinguish, then you win. Another way to look at the paradox is like this: suppose the adversary is required to choose his two numbers at random. Then you have a simple winning strategy: if A is positive, predict that B is smaller, and if A is negative, predict that B is larger. Even when both numbers are positive or both are negative, you win half the time; if one is positive and one is negative, you are guaranteed a win. If the adversary knows that this is what you are doing, he can cut you back to merely breaking even, by limiting himself to always choosing positive numbers. But you can foil this strategy of his by choosing your "positive" and "negative" classes to be divided somewhere other than at 0: instead of "positive" being "> 0" and "negative" being "< 0", you make them mean "> R" and "< R". The adversary still wants to choose two numbers that are always positive, but since he doesn't know how big R is, hw doesn't know how large he has to make his own numbers to get them both to be "positive". Still, this suggests the best strategy for the adversary: choose two very very large numbers that are close together. By doing this, he can make your expected win close to zero. The envelope paradox is often presented in a different form: you are given two envelopes. One contains a bunch of money, say x dollars. The other contains twice as much. You open one envelope at random and examine its contents. Then you choose one envelope to keep. A naïve analysis goes like this: I open the first envelope and see x. I can keep this envelope and collect amount x. If I switch, I have a 50% chance of ending up with 2x and a 50% chance of ending up with x/2, for an expected outcome of 5x/4. Since 5x/4 > x, I should always switch. This is what Quine calls a "falsidical paradox": the reasoning seems good, but leads to an impossible conclusion. The strategy of always switching can't possibly be correct, because you could apply it with without even seeing what is in the envelope. You could keep switching back and forth all day, never opening either envelope, and increasing your expected winnings to infinity. The tricky part, again, is that having seen x in the envelope, you cannot conclude that there is exactly a 50% chance of x being the larger of the two amounts. You get some information from the size of x, and if x is a large amount of money, then the probability that x is the larger of the two amounts is thereby greater than 0.5. To do a full analysis, one has to ask the question of how the original amounts were selected. Say that the two amounts are b and 2b; let's call b the "base amount". How did the adversary select b? Let's say that the probability of the base amount being any particular amount x is P(x). It is impossible that b has an equal probability of being every number, because $$\int_{-\infty}^\infty P(x) dx$$ is required to be 1, and if P(b) is the same for every possible base amount b, then it is a constant function, and constant functions do not have the required property. When you see x in the envelope, you know that one of two situations occurred. Either x is the base amount, and so is smaller, which occurs with probability P(x), or x/2 is the base amount, and x itself is the larger, which occurs with probability P(x/2). Since these are the only possibilities, the a posteriori likelihood that x is the smaller number is P(x)/(P(x/2) + P(x)). This is equal to 1/2 only if P(x) = P(x/2). Although this can occur for particular values of x, it can't be true for every x. As x increases, P(x) approaches zero, so for sufficiently large x, we must have P(x/2) > P(x), so P(x/2) + P(x) > 2P(x), and P(x)/(P(x/2) + P(x)) < P(x)/2P(x) = 1/2.
[Other articles in category /math] permanent link Wed, 14 Jun 2006
Worst mathematical notation ever
I was reading some lecture notes today, and I encountered what I think must be the single worst line of mathematical notation I have ever seen. Here it is:
$$\equiv_{i_0+1} = \equiv_{i_0} = \equiv$$ Isn't that just astonishing?The explanation is that the author is constructing a sequence of equivalence relations, each one ≡_{n} derived from the previous one ≡_{n-1}. Eventually (after i_{0} iterations of the procedure), the construction has nothing left to do, and the new relation is the same as the one from which it was constructed. At this point the constructed relation turns out to be a certain one ≡ with desirable properties.
[Other articles in category /math] permanent link Mon, 29 May 2006A puzzle for high school math studentsFactor x^{4} + 1.
SolutionAs I mentioned in yesterday's very long article about GF(2^{n}), there is only one interesting polynomial over the reals that is irreducible, namely x^{2} + 1. Once you make up a zero for this irreducible polynomial, name it i, and insert it into the reals, you have the complex numbers, and you are done, because the fundmental theorem of algebra says that every polynomial has a zero in the complex numbers, and therefore no polynomials are irreducible. Once you've inserted i, there is nothing else you need to insert.This implies, however, that there are no irreducible real polynomials of degree higher than 2. Every real polynomial of degree 3 or higher factors into simpler factors. For odd-degree polynomials, this is not surprising, since every such polynomial has a real zero. But I remember the day, surprisingly late in my life, when I realized that it also implies that x^{4} + 1 must be factorable. I was so used to x^{2} + 1 being irreducible, and I had imagined that x^{4} + 1 must be irreducible also. x^{2} + bx + 1 factors if and only if |b| ≥ 2, and I had imagined that something similar would hold for x^{4} + 1—But no, I was completely wrong. x^{4} + 1 has four zeroes, which are the fourth roots of -1, or, if you prefer, the two square roots of each of i and -i. When I was in elementary school I used to puzzle about the square root of i; I hadn't figured out yet that it was ±(1 + i)/√2; let's call these two numbers j and -j. The square roots of -i are the conjugates of j and -j: ±(1 - i)/√2, which we can call k and -k because my typewriter doesn't have an overbar. So x^{4} + 1 factors as (x + j)(x - j)(x + k)(x - k). But this doesn't count because j and k are complex. We get the real factorization by remembering that j and k are conjugate, so that (x - j)(x - k) is completely real; it equals x^{2} - √2x + 1. And similarly (x + j)(x + k) = x^{2} + √2x + 1. So x^{4} + 1 factors into (x^{2} - √2x + 1)(x^{2} + √2x + 1). Isn't that nice?
[Other articles in category /math] permanent link Sat, 27 May 2006IntroductionAccording to legend, the imaginary unit i = √-1 was invented by mathematicians because the equation x^{2} + 1 = 0 had no solution. So they introduced a new number i, defined so that i^{2} + 1 = 0, and investigated the consequences. Like most legends, this one has very little to do with the real history of the thing it purports to describe. But let's pretend that it's true for a while.
Descartes' theoremIf P(x) is some polynomial, and z is some number such that P(z) = 0, then we say that z is a zero of P or a root of P. Descartes' theorem (yes, that Descartes) says that if z is a zero of P, then P can be factored into P = (x-z)P', where P' is a polynomial of lower degree, and has all the same roots as P, except without z. For example, consider P = x^{3} - 6x^{2} + 11x - 6. One of the zeroes of P is 1, so Descartes theorem says that we can write P in the form (x-1)P' for some second-degree polynomial P', and if there are any other zeroes of P, they will also be zeroes of P'. And in fact we can; x^{3} - 6x^{2} + 11x - 6 = (x-1)(x^{2} - 5x + 6). The other zeroes of the original polynomial P are 2 and 3, and both of these (but not 1) are also zeroes of x^{2} - 5x + 6.We can repeat this process. If a_{1}, a_{2}, ... a_{n} are the zeroes of some nth-degree polynomial P, we can factor P as (x-a_{1})(x-a_{2})...(x-a_{n}). The only possibly tricky thing here is that some of the a_{i} might be the same. x^{2} - 2x + 1 has a zero of 1, but twice, so it factors as (x-1)(x-1). x^{3} - 4x^{2} + 5x - 2 has a zero of 1, but twice, so it factors as (x-1)(x^{2} - 3x + 2), where (x^{2} - 3x + 2) also has a zero of 1, but this time only once. Descartes' theorem has a number of important corollaries: an nth-degree polynomial has no more than n zeroes. A polynomial that has a zero can be factored. When coupled with the so-called fundamental theorem of algebra, which says that every polynomial has a zero over the complex numbers, it implies that every nth-degree polynomial can be factored into a product of n factors of the form (x-z) where z is complex, or into a product of first- and second-degree polynomials with real coefficients.
The square roots of -1Suppose we want to know what the square roots of 1 are. We want numbers x such that x^{2} = 1, or, equivalently, such that x^{2} - 1 = 0. x^{2} - 1 factors into (x-1)(x+1), so the zeroes are ±1, and these are the square roots of 1. On the other hand, if we want the square roots of 0, we need the solutions of x^{2} = 0, and the corresponding calculation calls for us to factor x^{2}, which is just (x-0)(x-0), So 0 has the same square root twice; it is the only number without two distinct square roots.If we want to find the square roots of -1, we need to factor x^{2} + 1. There are no real numbers that do this, so we call the polynomial irreducible. Instead, we make up some square roots for it, which we will call i and j. We want (x - i)(x - j) = x^{2} + 1. Multiplying through gives x^{2} - (i + j)x + ij = x^{2} + 1. So we have some additional relationships between i and j: ij = 1, and i + j = 0. Because i + j = 0, we always dispense with j and just call it -i. But it's essential to realize that neither one is more important than the other. The two square roots are identical twins, different individuals that are completely indistinguishable from each other. Our choice about which one was primary was completely arbitrary. That doesn't even do justice to the arbitrariness of the choice: i and j are so indistinguishable that we still don't know which one got the primary notation. Every property enjoyed by i is also enjoyed by j. i^{3} = j, but j^{3} = i. e^{iπ} = -1, but so too e^{jπ} = -1. And so on. In fact, I once saw the branch of mathematics known as Galois theory described as being essentially the observation that i and -i were indistinguishable, plus the observation that there are no other indistinguishable pairs of complex numbers. Anyway, all this is to set up the main point of the article, which is to describe my favorite math problem of all time. I call it the "train problem" because I work on it every time I'm stuck on a long train trip. We are going to investigate irreducible polynomials; that is, polynomials that do not factor into products of simpler polynomials. Descartes' theorem implies that irreducible polynomials are zeroless, so we are going to start by looking for polynomials without zeroes. But we are not going to do it in the usual world of the real numbers. The answer in that world is simple: x^{2} + 1 is irreducible, and once you make up a zero for this polynomial and insert it into the universe, you are done; the fundamental theorem of algebra guarantees that there are no more irreducible polynomials. So instead of the reals, we're going to do this in a system known as Z_{2}.
Z_{2}Z_{2} is much simpler than the real numbers. Instead of lots and lots of numbers, Z_{2} has only two. (Hence the name: The 2 means 2 and the Z means "numbers". Z is short for Zahlen, which is German.) Z_{2} has only 0 and 1. Addition and multiplication are as you expect, except that instead of 1 + 1 = 2, we have 1 + 1 = 0.
The 1+1=0 oddity is the one and only difference between Z_{2} and ordinary algebra. All the other laws remain the same. To be explicit, I will list them below:
Algebra in Z_{2}But the 1+1=0 oddity does have some consequences. For example, a + a = a·(1 + 1) = a·0 = 0 for all a. This will continue to be true for all a, even if we extend Z_{2} to include more elements than just 0 and 1, just as x + x = 2x is true not only in the real numbers but also in their extension, the complex numbers.Because a + a = 0, we have a = -a for all a, and this means that in Z_{2}, addition and subtraction are the same thing. If you don't like subtraction—and really, who does?—then Z_{2} is the place for you. As a consequence of this, there are some convenient algebraic manipulations. For example, suppose we have a + b = c and we want to solve for a. Normally, we would subtract b from both sides. But in Z_{2}, we can add b to both sides, getting a + b + b = c + b. Then the b + b on the left cancels (because b + b = 0) leaving a = c + b. From now on, I'll call this the addition law. If you suffered in high school algebra because you wanted (a + b)^{2} to be equal to a^{2} + b^{2}, then suffer no more, because in Z_{2} they are equal. Here's the proof:
Because the ab + ab cancels in the last step. This happy fact, which I'll call the square law, also holds true for sums of more than two terms: (a + b + ... + z)^{2} = a^{2} + b^{2} + ... + z^{2}. But it does not hold true for cubes; (a + b)^{3} is a^{3} + a^{2}b + ab^{2} + b^{3}, not a^{3} + b^{3}.
Irreducible polynomials in Z_{2}Now let's try to find a polynomial that is irreducible in Z_{2}. The first degree polynomials are x and x+1, and both of these have zeroes: 0 and 1, respectively.There are four second-degree polynomials:
Note in particular that, unlike in the real numbers, x^{2} + 1 is not irreducible in Z_{2}. 1 is a zero of this polynomial, which, because of the square law, factors as (x+1)(x+1). But we have found an irreducible polynomial, x^{2} + x + 1. The other three second-degree polynomials all can be factored, but x^{2} + x + 1 cannot. Like the fictitious mathematicians who invented i, we will rectify this problem, by introducing into Z_{2} a new number, called b, which has the property that b^{2} + b + 1 = 0. What are the consequences of this? The first thing to notice is that we haven't added only one new number to Z_{2}. When we add i to the real numbers, we also get a lot of other stuff like i+1 and 3i-17 coming along for the ride. Here we can't add just b. We must also add b+1. The next thing we need to do is figure out how to add and multiply with b. The addition and multiplication tables are extended as follows:
The only items here that might require explanation are the entries in the lower-right-hand corner of the multiplication table. Why is b×b = b+1? Well, b^{2} + b + 1 = 0, by definition, so b^{2} = b + 1 by the addition law. Why is b × (b+1) = 1? Well, b × (b+1) = b^{2} + b = (b + 1) + b = 1 because the two b's cancel. Perhaps you can work out (b+1) × (b+1) for yourself. From here there are a number of different things we could investigate. But what I think is most crucial is to discover the value of b^{3}. Let's calculate that. b^{3} = b·b^{2}. b^{2}, we know, is b+1. So b^{3} = b(b+1) = 1. Thus b is a cube root of 1. People with some experience with this sort of study will not be surprised by this. The cube roots of 1 are precisely the zeroes of the polynomial x^{3} + 1 = 0. 1 is obviously a cube root of 1, and so Descartes' theorem says that x^{3} + 1 must factor into (x+1)P, where P is some second-degree polynomial whose zeroes are the other two cube roots. P can't be divisible by x, because 0 is not a cube root of 1, so it must be either (x+1)(x+1), or something else. If it were the former, then we would have x^{3} + 1 = (x+1)^{3}, but as I pointed out before, we don't. So it must be something else. The only candidate is x^{2} + x + 1. Thus the other two cube roots of 1 must be zeroes of x^{2} + x + 1. The next question we must answer without delay is: b is one of the zeroes of x^{2} + x + 1; what is the other? There is really only one possibility: it must be b+1, because that is the only other number we have. But another way to see this is to observe that if p is a cube root of 1, then p^{2} must also be a cube root of 1:
So we now know the three zeroes of x^{3} + 1: they are 1, b, and b^{2}. Since x^{3} + 1 factors into (x+1)(x^{2}+x+1), the two zeroes of x^{2}+x+1 are b and b^{2}. As we noted before, b^{2} = b+1. This means that x^{2} + x + 1 should factor into (x+b)(x+b+1). Multiplying through we get x^{2} + bx + (b+1)x + b(b+1) = x^{2} + x + 1, as hoped. Actually, the fact that both b and b^{2} are zeroes of x^{2}+x+1 is not a coincidence. Z_{2} has the delightful property that if z is a zero of any polynomial whose coefficients are all 0's and 1's, then z^{2} is also a zero of that polynomial. I'll call this Theorem 1. Theorem 1 is not hard to prove; it follows from the square law. Feel free to skip the demonstration which follows in the rest of the paragraph. Suppose the polynomial is P(x) = a_{n}x^{n} + a_{n-1}x^{n-1} + ... + a_{1}x + a_{0}, and z is a zero of P, so that P(z) = 0. Then (P(z))^{2} = 0 also, so (a_{n}z^{n} + a_{n-1}z^{n-1} + ... + a_{1}z + a_{0})^{2} = 0. By the square law, we have a_{n}^{2}(z^{n})^{2} + a_{n-1}^{2}(z^{n-1})^{2} + ... + a_{1}^{2}z^{2} + a_{0}^{2} = 0. Since all the a_{i} are either 0 or 1, we have a_{i}^{2} = a_{i}, so a_{n}(z^{n})^{2} + a_{n-1}(z^{n-1})^{2} + ... + a_{1}z^{2} + a_{0} = 0. And finally, (z^{k})^{2} = (z^{2})^{k}, so we get a_{n}(z^{2})^{n} + a_{n-1}(z^{2})^{n-1} + ... + a_{1}z^{2} + a_{0} = P(z^{2}) = 0, which is what we wanted to prove. Now you might be worried about something: the fact that b is a zero of x^{2} + x + 1 implies that b^{2} is also a zero; doesn't the fact that b^{2} is a zero imply that b^{4} is also a zero? And then also b^{8} and b^{16} and so forth? Then we would be in trouble, because x^{2} + x + 1 is a second-degree polynomial, and, by Descartes' theorem, cannot have more than two zeroes. But no, there is no problem, because b^{3} = 1, so b^{4} = b·b^{3} = b·1 = b, and similarly b^{8} = b^{2}, so we are getting the same two zeroes over and over.
Symmetry againJust as i and -i are completely indistinguishable from the point of view of the real numbers, so too are b and b+1 indistinguishable from the point of view of Z_{2}. One of them has been given a simpler notation than the other, but there is no way to tell which one we gave the simpler notation to! The one and only defining property of b is that b^{2} + b + 1 = 0; since b+1 also shares this property, it also shares every other property of b. b and b+1 are both cube roots of 1. Each, added to the other gives 1; each, added to 1 gives the other. Each, when multiplied by itself gives the other; each when multiplied by the other gives 1.
Third-degree polynomialsWe have pretty much exhausted the interesting properties of b and its identical twin b^{2}, so let's move on to irreducible third-degree polynomials. If a third-degree polynomial is reducible, then it is either a product of three linear factors, or of a linear factor and an irreducible second-degree polynomial. If the former, it must be one of the four products x^{3}, x^{2}(x+1), x(x+1)^{2}, or (x+1)^{3}. If the latter, it must be one of x(x^{2}+x+1) or (x+1)(x^{2}+x+1). This makes 6 reducible polynomials. Since there are 8 third-degree polynomials in all, the other two must be irreducible:
There are two irreducible third-degree polynomials. Let's focus on x^{3} + x + 1, since it seems a little simpler. As before, we will introduce a new number, c, which has the property c^{3} + c + 1 = 0. By the addition law, this implies that c^{3} = c + 1. Using this, we can make a table of powers of c, which will be useful later:
To calculate the entries in this table, just multiply each line by c to get the line below. For example, once we know that c^{5} = c^{2} + c + 1, we multiply by c to get c^{6} = c^{3} + c^{2} + c. But c^{3} = c + 1 by definition, so c^{6} = (c + 1) + c^{2} + c = c^{2} + 1. Once we get as far as c^{7}, we find that c is a 7th root of 1. Analogous to the way b and b^{2} were both cube roots of 1, the other 7th roots of 1 are c^{2}, c^{3}, c^{4}, c^{5}, c^{6}, and 1. For example, c^{5} is a 7th root of 1 because (c^{5})^{7} = c^{35} = (c^{7})^{5} = 1^{5} = 1. We've decided that c is a zero of x^{3} + x + 1. What are the other two zeroes? By theorem 1, they must be c^{2} and (c^{2})^{2} = c^{4}. Wait, what about c^{8}? Isn't that a zero also, by the same theorem? Oh, but c^{8} = c·c^{7} = c, so it's really the same zero again. What about c^{3}, c^{5}, and c^{6}? Those are the zeroes of the other irreducible third-degree polynomial, x^{3} + x^{2} + 1. For example, (c^{3})^{3} + (c^{3})^{2} + 1 = c^{9} + c^{6} + 1 = c^{2} + (c^{2}+1) + 1 = 0. So when we take Z_{2} and adjoin a new number that is a solution of the previously unsolvable equation x^{3} + x + 1 = 0, we're actually forced to add six new numbers; the resulting system has 8, and includes three different solutions to both irreducible third-degree polynomials, and a full complement of 7th roots of 1. Since the zeroes of x^{3} + x + 1 and x^{3} + x^{2} + 1 are all 7th roots of 1, this tells us that x^{7} + 1 factors as (x+1)(x^{3} + x + 1)(x^{3} + x^{2} + 1), which might not have been obvious.
OnwardI said this was my favorite math problem, but I didn't say what the question was. But by this time we have dozens of questions. For example:
And then, when I get tired of Z_{2}, I can start all over with Z_{3}.
[Other articles in category /math] permanent link Thu, 11 May 2006
Egyptian Fractions
A simple algorithm for calculating this so-called "Egyptian fraction representation" is the greedy algorithm: To represent n/d, find the largest unit fraction 1/a that is less than n/d. Calculate a representation for n/d - 1/a, and append 1/a. This always works, but it doesn't always work well. For example, let's use the greedy algorithm to find a representation for 2/9. The largest unit fraction less than 2/9 is 1/5, and 2/9 - 1/5 = 1/45, so we get 2/9 = 1/5 + 1/45 = [5, 45]. But it also happens that 2/9 = [6, 18], which is much more convenient to calculate with because the numbers are smaller. Similarly, for 19/20 the greedy algorithm produces 19/20 = [2] + 9/20 = [2, 3] + 7/60 = [2, 3, 9, 180]. But even 7/60 can be more simply written than as [9, 180]; it's also [10, 60], [12, 30], and, best of all, [15, 20]. So similarly, for 3/7 this time, the greedy methods gives us 3/7 = 1/3 + 2/21, and that 2/21 can be expanded by the greedy method as [11, 231], so 3/7 = [3, 11, 231]. But even 2/21 has better expansions: it's also [12, 84], [14, 42], and, best of all, [15, 35], so 3/7 = [3, 15, 35]. But better than all of these is 3/7 = [4, 7, 28], which is optimal. Anyway, while I was tinkering with all this, I got an answer to a question I had been wondering about for years, which is: why did Ahmes come up with a table of representations of fractions of the form 2/n, rather than the representations of all possible quotients? Was there a table somewhere else, now lost, of representations of fractions of the form 3/n? The answer, I think, is "probably not"; here's why I think so. Suppose you want 3/7. But 3/7 = 2/7 + 1/7. You look up 2/7 in the table and find that 2/7 = [4, 28]. So 3/7 = [4, 7, 28]. Done. OK, suppose you want 4/7. You look up 2/7 in the table and find that 2/7 = [4, 28]. So 4/7 = [4, 4, 28, 28] = [2, 14]. Done. Similarly, 5/7 = [2, 7, 14]. Done. To calculate 6/7, you first calculate 3/7, which is [4, 7, 28]. Then you double 3/7, and get 6/7 = 1/2 + 2/7 + 1/14. Now you look up 2/7 in the table and get 2/7 = [4, 28], so 6/7 = [2, 4, 14, 28]. Whether this is optimal or not is open to argument. It's longer than [2, 3, 42], but on the other hand the denominators are smaller. Anyway, the table of 2/n is all you need to produce Egyptian representations of arbitrary rational numbers. The algorithm in general is:
An alternative algorithm is to expand the numerator as a sum of powers of 2, which the Egyptians certainly knew how to do. For 19/20 this gives us 19/20 = 16/20 + 2/20 + 1/20 = 4/5 + [10, 20]. Now we need to figure out 4/5, which we do as above, getting 4/5 = [2, 6, 12, 20], or 4/5 = 2/3 + [12, 20] if we are Egyptian, or 4/5 = [2, 4, 20] if we are clever. Supposing we are neither, we have 19/20 = [2, 6, 12, 20, 10, 20] = [2, 6, 12, 10, 10] = [2, 6, 12, 5] as before. (It is not clear to me, by the way, that either of these algorithms is guaranteed to terminate. I need to think about it some more.) Getting the table of good-quality representations of 2/n is not trivial, and requires searching, number theory, and some trial and error. It's not at all clear that 2/105 = [90, 126]. Once you have the table of 2/n, however, you can grind out the answer to any division problem. This might be time-consuming, but it's nevertheless trivial. So Ahmes needed a table of 2/n, but once he had it, he didn't need any other tables.
[Other articles in category /math] permanent link Sat, 22 Apr 2006
Counting squares
But there is a better way. It's called the Pólya-Burnside counting lemma. (It's named after George Pólya and William Burnside. The full Pólya counting theorem is more complex and more powerful. The limited version in this article is more often known just as the Burnside lemma. But Burnside doesn't deserve the credit; it was known much earlier to other mathematicians.) Let's take a slightly simpler example, and count the squares that have two colors, say blue and black only. We can easily pick them out from the list above:
Remember way back at the beginning where we decided that and and were the same because differences of a simple rotation didn't count? Well, the first thing you do is you make a list of all the kinds of motions that "don't count". In this case, there are four motions:
Now we temporarily forget about the complication that says that some squares are essentially the same as other squares. All squares are now different. and are now different because they are colored differently. This is a much simpler point of view. There are clearly 2^{4} such squares, shown below:
Which of these 16 squares is left unchanged by motion #3, a counterclockwise quarter-turn? All four wedges would have to be the same color. Of the 16 possible colorings, only the all-black and all-blue ones are left entirely unchanged by motion #3. Motion #1, the clockwise quarter-turn, works the same way; only the 2 solid-colored squares are left unchanged.
4 colorings are left unchanged by
a 180° rotation. The top wedge and the bottom wedges switch
places, so they must be the same color, and the left and right wedges
change places, so they must be the same color. But the top-and-bottom
wedges need not be the same color as the left-and-right wedges. We
have two independent choices of how to color a square so that it will
remain unchanged by a 180° rotation, and there are 2^{2} =
4 colorings that are left unchanged by a 180° rotation. These are
shown at right. So we have counted the number of squares left unchanged by each motion:
Next we take the counts for each motion, add them up, and average them. That's 2 + 4 + 2 + 16 = 24, and divide by 4 motions, the average is 6. So now what? Oh, now we're done. The average is the answer. 6, remember? There are 6 distinguishable squares. And our peculiar calculation gave us 6. Waaa! Surely that is a coincidence? No, it's not a coincidence; that is why we have the theorem. Let's try that again with three colors, which gave us so much trouble before. We hope it will say 24. There are now 3^{4} basic squares to consider. For motions #1 and #3, only completely solid colorings are left unchanged, and there are 3 solid colorings, one in each color. For motion 2, there are 3^{2} colorings that are left unchanged, because we can color the top-and-bottom wedges in any color and then the left-and-right wedges in any color, so that's 3·3 = 9. And of course all 3^{4} colorings are left unchanged by motion #4, because it does nothing.
The average is (3 + 9 + 3 + 81) / 4 = 96 / 4 = 24. Which is right. Hey, how about that? That was so easy, let's skip doing four colors and jump right to the general case of N colors:
Add them up and divide by 4, and you get (N^{4} + N^{2} + 2N)/4. So if we allow four colors, we should expect to have 70 different squares. I'm glad we didn't try to count them by hand! (Digression: Since the number of different colorings must be an integer, this furnishes a proof that N^{4} + N^{2} + 2N is always a multiple of 4. It's a pretty heavy proof if it were what we were really after, but as a freebie it's not too bad.) One important thing to notice is that each motion of the square divides the wedges into groups called orbits, which are groups of wedges that change places only with other wedges in the same orbit. For example, the 180° rotation divided the wedges into two orbits of two wedges each: the top and bottom wedges changed places with each other, so they were in one orbit; the left and right wedges changed places, so they were in another orbit. The "do nothing" motion induces four orbits; each wedge is in its own private orbit. Motions 1 and 3 put all the wedges into a single orbit; there are no smaller private cliques. For a motion to leave a square unchanged, all the wedges in each orbit must be the same color. For example, the 180° rotation leaves a square unchanged only when the two wedges in the top-bottom orbit are colored the same and the two wedges in the left-right orbit are colored the same. Wedges in different orbits can be different colors, but wedges in the same orbit must be the same color. Suppose a motion divides the wedges into k orbits. Since there are N^{k} ways to color the orbits (N colors for each of the k orbits), there are N^{k} colorings that are left unchanged by the motion. Let's try a slightly trickier problem. Let's go back to using 3 colors, and see what happens if we are allowed to flip over the squares, so that and are now considered the same. In addition to the four rotary motions we had before, there are now four new kinds of motions that don't count:
The diagonal reflections each have two orbits, and so leave 9 of the 81 squares unchanged. The horizontal and vertical reflections each have three orbits, and so leave 27 of the 81 squares unchanged. So the eight magic numbers are 3, 3, 9, and 81, from before, and now the numbers for the reflections, 9, 9, 27, and 27. The average of these eight numbers is 168/8 = 21. This is correct. It's almost the same as the 24 we got earlier, but instead of allowing both representatives of each pair like , we allow only one, since they are now considered "the same". There are three such pairs, so this reduces our count by exactly 3. Okay, enough squares. Lets do, um, cubes! How many different ways are there to color the faces of a cube with N colors? Well, this is a pain in the ass even with the Pólya-Burnside lemma, because there are 24 motions of the cube. (48 if you allow reflections, but we won't.) But it's less of a pain in the ass than if one tried to do it by hand. This is a pain for two reasons. First, you have to figure out what the 24 motions of the cube are. Once you know that, you then have to calculate the number of orbits of each one. If you are a combinatorics expert, you have already solved the first part and committed the solution to memory. The rest of the world might have to track down someone who has already done this—but that is not as hard as it sounds, since here I am, ready to assist. Fortunately the 24 motions of the cube are not all entirely different from each other. They are of only four or five types:
Unfortunately, the Pólya-Burnside technique does not tell you what the ten colorings actually are; for that you have to do some more work. But at least the P-B lemma tells you when you have finished doing the work! If you set about to enumerate ways of painting the faces of the cube, and you end up with 9, you know you must have missed one. And it tells you how much toil to expect if you do try to work out the colorings. 10 is not so many, so let's give it a shot:
Care to try it out? There are 4 ways to color the sides of a triangle with two colors, 10 ways if you use three colors, and N(N+1)(N+2)/6 if you use N colors. There are 140 different ways to color a the squares of a 3×3 square array, counting reflections as different. If reflected colorings are not counted separately, there are only 102 colorings. (This means that 38 of the colorings have some reflective symmetry.) If the two colors are considered interchangeable (so for example and are considered the same) there are 51 colorings. You might think it is obvious that allowing an exchange of the two colors cuts the number of colorings in half from 102 to 51, but it is not so for 2×2 squares. There are 6 ways to color a 2×2 array, whether or not you count reflections as different; if you consider the two colors interchangeable then there are 4 colorings, not 3. Why the difference?
[Other articles in category /math] permanent link Sat, 15 Apr 2006
Doubling productivity and diminishing returns
Centralisation of the means of communication and transport in the hands of the State.Well, OK. Perhaps in 1848 that looked like a good idea. Sure, it might be a reasonable thing to try. Having tried it, we now know that it is a completely terrible idea. I was planning a series of essays about crackpot ideas, how there are different sorts. Some crackpot ideas are obviously terrible right from the get-go. But other crackpot ideas, like that it would be good for the State to control all communication and transportation, are not truly crackpot; they only seem so in hindsight, after they are tried out and found totally hopeless.
To accomplish all this, Wilkins must first taxonomize all the things, actions, and properties in the entire universe. (I mentioned this to the philosopher Bryan Frances a couple of weeks ago, and he said "Gosh! That could take all morning!") The words are then assigned to the concepts according to their place in this taxonomy. When I mentioned this to my wife, she immediately concluded that he was a crackpot. But I don't think he was. He was a learned bishop, a scientist, and philosopher. None of which are inconsistent with being a crackpot, of course. But Wilkins presented his idea to the Royal Society, and the Royal Society had it printed up as a 450-page quarto book by their printer. Looking back from 2006, it looks like a crackpot idea—of course it was never going to work. But in 1668, it wasn't obvious that it was never going to work. It might even be that the reason we know now that it doesn't work is precisely that Wilkins tried it in 1668. (Roget's Thesaurus, published in 1852, is a similar attempt to taxonomize the universe. Roget must have been aware of Wilkins' work, and I wonder what he thought about it.) Anyway, I seem to have digressed. The real point of my article is to mention this funny thing from the Rosenfelder article. Here it is:
You can double your workforce participation from 27% to 51% of the population, as Singapore did; you can't double it again.Did you laugh? The point here is that it's easy for developing nations to get tremendous growth rates. They can do that because their labor forces and resources were so underused before. Just starting using all the stuff you have, and you get a huge increase in productivity and wealth. To get further increases is not so easy. So why is this funny? Well, if an increase from 27% to 51% qualifies as a doubling of workforce participation, then Singapore could double participation a second time. If the double of 27% is 51%, then the double of 51% is 96.3%. It's funny because M. Rosenfelder is trying to make an argument from pure mathematics, and doesn't realize that if you do that, you have to get the mathematics right. Sure, once your workforce participation, or anything else, is at 51%, you cannot double it again; it is mathematically impossible. But mathematics has strict rules. It's OK to report your numbers with an error of 5% each, but if you do, then it no longer becomes mathematically impossible to have 102% participation. By rounding off, you run the risk that your mathematical argument will collapse spectacularly, as it did here. (Addendum: I don't think that the conclusion collapses; I think that Rosenfelder is obviously correct.) OK, so maybe it's not funny. I told you I have a strange sense of humor. The diminishing returns thing reminds me of the arguments that were current a while back purporting that women's foot race times would surpass those of men. This conclusion was reached by looking at historical rates at which men's and women's times were falling. The women's times were falling faster; ergo, the women's times would eventually become smaller than the men's. Of course, the reason that the women's times were falling faster was that racing for women had been practiced seriously for a much shorter time, and so the sport was not as far past the point of diminishing returns as it was for men. When I first started bowling, my average scores increased by thirty points each week. But I was not foolish enough to think that after 10 weeks I would be able to score a 360.
[Other articles in category /math] permanent link Fri, 07 Apr 2006
Robert Recorde invents the equals sign
I once gave a conference talk about how it was a good idea to go dig up original materials, and why. Someday I may write a blog article about this. One of the best reasons is that these materials are the original materials because they are the ones that are so brilliant and penetrating and incisive that they inspired other people to follow them. So I thought it might be fun to read Leibnitz's original papers and see what I might find out that I did not already know. Also, there is an element of touristry in it: I would like to gaze upon the world's first use of the integral sign with my own eyes, in the same way that I would like to gaze on the Grand Canyon with my own eyes. The trip to the library is a lot more convenient, this month. Anyway, this sparked a discussion with M. Schoen about original mathematic manuscripts, and he mentioned to me that he had seen the page of Robert Recorde's The Whetstone of Witte that contains the world's first use, in 1557, of the equals sign. He had a scan of this handy; I have extracted the relevant portion of the page, and here it is: Here is a transcription, in case you find the font difficult to decipher:
Howbeit, for easie alteration of equations. I will propounde a fewe exanples, bicause the extraction of their rootes, maie the more aptly bee wroughte. And to avoide the tediouse repetition of these woordes : is equalle to : I will sette as I doe often in woorke use, a pair of paralleles, or Gemowe lines of one lengthe, thus: =====, bicause noe .2. thynges, can be moare equalle.If you are still having trouble reading this, try reading it aloud. The only tricky things are the spelling and the word "Gemowe". Reading aloud will solve the spelling problem. "Gemowe" means "twin", like in the astrological sign of Gemini. (I had to look this up in the big dictionary.) Reading 16th-century books takes a little time to get used to, but once you know the tricks, it is surprisingly easy, given how uncouth they appear at first look. I must say, compared with the writing of the Baroque period, which just goes on and on and on, this is extremely concise and to the point. It does not read all that differently from modern technical material. One reason I like to visit original documents is that I never know what I am going to find. If you visit someone else's account of the documents, you can only learn a subset of whatever that person happened to notice and think was important. This time I learned something surprising. I knew that the German "umlaut" symbol was originally a small letter "e". A word like schön ("beautiful") was originally spelled schoen, and then was written as schon with a tiny "e" over the "o", and eventually the tiny "e" dwindled away to nothing but two dots. I have a German book printed around 1800 in which the little "e"s are quite distinct. And I had recently learned that the twiddle in the Spanish ñ character was similarly a letter "n". A word like "año" was originally "anno" (as it is in Latin) and the second "n" was later abbreviated to a diacritic over the first "n". (This makes a nice counterpoint to the fact that the mathematical logical negation symbol $$\sim$$ was selected because of its resemblance to the letter "N".) But I had no idea that anything of the sort was ever done in English. Recorde's book shows clearly that it was, at least for a time. The short passage illustrated above contains two examples. One is the word "examples" itself, which is written "exãples", with a tilde over the "a". The other is "alteration", which is written "alteratiõ", with a tilde over the "o". More examples abound: "cõpendiousnesse", "nõbers", "denominatiõ", and, I think, "reme~ber". (The print is unclear.) I had never seen this done before in English. I will investigate further and see what I can find out. Would I have learned about this if I hadn't returned to the original document? Unlikely. Here's another interesting fact about this book: It coined the bizarre word "zenzizenzizenzike", which, of all the words in the big dictionary, is the one with the most "z"s. Recorde uses the word "zenzike" to refer to the square of a number, or to a term in an expression with a square power. "Zenzizenzike" is similarly a fourth power, and "zenzizenzizenzike" an eighth power. I uploaded a scan of the relevant pages to Wikipedia, where you can see them; the word appears at the very top of the right-hand page. That page also contains the delightful phrase "zzzz Betokeneth a Square of squares, squaredly squared." Squares, squares, squares, squares, squares, squares, squares, baked beans, squares, squares, squares, and squares!
[Other articles in category /math] permanent link Thu, 06 Apr 2006
Pick's theorem
Pick's theorem concerns the area of so-called "lattice polygons". These are simply polygons whose vertices all lie at points whose coordinates are integers. Such points are called "lattice points". Here it is:
Let P be a lattice polygon. Let b be the number of lattice points that lie on the edges of the polygon, and i be the number of lattice points inside the polygon. Then the area of the polygon is exactly b/2 + i - 1.I think this should be at least a bit surprising. It implies that every lattice polygon has an area that is an integer multiple of ½, which I would not have thought was obvious. Some examples now. Pick's theorem is obviously true for rectangles: In the example above, b = 14 and i = 6, so Pick's theorem says that the area is 14/2 + 6 - 1 = 12, which is correct. In general, an m×n rectangle has (m-1)(n-1) lattice points inside, and 2m + 2n on the edges, so by Pick's theorem its area is (2m + 2n)/2 + (mn - m - n + 1) - 1 = mn.
We can cut the rectangle in half, and it still works: b is now 8 and i is 3, so Pick's theorem predicts an area of 8/2 + 3 - 1 = 6, which is still correct. This works even when the diagonal cuts through some lattice points:
Here b is still 8, but i is only 1, so Pick predicts an area of 8/2 + 1 - 1 = 4. It works for figures without right angles: Here b=5 and i=0, for a total area of 3/2. We can check the area manually as follows: The entire square has area 9. Regions A, B, and C each have area 1, and D has area 4½. That leaves X = 1½ as Pick's theorem says. It works for more complicated figures too. Here we have b=7 and i=5 for a total area of 7½: It works for non-convex polygons: It does not, however, work for polygons with holes. The figure below is the same as the first triangle in this article (which had an area of of 6) except that it has a hole of size ½ chopped out of it. It also has three fewer interior points and three more boundary points than the original triangle, for a total net loss of 3 - 3/2 = 3/2.To fix Pick's theorem for non-simply-connected polygons, you need to say that each hole adds an extra -1 to the total area. The proof of Pick's theorem isn't hard. You start by proving it that it holds for all triangles that have two sides parallel to the x and y axes. Then you prove it for all triangles, using a subtraction argument like the one I used above for triangle X. Finally, you use induction to prove it for more complicated regions, which can be built up from triangles.But the funny thing about Pick's theorem is that you can guess it even without a proof. Suppose someone told you that there was a formula for the size of a polygon in terms of the number of lattice points on the boundary and in the interior. Well, each interior point is surrounded by a square of area 1, which is typically inside the polygon: So each such point should contribute about 1 to the area. Of course, not all do; some will contribute a little less: But there will also be parts of the polygon that are not near any interior points: The points outside the polygon whose squares are partly inside (which count less than they should) will tend to balance out the contributions of the points inside whose squares are partly outside (which count more than they should.) The squares around a point on the edge (but not the vertex) of the polygon will always be half inside, half outside, so that such a point will contribute exactly half of a square to the total: The vertices are a little funny. They also contribute about ½, for the same reason that the edge points do. But convex vertices contribute rather less than ½: While concave vertices contribute a bit more: So we need to adjust the contribution of the vertices away from the ideal of ½ each. The adjustment is positive when angle is more than 180°, in which case the path at the vertex turns clockwise, and negative when the angle is less than 180°, when the path turns counterclockwise. But any polygon makes exactly one complete counterclockwise turn, so the total adjustment is exactly one square's-worth, or -1, and this is where the fudge factor of -1 comes from. (I've known about Pick's theorem for twenty years, but I never knew where the -1 came from until just now.) Viewed in this light, it makes perfect sense that a hole in the polygon should subtract an extra unit of space, since it's adding an extra complete turn.I made the diagrams for this article with a picture-drawing tool, originally designed at Bell Labs, called pic. The source code files for the illustrations were all named things like pick.pic, and the command to compile them to PostScript was pic pick.pic. This was really confusing.
[Other articles in category /math] permanent link Tue, 04 Apr 2006
Hero's formula
this mathematician's best known work is the formula for the area of a triangle in terms of the lengths of it's sides. who is this and when did they live?This is the kind of question that always trips me up in Jeopardy. (And doesn't that first sentence sounds just like a Jeopardy question? "I'll take triangles for $600, Alex. . .") The ears hear "mathematician", "triangle", "formula", and the mouth says "Who was Pythagoras!" without any conscious intervention. But the answer here is almost certainly Hero of Alexandria. Hero's formula, or Heron's formula, has always seemed to me like something of an oddity. It doesn't look like any other formula in geometry. Usually to get anything useful from a triangle you need to involve the angles, and express the results in terms of trigonometric functions. This will be obvious if you pause to remember what the phrase "trigonometric function" means. The few rare cases in which you can avoid trigonometry (there's that word again) usually involve special cases, such as right triangles. But Hero's formula expresses the area of a triangle in terms of the lengths of its sides, with no angles and no trigonometric functions. If we were trying to discover such a formula, we might proceed trigonometrically, and then hope we could somehow eliminate the trigonometry by the end. So we might proceed as follows: The magnitude of the cross product a×b is the area of the parallelogram with sides a and b, so the area of the triangle is half that. And |a × b| = |a||b| sin θ, where θ is the angle between the two vectors. So if a and b are sides of a triangle and the included angle is θ, the area A is ab/2 · sin θ. Now we want θ is in terms of the third side c. The law of cosines generalizes the Pythagorean theorem to non-right triangles: c^{2} = a^{2} + b^{2} - 2ab cos θ, where θ is the included angle between sides a and b. (In a right triangle, θ = 90°, and the cosine term drops out.) So we have cos θ = (a^{2} + b^{2} - c^{2}) / 2ab. But we want sine instead of cosine, so convert cosine to sine using sin θ = √(1-cos^{2}θ), which yields:
$$\sin\theta = {1\over 2ab}\sqrt{4a^2b^2 - {(a^2 + b^2 - c^2)}^2}$$ Expanding out the right-hand side, and remembering that what we really want is A = ab/2 sin θ. we get:
$$A = {1\over4}\sqrt{2a^2b^2 + 2a^2c^2 + 2b^2c^2 - a^4 - b^4 - c^4}$$. Now, that might satisfy anyone, but Hero's answer is better. Hero says: Let p be the "semiperimeter" of the triangle; that is, p = (a+b+c)/2. Then the area of the triangle is just √(p(p-a)(p-b)(p-c)). This is "Hero's formula".Hero's formula is simple and easy to remember, which (2a^{2}b^{2} + 2a^{2}c^{2} + 2b^{2}c^{2} - a^{4} - b^{4}- c^{4})/16 is not. If you expand out Hero's formula, you find that that the two formulas are the same, as of course they must be. But if you have only the complicated fourth-degree polynomial, how would you get the idea that putting it in terms of p will simplify it? There is some technique or insight here that I am missing. Even though such technique is the indispensable tool of the working mathematician, mathematical writing customarily scorns such technique,and has numerous phrases for disparaging it or kicking it under the carpet. For example, having gotten the fourth-degree polynomial, we might say "obviously, this is equivalent to Hero's formula", or "it is left to the student to prove that the two formulations are equivalent." Or we could just skip direct from the polynomial to Hero's formula, and say what Wikipedia says in the same situation: "Here the simple algebra in the last step was omitted." Indeed, if you are trying to prove that Hero's formula is true, it's tedious but straightforward to grovel over the algebra and grind out that the two formulations are the same. But all this ignores the real problem, which is that nobody looking at the trigonometric formulation would know to guess Hero's formula if they had not seen it before. A proof serves two purposes. it is supposed to persuade you that the theorem is true. But more importantly, it is supposed to help you understand why it is true, and to give you some insight that may help you solve similar problems. The proof-by-handwaving-away-complex-and-unexpected-algebra technique serves the first purpose, but not the second. And Hero did not discover the formula using this approach anyway, since he did not take a trig class when he was in high school. Another way to proceed is to drop a perpendicular from the vertex opposite side c. If the length of the perpendicular is h, the area of the triangle is hc/2. But if the foot of the perpendicular divides c into x + y, we also have x^{2} + h^{2} = a^{2} and y^{2} + h^{2} = b^{2}. Grovelling over the algebra again yields something equivalent to Hero's formula. I don't know a good proof of Hero's formula. I haven't seen Hero's, about which MathWorld says:
Heron's proof is ingenious but extremely convoluted, bringing together a sequence of apparently unrelated geometric identities and relying on the properties of cyclic quadrilaterals and right triangles.A "cyclic quadrilateral" is a quadrilateral whose vertices all lie on a circle. Any triangle can be considered a degenerate cyclic quadrilateral, which happens to have two of its vertices in the same place. Indeed, there's a formula, called "Bretschneider's formula", just like Hero's, but for the area of a cyclic quadrilateral: A = √((p-a)(p-b)(p-c)(p-d)), where a, b, c, d are the lengths of the sides and p is the semiperimeter. If you consider that a triangle is just a cyclic quadrilateral one of whose sides has length 0, you put d = 0 in Bretschneider's formula and you get out Hero's formula.
[Other articles in category /math] permanent link Sun, 26 Mar 2006
Approximations and the big hammer
A lot of people I know would be tempted to invoke calculus for this, or might even think that calculus was required. They see the phrase "when ε is small" or that the statement is one about limits, and that immediately says calculus. Calculus is a powerful tool for producing all sorts of results like that one, but for that one in particular, it is a much bigger, heavier hammer than one needs. I think it's important to remember how much can be accomplished with more elementary methods. The thing about √(1-ε) is simple. First-year algebra tells us that (1 - ε/2)^{2} = 1 - ε + ε^{2}/4. If ε is small, then ε^{2}/4 is really small, so we won't lose much accuracy by disregarding it. This gives us (1 - ε/2)^{2} ≈ 1 - ε, or, equivalently, 1 - ε/2 ≈ √(1 - ε). Wasn't that simple?
[Other articles in category /math] permanent link Sat, 25 Mar 2006
Achimedes and the square root of 3
Throughout this proof, Archimedes uses several rational approximations to various square roots. Nowhere does he say how he got those approximations—they are simply stated without any explanation—so how he came up with some of these is anybody's guess.It's a bit strange that Dr. Lindsey seems to find this mysterious, because I think there's only one way to do it, and it's really easy to find, so long as you ask the question "how would Archimedes go about calculating rational approximations to √3", rather than "where the heck did 265/153 come from?" It's like one of those pencil mazes they print in the Sunday kids' section of the newspaper: it looks complicated, but if you work it in the right direction, it's trivial. Suppose you are a mathematician and you do not have a pocket calculator. You are sure to need some rational approximations to √3 somewhere along the line. So you should invest some time and effort into calculating some that you can store in the cupboard for when you need them. How can you do that? You want to find pairs of integers a and b with a/b ≈ √3. Or, equivalently, you want a and b with a^{2} ≈ 3b^{2}. But such pairs are easy to find: Simply make a list of perfect squares 1 4 9 16 25 36 49..., and their triples 3 12 27 48 75 108 147..., and look for numbers in one list that are close to numbers in the other list. 2^{2} is close to 3·1^{2}, so √3 ≈ 2/1. 7^{2} is close to 3·4^{2}, so √3 ≈ 7/4. 19^{2} is close to 3·11^{2}, so √3 ≈ 19/11. 97^{2} is close to 3·56^{2}, so √3 ≈ 97/56. Even without the benefits of Hindu-Arabic numerals, this is not a very difficult or time-consuming calculation. You can carry out the tabulation to a couple of hundred entries in a few hours, and if you do you will find that 265^{2} = 70225, and 3·153^{2} is 70227, so that √3 ≈ 265/153. Once you understand this, it's clear why Archimedes did not explain himself. By saying that √3 was approximately 265/153, had had exhausted the topic. By saying so, you are asserting no more and no less than that 3·153^{2} ≈ 265^{2}; if the reader is puzzled, all they have to do is spend a minute carrying out the multiplication to see that you are right. The only interesting point that remains is how you found those two integers in the first place, but that's not part of Archimedes' topic, and it's pretty obvious anyway. [ Addendum 20090122: Dr. Lindsey was far from the only person to have been puzzled by this. More here. ] In my article about the peculiarity of π, I briefly mentioned continued fractions, saying that if you truncate the continued fraction representation of a number, you get a rational number that is, in a certain sense, one of the best possible rational approximations to the original number. I'll eventually explain this in detail; in the meantime, I just want to point out that 265/153 is one of these best-possible approximations; the mathematics jargon is that 265/153 is one of the "convergents" of √3. The approximation of √n by rationals leads one naturally to the so-called "Pell's equation", which asks for integer solutions to ax^{2} - by^{2} = ±1; these turn out to be closely related to the convergents of √(a/b). So even if you know nothing about continued fractions or convergents, you can find good approximations to surds. Here's a method that I learned long ago from Patrick X. Gallagher of Columbia University. For concreteness, let's suppose we want an approximation to √3. We start by finding a solution of Pell's equation. As noted above, we can do this just by tabulating the squares. Deeper theory (involving the continued fractions again) guarantees that there is a solution. Pick one; let's say we have settled on 7 and 4, for which 7^{2} ≈ 3·4^{2}. Then write √3 = √(48/16) = √(49/16·48/49) = 7/4·√(48/49). 48/49 is close to 1, and basic algebra tells us that √(1-ε) ≈ 1 - ε/2 when ε is small. So √3 ≈ 7/4 · (1 - 1/98). 7/4 is 1.75, but since we are multiplying by (1 - 1/98), the true approximation is about 1% less than this, or 1.7325. Which is very close—off by only about one part in 4000. Considering the very small amount of work we put in, this is pretty darn good. For a better approximation, choose a larger solution to Pell's equation. More generally, Gallagher's method for approximating √n is: Find integers a and b for which a^{2} ±1 = nb^{2}; such integers are guaranteed to exist unless n is a perfect square. Then write √n = √(nb^{2} / b^{2}) = √((a^{2} ± 1) / b^{2}) = √(a^{2}/b^{2} · (a^{2} ± 1)/a^{2}) = a / b · √((a^{2} ± 1) / a^{2}) = a/b · √(1 ± 1/a^{2}) ≈ a/b · (1 ± 1 / 2a^{2}). Who was Pell? Pell was nobody in particular, and "Pell's equation" is a complete misnomer. The problem was (in Europe) first studied and solved by Lord William Brouncker, who, among other things, was the founder and the first president of the Royal Society. The name "Pell's equation" was attached to the problem by Leonhard Euler, who got Pell and Brouncker confused—Pell wrote up and published an account of the work of Brouncker and John Wallis on the problem. G.H. Hardy says that even in mathematics, fame and history are sometimes capricious, and gives the example of Rolle, who "figures in the textbooks of elementary calculus as if he had been a mathematician like Newton." Other examples abound: Kuratowski published the theorem that is now known as Zorn's Lemma in 1923, Zorn published different (although related) theorem in 1935. Abel's theorem was published by Ruffini in 1799, by Abel in 1824. Pell's equation itself was first solved by the Indian mathematician Brahmagupta around 628. But Zorn did discover, prove and publish Zorn's lemma, Abel did discover, prove and publish Abel's theorem, and Brouncker did discover, prove and publish his solution to Pell's equation. Their only failings are to have been independently anticipated in their work. Pell, in contrast, discovered nothing about the equation that carries his name. Hardy might have mentioned Brouncker, whose significant contribution to number theory was attributed to someone else, entirely in error. I know of no more striking mathematical example of the capriciousness of fame and history.
[Other articles in category /math] permanent link Wed, 15 Mar 2006
Why pi is 3
Simon [Cozens] also asked me why the number came out to be around 3, rather than around 5 or 57, and there I was on much shakier ground. I did not have any clever insights, and all I could do was itemize a bunch of stuff that seemed to bear on the issue. It will probably appear here in a future article.1. The most obviously germane fact I came up with was this: Inscribe a regular hexagon in the unit circle. Such a hexagon obviously has a perimeter of 6. The circle goes through the same six points, but instead of taking direct paths between them, it takes a circuitous route, so its perimeter is a bit more than 6. Therefore pi is a bit more than 3. Several people have written to me to point this out, and nobody has pointed out anything different, which I think supports my contention that this is the most obviously germane fact available. Also, as I replied to M. Cozens:
I do not know any way to calculate the perimeter of a circle without considering it as a limiting case of a polygon with a lot of very short sides, so I think any investigation of why pi is 3.14 and not something else will have to start here.If you circumscribe a hexagon around the circle, a little basic geometry reveals an upper bound: π < 2√3. By using polygons with more sides, you get better bounds. With a square, you get only that 2√2 < π < 4, for example; the hexagon improves this to 3 < π < 2√3. About 2200 years ago Archimedes did the calculation for 96-gons and got the value correct to two decimal places: 3 + 10/71 < π < 3 + 1/7. (This raises an interesting question: with a 96-gon, you would expect the bounds to involve things like √3, like the hexagon does. Where do the weird fractions 10/71 and 1/7 come from? Answer: A bound of the type 2√3 was of limited use to the Greeks, because it replaces the poorly-understood number π with another poorly-understood number √3. So Archimedes replaced surds with rational approximations; for example, early on he replaces √3 with the rational approximation 265/153. (See Dr. Chuck Lindsey's detailed explanation of Archimedes' calculation, and my explanation of where 265/153 comes from.) I'd like to work through this and see what he would have come up with if he had done the exact calculation, but it'll take me some time.) Anyway, the other items I sent to M. Cozens were:
2. The shortest curve that can enclose a unit area has length 2π. (Or conversely, the largest area that can be enclosed by a unit path is 1/4π.)This is going to depend strongly on the Euclidean metric again. I don't know how to extend a general metric to give an area measure; indeed, I'm not sure yet of a sensible way to ask the question. I have to think about it. (Yes, I'm sure someone has already studied this, and I could simply look it up, but I will get a lot more out of the answer if I think about the question myself for a while before peeking in the back of the book. There is, as they say, no royal road to geometry.) The "wordless" proof of the Pythagorean theorem shows that I'll have to be very careful in making the extension from length to area in Manhattan:
Independent of the metric, this proof demonstrates that the two small white thingies on the left have the same area as the large white thingy on the right. In Euclidean space, this equality establishes the Pythagorean theorem. It had better not do so in Manhattan, because the Pythagorean theorem is false in Manhattan; in the diagram, c is not equal to √(a^{2} + b^{2}) but to a + b. I think I've convinced myself that a square with side s in Manhattan still has area s^{2}. And I'm pretty sure that those two white thingies on the left are squares, and so have areas a^{2} and b^{2}, respectively. But this implies that the large white thing on the right has area a^{2} + b^{2}, and therefore that it is not a square, and does not have area c^{2} as labeled, because c = a + b, and c^{2} is not equal to a^{2} + b^{2}. Squares in Manhattan are required to have edges that are parallel to the coordinate axes. I think. I don't know where to go next; maybe I'll figure it out on the way home from work today. The next item is my second favorite, after the observation about the inscribed hexagon:
3. If you put a penny on the table, then you can get at most six other pennies to touch it at the same time. This is closely related to the fact that 6 is the largest integer less than 2π. Analogous results hold in higher dimensions. The area of a sphere is 4π, or about 12.5; you can get 12 spheres to touch another sphere at the same time, but not 13.This business of the spheres touching a central sphere is known as the "kissing number problem"; we say that the "kissing number in two dimensions" is 6, and the "kissing number in three dimensions" is 12. After this item, I was pretty much out of circle-related facts. So I switched tactics and tried to look at things that seemed completely unrelated to circles: 4. &pi satisfies the equation:M. Cozens had observed that you can get π by using integral calculus to calculate the area of a unit circle:x - x^{3}/6 + x^{5}/120 - x^{7}/5040 + ... = 0This was the only thing I could come up with that seemed both fairly elementary, and at the same time a good way to get π out without putting it in to begin with. But he complained (rightly, I think) that this gives you no insight at all as to where the π comes from, because the π sneaks in as arccos(1) when you do the trig substitution. And since the π was one of the principal ingredients in the recipe for the cosine function to begin with, all that has happened is that you got out what you put in. (The cosine function relates the length of a circular arc to the length of a related straight segment. π gets in there because the segment has length 0 when the arc has length π/2. But then you're right back at the mystery of why a complete circular arc has length 2π.) The integral acquires the π from the cosine function, and the cosine function got it directly from the length of the circumference. So I started trying to come up with ways to get π that seem to have nothing to do with circles. The infinite polynomial was the first thing I came up with.
It can be related to the circles, but not easily, which I think is an advantage. You need to be able to relate it to circles, or else it doesn't tell you anything about why the perimeter of the circle is pi. But it mustn't be too closely related, because I think that items 1-3 probably exhaust what can be gotten directly from the circles.I just know that some smart person out there is itching to point out that the polynomial is just the Maclaurin expansion for sin(π), and of course that is how I came up with it. (What, did you think it was just a lucky guess?) But if you did not know about the Maclaurin series, you might be quite shocked to discover that π was a zero of this expression. The terms are already starting to get small by the time you get to π^{9}/362880, so in spite of the transcendentality of π we have a 9th-degree polynomial of which it is almost a zero, a polynomial that is based on elementary notions, in which there is no obvious circle. Item 5 was the Buffon's needle problem, but I said that the appearance of π there appeared to be an obvious consequence of its appearing as the perimeter of a unit circle, so let's pass on to the next thing. 6. The probability that two randomly-selected integers are relatively prime is 6/π^{2}. I said:
This gets π out, without putting it in anywhere obvious, but does not seem to me to be elementary. And how you could relate it to the circle, I have no idea.But now, the relationship with circles seems somewhat clearer to me. You can turn this into a geometry problem like this: You are standing at the origin, looking out on an infinite orchard of apple trees. There is a tree at (a, b) for every pair of integers. The trees have zero width, but when one tree is directly behind another tree, it is blocked and you cannot see it. What fraction of the trees are visible? There is one visible tree for each (rational) direction you can look in. So there's a relationship between the points on a circle and the visible trees. (Digression: if the trees have positive diameter, only a finite number are visible from the origin. If the diameter is d, let the number of visible trees be v(d). Estimate v. I believe this problem is still open.) 7. The next item I mentioned was that 1 + 1/4 + 1/9 + ... = π^{2}/6. This is probably related to the orchard thing somehow. It might be that a good understanding of this identity will lead one to a good understanding of why π is a bit more than 3. It might also be that it has some relationship with the circle. But I told M. Cozens that if he wanted someone to make sense out of this, "you really need to be talking to someone with expertise in analytic number theory, instead of to me." I'll stand by that. 8. Finally, I pointed out that π does not appear only in circles; it also appears in spheres. For example, the volume of a unit sphere is 4π/3. By this time I was scraping the barrel. It is pretty obvious that π is going to get into spheres because spheres are just stacks of circles, and π is already in the circles. Adding together a bunch of line segments that have no relation more complicated than a square root is one thing; it is surprising to see π come out of that. But adding together a bunch of circles that all involve π and getting out something that involves π again is no surprise. As I said, the inscribed hexagon thing sweems the most germane, followed closely by the kissing number. A couple of people have written to me to point out that π also appears in a number of constants and laws from physics, such as Coulomb's law. I believe that these appearances are invariably derived from the appearance of π as the circumference, and, in many cases, that this is quite obvious. I'll address this in detail in a future article. The inclusion of π in these formulas signals their dependence on Euclidean space, which has some interesting implications, since general relativity claims that real space is non-Euclidean: we shouldn't expect Coulomb's law to hold over large distances, for example. I imagine that this is old news to the astrophysicists, but it might be a surprise to the physics graduate students.
[Other articles in category /math] permanent link Mon, 13 Mar 2006
Why pi?
what property of a circle makes it . . . an irrational number. . . perhaps about as arbitrary a number as you can get.I thought about this pretty hard, and, to my amazement, I came up with a plausible answer. So here we are. The one-paragraph summary: My theory is that the association of the very weird and complex number π with a geometric object as simple as a circle is a reflection of the underlying fundamental complexity of Euclidean geometry: specifically, that its metric is a nonlinear function. First I'm going to spend some time arguing that π does require explanation. I expect that almost everyone will agree that π is weird; if you do agree, feel free to skip this section. Then I'll discuss Euclidean and non-Euclidean geometries. This is important, because the relation between π and circles appears to be a special property of Euclidean geometry, one which does not occur, for example, in spherical geometry. Finally, I'll look at the essential properties of Euclidean geometry, and why I think it is more complex than people usually realize.
π is complex and bizarreIn this section, I'm going to argue that the question is indeed worth asking. π is an extremely peculiar number, even by mathematical standards. You often hear π mentioned in the same breath with e, another constant of fundamental mathematical importance. But e is much more tractable than π is, and much better understood.In fact, the degree to which π is not understood is rather shocking when you consider its ubiquity. If you don't need to be persuaded that π is unusually weird, even as transcendental numbers go, you may want to skip to the next section. Really, this section is here to address people who think they know more mathematics than they do, who want to argue that π is no more or less complicated than any other number. But I think it is. It should be fairly clear that, as a representation of real numbers, decimal fractions are not very satisfactory. For example, you might like simple numbers to have simple representations. But the representation of 1/3 is 0.33333...., which isn't even finite. The fact that a complicated number like like 3674/31250 ("0.117568") has a simpler representation than a simple number like 1/3 just demonstrates that the system is defective. 3674/31250 gets a simple representation not because it is a simple number, but because 31250 happens to divide 10^{6}. This being the case, it is perhaps not too surprising that nobody can make head or tail of the representation of π, which is 3.14159265358979... . As far as I know, the state of our current understanding of this representation of π can be summed up as:
None But that might just be the fault of the representation. The representation is based on the number 10, and it is not clear that π has anything to do with 10, so our failure to find an answer here may just indicate that the question was not worth asking.There are better representations of real numbers; one such is the so-called "continued fraction representation". I don't want to explain this in detail in this article, but I can refer you to a talk I gave on the subject. But an itemization of this representation's desirable properties may be persuasive even if you don't know how it works:
But it doesn't work for π. The continued fraction representation of π is [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, ...], and as far as I know, the state of our current understanding of this representation of π can be summed up as:
None So much for continued fractions.We might hope for some understanding about why π is irrational. The proof that √2 is irrational is elementary, and dates back to the Greeks; you can understand it as being related to the fact that 2 is not a perfect square. π was not shown to be irrational until 1761, and the proof is not simple, which means that nobody knows a simple argument about why it should be the complicated thing it is, rather than a simple fraction. So π is complex and poorly-understood, even compared with other important transcendental numbers like e.
Euclidean geometryM. Cozens asked:Is pi inherent in our definition of a circle, or our particular geometry, or our planet, or could the ratio be different in different worlds?I think this question is very insightful. π, at least as it relates to circles, is inherent in a particular geometry, namely, Euclidean geometry. Euclidean geometry takes its name from Euclid, who wrote Elements, an extremely influential treatise on geometry, about 2300 years ago. Most of the Elements is concerned with plane geometry, which takes place in an infinite, flat two-dimensional space. π arises naturally in this kind of space as the perimeter of a circle.
Non-Euclidean geometryIn non-Euclidean spaces, π is geometrically much less important. For example, consider geometry done on a sphere. If we keep the definition of "line" as the path of shortest distance between two points, then our "lines" turn out to be great circles on the sphere—that is, circles whose centers are at the center of the sphere; the equator is an example. The 49th parallel is not a "line" because it's not a path of shortest distance; for any two points on the 49th parallel, there's a path between them over the surface of the sphere that is shorter than the one along the parallel. (This may seem strange, but it's true, and it's why direct flights from New York to Taipei often stop off in Anchorage, Alaska.) In addition to being a line, the equator is also an example of a circle. What's a circle? Circles look pretty much the way you expect them to. A circle is the set of all points that are some fixed distance from a center point. The equator is all the points that are a certain fixed distance from the north pole. The 49th parallel is also a circle; its center is also the north pole.The "diameter" of a circle is the longest possible "line" you can draw from one point on the circle to another. A diameter of a circle has the property that it always goes through the center of the circle, as you would hope and expect. For the equator, a diameter goes through the north pole. The picture to the right shows the equator in red, its center, the north pole, in yellow, and a diameter of the equator in blue. The 49th parallel is in green. Let's say that the circumference of the equator, the red line, is 1. Then the length of the equator's diameter, the blue line, is 1/2. If we were expecting to divide the circumference by the diameter and get π, we are in for a surprise, because we just got 2 instead. For the 49th parallel, the ratio of circumference to diameter is larger: I calculate about 2.88. For smaller circles, the ratio is larger still. For very small circles, the ratio is very close to π, because a small circle can't tell whether it's on a sphere or in a plane; up close the two things look the same. So the relation between π and circles is actually a special property of Euclidean geometry. Circles in non-Euclidean spaces have a perimeter-to-diameter ratio that is different from π. Euclidean metricThe single fundamental property of Euclidean geometry is that the distance between two points, say (x_{1}, y_{1}) and (x_{2}, y_{2}), is ((x_{2}-x_{1})^{2} + (y_{2}-y_{1})^{2})^{1/2}. (Or, in higher-dimensional spaces, the obvious extension of this formula.) If you change the way you measure distance, you get a different kind of geometry with different kinds of circles that have different perimeter-to-diameter ratios. In an earlier article, I discussed an alternative distance function, called the Manhattan distance, which gives diamond-shaped circles whose perimeter-to-diameter ratio is always 4.The Euclidean distance function is nothing more than the familiar Pythagorean theorem. It is very difficult for me to imagine any reasonable way to do plane geometry without the Pythagorean theorem. It is just too simple. Even the proof is simple:
So the relationship of π (which is complicated) to circles (which appear to be simple) is grounded in the Euclidean distance formula. If you change the distance formula, π is no longer related to circles. So the weirdness must be due, at least in part, to some complexity in the Euclidean distance formula. But what's complex about the Euclidean distance formula? How could it be simpler? Actually, I think it only seems simple because it is so familiar. The Euclidean distance formula is, in some ways, deeply weird. I realized this a few months ago, but everyone I mentioned it to acted like I was insane. But now I'm pretty sure. I think the essence of the problem with π is that the Euclidean distance function is nonlinear in the two spatial coordinates x and y.
Nonlinearity of the Euclidean metricLinear functions are very well-behaved. If F is a linear function, then F(a+b) = F(a) + F(b), which means that you can calculate the contributions of a and b independently of each other. To calculate F of some very complicated argument, you can break the argument into simple components and deal with them all separately. With quadratic functions like the Euclidean distance function, you cannot do this; complex problems are not easily decomposable into simple ones.For the Euclidean metric, it means that the horizontalness and verticalosity are not independent, but are tangled together and cannot be separated. What do we really mean by the perimeter of a circle? The circle is the set of points (x, y) which are at distance 1 from the point (0,0). The only meaningful way I know to talk about the length of this set is to calculate it as a limit of an approximate polygonal path as the path gets more and more segments. So you are necessarily dragging in an infinite limiting process, and such processes are always complicated. If the distance function were linear, it wouldn't matter, because then you could treat the horizontal and vertical components separately, and when you did that, you would be dealing with paths in one dimension, which, being straight lines, would be simple. You can see this if you consider the Manhattan distance function: It doesn't matter how you get from (x_{1}, y_{1}) to (x_{2}, y_{2}); whatever path you take, whether you take a lot of steps or only one, the distance is always |x_{2}-x_{1}| + |y_{2}-y_{1}|, because the distance function is linear, and thus there is no interaction between the x parts and the y parts. But with a nonlinear distance function like the Euclidean metric, it does matter what path you take. I was thinking a few months ago about how peculiar this is. I cannot think of anything else that behaves this way. Suppose you have two jugs and you start filling them with milk. You find that to fill each jug separately requires one quart, but to fill both at once requires only 1.4142 quarts. Wouldn't that freak you out? But space does behave like that. To drive ten miles north takes a gallon of gas. To drive ten miles east takes a gallon of gas. North and east are perpendicular and should be completely independent of each other. To drive ten miles north and ten miles east should require two gallons of gas. But it requires only 1.4142 gallons. How the heck did that happen? I believe that this strange entanglement between north and east, two things one might have supposed were independent, is the ultimate root of what makes the circumference of a circle such a peculiar number. I was very pleased to have this confirmation that the entanglement between horizontal and vertical is strange and complex, because, as I mentioned before, when I tried to explain to people what I found strange about it, they thought I was nuts.
One-dimensional circlesMy theory is that the peculiar length of a circle's perimeter is a result of the peculiar interaction between the otherwise apparently independent spatial dimensions in Euclidean space. If this theory is correct, we should expect that the corresponding perimeter in a one-dimensional space will not be peculiar. A one-dimensional Euclidean space, having only one dimension, has no strange interactions between independent directions. And indeed, this is the case! The perimeter of a one-dimensional circle does not involve π. It's simply 2; the "area" (which is really length) is 2r. You only get difficult numbers in spaces of at least 2 dimensions.
Why 3?M. Cozens also asked me why the number came out to be around 3, rather than around 5 or 57, and there I was on much shakier ground. I did not have any clever insights, and all I could do was itemize a bunch of stuff that seemed to bear on the issue. It will probably appear here in a future article.[ Addendum: Here it is. ]
[Other articles in category /math] permanent link Wed, 01 Mar 2006
What is topology?
Over the years I've spent a lot of time thinking about how to briefly explain topology to someone with only an ordinary math background, say one year of college calculus. Usually when I try hard to find good short explanations of things like this, I'm successful. So far, I haven't found any such explanation of topology. I haven't given up, though. The difficulty is that topology was invented to give mathematicians a better understanding of analysis and the structure of the real numbers. Analysis was invented to give mathematicians a better understanding of calculus and limit processes. Calculus was invented to solve physics problems. So topology is three degrees removed from anything real. Contrast this with, say, graph theory, where the central object of study, the graph, is only one degree removed from something real. If you understand the vertices of a graph as computers and the edges as network connections, you can immediately see the point of graph theory. To understand the point of topology, you need to understand the point of analysis, and to understand the point of analysis, you need to understand the point of calculus. That's a lot of stuff to pack into a short explanation. On top of that, you have the difficulty that topology has become a field of study in itself, with sub-branches that have nothing to do with the original goal of better understanding of the reals. The structure of the Tychonoff corkscrew (a particularly bizarre and counterintuitive topological object) may illuminate certain facts in set theory, but it has nothing to do with better understanding of the real numbers. I think I can explain topology clearly, just not briefly. This is the first in what I hope will be a series of articles in which I'll try to do that. The first thing to know is that mathematicians think of the real numbers as being points in an infinite line, with zero in the middle, and the positive numbers stretching away to the right and negative numbers to the left. Real numbers are points on this line, with a number n to the right of m if n > m . So mathematicians have a visual and spatial conception of numbers as well as a quantitative conception. This visualization is extremely important if you want to understand topology, or indeed most of analysis. To a mathematician, the numeric and the spatial objects are the same thing, just viewed in different ways. The number 3.78 doesn't merely correspond with a point on the line; it is a point on the line, one which lies physically between the points 3.75 and 4.08. One consequence of this view is that a set of numbers is not merely an arithmetic object. It also has geometric properties, such as a length (find the smallest and largest numbers in the set, and subtract the smaller from the larger) and whether the set is a single connected piece or the union of smaller, disconnected components. Another consequence of this view of numbers as points on a line is that mathematicians refer to the numbers as "points" and to the set of numbers as a "space". One idea that appears over and over again in analysis, and which is the source of the single driving idea of topology, is the notion of an open interval. An open interval is very simple: it's just the set of all numbers in between two points. The notation (a, b) represents the set of all numbers greater than a and less than b. In the mathematician's mental picture of the real numbers, the interval (1, 5/2) looks like this: The definition of an open interval implies that the interval (a, b) omits the points a and b themselves; the open interval does not include the two endpoints. The curvy lines in the picture above are just notation intended to symbolize that the endpoints of the interval are not part of the interval. There is a different notation for an interval that does include the endpoints, a so-called closed interval; [a, b] represents the set of all points greater than or equal to a and less than or equal to b. So [a, b] includes all the points of (a, b), and, additionally, a and b themselves: Again, the square brackets in the picture are fictitious; they're just a notation to tell you that in this picture, the interval does includes its endpoints. This matter of the endpoints is crucial. Open and closed intervals have very different behaviors, stemming from the fact that a closed interval contains its boundary points and an open interval does not. A point inside an open interval can be close to the edge, but not at the edge, because the open interval omits the edge. But a closed interval does include the edge. In a closed interval, the points a and b are clearly special, and quite different from the other points in the interval, in a way that I will make more precise in a moment. But in an open interval, there is a certain sense in which all the points behave the same. Here's the essential property of an open interval, as identified by topologists. If you choose any point p in the open interval, you can draw a little circle around p so that all the points inside the circle are also in the interval. For example, consider the open interval (1, 5/2) and the point 2, which is inside the interval: If we draw a circle with radius 1/4 around the point, as shown, everything inside the circle is also inside the interval. We can do this for any point at all that is in the interval, as long as we make the circle sufficiently small: Because no matter how close a point in the interval is to the end of the interval, there's still a little extra space before the end. This is not true of closed intervals. A circle around the point 1 will include some points outside of the closed interval [1, 5/2], no matter how small we make the circle: Other differences flow from this essential property. For example, a point outside a closed interval must be separated from it by some positive distance, and you can always draw a circle around the point that is completely outside the closed interval:But that is not true of open intervals. For the open interval (1, 5/2), any circle drawn around the point 1 will also enclose some points in the interval: Open intervals can abut without intersecting. For example, the intervals (1, 2) and (2, 3) have no points in common. But any circle around 2 itself encloses points of both intervals: In contrast, the closed intervals [1, 2] and [2,3] also share the property that any circle around 2 encloses points of both intervals, but that's not surprising, since any such circle encloses the point 2, and 2 is in both intervals. With the open intervals, you can't say ahead of time what points of the two intervals will be inside the circle, until you find out how big the circle is. Mathematicians sum up all these properties by saying that a closed interval contains all the points of its boundary, whereas an open interval contains none of them. This business of the small circle drawn around a particular point p is clearly important, so it's good to have a name for it. The idea is that we're trying to look at what happens "close to" p. But to pin that down, we need a notion of what "close to" means, so we need a way of measuring distances. In the real numbers, measuring distances is easy. The distance between a and b is just |a - b|. |x| denotes the absolute value of x, which means that if x is negative, you make it positive instead. For |a - b|, it simply means that you should subtract the smaller one from the larger, and not the other way around. This is because you don't want to have negative distances, and you want the distance between 4 and 3 to be the same as the distance between 3 and 4. Then mathematicians formalize those little circles this way: they say that the "ball" of radius ε around some point p, symbolized as B_{ε}(p), is just the set of all points whose distance from p is less than ε. The use of the odd word "ball" here should tip you off that we're soon going to generalize this from one to three dimensions. In one dimension, balls are actually open intervals, and B_{ε}(p) is precisely the interval (p - ε, p + ε). In two dimensions, the balls are discs, and in three dimensions they are actually ball-shaped. The balls capture a flexible notion of "closeness": two points are "close" if they are inside the same ball. By making the balls small enough, we can make the notion of "close" as restrictive as we want. In this sense, we can see that no matter how restrictive we make the notion of closeness, there are always some points in an open interval that are close to the end of the interval—but no point is "close" for all possible notions of closeness; there is always some sufficiently restrictive definition under which a particular point is far from the end. Closed sets are different, and contain points that are close to the end for any definition of "close". Similarly, the sets (1, 2) and (2, 3) are close together, for any definition of "close", although they don't actually intersect, or even touch, since you can't get from one to the other without leaving the sets entirely. The essential property of open intervals that I mentioned before can be phrased in terms of the balls in this way: a set G is said to be open if, for any point p of G, there is some positive number ε such that B_{ε}(p) is entirely contained in G. Open intervals are open, but they are are not the only open sets. Let S be set of all points in either (1, 2) or (3, 4). S is open, but it's not an interval. But open sets all look pretty much the same; all open sets are unions of non-overlapping intervals, for example. There are some unbounded open sets, such as the set of all positive numbers, but we can think of that as the interval (0, ∞). Similarly (-∞, 3), the set of all numbers less than 3, is open. And the set of all real numbers is open, since it contains every ball whatsoever, but we can think of that set as (-∞, ∞). Other concepts can be defined in terms of the balls. For example, a "limit point" of a set S is a point p for which any ball around p must contain some point of S other than p. 5/2 is a limit point of both the open interval (1, 5/2) and the closed interval [1, 5/2] because every ball B_{ε}(5/2) contains points of both intervals other than 5/2 itself. The open interval omits 5/2, but the closed set includes it. One can define a closed set as a set that includes all of its limit points. With these definitions, plenty of useful stuff follows, such as: Every open set is a union of non-overlapping open intervals. For every closed set C, there is some open set G such that every real number is in either C or in G but not both. The union or intersection of any two open sets is another open set. The union or intersection of any two closed sets is another closed set. Every ball is an open set. Every finite set is closed. Other geometric notions come out of this too. For example, an "interior point" of a set S is a point p for which there's some B_{ε}(p) is completely contained in S. Then an open set is precisely a set whose points are all interior points. Every point in [1, 5/2] is an interior point except the boundary points 1 and 5/2. We can define the "interior" of a set as the set of all its interior points, a "boundary point" as a point in a set that isn't an interior point, and the "boundary" of a set as the set of all its boundary points. To generalize these notions to more complex spaces, we can generalize the idea of distance. Consider points in the plane, for example. These points have a well-known distance function, the so-called Pythagorean distance, which says that the distance between (a, b) and (c, d) is, as usual, √((c-a)^{2} + (d-b)^{2}). If we use this as our definition of distance, and defined balls as before, we find that B_{ε}(p) is just a disc of radius ε centered at p. The definitions still make sense. Consider the set of points in the plane that are on or inside the circle of radius 1 centered at a particular point p. The boundary of this set, even under the very abstract definitions above, is just what you would expect: the boundary is the circle itself. The interior is the disc that lies inside the circle, including the center but not the edge. Once again, open sets are sets in the plane that omit their boundaries, and closed sets are those sets that include their boundaries. Most theorems still work; for example, the intersection of two open sets is still an open set, and balls are still examples of open sets. We've started with a very thin, weak-looking base, which was simply the idea of measuring distances. From the simple idea of measurement, we get the balls, and from the balls we get ways to understand geometric notions like boundaries and interiors, and analytic notions like limits. A metric space is a generalization of the idea of measuring distance. The "space" is now a set of things, which could be anything at all: points, or numbers, or train stations, or whatever. The things are customarily called "points". The "metric" describes how you are planning to measure the distance between two points. To be a sensible distance function, the metric needs to have a few simple properties. It must never be negative, must be zero when measuring the distance from a point to itself, and must be positive when measuring the distance between two different points. It must be symmetric, which means that the distance from a to b must be the same as the distance from b to a in all cases. And the metric must satisfy the triangle inequality, which just means that the length of a route from a to b that stops off at c in between must be at least as long as the one that goes directly from a to b. In mathematical notation, we write d(a, b) to represent the distance from a to b. We then want d to have the following properties:
Once you have the metric, you can define the balls: the ball B_{ε}(p) is still the set of all points q for which d(p, q) < ε. (This ball is sometimes written as B_{d,ε}(p), because it depends on the metric.) And once you have defined the balls, you can define the open and closed sets. One thing this gets you is that you can talk about notions of closeness and nearness and limits in spaces that are much less tractable than lines and planes. For example, if you want to do analysis on the surface of a torus (a doughnut shape) this gives you a theoretical basis for treating it as a subset of ordinary three-dimensional space, and helps you understand which theorems of analysis will work on the torus and which ones won't. Another thing it gets you is the opportunity to consider analogous notions in spaces that are nothing at all like lines or planes. Sometimes this might lead to insights that are useful in analysis. Sometimes it might not. But it does keep mathematicians employed. Here's one example—I'm not sure whether it's genuinely useful or whether it serves primarily to keep mathematicians employed. Let's consider a plane, but instead of using the usual Pythagorean distance formula, let's say that the distance between (a, b) and (c, d) is |a-c| + |b-d|. This corresponds to a world in which you can only travel due north, south, east, and west; for this reason it is sometimes called the "Manhattan distance". In Manhattan, when you want to go from (a, b) to (c, d), you first walk east or west on b street, for a distance of |a-c|, until you get to c avenue. You then turn ninety degrees and walk north (or south) on c avenue, for a distance of |b-d|, until you reach the intersection with d street. The path is shown below: This peculiar distance function is indeed a metric. The balls are no longer circular; they are diamond-shaped. The illustration also shows a point P and the diamond-shaped ball B_{1}(P), along with three of the paths (each with length exactly 1) from P to the boundary of the ball.But one interesting thing about the Manhattan metric is that it doesn't affect which sets are open or closed. So in a certain way, it doesn't matter whether you measure distance according to the usual function or according to the Manhattan metric. A metric that is different is the "discrete" metric. In this metric, the distance between points p and q is 1, unless they are the same point, in which case it is 0. You may want to check to make sure that this metric has the requisite properties. Balls in the discrete metric are even weirder than the diamond-shaped balls of the Manhattan metric. A ball around p either includes p and nothing else (if its radius is less than or equal to 1) or else it includes the entire universe (if its radius is bigger than 1). In a space measured with the discrete metric, nothing is close to anything else. Our typical example of closeness was the interval (1, 5/2) and the point 1, which was not inside the interval, but was close to it, because any B_{ε}(1) overlapped the interval, no matter how small ε was. But in the discrete metric, the point is not close to the interval, because the ball B_{1/2}(1) does not overlap the interval—it contains only the point 1, and nothing else! In this article, I've tried to give a motivated and historical account of some of the basic notions of topology, and how they are generalizations of ideas of analysis, for the purpose of better understanding analysis. But I should break the news now that topology, when studied on its own, starts from a somewhat more abstract place. Instead of starting with a metric, and getting the balls from the metric, and the open sets from the balls, it starts with the open sets, and formulates the properties directly in terms of the open sets. This allows you to clean away a lot of unnecessary complication involving arithmetic and questions about Pythagorean vs. Manhattan distances and so on. The usual properties, like "interior" and "boundary" and "connected" can be formulated entirely in terms of the open sets. Now suppose you have two sets, A and B, with two different definitions of open sets. And suppose you know some way to transform A into B and back again, say by rotating it or something, so that open sets in A are transformed into open sets in B, and vice versa on the way back. Then in a certain sense the open sets of A and B are the same, and anything that will be true of A's open sets will be true of B's as well. Since all the properties of interest are defined solely in terms of the open sets, any of these properties possessed by A will also be possessed by B and vice versa, so in terms of topological properties, A |