The Universe of Discourse
           
Sun, 30 Nov 2008

License plate sabotage
A number of years ago I was opening a new bank account, and the bank clerk asked me what style of checks I wanted, with pictures clowns or balloons or whatever. I said I wanted them to be pale blue, possibly with wavy lines. I reasoned that there was no circumstance under which it would be a benefit to me to have my checks be memorable or easily-recognized.

So it is too with car license plates, and for a number of years I have toyed with the idea of getting a personalized plate with II11I11I or 0OO0OO00 or some such, on the theory that there is no possible drawback to having the least legible plate number permitted by law. (If you are reading this post in a font that renders 0 and O the same, take my word for it that 0OO0OO00 contains four letters and four digits.)

A plate number like O0OO000O increases the chance that your traffic tickets (or convictions!) will be thrown out because your vehicle has not been positively identified, or that some trivial clerical error will invalidate them.

Recently a car has appeared in my neighborhood that seems to be owned by someone with the same idea:

In case it's hard to make out in the picture—and remember that that's the whole idea—the license number is 00O0O0. (That's two letters and four digits.) If you are reading this in a font in which 0 and O are difficult or impossible to distinguish—well, remember that that's the whole idea.

Other Pennsylvanians should take note. Consider selecting OO0O0O, 00O00, and other plate numbers easily confused with this one. The more people with easily-confused license numbers, the better the protection.


[Other articles in category /misc] permanent link

Mon, 24 Nov 2008

Variations on the Goldbach conjecture

  1. Every prime number is the sum of two even numbers.
  2. Every odd number is the sum of two primes.
  3. Every even number is the product of two primes.


[Other articles in category /math] permanent link

Wed, 12 Nov 2008

Flag variables in Bourne shell programs
Who the heck still programs in Bourne shell? Old farts like me, occasionally. Of course, almost every time I do I ask myself why I didn't write it in Perl. Well, maybe this will be of some value to some fart even older than me..

Suppose you want to set a flag variable, and then later you want to test it. You probably do something like this:

        if some condition; then
            IS_NAKED=1
        fi

        ...

        if [ "$IS_NAKED" == "1" ]; then
          flag is set
        else
          flag is not set
        fi
Or maybe you use ${IS_NAKED:-0} or some such instead of "$IN_NAKED". Whatever.

Today I invented a different technique. Try this on instead:

        IS_NAKED=false
        if some condition; then
            IS_NAKED=true
        fi

        ...

        if $IS_NAKED; then
          flag is set
        else
          flag is not set
        fi
The arguments both for and against it seem to be obvious, so I won't make them.

I have never seen this done before, but, as I concluded and R.J.B. Signes independently agreed, it is obvious once you see it.

[ Addendum 20090107: some followup notes ]


[Other articles in category /prog] permanent link

Tue, 11 Nov 2008

Another note about Gabriel's Horn
I forgot to mention in the original article that I think referring to Gabriel's Horn as "paradoxical" is straining at a gnat and swallowing a camel.

Presumably people think it's paradoxical that the thing should have a finite volume but an infinite surface area. But since the horn is infinite in extent, the infinite surface area should be no surprise.

The surprise, if there is one, should be that an infinite object might contain a merely finite volume. But we swallowed that gnat a long time ago, when we noticed that the infinitely wide series of bars below covers only a finite area when they are stacked up as on the right.

The pedigree for that paradox goes at least back to Zeno, so perhaps Gabriel's Horn merely shows that there is still some life in it, even after 2,400 years.


[Other articles in category /math] permanent link

Mon, 10 Nov 2008

Gabriel's Horn is not so puzzling
Take the curve y = 1/x for x ≥ 1. Revolve it around the x-axis, generating a trumpet-shaped surface, "Gabriel's Horn".

Elementary calculations, with calculus, allow one to show that although the Horn has finite volume, it has an infinite surface area. This is considered paradoxical, because it says that although an infinite amount of paint is required to cover the interior surface of the horn, the entire interior can be filled up with a finite amount of paint.

The calculations themselves do not lend much insight into what is going on here. But I recently read a crystal-clear explanation that I think should be more widely known.

Take out some Play-Doh and roll out a snake. The surface area of the snake (neglecting the two ends, which are small) is the product of the length and the circumference; the circumference is proportional to the diameter. The volume is the product of the length and the cross-sectional area, which is proportional to the square of the diameter.

Order
Elementary Calculus: An Infinitesimal Approach
Elementary Calculus: An Infinitesimal Approach
with kickback
no kickback
Now roll the snake with your hands so that it becomes half as thick as it was before. Its diameter decreases by half, so its cross-sectional area decreases to one-fourth. Since the volume must remain the same, the snake is now four times as long as it was before. And the surface area, which is the product of the length and the diameter, has doubled.

As you continue to roll the snake thinner and thinner, the volume stays the same, but the surface area goes to infinity.

Gabriel's Horn does exactly the same thing, except without the rolling, because the parts of the Horn that are far from the origin look exactly the same as very long snakes.

There's nothing going on in the Gabriel's Horn example that isn't also happening in the snake example, except that in the explanation of Gabriel's Horn, the situation is obfuscated by calculus.

I read this explanation in H. Jerome Keisler's caclulus textbook. Keisler's book is an ordinary undergraduate calculus text, except that instead of basing everything on limits and on limiting processes, it is based on nonstandard analysis and explicit infinitesimal quantities. Check it out; it is available online for free. (The discussion of Gabriel's Horn is in chapter 6, page 356.)

[ Addendum 20081110: A bit more about this. ]


[Other articles in category /math] permanent link

Thu, 06 Nov 2008

Addenda to recent articles 200810

  • I discussed representing ordinal numbers in the computer and expressed doubt that the following representation truly captured the awesome complexity of the ordinals:

            data Nat = Z | S Nat
            data Ordinal = Zero
                         | Succ Ordinal
                         | Lim (Nat → Ordinal)
    
    In particular, I asked "What about Ω, the first uncountable ordinal?" Several readers pointed out that the answer to this is quite obvious: Suppose S is some countable sequence of (countable) ordinals. Then the limit of the sequence is a countable union of countable sets, and so is countable, and so is not Ω. Whoops! At least my intuition was in the right direction.

    Several people helpfully pointed out that the notion I was looking for here is the "cofinality" of the ordinal, which I had not heard of before. Cofinality is fairly simple. Consider some ordered set S. Say that an element b is an "upper bound" for an element a if ab. A subset of S is cofinal if it contains an upper bound for every element of S. The cofinality of S is the minimum cardinality of its cofinal subsets, or, what is pretty much the same thing, the minimum order type of its cofinal subsets.

    So, for example, the cofinality of ω is ℵ0, or, in the language of order types, ω. But the cofinality of ω + 1 is only 1 (because the subset {ω} is cofinal), as is the cofinality of any successor ordinal. My question, phrased in terms of cofinality, is simply whether any ordinal has uncountable cofinality. As we saw, Ω certainly does.

    But some uncountable ordinals have countable cofinality. For example, let ωn be the smallest ordinal with cardinality ℵn for each n. In particular, ω0 = ω, and ω1 = Ω. Then ωω is uncountable, but has cofinality ω, since it contains a countable cofinal subset {ω0, ω1, ω2, ...}. This is the kind of bullshit that set theorists use to occupy their time.

    A couple of readers brought up George Boolos, who is disturbed by extremely large sets in something of the same way I am. Robin Houston asked me to consider the ordinal number which is the least fixed point of the ℵ operation, that is, the smallest ordinal number κ such that |κ| = ℵκ. Another way to define this is as the limit of the sequence 0, ℵ00, ... . M. Houston describes κ as "large enough to be utterly mind-boggling, but not so huge as to defy comprehension altogether". I agree with the "utterly mind-boggling" part, anyway. And yet it has countable cofinality, as witnessed by the limiting sequence I just gave.

    M. Houston says that Boolos uses κ as an example of a set that is so big that he cannot agree that it really exists. Set theory says that it does exist, but somewhere at or before that point, Boolos and set theory part ways. M. Houston says that a relevant essay, "Must we believe in set theory?" appears in Logic, Logic, and Logic. I'll have to check it out.

    My own discomfort with uncountable sets is probably less nuanced, and certainly less well thought through. This is why I presented it as a fantasy, rather than as a claim or an argument. Just the sort of thing for a future blog post, although I suspect that I don't have anything to say about it that hasn't been said before, more than once.

    Finally, a pseudonymous Reddit user brought up a paper of Coquand, Hancock, and Setzer that discusses just which ordinals are representable by the type defined above. The answer turns out to be all the ordinals less than ωω. But in Martin-Löf's type theory (about which more this month, I hope) you can actually represent up to ε0. The paper is Ordinals in Type Theory and is linked from here.

    Thanks to Charles Stewart, Robin Houston, Luke Palmer, Simon Tatham, Tim McKenzie, János Krámar, Vedran Čačić, and Reddit user "apfelmus" for discussing this with me.

    [ Meta-addendum 20081130: My summary of Coquand, Hancock, and Setzer's results was utterly wrong. Thanks to Charles Stewart and Peter Hancock (one of the authors) for pointing this out to me. ]

  • Regarding homophones of numeral words, several readers pointed out that in non-rhotic dialects, "four" already has four homophones, including "faw" and "faugh". To which I, as a smug rhotician, reply "feh".

    One reader wondered what should be done about homophones of "infinity", while another observed that a start has already been made on "googol". These are just the sort of issues my proposed Institute is needed to investigate.

    One clever reader pointed out that "half" has the homophone "have". Except that it's not really a homophone. Which is just right!


[Other articles in category /addenda] permanent link

Election results
Regardless of how you felt about the individual candidates in the recent American presidential election, and regardless of whether you live in the United States of America, I hope you can appreciate the deeply-felt sentiment that pervades this program:

        #!/usr/bin/perl

        my $remain = 1232470800 - time();
        $remain > 0 or print("It's finally over.\n"), exit;

        my @dur;
        for (60, 60, 24, 100000) {
          unshift @dur, $remain % $_;
          $remain -= $dur[0];
          $remain /= $_;  
        }

        my @time = qw(day days hour hours minute minutes second seconds);
        my @s;
        for (0 .. $#dur) {
          my $n = $dur[$_] or next;
          my $unit = $time[$_*2 + ($n != 1)];
          $s[$_] = "$n $unit"; 
        }
        @s = grep defined, @s;

        $s[-1] = "and $s[-1]" if @s > 2;
        print join ", ", @s;
        print "\n";


[Other articles in category /politics] permanent link

Mon, 03 Nov 2008

Atypical Typing
I just got back from Nashville, Tennessee, where I delivered a talk at OOPSLA 2008, my first talk as an "invited speaker". This post is a bunch of highly miscellaneous notes about the talk.

If you want to skip the notes and just read the talk, here it is.

Talk abstract

Many of the shortcomings of Java's type system were addressed by the addition of generics to Java 5.0. Java's generic types are a direct outgrowth of research into strong type systems in languages like SML and Haskell. But the powerful, expressive type systems of research languages like Haskell are capable of feats that exceed the dreams of programmers familiar only with mainstream languages.

In this talk I'll give a brief retrospective on the history of type systems and an introduction to the type system of the Haskell language, including a remarkable example where the Haskell type checker diagnoses an infinite loop bug at compile time.

I did not say in the abstract that the talk was a retread of a talk I gave for the Perl mongers in 1999 titled "Strong Typing Doesn't Have to Suck. Nobody wants to hear that. Still, the talk underwent a major rewrite, for all the obvious reasons.

In 1999, the claim that strong typing does not have to suck was surprising news, and particularly so to Perl Mongers. In 2008, however, this argument has been settled by Java 5, whose type system demonstrates pretty conclusively that strong typing doesn't have to suck. I am not saying that you must like it, and I am not saying that there is no room for improvement. Indeed, it's obvious that the Java 5 type system has room for improvement: if you take the SML type system of 15 years ago, and whack on it with a hammer until it's chipped and dinged all over, you get the Java 5 type system; the SML type system of the early 1990s is ipso facto an improvement. But that type system didn't suck, and neither does Java's.

So I took out the arguments about how static typing didn't have to suck, figuring that most of the OOPSLA audience was already sold on this point, and took a rather different theme: "Look, this ivory-tower geekery turned out to be important and useful, and its current incarnation may turn out to be important and useful in the same way and for the same reasons."

In 1999, I talked about Hindley-Milner type systems, and although it was far from clear at the time that mainstream languages would follow the path blazed by the HM languages, that was exactly what happened. So the HM languages, and Haskell in particular, contained some features of interest, and, had you known then how things would turn out, would have been worth looking at. But Haskell has continued to evolve, and perhaps it still is worth looking at.

Or maybe another way to put it: If the adoption of functional programming ideas into the mainstream took you by surprise, fair enough, because sometimes these things work out and sometimes they don't, and sometimes they get adopted and sometimes they don't. But if it happens again and takes you by surprise again, you're going to look like a dumbass. So start paying attention!

Haskell types are hard to explain

I spent most of the talk time running through some simple examples of Haskell's type inference algorithm, and finished with a really spectacular example that I first saw in a talk by Andrew R. Koenig at San Antonio USENIX where the type checker detects an infinite-loop bug in a sorting function at compile time. The goal of the 1999 talk was to explain enough of the ML type system that the audience would appreciate this spectacular example. The goal of the 2008 talk was the same, except I wanted to do the examples in Haskell, because Haskell is up-and-coming but ML is down-and-going.

It is a lot easier to explain ML's type system than it is to explain Haskell's. Partly it's because ML is simpler to begin with, but also it's because Haskell is so general and powerful that there are very few simple examples! For example, in SML one can demonstrate:

        (* SML *)
        val 3 : int;
        val 3.5 : real;
which everyone can understand.

But in Haskell, 3 has the type (Num t) ⇒ t, and 3.5 has the type (Fractional t) ⇒ t. So you can't explain the types of literal numeric constants without first getting into type classes.

The benefit of this, of course, is that you can write 3 + 3.5 in Haskell, and it does the right thing, whereas in ML you get a type error. But it sure does make it a devil to explain.

Similarly, in SML you can demonstrate some simple monomorphic functions:

	  not : bool → bool
	 real : int → real
	 sqrt : real → real
	floor : real → int
Of these, only not is simple in Haskell:

            not :: Bool → Bool
    fromInteger :: (Num a) ⇒ Integer → a    -- analogous to 'real'
           sqrt :: (Floating a) ⇒ a → a
          floor :: (RealFrac a, Integral b) ⇒ a → b
There are very few monomorphic functions in the Haskell standard prelude.

Slides

I'm still using the same slide-generation software I used in 1999, which makes me happy. It's a giant pile of horrible hacks, possibly the worst piece of software I've ever written. I'd like to hold it up as an example of "worse is better", but actually I think it only qualifies for "bad is good enough". I should write a blog article about this pile of hacks, just to document it for future generations.

Conference plenary sessions

This was the first "keynote session" I had been to at a conference in several years. One of the keynote speakers at a conference I attended was such a tedious, bloviating windbag that I walked out and swore I would never attend another conference plenary session. And I kept that promise until last week, when I had to attend, because now I was not only the bloviating windbag behind the lectern, but an oath-breaker to boot. This is the "shameful confession" alluded to on slide 3.

On the other hand...

One of the highest compliments I've ever received. It says "John McCarthy will be there. Mark Jason Dominus, too." Wow, I'm almost in the same paragraph with John McCarthy.

McCarthy didn't actually make it, unfortunately. But I did get to meet Richard Gabriel and Gregor Kiczales. And Daniel Weinreb, although I didn't know who he was before I met him. But now I'm glad I met Daniel Weinreb. During my talk I digressed to say that anyone who is appalled by Perl's regular expression syntax should take a look at Common Lisp's format feature, which is even more appalling, in much the same way. And Weinreb, who had been sitting in the front row and taking copious notes, announced "I wrote format!".

More explaining of jokes

As I get better at giving conference talks, the online slides communicate less and less of the amusing part of the content. You might find it interesting to compare the 1999 version of this talk with the 2008 version.

One joke, however, is too amusing to leave out. At the start of the talk, I pretended to have forgotten my slides. "No problem," I said. "All my talks these days are generated automatically by the computer anyway. I'll just rebuild it from scratch." I then displayed this form, which initialliy looked like this:

I then filled out the form appropriately for OOPSLA:

I pushed the button, and poof! Instant slides.

Wadler's anecdote

I had the chance to talk to Philip Wadler, one of the designers of Haskell and of the Java generics system, before the talk. I asked him about the history of the generics feature, and he told me the following story: At this point in the talk I repeated an anecdote that Wadler told me. After he and Odersky had done the work on generics in their gj and "Pizza" projects, Odersky was hired by Sun to write the new Java compiler. Odersky thought the generics were a good idea, so he put them into the compiler. At first the Sun folks always ran the compiler with the generics turned off. But they couldn't rip out the generics support completely, because they needed it in the compiler in order to get it to compile its own source code, which Odersky had written with generics. So Sun had to leave the feature in, and eventually they started using it, and eventually they decided they liked it. I related this story in the talk, but it didn't make it onto the slides, so I'm repeating it here.

I had never been to OOPSLA, so I also asked Wadler what the OOPSLA people would want to hear about. He mentioned STM, but since I don't know anything about STM I didn't say anything about it.

View it online

The slides are online.

[ Addendum 20081031: Thanks to a Max Rabkin for pointing out that Haskell's analogue of real is fromInteger. I don't know why this didn't occur to me, since I mentioned it in the talk. Oh well. ]


[Other articles in category /talk] permanent link

Fri, 31 Oct 2008

A proposed correction to an inconsistency in English orthography
English contains exactly zero homophones of "zero", if one ignores the trivial homophone "zero", as is usually done.

English also contains exactly one homophone of "one", namely "won".

English does indeed contain two homophones of "two": "too" and "to".

However, the expected homophones of "three" are missing. I propose to rectify this inconsistency. This is sure to make English orthography more consistent and therefore easier for beginners to learn.

I suggest the following:

thrie
threigh
thurry
I also suggest the founding of a well-funded institute with the following mission:

  1. Determine the meanings of these three new homophones
  2. Conduct a public education campaign to establish them in common use
  3. Lobby politicians to promote these new words by legislation, educational standards, public funding, or whatever other means are appropriate
  4. Investigate the obvious sequel issues: "four" has only "for" and "fore" as homophones; what should be done about this?
Obviously, the director of this institute should be a thoughtful, far-seeing individual who will not allow his good judgement to be clouded by the generous salary. I refer, of course, to myself.

Happy Halloween. All Hail Discordia.

[ Addendum 20081106: Some readers inexplicably had nothing better to do than to respond to this ridiculous article. ]


[Other articles in category /lang] permanent link

Mon, 13 Oct 2008

The speed of electricity
For some reason I have needed to know this several times in the past few years: what is the speed of electricity? And for some reason, good answers are hard to come by.

(Warning: as with all my articles on physics, readers are cautioned that I do not know what I am talking about, but that I can talk a good game and make up plenty of plausible-sounding bullshit that sounds so convincing that I believe it myself. Beware of bullshit.)

If you do a Google search for "speed of electricity", the top hit is Bill Beaty's long discourse on the subject. In this brilliantly obtuse article, Beaty manages to answer just about every question you might have about everything except the speed of electricity, and does so in a way that piles confusion on confusion.

Here's the funny thing about electricity. To have electricity, you need moving electrons in the wire, but the electrons are not themselves the electricity. It's the motion, not the electrons. It's like that joke about the two rabbinical students who are arguing about what makes tea sweet. "It's the sugar," says the first one. "No," disagrees the other, "it's the stirring." With electricity, it really is the stirring.

We can understand this a little better with an analogy. Actually, several analogies, each of which, I think, illuminates the others. They will get progressively closer to the real truth of the matter, but readers are cautioned that these are just analogies, and so may be misleading, particularly if overextended. Also, even the best one is not really very good. I am introducing them primarily to explain why I think M. Beaty's answer is obtuse.

  1. Consider a garden hose a hundred feet long. Suppose the hose is already full of water. You turn on the hose at one end, and water starts coming out the other end. Then you turn off the hose, and the water stops coming out. How long does it take for the water to stop coming out? It probably happens pretty darn fast, almost instantaneously.

    This shows that the "signal" travels from one end of the hose to the other at a high speed—and here's the key idea—at a much higher speed than the speed of the water itself. If the hose is one square inch in cross-section, its total volume is about 5.2 gallons. So if you're getting two gallons per minute out of it, that means that water that enters the hose at the faucet end doesn't come out the nozzle end until 156 seconds later, which is pretty darn slow. But it certainly isn't the case that you have to wait 156 seconds for the water to stop coming out after you turn off the faucet. That's just how long it would take to empty the hose. And similarly, you don't have to wait that long for water to start coming out when you turn the faucet on, unless the hose was empty to begin with.

    The water is like the electrons in the wire, and electricity is like that signal that travels from the faucet to the nozzle when you turn off the water. The electrons might be travelling pretty slowly, but the signal travels a lot faster.

  2. You're waiting in the check-in line at the airport. One of the clerks calls "Can I help who's next?" and the lady at the front of the line steps up to the counter. Then the next guy in line steps up to the front of the line. Then the next person steps up. Eventually, the last person in line steps up. You can imagine that there's a "hole" that opens up at the front of the line, and the hole travels backwards through the line to the back end.

    How fast does the hole travel? Well, it depends. But one thing is sure: the speed at which the hole moves backward is not the same as the speed at which the people move forward. It might take the clerks another hour to process the sixty people in line. That does not mean that when they call "next", it will take an hour for the hole to move all the way to the back. In fact, the rate at which the hole moves is to a large extent independent of how fast the people in the line are moving forward.

    The people in the line are like electrons. The place at which the people are actually moving—the hole—is the electricity itself.

  3. In the ocean, the waves start far out from shore, and then roll in toward the shore. But if you look at a cork bobbing on the waves, you see right away that even though the waves move toward the shore, the water is staying in pretty much the same place. The cork is not moving toward the shore; it's bobbing up and down, and it might well stay in the same place all day, bobbing up and down. It should be pretty clear that the speed with which the water and the cork are moving up and down is only distantly related to the speed with which the waves are coming in to shore. The water is like the electrons, and the wave is like the electricity.

  4. A bomb explodes on a hill, and sometime later Ike on the next hill over hears the bang. This is because the exploding bomb compresses the air nearby, and then the compressed air expands, compressing the air a little way away again, and the compressed air expands and compresses the air a little way farther still, and so there's a wave of compression that spreads out from the bomb until eventually the air on the next hill is compressed and presses on Ike's eardrums. It's important to realize that no individual air molecule has traveled from hill A to hill B. Each air molecule stays in pretty much the same place, moving back and forth a bit, like the water in the water waves or the people in the airport queue. Each person in the airport line stays in pretty much the same place, even though the "hole" moves all the way from the front of the line to the back. Similarly, the air molecules all stay in pretty much the same place even as the compression wave goes from hill A to hill B. When you speak to someone across the room, the sound travels to them at a speed of 680 miles per hour, but they are not bowled over by hurricane-force winds. (Thanks to Aristotle Pagaltzis for suggesting that I point this out.) Here the air molecules are like the electrons in the wire, and the sound is like the electricity.

OK, where did all these analogies get us? I wanted to make clear that in each of these phenomena, there are two kinds of moving things. There is a motion of the concrete particles in the medium itself: water, or air, or people in the queue. And then there is a more subtle phenomenon, which is the motion of the wave of change through the medium. And the speeds of these two things are related in a complex way, if at all.

I believe that when someone asks for the speed of electricity, what they are typically after is something like: When I flip the switch on the wall, how long before the light goes on? Or: the ALU in my computer emits some bits. How long before those bits get to the output bus? Or again: I send a telegraph message from Nova Scotia to Ireland on an undersea cable. How long before the message arrives in Ireland? Or again: computers A and B are on the same branch of an ethernet, 10 meters apart. How long before a packet emitted by A's ethernet hardware gets to B's ethernet hardware?

M. Beaty's answer about the speed of the electrons is totally useless as an answer to this kind of question. It's a really detailed, interesting answer to a question to which hardly anyone was interested in the answer.

Here the analogy with the speed of sound really makes clear what is wrong with M. Beaty's answer. I set off a bomb on one hill. How long before Ike on the other hill a mile away hears the bang? Or, in short, "what is the speed of sound?" M. Beaty doesn't know what the speed of sound is, but he is glad to tell you about the speed at which the individual air molecules are moving back and forth, although this actually has very little to do with the speed of sound. He isn't going to tell you how long before the tsunami comes and sweeps away your village, but he has plenty to say about how fast the cork is bobbing up and down on the water.

That's all fine, but I don't think it's what people are looking for when they want the speed of electricity. So the individual charges in the wire are moving at 2.3 mm/s; who cares? As M. Beaty was at some pains to point out, the moving charges are not themselves the electricity, so why bring it up?

I wanted to end this article with a correct and pertinent answer to the question. For a while, I was afraid I was going to have to give up. At first, I just tried looking it up on the web. Many people said that the electricity travels at the speed of light, c. This seemed rather implausible to me, for various reasons. (That's another essay for another day.) And there was widespread disagreement about how fast it really was. For example:

But then I found this page on the characteristic impedance of coaxial cables and other wires, which seems rather more to the point than most of the pages I have found that purport to discuss the "speed of electricity" directly.

From this page, we learn that the thing I have been referring to as the "speed of electricity" is called, in electrical engineering jargon, the "velocity factor" of the wire. And it is a simple function of the "dielectric constant" not of the wire material itself, but of the insulation between the two current-carrying parts of the wire! (In typical physics fashion, the dielectric "constant" is anything but; it depends on the material of which the insulation is made, the temperature, and who knows what other stuff they aren't telling me. Dielectric constants in the rest of the article are for substances at room temperature.) The function is simply:

$$V = {c\over\sqrt{\varepsilon_r}}$$

where V is the velocity of electricity in the wire, and εr is the dielectric constant of the insulating material, relative to that of vacuum. Amazingly, the shape, material, and configuration of the wire doesn't come into it; for example it doesn't matter if the wire is coaxial or twin parallel wires. (Remember the warning from the top of the page: I don't know what I am talking about.) Dielectric constants range from 1 up to infinity, so velocity ranges from c down to zero, as one would expect. This explains why we find so many inconsistent answers about the speed of electricity: it depends on a specific physical property of the wire. But we can consider some common examples.

Wikipedia says that the dielectric constant of rubber is about 7 (and this website specifies 6.7 for neoprene) so we would expect the speed of electricity in rubber-insulated wire to be about 0.38c. This is not quite accurate, because the wires are also insulated by air and by the rest of the universe. But it might be close to that. (Remember that warning!)

The dielectric constant of air is very small—Wikipedia says 1.0005, and the other site gives 1.0548 for air at 100 atmospheres pressure—so if the wires are insulated only by air, the speed of electricity in the wires should be very close to the speed of light.

We can also work the calculation the other way: this web page says that signal propagation in an ethernet cable is about 0.66c, so we infer that the dielectric constant for the insulator is around 1/0.662 = 2.3. We look up this number in a a table of dielectric constants and guess from that that the insulator might be polyethylene or something like it. (This inference would be correct.)

What's the lower limit on signal propagation in wires? I found a reference to a material with a dielectric constant of 2880. Such a material, used as an insulator between two wires, would result in a velocity of about 2% of c, which is still 5600 km/s. this page mentions cement pastes with "effective dielectric constants" up around 90,000, yielding an effective velocity of 1/300 c, or 1000 km/s.

Finally, I should add that the formula above only applies for direct currents. For varying currents, such as are typical in AC power lines, the dielectric constant apparently varies with time (some constant!) and the analysis is more complicated.


[Other articles in category /physics] permanent link

Fri, 10 Oct 2008

Representing ordinal numbers in the computer and elsewhere
Lately I have been reading Andreas Abel's paper "A semantic analysis of structural recursion", because it was a referred to by David Turner's paper on total functional programming.

The Turner paper is a must-read. It's about functional programming in languages where every program is guaranteed to terminate. This is more useful than it sounds at first.

Turner's initial point is that the presence of ⊥ values in languages like Haskell spoils one's ability to reason from the program specification. His basic example is simple:

        loop :: Integer -> Integer
        loop x = 1 + loop x
Taking the function definition as an equation, we subtract (loop x) from both sides and get
0 = 1
which is wrong. The problem is that while subtracting (loop x) from both sides is valid reasoning over the integers, it's not valid over the Haskell Integer type, because Integer contains a ⊥ value for which that law doesn't hold: 1 ≠ 0, but 1 + ⊥ = 0 + ⊥.

Before you can use reasoning as simple and as familiar as subtracting an expression from both sides, you first have to prove that the value of the expression you're subtracting is not ⊥.

By banishing nonterminating functions, one also banishes ⊥ values, and familiar mathematical reasoning is rescued.

You also avoid a lot of confusing language design issues. The whole question of strictness vanishes, because strictness is solely a matter of what a function does when its argument is ⊥, and now there is no ⊥. Lazy evaluation and strict evaluation come to the same thing. You don't have to wonder whether the logical-or operator is strict in its first argument, or its second argument, or both, or neither, because it comes to the same thing regardless.

The drawback, of course, is that if you do this, your language is no longer Turing-complete. But that turns out to be less of a problem in practice than one would expect.

The paper was so interesting that I am following up several of its precursor papers, including Abel's paper, about which the Turner paper says "The problem of writing a decision procedure to recognise structural recursion in a typed lambda calculus with case-expressions and recursive, sum and product types is solved in the thesis of Andreas Abel." And indeed it is.

But none of that is what I was planning to discuss. Rather, Abel introduces a representation for ordinal numbers that I hadn't thought much about before.

I will work up to the ordinals via an intermediate example. Abel introduces a type Nat of natural numbers:

Nat = 1 ⊕ Nat
The "1" here is not the number 1, but rather a base type that contains only one element, like Haskell's () type or ML's unit type. For concreteness, I'll write the single value of this type as '•'.

The ⊕ operator is the disjoint sum operator for types. The elements of the type ST have one of two forms. They are either left(s) where sS or right(t) where tT. So 1⊕1 is a type with exactly two values: left(•) and right(•).

The values of Nat are therefore left(•), and right(n) for any element n of Nat. So left(•), right(left(•)), right(right(left(•))), and so on. One can get a more familiar notation by defining:

0 = left(•)
Succ(n) = right(n)
And then one just considers 3 to be an abbreviation for Succ(Succ(Succ(0))) as usual. (In this explanation, I omitted some technical details about recursive types.)

So much for the natural numbers. Abel then defines a type of ordinal numbers, as:

Ord = (1 ⊕ Ord) ⊕ (NatOrd)
In this scheme, an ordinal is either left(left(•)), which represents 0, or left(right(n)), which represents the successor of the ordinal n, or right(f), which represents the limit ordinal of the range of the function f, whose type is NatOrd.

We can define abbreviations:

Zero = left(left(•))
Succ(n) = left(right(n))
Lim(f) = right(f)
So 0 = Zero, 1 = Succ(0), 2 = Succ(1), and so on. If we define a function id which maps Nat into Ord in the obvious way:

        id :: NatOrd
        id 0       = Zero
        id (n + 1) = Succ(id n)
then ω = Lim(id). Then we easily get ω+1 = Succ(ω), etc., and the limit of this function is 2ω:

        plusomega :: NatOrd
        plusomega 0       = Lim(id)
        plusomega (n + 1) = Succ(plusomega n)
We can define an addition function on ordinals:

        + :: OrdOrdOrd
        ord + Zero    = ord
        ord + Succ(n) = Succ(ord + n)
        ord + Lim(f)  = Limx. ord + f(x))
This gets us another way to make 2ω: 2ω = Limx.id(x) + ω).

Then this function multiplies a Nat by ω:

        timesomega :: NatOrd
        timesomega 0       = Zero
        timesomega (n + 1) = ω + (timesomega n)
and Lim(timesomega) is ω2. We can go on like this.

But here's what puzzled me. The ordinals are really, really big. Much too big to be a set in most set theories. And even the countable ordinals are really, really big. We often think we have a handle on uncountable sets, because our canonical example is the real numbers, and real numbers are just decimal numbers, which seem simple enough. But the set of countable ordinals is full of weird monsters, enough to convince me that uncountable sets are much harder than most people suppose.

So when I saw that Abel wanted to define an arbitrary ordinals as a limit of a countable sequence of ordinals, I was puzzled. Can you really get every ordinal as the limit of a countable sequence of ordinals? What about Ω, the first uncountable ordinal?

Well, maybe. I can't think of any reason why not. But it still doesn't seem right. It is a very weird sequence, and one that you cannot write down. Because suppose you had a notation for all the ordinals that you would need. But because it is a notation, the set of things it can denote is countable, and so a fortiori the limit of all the ordinals that it can denote is a countable ordinal, not Ω.

And it's all very well to say that the sequence starts out (0, ω, 2ω, ω2, ωω, ε0, ε1, εε0, ...), or whatever, but the beginning of the sequence is totally unimportant; what is important is the end, and we have no way to write the end or to even comprehend what it looks like.

So my question to set theory experts: is every limit ordinal the least upper bound of some countable sequence of ordinals?

I hate uncountable sets, and I have a fantasy that in the mathematics of the 23rd Century, uncountable sets will be looked back upon as a philosophical confusion of earlier times, like Zeno's paradox, or the luminiferous aether.

[ Addendum 20081106: Not every limit ordinal is the least upper bound of some countable sequence of (countable) ordinals, and my guess that Ω is not was correct, but the proof is so simple that I was quite embarrassed to have missed it. More details here. ]


[Other articles in category /math] permanent link

Wed, 01 Oct 2008

The Lake Wobegon Distribution
Michael Lugo mentioned a while back that most distributions are normal. He does not, of course, believe any such silly thing, so please do not rush to correct him (or me). But the remark reminded me of how many people do seem to believe that most distributions are normal. More than once on internet mailing lists I have encountered people who ridiculed others for asserting that "nearly all x are above [or below] average". This is a recurring joke on Prairie Home Companion, broadcast from the fictional town of Lake Wobegon, where "all the women are strong, all the men are good looking, and all the children are above average." And indeed, they can't all be above average. But they could nearly all be above average. And this is actually an extremely common situation.

To take my favorite example: nearly everyone has an above-average number of legs. I wish I could remember who first brought this to my attention. James Kushner, perhaps?

But the world abounds with less droll examples. Consider a typical corporation. Probably most of the employees make a below-average salary. Or, more concretely, consider a small company with ten employees. Nine of them are paid $40,000 each, and one is the owner, who is paid $400,000. The average salary is $76,000, and 90% of the employees' salaries are below average.

The situation is familiar to people interested in baseball statistics because, for example, most baseball players are below average. Using Sean Lahman's database, I find that 588 players received at least one at-bat in the 2006 National League. These 588 players collected a total of 23,501 hits in 88,844 at-bats, for a collective batting average of .265. Of these 588, only 182 had an individual batting average higher than 265. 69% of the baseball players in the 2006 National League were below-average hitters. If you throw out the players with fewer than 10 at-bats, you are left with 432 players of whom 279, or 65%, hit worse than their collective average of 23430/88325 = .265. Other statistics, such as earned-run averages, are similarly skewed.

The reason for this is not hard to see. Baseball-hitting talent in the general population is normally distributed, like this:

Here the right side of the graph represents the unusually good hitters, of whom there aren't very many. The left side of the graph represents the unusually bad hitters; there aren't many of those either. Most people are somewhere in the middle, near the average, and there are about as many above-average hitters as below-average hitters in the general population.

But major-league baseball players are not the general population. They are carefully selected, among the best of the best. They are all chosen from the right-hand edge of the normal curve. The people in the middle of the normal curve, people like me, play baseball in Clark Park, not in Quankee Stadium.

Here's the right-hand corner of the curve above, highly magnified:

As you can see here, the shape is not at all like the curve for the general population, which had the vast majority of the population in the middle, around the average. Here, the vast majority of the population is way over on the left side, just barely good enough to play in the majors, hanging on to their jobs by the skin of their teeth, subject at any moment to replacement by some kid up from the triple-A minors. The above-average players are the ones over on the right end, the few of the few.

Actually I didn't present the case strongly enough. There are around 800 regular major-league ballplayers in the USA, drawn from a population of around 300 million, a ratio of one per 375,000. Well, no, the ratio is smaller, since the U.S. leagues also draw the best players from Mexico, Venezuela, Canada, the Dominican Republic, Japan, and elsewhere. The curve above is much too inclusive. The real curve for major-league ballplayers looks more like this:

(Note especially the numbers on the y-axis.)

This has important implications for the analysis of baseball. A player who is "merely" above average is a rare and precious resource, to be cherished; far more players are below average. Skilled analysts know that comparisons with the "average" player are misleading, because baseball is full of useful, effective players who are below average. Instead, analysts compare players to a hypothetical "replacement level", which is effectively the leftmost edge of the curve, the level at which a player can be easily replaced by one of those kids from triple-A ball.

In the Historical Baseball Abstract, Bill James describes some great team, I think one of the Cincinnati Big Red Machine teams of the mid-1970s, as "possibly the only team in history that was above average at every position". That's an important thing to know about the sport, and about team sports in general: you don't need great players to completely clobber the opposition; it suffices to have players that are merely above average. But if you're the coach, you'd better learn to make do with a bunch of players who are below average, because that's what you have, and that's what the other team will beat you with.

The right-skewedness of the right side of a normal distribution has implications that are important outside of baseball. Stephen Jay Gould wrote an essay about how he was diagnosed with cancer and given six months to live. This sounds awful, and it is awful. But six months was the expected lifetime for patients with his type of cancer—the average remaining lifetime, in other words—and in fact, nearly everyone with that sort of cancer lived less than six months, usually much less. The average was only skewed up as high as six months because of a few people who took years to die. Gould realized this, and then set about trying to find out how the few long-lived outliers survived and what he could do to turn himself into one of the long-lived freaks. And he succeeded, and lived for twenty years, dying eventually at age 60.

My heavens, I just realized that what I've written is an article about the "long tail". I had no idea I was being so trendy. Sorry, everyone.


[Other articles in category /math] permanent link

Fri, 26 Sep 2008

Sprague-Grundy theory
I'm on a small mailing list for math geeks, and there's this one guy there, Richard Penn, who knows everything. Whenever I come up with some idle speculation, he has the answer. For example, back in 2003 I asked:

Let N be any positive integer. Does there necessarily exist a positive integer k such that the base-10 representation of kN contains only the digits 0 through 4?
M. Penn was right there with the answer.

Yesterday, M. Penn asked a question to which I happened to know the answer, and I was so pleased that I wrote up the whole theory in appalling detail. Since I haven't posted a math article in a while, and since the mailing list only has about twelve people on it, I thought I would squeeze a little more value out of it by posting it here.

Richard Penn asked:

N dots are placed in a circle. Players alternate moves, where a move consists of crossing out any one of the remaining dots, and the dots on each side of it (if they remain). The winner is the player who crosses out the last dot. What is the optimal strategy with 19 dots? with 20? Can you generalize?
M. Penn observed that there is a simple strategy for the 20-dot circle, but was not able to find one for the 19-dot circle. But solving such problems in general is made easy by the Sprague-Grundy theory, which I will explain in detail.

0. Short Spoilers

Both positions are wins for the second player to move.

The 20-dot case is trivial, since any first-player move leaves a row of 17 dots, from which the second player can leave two disconnected rows of 7 dots each. Then any first-player move in one of these rows can be effectively answered by the second player in the other row.

The 19-dot case is harder. The first player's move leaves a row of 16 dots. The second player can win by removing 3 dots to leave disconnected rows of 6 and 7 dots. After this, the strategy is complicated, but is easily found by the Sprague-Grundy theory. It's at the end of this article if you want to skip ahead.

Sprague-Grundy theory is a complete theory of all finite impartial games, which are games like this one where the two players have exactly the same moves from every position.

The theory says:

  1. Every such game position has a "value", which is a non-negative integer.
  2. A position is a second-player win if and only if its value is zero.
  3. The value of a position can be calculated from the values of the positions to which the players can move, in a simple way.
  4. The value of a collection of disjoint positions (such as two disconnected rows of dots) can be calculated from the values of its component positions in a simple way.
Order
Winning Ways for Your Mathematical Plays, Vol. 1
Winning Ways for Your Mathematical Plays, Vol. 1
with kickback
no kickback
Long details follow. They are also found in "Winning Ways", Vol I, by Berlekamp, Conway, and Guy.

1. Nim

In the game of Nim, one has some piles of beans, and a legal move is to remove some or all of the beans from any one pile. The winner is the player who takes the last bean. Equivalently, the winner is the last player who has a legal move.

Nim is important because every position in every impartial game is somehow equivalent to a position in Nim, as we will see. In fact, every position in every impartial game is equivalent to a Nim position with at most one heap of beans! Since single Nim-heaps are trivially analyzed, one can completely analyze any impartial game position by calculating the Nim-heap to which it is equivalent.

2. Disjoint sums of games

Definition: The "disjoint sum" A # B of two games A and B is a new game whose rules are as follows: a legal move in A # B is either a move in A or a move in B; the winner is the last player with a legal move.

Three easy exercises:

  1. # is commutative.
  2. # is associative.
  3. Let (a,b,c...) represent the Nim position with heaps a, b, c, etc. Then the game (a,b,c,...) is precisely (a) # (b) # (c) # ... .
Consider the trivial game with no legal moves for anyone. This game is called 0, because:

0 # a = a # 0 = a
for all games a. 0 is a win for the previous player: the next player to move has no legal moves, and loses.

We will call the next player to move "P1", and the player who just moved "P2".

Note that a Nim-heap of 0 beans is precisely the 0 game.

3. Sums of Nim-heaps

We usually represent a single Nim-heap with n beans as "∗n". I'll do that from now on.

We observed that ∗0 is a win for the second player. Observe now that when n is positive, ∗n is a win for the first player, by a trivial strategy.

From now on we will use the symbol "=" to mean a weaker relation on games than strict equality. Two games A and B will be equivalent if their outcomes are the same in a rather strong sense:

A = B means that for any game X, A # X is a winning position if and only if B # X is also.
Taking X = 0, the condition A = B implies that both games have the same outcome in isolation: if one is a first-player win, so is the other. But the condition is stronger than that. Both ∗1 and ∗2 are first-player wins, but ∗1 ≠ ∗2, because ∗1 # ∗1 is a second-player win, while ∗2 # ∗1 is a first-player win.

Exercise: ∗x = ∗y if and only if x = y.

It so happens that the disjoint sum of two Nim-heaps is equivalent to a single Nim-heap:

Nim-sum theorem:a # ∗b = ∗(ab), Where ⊕ is the bitwise exclusive-or operation.

I'll omit the proof, which is pretty easy to find. ⊕ is often described as "write a and b in binary, and add, ignoring all carries." For example 1 ⊕ 2 = 3, and 13 ⊕ 7 = 10. This implies that ∗1 # ∗2 = ∗3, and that ∗13 # ∗7 = ∗10.

Although I omitted the proof that # for Nim-heaps is essentially the ⊕ operation in disguise, there are many natural implications of this that you can use to verify that the claim is plausible. For example:

  1. The Nim-sum theorem implies that ∗0 is a neutral element for #, which we already knew.
  2. Since aa = 0, we have:
    a # ∗a = ∗0 for all a
    That is, ∗a # ∗a is a win for P2. And indeed, P2 has an obvious strategy: whatever P1 does in one pile, P2 does in the other pile. P2 never runs out of legal moves until after P1 does, and so must win.

  3. Since aa = 0, we have, more generally:
    a # ∗a # X = X for all a, X
    No matter what X is, its outcome is the same as that of ∗a # ∗a # X. Why?

    Suppose you are the player with a winning strategy for playing X alone. Then it is easy to see that you have a winning strategy in ∗a # ∗a # X, as follows: ignore the ∗a # ∗a component, until your opponent moves in it, when you should copy their move in the other half of that component. Eventually the ∗a # ∗a part will be used up (that is, reduced to ∗0 # ∗0 = 0) and your opponent will be forced to move in X, whereupon you can continue your winning strategy there until you win.

  4. According to the ⊕ operation, ∗1 # ∗2 = ∗3, and so ∗1 # ∗2 # ∗3 = ∗3 # ∗3 = 0, so P2 should have a winning strategy in ∗1 # ∗2 # ∗3. Which he does: If P1 removes any entire heap, P2 can win by equalizing the remaining heaps, leaving ∗1 # ∗1 = 0 or ∗2 # ∗2 = 0, which he wins easily. If P1 equalizes any two heaps, P2 can remove the third heap, winning the same way.

  5. Let's reconsider the game of the previous paragraph, but change the ∗1 to something else. 2 ⊕ 3 ⊕ x > 0 so if ∗x ≠ 1, ∗2 # ∗3 # ∗x = ∗y, where y>0. Since ∗y is a single nonempty Nim-heap, it is obviously a win for P1, and so ∗2 # ∗3 # ∗x should be equivalent, also a win for P1. What is P1's winning strategy in ∗2 # ∗3 # ∗x? It's easy. If x > 1, then P1 can reduce ∗x to ∗1, leaving ∗2 # ∗3 # ∗1, which we saw is a winning position. And if x = 0, then P1 can move to ∗2 # ∗2 and win.

4. The MEX rule

The important thing about disjoint sums is that they abstract away the strategy. If you have some complicated set of Nim-heaps ∗a # ∗b # ... # ∗z, you can ignore them and pretend instead that they are a single heap ∗(ab ⊕ ... ⊕ z). Your best move in the compound heap can be easily worked out from the corresponding best move in the fictitious single heap.

For example, how do you figure out how to play in ∗2 # ∗3 # ∗x? You consider it as (∗2 # ∗3) # ∗x = ∗1 # ∗x. That is, you pretend that the ∗2 and the ∗3 are actually a single heap of size 1. Then your strategy is to win in ∗1 # ∗x, which you obviously do by reducing ∗x to size 1, or, if ∗x is already ∗0, by changing ∗1 to ∗0.

Now, that is very facile, but ∗2 # ∗3 is not the same game as ∗1, because from ∗1 there is just one legal move, which is to ∗0. Whereas from ∗2 # ∗3 there are several moves. It might seem that your opponent could complicate the situation, say by moving from ∗2 # ∗3 to ∗3, which she could not do if it were really ∗1.

But actually this extra option can't possibly help your opponent, because you have an easy response to that move, which is to move right back to ∗1! If pretending that ∗2 # ∗3 was ∗1 was good before, it is certainly good after you make it ∗1 for real.

From ∗2 # ∗3 there are a whole bunch of moves:

Move to ∗3
Move to ∗2
Move to ∗1 # ∗3 = ∗2
Move to ∗2 # ∗1 = ∗3
Move to ∗2 # ∗2 = ∗0
But you can disregard the first four of these, because they are reversible: if some player X has a winning strategy that works by pretending that ∗2 # ∗3 is identical with ∗1, then the extra options of moving to ∗2 and ∗3 won't help X's opponent, because X can reverse those moves and turn the ∗2 # ∗3 component back into ∗1. So we can ignore these options, and say that there's just one move from ∗2 # ∗3 worth considering further, namely to ∗2 # ∗2 = 0. Since this is exactly the same set of moves that is available from ∗1, ∗2 # ∗3 behaves just like ∗1 in all situations, and have just proved that ∗2 # ∗3 = ∗1.

Unlike the other moves, the move from ∗2 # ∗3 to ∗0 is not reversible. Once someone turns ∗2 # ∗3 into ∗0, by equalizing the piles, it cannot then be turned back into ∗1, or anything else.

Considering this in more generality, suppose we have some game position P where the options are to move to one of several possible Nim-heaps, and M is the smallest Nim-heap that is not among the options. Then P = ∗M. Why? Because P has just the same options that ∗M has, namely the options of moving to one of ∗0 ... ∗(M-1). P also has some extra options, but we can ignore these because they're reversible. If you have a winning strategy in X # ∗M, then you have a winning strategy in X # P also, as follows:

  • If your opponent plays in X, then follow your strategy for X # ∗M, since the same move will also be available in X # P.

  • If your opponent makes P into ∗y, with y < M, then they've discarded their extra options, which are now irrelevant; play as you would if they had moved from X # ∗M to X # ∗y.

  • If your opponent makes P into ∗y, with y > M, then just move from ∗y to ∗M, leaving X + ∗M, which you can win.

MEX Theorem: If all the legal moves from a position P are equivalent to Nim-heaps of sizes {s1, ..., sk}, then P itself is equivalent to a nim-heap of size MEX(s1, ..., sk), where the MEX is the "Minimal EXcluded" element of the set: the smallest nonnegative integer that is not in the set.

For example, let's consider what happens if we augment Nim by adding a special token, called ♦. A player may, in lieu of a regular move, replace ♦ by a pile of beans of any positive size. What effect does this have on Nim?

Since the legal moves from ♦ are {∗1, ∗2, ∗3, ...} and the MEX is 0, ♦ should behave like ∗0. That is, adding a ♦ token to any position should leave the outcome unaffected. And indeed it does. If you have a winning strategy in game G, then you have a winning strategy in G # ♦ also, as follows: If your opponent plays in G, reply in G. If your opponent replaces ♦ with a pile of beans, remove it, leaving only G.

Exercise: Let G be a game where all the legal moves are to Nim-heaps. Then G is a win for P1 if and only if one of the legal moves from G is to ∗0, and a win for P2 if and only if none of the legal moves from G is to ∗0.

5. The Sprague-Grundy theory

An "impartial game" is one where both players have the same moves from every position.

Sprague-Grundy theorem: Any finite impartial game is equivalent to some Nim-heap ∗n, which is the "Nim-value" of the game.

Now let's consider Richard Penn's game, which is impartial. A legal move is to cross out any dot, and the adjacent dot or dots, if any.

The Sprague-Grundy theorem says that every row of dots in Penn's game is equivalent to some Nim-heap. Let's tabulate the size of this heap (the Nim-value) for each row of n dots. We'll represent a row of n dots as [οοοοο...ο]. Obviously, [] = ∗0 so the Nim-value of [] is 0. Also obviously, [ο] = ∗1, since they're exactly the same game.

[οο] = ∗1 also, since the only legal move from [2] is to [] = 0, and the MEX of {0} is 1.

The legal moves from [οοο] are to [] = ∗0 and [ο] = ∗1, so {∗0, ∗1}, and the MEX is 2. So [οοο] = ∗2.

Let's check that this is working. Since the Nim-value of [οοο] is 2, the theory predicts that [οοο] # ∗2 = 0 and so should be a win for P2. P2 should be able to pretend that [οοο] is actually ∗2.

Suppose P1 turns the ∗2 into ∗1, moving to [οοο] # ∗1. Then P2 should turn [οοο] into ∗1 also, which he can do by crossing out an end dot and the adjacent one, leaving [ο] # ∗1, which he easily wins. If P1 turns ∗2 into ∗0, moving to [οοο] # ∗0, then P2 should turn [οοο] into ∗0 also, which he can do by crossing out the middle and adjacent dots, leaving [] # ∗0, which he wins immediately.

If P1 plays in the [οοο] component, she must move to [] or to [ο], each equivalent to some Nim-heap of size x < 2, and P2 can answer by reducing the true Nim-heap ∗2 to contain x beans also.

Continuing our analysis of rows of dots: In Penn's game, the legal moves from [οοοο] are to [οο] and [ο]. Both of these have Nim-value ∗1, so the MEX is 0.

Easy exercise: Since [οοοο] is supposedly equivalent to ∗0, you should be able to show that a player who has a winning strategy in some game G also has a winning strategy in G + [οοοο].

The legal moves from [οοοοο] are to [οοο], [οο], and [ο] # [ο]. The Nim-values of these three games are ∗2, ∗1, and ∗0 respectively, so the MEX is 3 and [οοοοο] = ∗3.

The legal moves from [οοοοοο] are to [οοοο], [οοο], and [ο] # [οο]. The Nim-values of these three games are 0, 2, and 0, so [οοοοοο] = ∗1.

6. Richard Penn's game analyzed

Row of
n dots
Nim-
value
Winning
move
0 0  
1 1 []
2 1 []
3 2 []
4 0  
5 3 [ο] # [ο]
6 1 [ο] # [οο]
7 1 [οο] # [οο]
8 0  
9 3 [οοο] # [οοο]
10 3 [οοοοοοοο]
11 2 [οοοο] # [οοοο]
12 2 [οο] # [οοοοοοο]
13 4 [οοοοο] # [οοοοο]
14 0  
15 5 [οοοοοο] # [οοοοοο]
16 2 [ο × 14]
17 2 [οοοοοοο] # [οοοοοοο]
18 3 [οοο] # [ο × 12]
19 3 [οοοοοοοο] # [οοοοοοοο]
20 0  
Continuing in this way, we get the table of Nim-values that you see at left.

The table says that a row of 19 dots should be a win for P1, if she reduces the Nim-value from 3 to 0. And indeed, P1 has an easy winning strategy, which is to cross the 3 dots in the middle of the row, replacing [οοοοοοοοοοοοοοοοοοο] with [οοοοοοοο] # [οοοοοοοο]. But no such easy strategy obtains in a row of 20 dots, which, indeed, is a win for P2.

The original question involved circles of dots, not rows. But from a circle of n dots there is only one legal move, which is to a row of n-3 dots. From a circle of 20 dots, the only legal move is to [ο × 17] = ∗2, which should be a win for P1. P1 should win by changing ∗2 to ∗0, so should look for the move from [ο × 17] to ∗0. This is the obvious solution Richard Penn discovered: move to [οοοοοοο] # [οοοοοοο]. So the circle of 20 dots is an easy win for P2, the second player.

But for the circle of 19 dots the answer is the same, a win for the second player. The first player must move to [ο × 16] = ∗2, and then the second player should win by moving to a 0 position. [ο × 16] must have such a move, because if it didn't, the MEX rule would imply that its Nim-value was 0 instead of 2. So what's the second player's zero move here? There are actually two options. The second player can win by playing to [ο × 14], or by splitting the row into [οοοοοο] # [οοοοοοο].


7. Complete strategy for 19-bean circle

Just for completeness, let's follow one of these purportedly winning moves in detail. I claimed that the second player could win by moving to [οοοοοο] # [οοοοοοο]. But what next?

First recall that any isolated row of four dots, [οοοο], can be disregarded, because any first-player move in such a row can be answered by a second-player move that crosses out the rest of the row. And any pair of isolated rows of one or two dots, [ο] or [οο], can be similarly disregarded, because any move that crosses out one can be answered by a move that crosses out the other. So in what follows, positions like [οο] # [ο] # [οοοο] will be assumed to have been won by the second player, and we will say that the second player "has an easy win" if he has a move to such a position.

  • The first player has three possible moves in the left [οοοοοο] component, as follows:

    1. If the first player moves to [οοοο] # [οοοοοοο], the second player has an easy win by moving to [οοοο] # [οοοο].

    2. If the first player moves to [οοο] # [οοοοοοο] = ∗2 # ∗1, the second player should reduce the left component to ∗1, by moving to [ο] # [οοοοοοο]. Then no matter what the first player does, the second player has an easy win.

    3. If the first player moves to [ο] # [οο] # [οοοοοοο] = ∗1 # ∗1 # ∗1, the second player can disregard the [ο] # [οο] component. The second player instead plays to [ο] # [οο] # [οοοο] and wins.

  • The first player has four moves in the right [οοοοοοο] component, as follows:

    1. If the first player moves to [οοοοοο] # [οοοοο] = ∗1 # ∗3, the second player should move from ∗3 to ∗1. There must be a move in [οοοοο] to a position with Nim-value 1. (If there weren't, [οοοοο] would have Nim-value 1 instead of 3, by the MEX rule.) Indeed, the second player can move to [οοοοοο] # [οο]. Now whatever the first player does the second player has an easy win, either to [οοοο] or to X # X for some row X.

    2. If the first player moves to [οοοοοο] # [οοοο] = ∗1 # ∗0, the second player should move from ∗1 to ∗0. There must be a move in [οοοοοο] to a position with Nim-value 0, and indeed there is: the second player moves to [οοοο] # [οοοο] and wins.

    3. If the first player moves to [οοοοοο] # [ο] # [οοο] = ∗1 # ∗1 # ∗2, the second player can disregard the ∗1 # ∗1 component and should move in the ∗2 component, to ∗0, which he does by eliminating it entirely, leaving the first player with [οοοοοο] # [ο]. After any move by the first player the second player has an easy win.

    4. If the first player moves to [οοοοοο] # [οο] # [οο] = ∗1 # ∗1 # ∗1, the second player has a number of good choices. The simplest thing to do is to disregard the [οο] # [οο] component and move in the [οοοοοο] to some position with Nim-value 0. Moving to [οοοο] # [οο] # [οο] suffices.

So [ο × 17] is indeed a win for the next player to move, and a circle of 20 dots is therefore a win for the previous player, who is the second player.

But the important point here is not the strategy itself, which is hard to remember, and which could have been found by computer search. The important thing to notice is that computing the table of Nim-values for each row of n dots is easy, and once you have done this, the rest of the strategy almost takes care of itself. Do you need to find a good move from [οοοοοοο] # [οοοοοοοοο] # [οοοοοοοοοο]? There's no need to worry, because the table says that this can be viewed as ∗1 # ∗3 # ∗3, and so a good move is to reduce the ∗1 component, the [οοοοοοο], to ∗0, say by changing it to [οοοο] or to [οο] # [οο]. Whatever your opponent does next, calculating your reply will be similarly easy.


[Other articles in category /math] permanent link

Thu, 18 Sep 2008

Return return
Among the things I read during the past two months was the paper Functional Programming with Overloading and Higher-Order Polymorphism, by Mark P. Jones. I don't remember why I read this, but it sure was interesting. It is an introduction to the new, cool features of Haskell's type system, with many examples. It was written in 1995 when the features were new. They're no longer new, but they are still cool.

There were two different pieces of code in this paper that wowed me. When I started this article, I was planning to write about #2. I decided that I would throw in a couple of paragraphs about #1 first, just to get it out of the way. This article is that couple of paragraphs.

[ Addendum 20080917: Here's the article about #2. ]

Suppose you have a type that represents terms over some type v of variable names. The v type is probably strings but could possibly be something else:

	data Term v = TVar v                -- Type variable
	            | TInt                  -- Integer type
	            | TString               -- String type
		    | Fun (Term v) (Term v) -- Function type
There's a natural way to make the Term type constructor into an instance of Monad:

	instance Monad Term where
	    return v          = TVar v
	    TVar v   >>= f = f v
            TInt     >>= f = TInt
            TString  >>= f = TString
	    Fun d r  >>= f = Fun (d >>= f) (r >>= f)
That is, the return operation just lifts a variable name to the term that consists of just that variable, and the bind operation just maps its argument function over the variable names in the term, leaving everything else alone.

Jones wants to write a function, unify, which performs a unification algorithm over these terms. Unification answers the question of whether, given two terms, there is a third term that is an instance of both. For example, consider the two terms a → Int and String → b, which are represented by Fun (TVar "a") TInt and Fun TString (TVar "b"), respectively. These terms can be unified, since the term String → Int is an instance of both; one can assign a = TString and b = TInt to turn both terms into Fun TString TInt.

The result of the unification algorithm should be a set of these bindings, in this example saying that the input terms can be unified by replacing the variable "a" with the term TString, and the variable "b" with the term TInt. This set of bindings can be represented by a function that takes a variable name and returns the term to which it should be bound. The function will have type v → Term v. For the example above, the result is a function which takes "a" and returns TString, and which takes "b" and returns TInt. What should this function do with variable names other than "a" and "b"? It should say that the variable named "c" is "replaced" by the term TVar "c", and similarly other variables. Given any other variable name x, it should say that the variable x is "replaced" by the term TVar x.

The unify function will take two terms and return one of these substitutions, where the substition is a function of type v → Term v. So the unify function has type:

    unify :: Term v → Term v → (v → Term v)
Oh, but not quite. Because unification can also fail. For example, if you try to unify the terms ab and Int, represented by Fun (TVar "a") (TVar "b") and TInt respectively, the unfication should fail, because there is no term that is an instance of both of those; one represents a function and the other represents an integer. So unify does not actually return a substitution of type v → Term v. Rather, it returns a monad value, which might contain a substitution, if the unification is successful, and otherwise contains an error value. To handle the example above, the unify function will contain a case like this:

	unify	TInt	(Fun _ _) = fail ("Cannot unify" ....)
It will fail because it is not possible to unify functions and integers.

If unification is successful, then instead of using fail, the unify function will construct a substitution and then return it with return. Let's consider the result of unifying TInt with TInt. This unification succeeds, and produces a trivial substitition with no bindings. Or more precisely, every variable x should be "replaced" by the term TVar x. So in this case the substitution returned by unify should be the trivial one, a function which takes x and returns TVar x for all variable names x.

But we already have such a function. This is just what we decided that Term's return function should do, when we were making Term into a monad. So in this case the code for unify is:

	unify	TInt	TInt	  = return return
Yep, in this case the unify function returns the return function.

Wheee!

At this point in the paper I was skimming, but when I saw return return, I boggled. I went back and read it more carefully after that, you betcha.

That's my couple of paragraphs. I was planning to get to this point and then say "But that's not what I was planning to discuss. What I really wanted to talk about was...". But I think I'll break with my usual practice and leave the other thing for tomorrow.

Happy Diada Nacional de Catalunya, everyone!

[ Addendum 20080917: Here's the article about the other thing. ]


[Other articles in category /prog] permanent link

data Mu f = In (f (Mu f))
Last week I wrote about one of two mindboggling pieces of code that appears in the paper Functional Programming with Overloading and Higher-Order Polymorphism, by Mark P. Jones. Today I'll write about the other one. It looks like this:

        data Mu f = In (f (Mu f))                       -- (???)
I bet a bunch of people reading this on Planet Haskell are nodding and saying "Oh, that!"

When I first saw this I couldn't figure out what it was saying at all. It was totally opaque. I still have trouble recognizing in Haskell what tokens are types, what tokens are type constructors, and what tokens are value constructors. Code like (???) is unusually confusing in this regard.

Normally, one sees something like this instead:

        data Maybe f = Nothing | Just f
Here f is a type variable; that is, a variable that ranges over types. Maybe is a type constructor, which is like a function that you can apply to a type to get another type. The most familiar example of a type constructor is List:

        data List e = Nil | Cons e (List e)
Given any type f, you can apply the type constructor List to f to get a new type List f. For example, you can apply List to Int to get the type List Int. (The Haskell built-in list type constructor goes by the funny name of [], but works the same way. The type [Int] is a synonym for ([] Int).)

Actually, type names are type constructors also; they're argumentless type constructors. So we have type constructors like Int, which take no arguments, and type constructors like List, which take one argument. Haskell also has type constructors that take more than one argument. For example, Haskell has a standard type constructor called Either for making union types:

        data Either a b = Left a | Right b;
Then the type Either Int String contains values like Left 37 and Right "Cotton Mather".

To keep track of how many arguments a type constructor has, one can consider the, ahem, type, of the type constructor. But to avoid the obvious looming terminological confusion, the experts use the word "kind" to refer to the type of a type constructor. The kind of List is * → *, which means that it takes a type and gives you back a type. The kind of Either is * → * → *, which means that it takes two types and gives you back a type. Well, actually, it is curried, just like regular functions are, so that Either Int is itself a type constructor of kind * → * which takes a type a and returns a type which could be either an Int or an a. The nullary type constructor Int has kind *.

Continuing the "Maybe" example above, f is a type, or a constructor of kind *, if you prefer. Just is a value constructor, of type fMaybe f. It takes a value of type f and produces a value of type Maybe f.

Now here is a crucial point. In declarations of type constructors, such as these:

        data Either a b = ...
        data List e = ...
        data Maybe f = ...
the type variables a, b, e, and f actually range over type constructors, not over types. Haskell can infer the kinds of the type constructors Either, List, and Maybe, and also the kinds of the type variables, from the definitions on the right of the = signs. In this case, it concludes that all four variables must have kind *, and so really do represent types, and not higher-order type constructors. So you can't ask for Either Int List because List is known to have kind * → *, and Haskell needs a type constructor of kind * to serve as an argument to Either.

But with a different definition, Haskell might infer that a type variable has a higher-order kind. Here is a contrived example, which might be good for something, perhaps. I'm not sure:

        data TyCon f = ValCon (f Int)
This defines a type constructor TyCon with kind (* → *) → *, which can be applied to any type constuctor f that has kind * → *, to yield a type. What new type? The new type TyCon f is isomorphic to the type f of Int. For example, TyCon List is basically the same as List Int. The value Just 37 has type Maybe Int, and the value ValCon (Just 37) has type TyCon Maybe.

Similarly, the value [1, 2, 3] has type [Int], which, you remember, is a synonym for [] Int. And the value ValCon [1, 2, 3] has type TyCon [].

Now that the jargon is laid out, let's look at (???) again:

        data Mu f = In (f (Mu f))                       -- (???)
When I was first trying to get my head around this, I had trouble seeing what the values were going to be. It looks at first like it has no bottom. The token f here, like in the TyCon example, is a variable that ranges over type constructors with kind * → *, so could be List or Maybe or [], something that takes a type and yields a new type. Mu itself has kind (* → *) → *, taking something like f and yielding a type. But what's an actual value? You need to apply the value constructor In to a value of type f (Mu f), and it's not immediately clear where to get such a thing.

I asked on #haskell, and Cale Gibbard explained it very clearly. To do anything useful you first have to fix f. Let's take f = Maybe. In that particular case, (???) becomes:

        data Mu Maybe = In (Maybe (Mu Maybe))
So the In value constructor will take a value of type Maybe (Mu Maybe) and return a value of type Mu Maybe. Where do we get a value of type Maybe (Mu Maybe)? Oh, no problem: the value Nothing is polymorphic, and has type Maybe a for all a, so in particular it has type Maybe (Mu Maybe). Whatever Maybe (Mu Maybe) is, it is a Maybe-type, so it has a Nothing value. So we do have something to get started with.

Since Nothing is a Maybe (Mu Maybe) value, we can apply the In constructor to it, yielding the value In Nothing, which has type Mu Maybe. Then applying Just, of type a → Maybe a, to In Nothing, of type Mu Maybe, produces Just (In Nothing), of type Maybe (Mu Maybe) again. We can repeat the process as much as we want and produce as many values of type Mu Maybe as we want; they look like these:

        In Nothing
        In (Just (In Nothing))
        In (Just (In (Just (In Nothing))))
        In (Just (In (Just (In (Just (In Nothing))))))
        ...
And that's it, that's the type Mu Maybe, the set of those values. It will look a little simpler if we omit the In markers, which don't really add much value. We can just agree to omit them, or we can get rid of them in the code by defining some semantic sugar:

        nothing = In Nothing
        just = In . Just
Then the values of Mu Maybe look like this:
        nothing
        just nothing
        just (just nothing)
        just (just (just nothing))
        ...
It becomes evident that what the Mu operator does is to close the type under repeated application. This is analogous to the way the fixpoint combinator works on values. Consider the usual definition of the fixpoint combinator:

        Y f = f (Y f)
Here f is a function of type aa. Y f is a fixed point of f. That is, it is a value x of type a such that f x = x. (Put x = Y f in the definition to see this.)

The fixed point of a function f can be computed by considering the limit of the following sequence of values:


f(⊥)
f(f(⊥))
f(f(f(⊥)))
...

This actually finds the least fixed point of f, for a certain definition of "least". For many functions f, like xx + 1, this finds the uninteresting fixed point ⊥, but for many f, like x → λ n. if n = 0 then 1 else n * x(n - 1), it's something better.

Mu is analogous to Y. Instead of operating on a function f from values to values, and producing a single fixed-point value, it operates on a type constructor f from types to types, and produces a fixed-point type. The resulting type T is the least fixed point of the type constructor f, the smallest set of values such that f T = T.

Consider the example of f = Maybe again. We want to find a type T such that T = Maybe T. Consider the following sequence:

{ ⊥ }
Maybe { ⊥ }
Maybe(Maybe { ⊥ })
Maybe(Maybe(Maybe { ⊥ }))
...

The first item is the set that contains nothing but the bottom value, which we might call t0. But t0 is not a fixed point of Maybe, because Maybe { ⊥ } also contains Nothing. So Maybe { ⊥ } is a different type from t0, which we can call t1 = { Nothing, ⊥ }.

The type t1 is not a fixed point of Maybe either, because Maybe t1 evidently contains both Nothing and Just Nothing. Repeating this process, we find that the limit of the sequence is the type Mu Maybe = { ⊥, Nothing, Just Nothing, Just (Just Nothing), Just (Just (Just Nothing)), ... }. This type is fixed under Maybe.

It might be worth pointing out that this is not the only such fixed point, but is is the least fixed point. One can easily find larger types that are fixed under Maybe. For example, postulate a special value Q which has the property that Q = Just Q. Then Mu Maybe ∪ { Q } is also a fixed point of Maybe. But it's easy to see (and to show, by induction) that any such fixed point must be a superset of Mu Maybe. Further consideration of this point might take me off to co-induction, paraconsistent logic, Peter Aczel's nonstandard set theory, and I'd never get back again. So let's leave this for now.

So that's what Mu really is: a fixed-point operator for type constructors. And having realized this, one can go back and look at the definition and see that oh, that's precisely what the definition says, how obvious:

              Y f =     f  (Y f)             -- ordinary fixed-point operator
        data Mu f = In (f (Mu f))            -- (???)
Given f, a function from values to values, Y(f) calculates a value x such that x = f(x). Given f, a function from types to types, Mu(f) calculates a type T such that f(T) = T. That's why the definitions are identical. (Except for that annoying In constructor, which really oughtn't to be there.)

You can use this technique to construct various recursive datatypes. For example, Mu Maybe turns out to be equivalent to the following definition of the natural numbers:

        data Number = Zero | Succ Number;
Notice the structural similarity with the definition of Maybe:

        data Maybe a = Nothing | Just a;
One can similarly define lists:

        data Mu f = In (f (Mu f)) 
        data ListX a b = Nil | Cons a b deriving Show
        type List a = Mu (ListX a)

        -- syntactic sugar
        nil :: List a
        nil = In Nil
        cons :: a → List a → List a
        cons x y = In (Cons x y)

        -- for example
        ls = cons 3 (cons 4 (cons 5 nil))          -- :: List Integer
        lt = (cons 'p' (cons 'y' (cons 'x' nil)))  -- :: List Char
Or you could similarly do trees, or whatever. Why one might want to do this is a totally separate article, which I am not going to write today.

Here's the point of today's article: I find it amazing that Haskell's type system is powerful enough to allow one to defined a fixed-point operator for functions over types.

We've come a long way since FORTRAN, that's for sure.

A couple of final, tangential notes: Google search for "Mu f = In (f (Mu f))" turns up relatively few hits, but each hit is extremely interesting. If you're trying to preload your laptop with good stuff to read on a plane ride, downloading these papers might be a good move.

The Peter Aczel thing seems to be less well-known that it should be. It is a version of set theory that allows coinductive definitions of sets instead of inductive definitions. In particular, it allows one to have a set S = { S }, which standard set theory forbids. If you are interested in co-induction you should take a look at this. You can find a clear explanation of it in Barwise and Etchemendy's book The Liar (which I have read) and possibly also in Aczel's book Non Well-Founded Sets (which I haven't read).


[Other articles in category /prog] permanent link

Tue, 09 Sep 2008

Factorials are not quite as square as I thought
(This is a followup to yesterday's article.)

Let s(n) be the smallest perfect square larger than n. Then to have n! = a2 - 1 we must have a2 = s(n!), and in particular we must have s(n!) - n! square.

This actually occurs for n in { 4, 5, 6, 7, 8, 9, 10, 11 }, and since 11 was as far as I got on the lunch line yesterday, I had an exaggerated notion of how common it is. had I worked out another example, I would have realized that after n=11 things start going wrong. The value of s(12!) is 218872, but 218872 - 12! = 39169, and 39169 is not a square. (In fact, the n=11 solution is quite remarkable; which I will discuss at the end of this note.)

So while there are (of course) solutions to 12! = a2 - b2, and indeed where b is small compared to a, as I said, the smallest such b takes a big jump between 11 and 12. For 4 ≤ n ≤ 11, the minimal b takes the values 1, 1, 3, 1, 9, 27, 15, 18. But for n = 12, the solution with the smallest b has b = 288.

Calculations with Mathematica by Mitch Harris show that one has n! = s(n!) - b2 only for n in {1, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16}, and then not for any other n under 1,000. The likelihood that I imagine of another solution for n! = a2 - 1, which was already not very high, has just dropped precipitously.

My thanks to M. Harris, and also to Stephen Dranger, who also wrote in with the results of calculations.

Having gotten this far, I then asked OEIS about the sequence 1, 1, 3, 1, 9, 27, 15, 18, and (of course) was delivered a summary of the current state of the art in n! = a2 - 1. Here's my summary of the summary.

The question is known as "Brocard's problem", and was posed by Brocard in 1876. No solutions are known with n > 7, and it is known that if there is a solution, it must have n > 109. According to the Mathworld article on Brocard's problem, it is believed to be "virtually certain" that there are no other solutions.

The calculations for n ≤ 109 are described in this unpublished paper of Berndt and Galway, which I found linked from the Mathworld article. The authors also investigated solutions of n! = a2 - b2 for various fixed b between 2 and 50, and found no solutions with 12 ≤ n ≤ 105 for any of them. The most interesting was the 11! = 63182 - 182 I mentioned already.

[ The original version of this article contained some confusion about whether s(n) was the largest square less than n, or the largest number whose square was less than n. Thanks to Roie Marianer for pointing out the error. ]


[Other articles in category /math] permanent link

Factorials are almost, but not quite, square
This weekend I happened to notice that 7! = 712 - 1. Is this a strange coincidence? Well, not exactly, because it's not hard to see that

$$n! = a^{2} - b^{2}\qquad (*)$$

will always have solutions where b is small compared to a. For example, we have 11! = 63182 - 182.

But to get b=1 might require a lot of luck, perhaps more luck than there is. (Jeremy Kahn once argued that |2x - 3y| = 1 could have no solutions other than the obvious ones, essentially because it would require much more fabulous luck than was available. I sneered at this argument at the time, but I have to admit that there is something to it.)

Anyway, back to the subject at hand. Is there an example of n! = a2 -1 with n > 7? I haven't checked yet.

In related matters, it's rather easy to show that there are no nontrivial examples with b=0.

It would be pretty cool to show that equation (*) implied n = O(f(b)) for some function f, but I would not be surprised to find out that there is no such bound.

This kept me amused for twenty minutes while I was in line for lunch, anyway. Incidentally, on the lunch line I needed to estimate √11. I described in an earlier article how to do this. Once again it was a good trick, the sort you should keep handy if you are the kind of person who needs to know √11 while standing in line on 33rd Street. Here's the short summary: √11 = √(99/9) = √((100-1)/9) = √((100/9)(1 - 1/100) = (10/3)√(1 - 1/100) ≈ (10/3)(1 - 1/200) = (10/3)(199/200) = 199/60.

[ Addendum 20080909: There is a followup article. ]


[Other articles in category /math] permanent link

Why pi?
Simon Cozens wrote to me yesterday to ask what the heck was up with π:

what property of a circle makes it . . . an irrational number. . . perhaps about as arbitrary a number as you can get.
I thought about this pretty hard, and, to my amazement, I came up with a plausible answer. So here we are.

The one-paragraph summary: My theory is that the association of the very weird and complex number π with a geometric object as simple as a circle is a reflection of the underlying fundamental complexity of Euclidean geometry: specifically, that its metric is a nonlinear function.

First I'm going to spend some time arguing that π does require explanation. I expect that almost everyone will agree that π is weird; if you do agree, feel free to skip this section. Then I'll discuss Euclidean and non-Euclidean geometries. This is important, because the relation between π and circles appears to be a special property of Euclidean geometry, one which does not occur, for example, in spherical geometry. Finally, I'll look at the essential properties of Euclidean geometry, and why I think it is more complex than people usually realize.

π is complex and bizarre

In this section, I'm going to argue that the question is indeed worth asking. π is an extremely peculiar number, even by mathematical standards. You often hear π mentioned in the same breath with e, another constant of fundamental mathematical importance. But e is much more tractable than π is, and much better understood.

In fact, the degree to which π is not understood is rather shocking when you consider its ubiquity.

If you don't need to be persuaded that π is unusually weird, even as transcendental numbers go, you may want to skip to the next section. Really, this section is here to address people who think they know more mathematics than they do, who want to argue that π is no more or less complicated than any other number. But I think it is.

It should be fairly clear that, as a representation of real numbers, decimal fractions are not very satisfactory. For example, you might like simple numbers to have simple representations. But the representation of 1/3 is 0.33333...., which isn't even finite. The fact that a complicated number like like 3674/31250 ("0.117568") has a simpler representation than a simple number like 1/3 just demonstrates that the system is defective. 3674/31250 gets a simple representation not because it is a simple number, but because 31250 happens to divide 106.

This being the case, it is perhaps not too surprising that nobody can make head or tail of the representation of π, which is 3.14159265358979... . As far as I know, the state of our current understanding of this representation of π can be summed up as:

None

But that might just be the fault of the representation. The representation is based on the number 10, and it is not clear that π has anything to do with 10, so our failure to find an answer here may just indicate that the question was not worth asking.

There are better representations of real numbers; one such is the so-called "continued fraction representation". I don't want to explain this in detail in this article, but I can refer you to a talk I gave on the subject. But an itemization of this representation's desirable properties may be persuasive even if you don't know how it works:

  • In continued fraction notation, a number has a finite representation when, and only when, it is rational.

  • The representation is not inappropriately snuggly with the number 10, or with any other number.

  • Simple rationals have simple representations and more complicated rationals have more complicated representations. For example, 1/3 is represented as [0; 3] and 3674/31250 is represented as [0; 8, 1, 1, 43, 4, 5].

  • Some irrational numbers have simple (although infinite) representations. For example, in the customary system, √2 is an incomprehensible soup of digits starting 1.414213562... . In the continued fraction system, it is [1; 2, 2, 2, 2, ...].

  • If you turn an irrational number into a rational one by chopping off the infinite tail of the continued fraction representation, you get a very closely-related rational result, one that is numerically as close as possible to the original number. This is not true of the decimal fraction. If you chop off √2 after a couple of terms of decimal fraction, you get 1.41, which is 141/100. If you chop off √ after a couple of terms of continued fraction, you get [1; 2, 2], which is 7/5. This is slightly less accurate than 141/100, but the denominator is twenty times smaller. If you chop a little later, you get [1; 2, 2, 2], which is 17/12, which is a lot more accurate than 141/100, even though the denominator is only 12.

So maybe our problems with π will be solved by considering its continued fraction representation, which we might hope would be simple and tractable. Sometimes this works, as with √2. The decimal expansion of e is incomprehensible (2.7182818284590452...) but it has a very nice continued fraction representation: [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, ...].

But it doesn't work for π. The continued fraction representation of π is [3; 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, ...], and as far as I know, the state of our current understanding of this representation of π can be summed up as:

None

So much for continued fractions.

We might hope for some understanding about why π is irrational. The proof that √2 is irrational is elementary, and dates back to the Greeks; you can understand it as being related to the fact that 2 is not a perfect square. π was not shown to be irrational until 1761, and the proof is not simple, which means that nobody knows a simple argument about why it should be the complicated thing it is, rather than a simple fraction.

So π is complex and poorly-understood, even compared with other important transcendental numbers like e.

Euclidean geometry

M. Cozens asked:
Is pi inherent in our definition of a circle, or our particular geometry, or our planet, or could the ratio be different in different worlds?
I this question is very insightful. π, at least as it relates to circles, is inherent in a particular geometry, namely, Euclidean geometry.

Euclidean geometry takes its name from Euclid, who wrote Elements, an extremely influential treatise on geometry, about 2300 years ago. Most of the Elements is concerned with plane geometry, which takes place in an infinite, flat two-dimensional space. π arises naturally in this kind of space as the perimeter of a circle.

Non-Euclidean geometry

In non-Euclidean spaces, π is geometrically much less important. For example, consider geometry done on a sphere. If we keep the definition of "line" as the path of shortest distance between two points, then our "lines" turn out to be great circles on the sphere—that is, circles whose centers are at the center of the sphere; the equator is an example. The 49th parallel is not a "line" because it's not a path of shortest distance; for any two points on the 49th parallel, there's a path between them over the surface of the sphere that is shorter than the one along the parallel. (This may seem strange, but it's true, and it's why direct flights from New York to Taipei often stop off in Anchorage, Alaska.) In addition to being a line, the equator is also an example of a circle. What's a circle? Circles look pretty much the way you expect them to. A circle is the set of all points that are some fixed distance from a center point. The equator is all the points that are a certain fixed distance from the north pole. The 49th parallel is also a circle; its center is also the north pole.

The "diameter" of a circle is the longest possible "line" you can draw from one point on the circle to another. A diameter of a circle has the property that it always goes through the center of the circle, as you would hope and expect. For the equator, a diameter goes through the north pole. The picture to the right shows the equator in red, its center, the north pole, in yellow, and a diameter of the equator in blue. The 49th parallel is in green.

Let's say that the circumference of the equator, the red line, is 1. Then the length of the equator's diameter, the blue line, is 1/2. If we were expecting to divide the circumference by the diameter and get π, we are in for a surprise, because we just got 2 instead.

For the 49th parallel, the ratio of circumference to diameter is larger: I calculate about 2.88. For smaller circles, the ratio is larger still. For very small circles, the ratio is very close to π, because a small circle can't tell whether it's on a sphere or in a plane; up close the two things look the same.

So the relation between π and circles is actually a special property of Euclidean geometry. Circles in non-Euclidean spaces have a perimeter-to-diameter ratio that is different from π.

Euclidean metric

The single fundamental property of Euclidean geometry is that the distance between two points, say (x1, y1) and (x2, y2), is ((x2-x1)2 + (y2-y1)2)1/2. (Or, in higher-dimensional spaces, the obvious extension of this formula.) If you change the way you measure distance, you get a different kind of geometry with different kinds of circles that have different perimeter-to-diameter ratios. In an earlier article, I discussed an alternative distance function, called the Manhattan distance, which gives diamond-shaped circles whose perimeter-to-diameter ratio is always 4.

The Euclidean distance function is nothing more than the familiar Pythagorean theorem. It is very difficult for me to imagine any reasonable way to do plane geometry without the Pythagorean theorem. It is just too simple. Even the proof is simple:


Each blue triangle has area ab/2. The left-hand large square is made of four triangles and two smaller squares, for a total area of 4(ab/2) + a2 + b2; the right-hand large square is made of four triangles and one smaller square, for a total area of 4(ab/2) + c2. Each large square has edges whose lengths are a + b, so the two large squares are the same size, and 4(ab/2) + a2 + b2 = 4(ab/2) + c2, or a2 + b2 = c2. End of proof.

So the relationship of π (which is complicated) to circles (which appear to be simple) is grounded in the Euclidean distance formula. If you change the distance formula, π is no longer related to circles. So the weirdness must be due, at least in part, to some complexity in the Euclidean distance formula.

But what's complex about the Euclidean distance formula? How could it be simpler?

Actually, I think it only seems simple because it is so familiar. The Euclidean distance formula is, in some ways, deeply weird. I realized this a few months ago, but everyone I mentioned it to acted like I was insane. But now I'm pretty sure. I think the essence of the problem with π is that the Euclidean distance function is nonlinear in the two spatial coordinates x and y.

Nonlinearity of the Euclidean metric

Linear functions are very well-behaved. If F is a linear function, then F(a+b) = F(a) + F(b), which means that you can calculate the contributions of a and b independently of each other. To calculate F of some very complicated argument, you can break the argument into simple components and deal with them all separately. With quadratic functions like the Euclidean distance function, you cannot do this; complex problems are not easily decomposable into simple ones.

For the Euclidean metric, it means that the horizontalness and verticalosity are not independent, but are tangled together and cannot be separated.

What do we really mean by the perimeter of a circle? The circle is the set of points (x, y) which are at distance 1 from the point (0,0). The only meaningful way I know to talk about the length of this set is to calculate it as a limit of an approximate polygonal path as the path gets more and more segments. So you are necessarily dragging in an infinite limiting process, and such processes are always complicated.

If the distance function were linear, it wouldn't matter, because then you could treat the horizontal and vertical components separately, and when you did that, you would be dealing with paths in one dimension, which, being straight lines, would be simple. You can see this if you consider the Manhattan distance function: It doesn't matter how you get from (x1, y1) to (x2, y2); whatever path you take, whether you take a lot of steps or only one, the distance is always |x2-x1| + |y2-y1|, because the distance function is linear, and thus there is no interaction between the x parts and the y parts. But with a nonlinear distance function like the Euclidean metric, it does matter what path you take.

I was thinking a few months ago about how peculiar this is. I cannot think of anything else that behaves this way. Suppose you have two jugs and you start filling them with milk. You find that to fill each jug separately requires one quart, but to fill both at once requires only 1.4142 quarts. Wouldn't that freak you out? But space does behave like that. To drive ten miles north takes a gallon of gas. To drive ten miles east takes a gallon of gas. North and east are perpendicular and should be completely independent of each other. To drive ten miles north and ten miles east should require two gallons of gas. But it requires only 1.4142 gallons. How the heck did that happen?

I believe that this strange entanglement between north and east, two things one might have supposed were independent, is the ultimate root of what makes the circumference of a circle such a peculiar number. I was very pleased to have this confirmation that the entanglement between horizontal and vertical is strange and complex, because, as I mentioned before, when I tried to explain to people what I found strange about it, they thought I was nuts.

One-dimensional circles

My theory is that the peculiar length of a circle's perimeter is a result of the peculiar interaction between the otherwise apparently independent spatial dimensions in Euclidean space. If this theory is correct, we should expect that the corresponding perimeter in a one-dimensional space will not be peculiar. A one-dimensional Euclidean space, having only one dimension, has no strange interactions between independent directions. And indeed, this is the case! The perimeter of a one-dimensional circle does not involve π. It's simply 2; the "area" (which is really length) is 2r. You only get difficult numbers in spaces of at least 2 dimensions.

Why 3?

M. Cozens also asked me why the number came out to be around 3, rather than around 5 or 57, and there I was on much shakier ground. I did not have any clever insights, and all I could do was itemize a bunch of stuff that seemed to bear on the issue. It will probably appear here in a future article.

[ Addendum: Here it is. ]


[Other articles in category /math] permanent link

Thu, 17 Jul 2008

Closed file descriptors
I wasn't sure whether to file this on the /oops section. It is a mistake, and I spent a lot longer chasing the bug than I should have, because it's actually a simple bug. But it isn't a really big conceptual screwup of the type I like to feature in the /oops section. It concerns a program that I'll discuss in detail tomorrow. In the meantime, here's a stripped-down summary, and a stripped-down version of the code:

        my $command = shift;
        for my $file (@ARGV) {
          if ($file =~ /\.gz$/) {
            my $fh;
            unless (open $fh, "<", $file) {
              warn "Couldn't open $file: $!; skipping\n";
              next;
            }
            my $fd = fileno $fh;
            $file = "/proc/self/fd/$fd";
          }
        }

        exec $command, @ARGV;
        die "Couldn't run command '$command': $!\n";
The idea here is that this program, called z, will preprocess the arguments of some command, and then run the command with the modified arguments. For some of the command-line arguments, here the ones named *.gz, the original file will be replaced by the output of some file descriptor. In the example above, the descriptor is attached to the original file, which is pointless. But once this part of the program was working, I planned to change the code so that the descriptor would be attached to a pipe instead.

Having written something like this, I then ran a test, which failed:

% z cat foo.gz
cat: /proc/self/fd/3: No such file or directory
"Aha," I said instantly. "I know what is wrong. Perl set the close-on-exec flag on file descriptor 3."

You see, after a successful exec, the kernel will automatically close all file descriptors that have the close-on-exec flag set, before the exec'ed image starts running. Perl normally sets the close-on-exec flag on all open files except for standard input, standard output, and standard error. Actually it sets it on all open files whose file descriptor is greater than the value of $^F, but $^F defaults to 2.

So there is an easy fix for the problem: I just set $^F = 100000 at the top of the program. That is not the best solution, but it can be replaced with a better one once the program is working properly. Which I expected it would be:

% z cat foo.gz
cat: /proc/self/fd/3: No such file or directory
Huh, something is still wrong.

Maybe I misspelled /proc/self/fd? No, it is there, and contains the special files that I expected to find.

Maybe $^F did not work the way I thought it did? I checked the manual, but it looked okay.

Nevertheless I put in use Fcntl and used the fcntl function to remove the close-on-exec flags explicitly. The code to do that looks something like this:

    use Fcntl;

    ....

    my $flags = fcntl($fh, F_GETFD, 0);
    fcntl($fh, F_SETFD, $flags & ~FD_CLOEXEC);
And try it again:

% z cat foo.gz
cat: /proc/self/fd/3: No such file or directory
Huh.

I then wasted a lot of time trying to figure out an easy way to tell if the file descriptor was actually open after the exec call. (The answer turns out to be something like this: perl -MPOSIX=fstat -le 'print "file descriptor 3 is ", fstat(3) ? "open" : "closed"'.) This told me whether the error from cat meant what I thought it meant. It did: descriptor 3 was indeed closed after the exec.

Now your job is to figure out what is wrong. It took me a shockingly long time. No need to email me about it; I have it working now. I expect that you will figure it out faster than I did, but I will also post the answer on the blog tomorrow. Sometime on Friday, 21 March 2008, this link will start working and will point to the answer.

[ Addendum 20080321: I posted the answer. ]


[Other articles in category /prog/perl] permanent link

Sat, 12 Jul 2008

runN revisited
Exactly one year ago I discussed runN, a utility that I invented for running the same command many times, perhaps in parallel. The program continues to be useful to me, and now Aaron Crane has reworked it and significantly improved the interface. I found his discussion enlightening. He put his finger on a lot of problems that had been bothering me that I had not quite been able to pin down.

Check it out. Thank you, M. Crane.


[Other articles in category /prog] permanent link

Another useful utility
Every couple of years I get a good idea for a simple utility that will make my life easier. Last time it was the following triviality, which I call f:

	#!/usr/bin/perl

	my $field = shift or usage();
	$field -= 1 if $field > 0;
	$|=1;

	while (<>) {
		chomp;
		my @f = split;
		print $f[$field], "\n";
	}

	sub usage {
		print STDERR "$0 fieldnumber\n"; 
		exit 1;
	}
I got tired of writing awk '{print $11}' when I wanted to extract the 11th field of some stream of data in a Unix pipeline, which is something I do about six thousand times a day. So I wrote this tiny thing. It was probably the most useful piece of software I wrote in that calendar year, and as you can see from the length, it certainly had the best cost-to-benefit ratio. I use it every day.

The point here is that you can replace awk '{print $11}' with just f 11. For example, f 11 access_log finds out the referrer URLs from my Apache httpd log. I also frequently use f -1, which prints the last field in each line. ls -l | grep '^l' | f -1 prints out the targets of all the symbolic links in the current directory.

Programs like this won't win me any prizes, but they certainly are useful.

Anyway, today's post was inspired by another similarly tiny utility that I expect will be similarly useful that I just finished. It's called runN:

	#!/usr/bin/perl

	use Getopt::Std;
	my %opt;
	getopts('r:n:c:v', \%opt) or usage();
	$opt{n} or usage();
	$opt{c} or usage();

	@ARGV = shuffle(@ARGV) if $opt{r};

	my $N = $opt{n};
	my %pid;
	while (@ARGV) {
	  if (keys(%pid) < $N) {
	    $pid{spawn($opt{c}, split /\s+/, shift @ARGV)} = 1;
	  } else {
	    delete $pid{wait()};
	  }
	}

	1 while wait() >= 0;

	sub spawn {
	  my $pid = fork;
	  die "fork: $!" unless defined $pid;
	  return $pid if $pid;
	  exec @_;
	  die "exec: $!";
	}
You can tell I just finished it because the shuffle() and usage() functions are unimplemented.

The idea is that you execute the program like this:

	runN -n 3 -c foo arg1 arg2 arg3 arg4...
and it runs the commands foo arg1, foo arg2, foo arg3, foo arg4, etc., simultaneously, but with no more than 3 running at a time.

The -n option says how many commands to run simultaneously; after running that many the main control waits until one has exited before starting another.

If I had implemented shuffle(), then -r would run the commands in random order, instead of in the order specified. Probably I should get rid of -c and just have the program take the first argument as the command name, so that the invocation above would become runN -n 3 foo arg1 arg2 arg3 arg4.... The -v flag, had I implemented it, would put the program into verbose mode.

I find that it's best to defer the implementation of features like -r and -v until I actually need them, which might be never. In the past I've done post-analyses of the contents of ~mjd/bin, and what I found was that my tendency was to implement a lot more features than I needed or used.

In the original implementation, the -n is mandatory, because I couldn't immediately think of a reasonable default. The only obvious choice is 1, but since the point of the program was to run programs concurrently, 1 is not reasonable. But it occurs to me now that if I let -n default to 1, then this command would replace many of my current invocations of:

	for i in ...; do
	  cmd $i
	done
which I do quite a lot. Typing runN cmd ... would be a lot quicker and easier. As I've written before, when a feature you put in turns out to have unanticipated uses, it's a sign of a good, modular design.

The code itself makes me happy for two reasons. One is that the program worked properly on the first try, which does not happen very often for me. When I was in elementary school, my teachers always complained that although I was very bright, I made a lot of careless mistakes because I was not methodical enough. They tried hard to fix this personality flaw. They did not succeed.

The other thing I like about the code is that it's so very brief. Not to say that it is any briefer than it should be; I think it's just about perfect. One of the recurring themes of my study of programming for the last few years is that beginner programmers use way more code than is necessary, just like beginning writers use way too many words. The process and concurrency management turned out to be a lot easier than I thought they would be: the default Unix behavior was just exactly what I needed. I am particularly pleased with delete $pid{wait()}. Sometimes these things just come together.

The 1 while wait() >= 0 line is a non-obfuscated version of something I wrote in my prize-winning obfuscated program, of all places. Sometimes the line between the sublime and the ridiculous is very fine indeed.

Despite my wariness of adding unnecessary features, there is at least one that I will put in before I deploy this to ~mjd/bin and start using it. I'll implement usage(), since experience has shown that I tend to forget how to invoke these things, and reading the usage message is a quicker way to figure it out than is rereading the source code. In the past, usage messages have been good investments.

I'm tempted to replace the cut-rate use of split here with something more robust. The problem I foresee is that I might want to run a command with an argument that contains a space. Consider:

	runN -n 2 -c ls foo bar "-l baz"
This runs ls foo, then ls bar, then ls -l baz. Without the split() or something like it, the third command would be equivalent to ls "-l baz" and would fail with something like -l baz: no such file or directory. (Actually it tries to interpret the space as an option flag, and fails for that reason instead.) So I put the split in to enable this usage. (Maybe this was a you-ain't-gonna-need-it moment; I'm not sure.) But this design makes it difficult or impossible to apply the command to an argument with a space in it. Suppose I'm trying to do ls on three directories, one of which is called old stuff. The natural thing to try is:

	runN -n 2 -c ls foo bar "old stuff"
But the third command turns into ls old stuff and produces:

	ls: old: No such file or directory
	ls: stuff: No such file or directory
If the split() were omitted, it would just work, but then the ls -l baz example above would fail. If the split() were replaced by the correct logic, I would be able to get what I wanted by writing something like this:

	runN -n 2 -c ls foo bar "'old stuff'"
But as it is this just produces another error:

	ls: 'old: No such file or directory
	ls: stuff': No such file or directory
Perl comes standard with a library called ShellWords that is probably close to what I want here. I didn't use it because I wasn't sure I'd actually need it—only time will tell—and because shell parsing is very complicated and error-prone, more so when it is done synthetically rather than by the shell, and even more so when it is done multiple times; you end up with horrible monstrosities like this:

	s='q=`echo "$s" | sed -e '"'"'s/'"'"'"'"'"'"'"'"'/'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'/g'"'"'`; echo "s='"'"'"$q"'"'"'"; echo $s'
	q=`echo "$s" | sed -e 's/'"'"'/'"'"'"'"'"'"'"'"'/g'`; echo "s='"$q"'"; echo $s
So my fear was that by introducing a double set of shell-like interpretation, I'd be opening a horrible can of escape character worms and weird errors, and my hope was that if I ignored the issue the problems might be simpler, and might never arise in practice. We'll see.

[ Addendum 20080712: Aaron Crane wrote a thoughtful followup. Thank you, M. Crane. ]


[Other articles in category /prog] permanent link

Period three and chaos
In the copious spare time I have around my other major project, I am tinkering with various stuff related to Möbius functions. Like all the best tinkering projects, the Möbius functions are connected to other things, and when you follow the connections you can end up in many faraway places.

A Möbius function is simply a function of the form f : x → (ax + b) / (cx + d) for some constants a, b, c, and d. Möbius functions are of major importance in complex analysis, where they correspond to certain transformations of the Riemann sphere, but I'm mostly looking at the behavior of Möbius functions on the reals, and so restricting a, b, c, and d to be real.

One nice thing about the Möbius functions is that you can identify the Möbius function f : x → (ax + b) / (cx + d) with the matrix ${ a\, b \choose c\,d}$, because then composition of Möbius functions is the same as multiplication of the corresponding matrices, and so the inverse of a Möbius function with matrix M is just the function that corresponds to M-1. Determining whether a set of Möbius functions is closed under composition is the same as determining whether the corresponding matrices form a semigroup; you can figure out what happens when you iterate a Möbius function by looking at the eigenvalues of M, and so on.

The matrices are not quite identical with the Möbius functions, because the matrix ${ 1\, 0 \choose 0\,1}$ and the matrix ${ 2\, 0 \choose 0\,2}$ are the same Möbius function. So you really need to consider the set of matrices modulo the equivalence relation that makes two matrices equivalent if they are the same up to a scalar factor. If you do this you get a group of matrices called the "projective linear group", PGL(2). This takes us off into classical group theory and Lie groups, which I have been intermittently trying to figure out.

You can also consider various subgroups of PGL(2), such as the subgroup that leaves the set {0, 1, ∞, -1} fixed. The reciprocal function x → 1/x is one such; it leaves 1 and -1 fixed and exchanges 0 and ∞.

In general a Möbius function has three degrees of freedom, since you can choose the four constants a, b, c, and d however you like, but one degree of freedom is removed because of the equivalence relation—or, to look at it another way, you get to pick b/a, c/a, and d/a however you like. So in general you can pick any p, q, and r and find the unique Möbius function m with m(0) = p, m(1) = q, m(-1) = r. These then determine m(∞), which turns out to be (4qr - 2p(q+r))/(q + r - 2p) when that is defined. And sometimes even when it isn't.

You may be worrying about the infinities here, but it's really nothing much to worry about. f(∞) is nothing more than $\lim_{x\rightarrow\infty} f(x)$.

If (4qr - 2p(q+r))/(q + r - 2p) in the presence of infinities worries you, try a few examples. For instance, consider m : xx+1. This function has p = m(0) = 1, q = m(1) = 2, r = m(-1) = 0. Plugging into the formula, we get m(∞) = -2pq/(q - 2p) = -4 / (2-2) = -4/0 = ∞, which is just right.

The only other thing you have to remember is that +∞ = -∞, because we're really living on the Riemann sphere. Or rather, we're living on the real part of the Riemann sphere, but either way there's only one ∞. We might call this space the "Riemann circle", but I've never heard it called that. And neither has Google, although it did turn up a bulletin board post in which someone else asked the same question in a similar context. There's a picture of it farther down on the right.

Anyway, most choices of p, q, and r in {0, 1, ∞, -1} do not get you permutations of {0, 1, ∞, -1}, because they end up mapping ∞ outside that set. For example, if you take p = 1, q = -1, r = 0, you get m(∞) = -2/3. But obviously the identity function has the desired property, and if you think about the Riemann circle (excuse me, Riemann sphere) you immediately get the rest: any rigid motion of the Riemann sphere is a Möbius function, and some of those motions permute the four points {0, 1, ∞, -1}. In fact, there are eight such functions, because {0, 1, ∞, -1} are at the vertices of a square, so any rigid motion of the Riemann sphere that permutes {0, 1, ∞, -1} must be a rigid motion of that square, and the square has eight symmetries, namely the elements of the group D4:

D4 element m(0) m(1) m(∞) m(-1) m(x) = ? M
Identity 0 1 -1 x
10
01
Rotate
clockwise
1 -1 0 (x + 1) / (x - 1)
11
-11
Rotate 180° -1 0 1 - (1/x)
0-1
10
Rotate
counterclockwise
-1 0 1 (x - 1) / (x + 1)
1-1
11
Reflect
horizontally
0 -1 1 -x
-10
01
Reflect
vertically
1 0 -1 1/x
01
10
Reflect
diagonally (1)
1 0 -1 (-x + 1) / (x + 1)
-11
11
Reflect
diagonally (2)
-1 1 0 (x + 1) / (x - 1)
11
1-1

Here we have eight functions on the reals which make the group D4 under the operation of composition. For example, if f(x) = (x+1)/(x-1), then f(f(f(f(x)))) = x. Isn't that nice?

Anyway, none of that was what I was really planning to talk about. (You knew that was coming, didn't you?)

What I wanted to discuss was the function f : x → 1 / (1 - x). I found this function because I was considering other permutations of {0, 1, ∞, -1}. The f function takes 0 → 1 → ∞ → 0. (It also takes -1 → 1/2, and so is not one of the functions in the D4 table above.) We say that f has a periodic point of order 3 because f(f(f(x))) = x for some x; in this case at least for x ∈ {0, 1, ∞}.

A function with a periodic point of order three is not something you see every day, and I was somewhat surprised that as simple a function as 1/(1-x) had one. But if you do the algebra and calculate f(f(f(x))) explicitly, you find that you do indeed get x, so every point is a periodic point of order 3, or possibly 1.

Or you can do a simpler calculation: since f is the Möbius function that corresponds to the matrix F = ${ \hphantom{-}0\, 1 \choose -1\,1}$, just calculate F3. You get ${ -1\, \hphantom{-}0 \choose \hphantom{-}0\, -1}$, which is indeed the identity function.

This also gives you a simple matrix M for which M7 = M, if you happened to be looking for such a thing.

I had noticed a couple of years ago that this 1/(1-x) function had period 3, and then forgot about it. Then I noticed it again a few weeks ago, and a nagging question came into my mind, which is reflected in a note I wrote in my notebook at that point: "WHAT ABOUT SARKOVSKY'S THEOREM?"

Well, what about it? Sharkovskii's theorem (I misspelled it in the notebook) is a delightful generalization of the "Period three implies chaos" theorem of Li and Yorke. It says, among other things, that if a continuous function of the reals has a periodic point of order 3, then it also has a periodic point of order n for all positive integers n. In particular, we can take n=1, so the function f, which has a periodic point of order 3 must also have a fixed point. But it's quite easy to see that f has no fixed point on the reals: Just put f(x) = 1/(1-x) = x and solve for x; there are no real solutions.

So what about Sharkovskii's theorem? Oh, it only applies to continuous functions, and f is not, because f(1) = ∞. So that's all right.

The Sharkovskii thing is excellent. The Sharkovskii ordering of the integers is:

3 < 5 < 7 < 9 < ...
  < 6 < 10 < 14 < 18 < ...
  < 12 < 20 < 28 < 36 < ...
...
... < 16 < 8 < 4 < 2 < 1.

And the theorem says that if a continuous function of the reals has a periodic point of order n, then it also has a periodic point of order m for all m > n in the Sharkovskii ordering. So if the function has a periodic point of order 2, it must also have a fixed point; if it has a periodic point of order 4, it must also have a periodic point of order 2; if it has a periodic point of order 17, it must also have periodic points of all even orders and all odd orders greater than 17, and so on.

The 1/(1-x) function led me to read more about Sharkovskii's theorem and its predecessor, the "period three implies chaos" theorem. Isn't that a great name for a theorem? And Li and Yorke knew it, because that's what they titled their paper. "Chaos" in this context means the following: say that two values a and b are "scrambled" by f if, for any given d and ε, there is some n for which |fn(a) - fn(b)| > d, and some m for which |fm(a) - fm(b)| < ε. That is, a and b are scrambled if repeated application of f drives a and b far apart, then close together, then far apart again, and so on. Then, if f is a continuous function with a periodic point of order 3, there is some uncountable set S of reals such that f scrambles all distinct pairs of values a and b from S. All that was from memory; I hope it got it more or less correct.

(The Li and Yorke paper also includes an example of a continuous function with a periodic point of order 5 but no periodic point of order 3. It's pretty simple.)

Reading about Sharkovskii's theorem and related matters led me to the web pages of James A. Yorke (of Li and Yorke), and then to the book Chaos: An Introduction to Dynamical Systems that he did with Alligood and Sauer, which is very readable.

I was pleased to finally be studying this material, because it was a very early inspiration to me. When I was about fourteen, my cousin Alex, who is an analytic chemist, came to visit, and told me about period-doubling and chaos in the logistic map. (It was all over the news at the time.) The logistic map is just f : x → λx(1-x) for some constant λ. For small &lambda, the map has a single fixed point, which increases as λ does. But at a certain critical value of λ (λ=3, actually) the function's behavior changes, and it suddenly begins to have a periodic point of order 2. As λ increases further, the behavior changes again, and the periodicity changes from order 2 to order 4. As &lambda increases, this happens again and again, with the splits occurring at exponentially closer and closer values of λ. Eventually there is a magic value of λ at which the function goes berserk and is chaotic. Chaos continues for a while, and then the function develops a periodic point of order 3, which bifurcates...

(The illustration here, which I copied from Wikipedia, uses r instead of λ.)

I was deeply impressed. For some reason I got the idea that I would need to understand partial differential equations to understand the chaos and the logistic map, so I immeditately set out on a program to learn what I thought I would need to know. I enrolled in differential equations courses at Columbia University instead of in something more interesting. The partial differential equations turned out to be a sidetrack, but in those days there were no undergraduate courses in iterated dynamic systems.

I am happy to discover that after only twenty-five years I am finally arriving at the destination.

Cousin Alex also told me to carry a notebook and pen with me wherever I went. That was good advice, and it took me rather less time to learn.


[Other articles in category /math] permanent link

Sun, 29 Jun 2008

Freshman electromagnetism questions: answer 3
Last year I asked a bunch of basic questions about electromagnetism. Many readers wrote in with answers and explanations, which I still hope to write up in detail. In the meantime, however, I figured out the answer to one of the questions by myself.

I had asked:

  1. Any beam of light has a time-varying electric field, perpendicular to the direction that the light is travelling. If I shine a light on an electron, why doesn't the electron vibrate up and down in the varying electric field? Or does it?

And one day a couple of months ago it occurred to me that yes, of course the electron vibrates up and down, because that is how radio antennas work. The EM wave comes travelling by, and the electrons bound in the metal antenna vibrate up and down. When electrons vibrate up and down in a metal wire, it is called an alternating current. Some gizmo at the bottom end of the antenna detects the alternating current and turns it back into the voice of Don Imus.

I thought about it a little more, and I realized that this vibration effect is also how microwave ovens work. The electromagnetic microwave comes travelling by, and it makes the electrons in the burrito vibrate up and down. But these electrons are bound into water molecules, and cannot vibrate freely. Instead, the vibrational energy is dissipated as heat, so the burrito gets warm.

So that's one question out of the way. Probably I have at least three reader responses telling me this exact same thing. And perhaps someday we will all find out together...


[Other articles in category /physics] permanent link

Tue, 17 Jun 2008

Defunctionalization and Java
A couple of weeks ago I was introduced to the notion of defunctionalization by this article on Ken Knowles' blog. Defunctionalization is a program transformation that removes the higher-order functions from a program. The idea is that you replace something like λx.x+y with a data structure that encapsulates a value of y somewhere, say (HOLD y). And instead of using the language's built-in function application to apply this object directly to an argument x, you write a synthetic applicator that takes (HOLD y) and x and returns x + y. And anyone who wanted to apply λx.x+y to some argument x in some context in which y was bound should first construct (HOLD y), then use the synthetic applicator on (HOLD y) and x.

Consider, for example, the following Haskell program:

        -- Haskell
        aux f = f 1 + f 10
        res x = aux (λz -> z + x)
The defunctionalization of this example is:

        -- Haskell
        data Hold = HOLD Int
        fake_apply (HOLD a) b = a + b
        aux held = fake_apply held 1 + fake_apply held 10
        res x = aux (HOLD x)
I hope this will make the idea clear.

M. Knowles cites the paper Defunctionalization at work by Olivier Danvy and Lasse R. Nielsen, which was lots of fun. (My Haskell example above is a simplification of the example from page 5 of Danvy and Nielsen.) Among other things, Danvy and Nielsen point out that this defunctionalization transformation is in a certain sense dual to the transformation that turns ordinary data structures into λ-terms in Church encoding. Church encloding turns data items like pairs or booleans into higher-order functions; defunctionalization turns them back again.

Section 1.4 of the Danvy and Nielsen paper lists a whole bunch of contexts in which this technique has been studied and used, but one thing I didn't think I saw there is that this is essentially the transformation that Java programmers use when they want to use closures.

For example, suppose a Java programmer wants to write something like aux in:

        -- Haskell
        aux f = f 1 + f 10
        res x = aux (λz -> z + x)
But they can't, because Java doesn't have closures.

So instead, they do this:

        /* Java */

        class Hold {
          private int a;

          public Hold(int a) {
            this.a = a;
          }

          public int fake_apply(int b) {
            return this.a + b;
          }
        }

        private static int aux(Hold h) {
          return h.fake_apply(1) + h.fake_apply(10);
        }

        static int res(int x) {
          Hold h = new Hold(x);
          return aux(h);
        }
Where the class Hold corresponds directly to the data type Hold in the defunctionalized Haskell code.

Here is a real example. Consider GNU Emacs. When I enter text-mode in Emacs, I want a bunch of subsystems to be notified. Emacs has a text-mode-hook variable, which is basically a list of functions, and when an Emacs buffer is put into text-mode, Emacs invokes the hooks. Any subsystem that wants to be notified puts its own hook function into that variable. If I wanted to accomplish something similar in Haskell or SML, I would similarly use a list of functions.

In Java, the corresponding facility is called java.util.Observable. Were one implementing Emacs in Java (perish the thought!) the mode object would inherit from Observable, and so would provide an addObserver method for adding a hook to a list somewhere. When the mode was switched to text-mode, the mode object would call notifyObservers, which would loop over the hook list, calling the hooks. So far this is just like Emacs Lisp.

But in Java the hooks are not functions, as they are in Emacs, because in Java functions are not first-class entities. Instead, the hooks are objects which conform to the Observer interface specification, and instead of invoking functions directly, the notifyObservers method calls the update method on each hook object.

Here's another example. I wrote a recursive descent parser in Java a while back. An ActionParser is just like a Parser, except that if its parse succeeds, it invokes a callback. If I were programming in SML or Haskell or Perl, an ActionParser would be nothing but a Parser with an associated closure, something like this:

        # Perl        
        package ActionParser;

        sub new {
          my ($class, $parser, $action) = @_;
          bless { Parser => $parser,
                  Action => $action } => $class;
        }

        # Just like the embedded parser, but invoke the action on success
        sub parse {
          my $self = shift;
          my $input = shift;
          my $result = $self->{Parser}->parse($input);
          if ($result->success) 
            $self->{Action}->($result);   # Invoke action
          }
          return $result;          
        }
Here the Action member is expected to be a closure, which is automatically invoked if the parse succeeds. To use this, I would write something like this:

        # Perl        
        my $missiles;        
        ...
        my $parser = ActionParser->new($otherParser, 
                                       sub { $missiles->launch() }
                                      );
        $parser->parse($input);
And then if the input parses correctly, the parser launches the missiles from the anonymous closure, which has captured the local $missiles object.

But in Java, you have no closures. Instead, you defunctionalize, and represent closures with objects:

        /* Java */
        abstract class Action {
          void invoke(ParseResults results) {}
        }

        class ActionParser extends Parser {
          Action action;
          Parser parser;

          ActionParser(Parser p, Action a) {
            action = a;
            parser = p;
          }

          ParseResults Parse(Input input) {
            ParseResults res = this.parser.Parse(input);
            if (res.isSuccess) {
              this.action.invoke(res);
            }
            return res;
          }
        }
To use this, one writes something like this:

        /* Java */

        class LaunchMissilesAction extends Action {
          Missiles m;

          LaunchMissilesAction(Missiles m) { this.m = m; }
          void invoke(ParseResults results) {
            m.launch();
          }
        }

        ...

        Action a = new LaunchMissilesAction(missiles);
        Parser p = new ActionParser(otherParser, a);
        p.parse(input);
The constructor argument missiles takes the place of a free variable in a closure. The closure itself has been replaced with an object from an ad hoc class, just as in Danvy and Nielsen's formulation, the closure is replaced with a synthetic data object that holds the values of the free variables. The invoke method plays the role of fake_apply.

Now, it's not a particularly interesting observation that this can be done. The interesting part, I think, is that this is what Java programmers actually do. And also, perhaps, that Danvy and Nielsen didn't mention it in their paper, because I think the technique is pretty widespread.


[Other articles in category /prog] permanent link

Fri, 06 Jun 2008

484848 is excellent
Brad Murray wrote to request a proof that the number 4848...4848 is excellent:

So, are all concatenations of odd numbers of "48" excellent? I demand a proof!
So okay, it wasn't a request. I have been given no choice but to comply. I hear and I obey, O mighty one!

First let's define the items we're talking about:

  • a0 = 4, an+1 = 100an + 84, so that the an are 4, 484, 48484, etc.

  • Similarly, b0 = 8, bn+1 = 100bn + 48, so that the bn are 8, 848, 84848, etc.

  • Similarly, c0 = 48, cn+1 = 10000cn + 4848, so that the cn are 48, 484848, 4848484848, etc.

I now assert that cn is the concatenation of an and bn, by one or more of the following arguments:

  • It's obvious, or even if not,
  • it follows from an easy inductive proof, which
  • is left as an exercise for the reader.
(This sounds like a joke, but it isn't; I honestly expect people will find it obvious, or that even if not they will agree that it should follow from an easy induction.)

In order to show that cn is excellent, it then suffices to show that cn = bn2 - an2 for all n.

First we'll prove the following lemma: for all n, 4bn - 7an = 4. This follows easily by induction. For n=0, we have 4·8 - 7·4 = 32 - 28 = 4. Now suppose the lemma is proved for n=i; we want to show that it is true for n=i+1. That is, we want to calculate:
4bi+1 - 7ai+1=
4(100bi+48) - 7(100ai+84)=
400bi + 192 - 700ai - 588=
400bi + 700ai + 192 - 588=
100(4bi - 7ai) - 396=
100·4 - 396=
4 
And we are done.

Now the main theorem, again by induction. We want to show that:

bn2 - an2 = cn

for all n. For n=0 this is trivial, since we have 82 - 42 = 64 - 16 = 48. Now suppose we know it is true for n=i; we will show that it is true for n = i+1 as well:

bi+12 - ai+12 =
(100bi + 48)2 - (100ai + 84)2 =
10000bi2 + 9600bi + 2304 - 10000ai2 - 16800ai - 7056 =
10000(bi2 - ai2) + 2400(4bi - 7ai) + 2304 - 7056 =
10000ci + 2400(4bi - 7ai) - 4752 =
10000ci + 2400·4 - 4752 =
10000ci + 4848 =
ci+1  
Q.E.D.

I may have more to say about this later. I have a half-written article that complains about homework questions of the form "Solve problem X using technique Z," where Z is something like induction. The article was inspired by a particularly odious problem of this type:

if n + 1 balls are put inside n boxes, then at least one box will contain more than one ball. prove this principle by induction.
Nobody in his right mind would prove this principle by induction. You prove it by pointing out that if the conclusion were to fail, no box would have more than one ball; since there are n boxes, each of which has no more than one ball, then there are no more than n balls, and this contradicts the hypothesis. Using induction is idiotic.

A student faced with this kind of question will conclude (correctly) that he or she is being forced to jump through a pointless hoop, and may conclude (incorrectly) that induction is useless. And students are frequently confused by pointless applications of principles. People learn better when they understand why things are happening; when students feel that they don't understand the point of what is being done, they feel that they don't understand the mechanics either.

In the real world—by which I mean what real scientists, mathematicians, and engineers do, in addition to what people in the grocery store do—I am excluding only homework assignements—we almost never get a problem of the form "solve X using technique Y". Problems we face in the real world always have the form "solve X, by hook or by crook." The closest we ever see to a prescribed technique are mere suggestions like "Well, Y might work here, so you could try that."

Questions that prescribe techniques are either lazy pedagogy or bad curriculum design. If technique Z—say, induction—is a useful technique, then it is because there is some problem Y such that Z is superior to all other techniques for solving Y. If all such Y are outside the scope of the class, then Z is outside the scope of the class too. If, on the other hand, there is some Y that is in the scope of the class, it is the instructor's job to find it and present it to the students, as an instructive example. To fail in this, and to make up a contrived and irrelevant problem in place of Y, is a failure of the instructor's principal duty, which is to illustrate the subject matter by realistic and relevant examples.

For the theorem above about 484848, induction is clearly a good way to solve it; to solve the problem by direct calculation is painful.

There are other things to learn from the demonstration above. It serves as a wonderful example of what is wrong with standard mathematical style for writing up proofs. A student seeing this proof might well ask "where the heck did you get that lemma about 4b - 7a = 4? Is that something you knew from before? Did you just guess? Was it in the book somewhere?" But no, I did not guess, I did not know that before, and I did not get it from the book.

The answer is that I did the main demonstration first, starting with bi2 - ai2 and trying to get from there to ci by using algebraic manipulations and the definitions of a, b, and c. And just when everything seemed to be going along well, I got stuck. I had:

10000ci + 2400(4bi - 7ai) - 4752

This looked something like what I was trying to manufacture, which was:

10000ci + 4848

but it was not quite right. The 10000ci part was fine, but instead of 2400(4bi - 7ai) - 4752 I needed 4848.

So if it was going to work, I needed to have:

2400(4bi - 7ai) - 4752 = 4848

or equivalently:

2400(4bi - 7ai) = 9600

which is equivalent to:

4bi - 7ai = 4.

So I had better have 4bi - 7ai = 4; if this turns out false, the whole thing falls apart.

But a quick check of a couple of examples shows that 4bi - 7ai = 4 does work, at least for i=0 and 1, so maybe it would worth trying to prove in the general case. And indeed, the proof went through fine, and I won.

But in the presentation of the proof, everything is backwards: I pull the mystery lemma out of my ass at the beginning for no apparent reason, and then later on it happens to be what what I need at the crucial moment. Almost as if I knew beforehand what was going to happen!

There are a lot of things wrong with mathematics pedagogy, and those were two of them: artifically prescribed techniques to solve homework problems, and the ass-extraction of lemmas backwards in time.


[Other articles in category /math] permanent link

Sun, 01 Jun 2008

Addenda to recent articles 200805

  • Regarding the bicameral mind theory put forth in Julian Jaynes' book The Origin of Consciousness in the breakdown of the Bicameral Mind, Carl Witty informs me that the story "Sour Note on Palayata", by James Schmitz, features a race of bicameral aliens whose mentality is astonishingly similar to the bicameral mentality postulated by Julian Jaynes. M. Witty describes it as follows:

    The story features a race of humanoid aliens with a "public" and a "private" mind. The "public" mind is fairly stupid, and handles all interactions with the real world; and the "private" mind is intelligent and psychic. The private mind communicates psychically with the private minds of other members of the race, but has only limited influence over the public mind; this influence manifests as visions and messages from God.
    This would not be so remarkable, since Jaynes' theories have been widely taken up by some science fiction authors. For example, they appear in Neal Stephenson's novel Snow Crash, and even more prominently in his earlier novel The Big U, so much so that I wondered when reading it how anyone could understand it without having read Jaynes first. But Schmitz's story was published in 1956, twenty years before the publication of The Origin of Consciousness.

  • Also in connection with Jaynes: I characterized his theory as "either a work of profound genius, or of profound crackpottery". I should have mentioned that this characterization was not lost on Jaynes himself. In his book, he referred to his own theory as "preposterous".

  • Many people wrote in with more commentary about my articles on artificial Finnish [1] [2]:
    • I had said that "[The one-letter word 'i'] appears in my sample in connection with Sukselaisen I hallitus, whatever that is". Several people explained that this "I" is actually a Roman numeral 1, denoting the ordinal number "first", and that Sukselaisen I hallitus is the first government headed by V. J. Sukselaisen.

      I had almost guessed this—I saw "Sukselaisen I" in the source material and guessed that the "I" was an ordinal, and supposed that "Sukselaisen I" was analogous to "Henry VIII" in English. But when my attempts to look up the putative King Sukselaisen I met with failure, and I discovered that "Sukselaisen I" never appeared without the trailing "hallitus", I decided that there must be more going on than I had supposed, as indeed there was. Thanks to everyone who explained this.

    • Marko Heiskanen says that the (fictitious) word yhdysvalmistämistammonit is "almost correct", at least up to the nonsensical plural component "tammonit". The vowel harmony failure can be explained away because compound words in Finnish do not respect the vowel harmony rules anyway.

    • Several people objected to my program's generation of the word "klee": Jussi Heinonen said "Finnish has quite few words that begin with two consonants", and Jarkko Hietaniemi said "No word-initial "kl":s possible in native Finnish words". I checked, and my sample Finnish input contains "klassisesta", which Jarkko explained was a loanword, I suppose from Russian.

      Had I used a larger input sample, oddities like "klassisesta" would have had less influence on the output.

    • I acquired my input sample by selecting random articles from Finnish Wikipedia, but my random sampling was rather unlucky, since it included articles about Mikhail Baryshnikov (not Finnish), Dmitry Medvevev (not Finnish), and Los Angeles (also not Finnish). As a result, the input contained too many strange un-Finnish letters, like B, D, š, and G, and so therefore did the output. I could have been more careful in selecting the input data, but I didn't want to take the time.

      Medvedev was also the cause of that contentious "klassisesta", since, according to Wikipedia, "Medvedev pitää klassisesta rock-musiikista". The Medvedev presidency is not even a month old and already he has this international incident to answer for. What catastrophes could be in the future?

    • Another serious problem with my artificial Finnish is that the words were too long; several people complained about this, and the graph below shows the problem fairly clearly:

      The x-axis is word length, and the y-axis is frequency, on a logarithmic scale, so that if 1/100 of the words have 17 letters, the graph will include the point (17, -2). The red line, "in.dat", traces the frequencies for my 6 kilobyte input sample, and the blue line, "pseudo.dat", the data for the 1000-character sample I published in the article. ("Ävivät mena osakeyhti...") The green line, "out.dat", is a similar trace for a 6 kb N=3 text I generated later. The long right tail is clearly visible. My sincere apologies to color-blind (and blind) readers.

      I am not sure exactly what happened here, but I can guess. The Markov process has a limited memory, 3 characters in this case, so in particular is has essentially no idea how long the words are that it is generating. This means that the word lengths that it generates should appear in roughly an exponential distribution, with the probability of a word of length N approximately equal to $\lambda e^{-\lambda N} $, where 1/λ is the mean word length.

      But there is no particular reason why word lengths in Finnish (or any other language) should be exponentially distributed. Indeed, one would expect that the actual distribution would differ from exponential in several ways. For example, extremely short words are relatively uncommon compared with what the exponential distribution predicts. (In the King James Bible, the most common word length is 3, then 4, with 1 and 8 tied for a distant seventh place.) This will tend to push the mean rightwards, and so it will skew the Markov process' exponential distribution rightwards as well.

      I can investigate the degree to which both real text and Markov process output approximate a theoretical exponential distribution, but not today. Perhaps later this month.

    My thanks again to the many helpful Finnish speakers who wrote in on these and other matters, including Marko Heiskanen, Shae Erisson, Antti-Juhani Kaijanaho, Ari Loytynoja, Ilmari Vacklin, Jarkko Hietaniemi, Jussi Heinonen, Nuutti-Iivari Meriläinen, and any others I forgot to mention.

  • My explanation of Korean vowel harmony rules in that article is substantively correct, but my description of the three vowel groups was badly wrong. I have apparently forgotten most of the tiny bit I once knew about Middle Korean. For a correct description, see the Wikipedia article or this blog post. My thanks to the anonymous author of the blog post for his correction.

  • Regarding the transitivity of related-by-blood-ness, Toth András told me about a (true!) story from the life of Hungarian writer Karinthy Frigyes:

    Karinthy Frigyes got married two times, the Spanish flu epidemic took his first wife away. A son of his was born from his first marriage, then his second wife brought a boy from his previous husband, and a common child was born to them. The memory of this the reputed remark: "Aranka, your child and my child beats our child."

    (The original Hungarian appears on this page, and the surprisingly intelligible translation was provided by M. Toth and the online translation service at webforditas.hu. Thank you, M. Toth.

  • Chung-chieh Shan tells me that the missing document-viewer feature that I described is available in recent versions of xdvi. Tanaeem M. Moosa says that it is also available in Adobe Reader 8.1.2.


[Other articles in category /addenda] permanent link

Fri, 30 May 2008

Glade
Last week I needed to mock up a dialog box I was talking about in this article:

I wasn't sure how to do this, and my first draft just had a description. But the day before, I had happened to notice a new item that had appeared in the "Programming" menu on my Ubuntu computer: It said "Glade Interface Designer". I had started it up, for no particular reason, and tinkered with it for about two minutes.

Glade lets you design a window interface, by positioning buttons and sliders and things, and then does something or other. At the time I didn't know what it would do, but I knew I could mock up the window I wanted, and I thought maybe I could screenshot the mockup for the blog article.

The Glade thing was so easy to use that the easiest way to get a mockup of the dialog was to have Glade generate a complete, working windowing application, compile and run the application, and then screenshot the application. I got this done in about fifteen minutes.

The application I made doesn't actually do anything, but it does compile, run, and pop up the dialog box I designed. I'm confident that I could get it to do something pretty easily, if I wanted. The auto-generated code, and some of the Glade controls, are very suggestive.

I give Glade a big gold star. I went from having never heard of it to a working (although trivial) window application in one two-minute session and one fifteen-minute session. Maybe two big gold stars and a "Good work!" sticker.

[ Addendum 20080530: I went ahead with making an application that actually does something. It worked. ]


[Other articles in category /prog] permanent link

More Glade
After writing about Glade Interface Designer today, I decided to go ahead and see if it would be as easy to make a working application as I hoped it would be.

The outcome: big success.

The application has a window with two input fields, a "+" button, and an output field that shows the sum of the input fields when you press the "+" button. It took about half an hour from start to finish, and the only thing I had to look up in the manual was the names of the functions that read and write the values of the text fields. Everything else I got through bricolage and tinkering with the autogenerated monkey code.

The biggest problem that I encountered was that the application didn't exit when I clicked the close box, although the window disappeared. I figured out that the close box was sending a "delete" event and not a "destroy" event and fixed it up right quick.

Gtk+ and Glade Interface Designer get at least two gold stars. Maybe three. Maybe fifty-three.


[Other articles in category /prog] permanent link

A missing feature in document viewers
It often happens that I'm looking at some multi-page document, such as a large PDF file, with a viewer program, say Adobe's Acrobat Reader, or Gnome Document Viewer, and the page numbers don't match.

Typically, the viewer numbers all the pages sequentially, starting with 1. But many documents have some front matter, such as a table of contents, that is outside the normal numbering. For example, there might be a front cover page, and then a table of contents labeled with page numbers i through xviii, and then the main content of the document follows on pages 1 through 263.

Computer programmers, I just realized, have a nice piece of jargon to describe this situation, which is very common. They speak of "logical" and "physical" pages. The "physical" page numbers are the real, honest-to-goodness numbers of the pages, what you get if you start at 1 and count up. The "logical" page numbers are the names by which the pages are referred. In the example document I described, physical page 1 is the front cover, physical page 2 is logical page i, physical page 19 is logical page xviii, physical page 20 is logical page 1, and so forth. The document has 282 physical pages, and the last one is logical page 263.

Let's denote physical pages with square brackets and logical pages with curvy brackets. So "(xviii)" and "[19]" denote the same page in this document. Page (1) is page [20], and page (20) is page [39]. Page [1] has no logical designation, or perhaps it is something like "(front cover sheet)".

Now the problem I want to discuss is as follows: Every viewer program has a little box where it displays the current page number, and the little box is usually editable. You scan the table of contents, find the topic you want to read about, and the table says that it's on page (165). Then you tell the document viewer to go to page 165, and it does, but it's not the page 165 you want, because the viewer gives you [165], which is actually (146). You actually wanted (165), which is page [184].

Then you curse, mentally subtract 146 (what you got) from 165 (what you wanted), add the result, 19, back to 165, getting 184, and then you ask for 184 to get 165. And if you're me you probably mess up one time in three and have to do it over, because subtraction is hard.

But it would be extremely easy for viewer programs to mostly fix this. They need to support an option where you can click on the box and tell it "your page number is wrong here". Maybe you would right-click the little page-number box, and the process would pop up a dialog:

Then you would type in 146 (which you can see at the bottom of the page you're viewing) and click "OK". From then on the process would know that the logical and physical page numbers differed by 19, and it would subtract 19 from the number in the little box until you told it something else. You could then type 165 into the little box, and the process would think "well, you asked for (165), and I know that (165) is really [184] because you told me earlier that [165] is really (146)" and then you would get [184], which is what you wanted. And when you scrolled down from (165) to (166), the program would think "ho, you just went from [184] to [185], so I will change the display in the little box and display [185]-19 = (166) there".

But no, none of them do this.

The document itself should carry this information, and some of them do, sometimes. But not every document will, so viewers should support this feature, which is useful anyway.

Some document formats support internal links, but most documents do not use those features, and anyway they are useless when what you are trying to do is look up a reference from someone else's bibliography: "(See Ogul, pp. 662–664.)"

This is not a complete solution, but it's an almost complete solution, and it can be implemented unilaterally, by which I mean that the document author and the viewer program author need not agree on anything. It's really easy to do.

[ Addendum 20080521: Chung-chieh Shan informs me that current versions of xdvi have this feature. I was unaware of this, because the version installed on my machine was compiled in celebration of the 1926 Philadelphia Sesquicentennial Exhibition and so predates the addition of this feature. ]

[ Addendum 20080530: How I made the dialog box graphic. ]


[Other articles in category /tech] permanent link

Thu, 29 May 2008

Trivial calculations
Back in September, I wrote about how I tend to plunge ahead with straightforward calculations whenever possible, grinding through the algebra, ignoring the clever shortcut. I'll go back and look for the shortcut, but only if the hog-slaughtering approach doesn't get me what I want. This is often an advantage in computer programming, and often a disadvantage in mathematics.

This occasionally puts me in the position of feeling like a complete ass, because I will grind through some big calculation to reach a simple answer which, in hindsight, is completely obvious.

One early instance of this that I remember occurred more than twenty years ago when a friend of mine asked me how many spins of a slot machine would be required before you could expect to hit the jackpot; assume that the machine has three wheels, each of which displays one of twenty symbols, so the chance of hitting the jackpot on any particular spin is 1/8,000. The easy argument goes like this: since the expected number of jackpots per spin is 1/8,000, and expectations are additive, 8,000 spins are required to get the expected number of jackpots to 1, and this is in fact the answer.

But as a young teenager, I did the calculation the long way. The chance of getting a jackpot in one spin is 1/8000. The chance of getting one in exactly two spins is (1 - 1/8000)·(1/8000). The chance of getting one in exactly three spins is (1 - 1/8000)2·(1/8000). And so on; so you sum the infinite series:

$$\sum_{i=1}^\infty {i{\left({k-1\over k}\right)^{i-1}} {1\over k}}$$

And if you do it right, you do indeed get exactly k. Well, I was young, and I didn't know any better.

I'd like to be able say I never forgot this. I wish it were true. I did remember it a lot of the time. But I can think of a couple of times when I forgot, and then felt like an ass when I did the problem the long way.

One time was in 1996. There is a statistic in baseball called the on-base percentage or "OBP"—please don't all go to sleep at once. This statistic measures, for each player, the fraction of his plate appearances in which he is safe on base, typically by getting a hit, or by being walked. It is typically around 1/3; exceptional players have an OBP as high as 2/5 or even higher. You can also talk about the OBP of a team as a whole.

A high OBP is a very important determiner of the number of runs a baseball team will score, and therefore of how many games it will win. Players with a higher OBP are more likely to reach base, and when a batter reaches base, another batter comes to the plate. Teams with a high overall OBP therefore tend to bring more batters to the plate and so have more chances to score runs, and so do tend to score runs, than teams with a low overall OBP.

I wanted to calculate the relationship between team OBP and the expected number of batters coming to the plate each inning. I made the simplifying assumption that every batter on the team had an OBP of p, and calculated the expected number of batters per inning. After a lot of algebra, I had the answer: 3/(1-p). Which makes perfect sense: There are only two possible outcomes for a batter in each plate appearance: he can reach base, or he can be put out; these are exclusive. A batter who reaches base p of the time is put out 1-p of the time, consuming, on average, 1-p of an out. A team gets three outs in an inning; the three outs are therefore consumed after 3/(1-p) batters. Duh.

This isn't the only baseball-related mistake I've made. I once took a whole season's statistics and wrote a program to calculate the average number of innings each pitcher pitched. Okay, no problem yet. But then I realized that the average was being depressed by short relievers; I really wanted the average only for starting pitchers. But how to distinguish starting pitchers from relievers? Simple: the statistics record the number of starts for each pitcher, and relievers never start games. But then I got too clever. I decided to weight each pitcher's contribution to the average by his number of starts. Since relievers never start, this ignores relievers; it allows pitchers who do start a lot of games to influence the result more than those who only pitched a few times. I reworked the program to calculate the average number of innings pitched per start.

The answer was 9. (Not exactly, but very close.)

It was obviously 9. There was no other possible answer. There is exactly one start per game, and there are 9 innings per game,1 and some pitcher pitches in every inning, so the number of innings pitched per start is 9. Duh.

Anyway, the real point of this article is to describe a more sophisticated mistake of the same general sort that I made a couple of weeks ago. I was tinkering around with a problem in pharmacology. Suppose Joe has some snakeoleum pills, and is supposed to take one capsule every day. He would like to increase the dosage to 1.5 pills every day, but he cannot divide the capsules. On what schedule should he take the pills to get an effect as close as possible to the effect of 1.5 pills daily?

I assumed that we could model the amount of snakeoleum in Joe's body as a function f(t), which normally decayed exponentially, following f(t) = ae-kt for some constant k that expresses the rate at which Joe's body metabolizes and excretes the snakeoleum. Every so often, Joe takes a pill, and at these times d0, d1, etc., the function f is discontinuous, jumping up by 1. The value a here is the amount of snakeoleum in Joe's body at time t=0. If Joe takes one pill every day, the (maximum) amount of snakeoleum in his body will tend toward 1/(1-e-k) over time, as in the graph below:

(All graphs in this article get bigger if you click them.)

I wanted to compare this with what happens when Joe takes the pill on various other schedules. For some reason I decided it would be a good idea to add up the total amount of pill-minutes for a particular dosage schedule, by integrating f. That is, I was calculating $\int_a^b f(t) dt$ for various a and b; for want of better values, I calculated the total amount $\int_0^\infty f(t) dt$.

Doing this for the case in which Joe takes a single pill at time 0 is simple; it's just $\int_0^\infty e^{-kt} dt$, which is simply 1/k.

But then I wanted to calculate what happens when Joe takes a second pill, say at time M. At time M, the amount of snakeoleum left in Joe's body from the first pill is e-kM, so the function f has f(t) = e-kt for 0 ≤ tM and f(t) = (e-kM+1)e-k(t-M) for Mt < ∞. The graph looks like this:

After the second pill, at t=600, it decays gradually to zero. I wanted to integrate this function. Since it's continuous on the two pieces of the interval, we break the integral into two:

 $$\halign{ \hfil$#$ & $= # \hfil $ & $+ #\hfil$ \cr \int_0^\infty f(t) dt & \int_0^M f(t) dt & \int_M^\infty f(t) dt \cr & \int_0^M e^{-kt} dt & \int_M^\infty (e^{-kM} + 1)e^{-k(t-M)} dt \cr & {1 \over k }(1-e^{-kM}) & {1\over k}(e^{-kM} + 1)\cr & \ldots & \omit \cr }$$.

Well, you get the idea. The answer was 2/k.

In retrospect, this is completely obvious.

This is because of the way I modeled the pills. When the decay of f(t) is exponential, as it is here, that means that the rate at which the snakeoleum is metabolized is proportional to the amount: twice as much snakeoleum means that the decay is twice as fast. or, looked at another way, each pill decays independently of the rest. Put two pills in, and an hour later you'll find that you have twice as much left as if you had only put one pill in.

Since the two pills are acting independently, you can calculate their effect independently. That is, f(t) can be decomposed into f1(t) + f2(t), where f1 is the contribution from the first pill:

And f2 is the contribution from the second pill:

But the second pill is exactly like the first pill. The only difference is that Joe takes it a little later. But since the integrals are considering the total effect from now to eternity, the time at the beginning is unimportant, and the effect of the second pill might as well be:

Just shift your head over a bit. (Mathematically, this is equivalent to substituting y = x-M in the second integral.) So of course it is the same as the first pill, and the result is exactly 2/k. When I saw the 2/k, I scratched my head for a minute, then figured it out and said "duh".

Anyway, I think what I really want is to find $\int_0^\infty {\left(f_1(t) - f_{1.5}(t, d)\right)}^2 dt$, where f1 is the function that describes the amount of snakeoleum when Joe takes one pill a day, and f1.5 is the function that describes the amount when Joe takes 1.5 pills every d days. But if there's one thing I think you should learn from a dumbass mistake like this, it's that it's time to step back and try to consider the larger picture for a while, so I've decided to do that before I go on.

1[ Addendum 20070124: There is a brief explanation of why the average baseball game has almost exactly 9 innings. ]


[Other articles in category /oops] permanent link

Thu, 15 May 2008

Luminous band-aids
Last night after bedtime Iris asked for a small band-aid for her knee. I went into the bathroom to get one, and unwrapped it in the dark.

The band-aid itself is circular, about 1.5 cm in diameter. It is sealed between two pieces of paper, each about an inch square, that have been glued together along the four pairs of edges. There is a flap at one edge that you pull, and then you can peel the two glued-together pieces of paper apart to get the band-aid out.

As I peeled apart the two pieces of paper in the dark, there was a thin luminous greenish line running along the inside of the wrapper at the place the papers were being pulled away from each other. The line moved downward following the topmost point of contact between the papers as I pulled the papers apart. It was clearly visible in the dark.

I've never heard of anything like this; the closest I can think of is the thing about how wintergreen Life Savers glow in the dark when you crush them.

My best guess is that it's a static discharge, but I don't know. I don't have pictures of the phenomenon itself, and I'm not likely to be able to get any. But the band-aids look like this:

Have any of my Gentle Readers seen anything like this before? A cursory Internet search has revealed nothing of value.


[Other articles in category /physics] permanent link

Wed, 14 May 2008

More artificial Finnish
Several Finns wrote to me to explain in some detail what was wrong with the artificial Finnish in yesterday's article. As I surmised, the words "ssän" and "kkeen" are lexically illegal in Finnish. There were a number of similar problems. For example, my sample output included the non-word "t". I don't know how this could have happened, since the input probably didn't include anything like that, and the Markov process I used to generate it shouldn't have done so. But the code is lost, so I suppose I'll never know.

Of the various comments I received, perhaps the most interesting was from Ilmari Vacklin. ("Vacklin", huh? If my program had generated "Vacklin", the Finns would have been all over the error.) M. Vacklin pointed out that a number of words in my sample output violated the Finnish rules of vowel harmony.

(M. Vacklin also suggested that my article must have been inspired by this comic, but it wasn't. I venture to guess that the Internet is full of places that point out that you can manufacture pseudo-Finnish by stringing together a lot of k's and a's and t's; it's not that hard to figure out. Maybe this would be a good place to mention the word "saippuakauppias", the Finnish term for a soap-dealer, which was in the Guinness Book of World Records as the longest commonly-used palindromic word in any language.)

Anyway, back to vowel harmony. Vowel harmony is a phenomenon found in certain languages, including Finnish. These languages class vowels into two antithetical groups. Vowels from one group never appear in the same word as vowels from the other group. When one has a prefix or a suffix that normally has a group A vowel, and one wants to join it to a word with group B vowels, the vowel in the suffix changes to match. This happens a lot in Finnish, which has a zillion suffixes. In many languages, including Finnish, there is also a third group of vowels which are "neutral" and can be mixed with either group A or with group B.

Modern Korean does not have vowel harmony, mostly, but Middle Korean did have it, up until the early 16th century. The Korean alphabet was invented around 1443, and the notation for the vowels reflected the vowel harmony:

[ Addendum 20080517: The following paragraph about vowel harmony contains significant errors of fact. I got the groups wrong. ]

The first four vowels in this illustration, with the vertical lines, were incompatible with the second four vowels, the ones with the horizontal lines. The last two vowels were neutral, as was another one, not shown here, which was written as a single dot and which has since fallen out of use. Incidentally, vowel harmony is an unusual feature of languages, and its presence in Korean has led some people to suggest that it might be distantly related to Turkish.

The vowel harmony thing is interesting in this context for the following reason. My pseudo-Finnish was generated by a Markov process: each letter was selected at random so as to make the overall frequency of the output match that of real Finnish. Similarly, the overall frequency of two- and three-letter sequences in pseudo-Finnish should match that in real Finnish. Is this enough to generate plausible (although nonsensical) Finnish text? For English, we might say maybe. But for Finnish the answer is no, because this process does not respect the vowel harmony rules. The Markov process doesn't remember, by the time it gets to the end of a long word, whether it is generating a word in vowel category A or B, and so it doesn't know which vowels it whould be generating. It will inevitably generate words with moxed vowels, which is forbidden. This problem does not come up in the generation of pseudo-English.

None of that was what I was planning to write about, however. What I wanted to do was to present samples of pseudo-Finnish generated with various tunings of the Markov process.

The basic model is this: you choose a number N, say 2, and then you look at some input text. For each different sequence of N characters, you count how many times that sequence is followed by "a", how many times it is followed by "b", and so on.

Then you start generating text at random. You pick a sequence of N characters arbitrarily to start, and then you generate the next character according to the probabilities that you calculated. Then you look at the last N characters (the last N-1 from before, plus the new one) and repeat. You keep doing that until you get tired.

For example, suppose we have N=2. Then we have a big table whose keys are 2-character strings like "ab", and then associated with each such string, a table that looks something like this:
r 54.52
a 15.89
i 10.41
o 7.95
l 4.11
e 3.01
u 1.10
space 0.82
: 0.55
t 0.55
, 0.27
. 0.27
b 0.27
s 0.27
So in the input to this process, "ab" was followed by "r" more than 54% of the time, by "a" about 16% of the time, and so on. And when generating the output, every time our process happens to generate "ab", it will follow by generating an "r" 54.52% of the time, an "a" 15.89% of the time, and so on.

Whether to count capital letters as the same as lowercase, and what to do about punctuation and spaces and so forth, are up to the designer.

Here, as examples, are some samples of pseudo-English, generated with various N. The input text was the book of Genesis, which is not entirely typical. In each case, I deleted the initial N characters and the final partial word, cleaned up the capitalization by hand, and appended a final period.

N=0
Lt per f idd et oblcs hs hae:uso ar w aaolt y tndh rl ohn otuhrthpboleel.ee n synenihbdrha,spegn.
N=1
Cachand t wim, heheethas anevem blsant ims, andofan, ieahrn anthaye s, lso iveeti alll t tand, w.
N=2
Ged hich callochbarthe of th to tre said nothem, and rin ing of brom. My and he behou spend the.
N=3
Sack one eved of and refor ther of the hand he will there that in the ful, when it up unto rangers.
It should be clear that the quality improves as one increases the N parameter. The N=3 sample has mostly real words, and the few nonsense ones it contains ("eved", "ful") are completely plausible English. N=2, on the other hand, is mostly nonsense, although it's mostly plausible nonsense. Even "callochbarthe" is almost plausible. (The unfortunate "chb" in the middle is just bad luck. It occurs because Genesis 36 mentions Baalhanan the son of Achbor.) The N=1 sample is recognizably bogus; no English word looks like "ieahrn", and the triple "l" in "alll" is nearly impossible. (I did once write to Jesse Sheidlower, an editor of the Big Dictionary, to ask his advice about whether "ballless" should be hyphenated.)

I have prepared samples of pseudo-Finnish of various qualities. The input here was a bunch of text I copied out of Finnish Wikipedia. (Where else? If you need Finnish text in 1988, you get it from the Usenet fi.talk group; if you need Finnish text in 2008, you get it from Finnish Wikipedia.) I did a little bit of manual cleanup, as with the English, but not too much.

N=0
Vtnnstäklun so so rl sieesjo.Aiijesjeäyuiotiannorin traäl.N vpojanti jonn oteaanlskmt enhksaiaaiiv oenlulniavas. Rottlatutsenynöisu iikännam e lavantkektann eaagla admikkosulssmpnrtinrkudilsorirumlshsmoti,anlosa anuioessydshln.Atierisllsjnlu e.Itatlosyhi vnko ättr otneän akho smalloailäi jiaat kajvtaopnasneilstio tntin einteaonaiimotn:r apoya oruasnainttotne wknaiossäelaäinoev aobrs,vteorlokynv. Aevsrikhanä tp s s oälnlke rvmi il ynae nara ign ssm lkimttbhineaatismäi tst lli ahaltineshne kr keöunv ah s itenh s .Ia pa elstpnanmnuiksriil anaalnttt mr ti.Ooa ka eee eiiei,tnees äusee a nanhetv.Iopkijeatatits,i l eklbiik suössmap tioaotaktdiir rkeaviohiesotkeagarihv nnadvö jlape öt kaeakmjkhykoto tnt iunnuyknnelu rutliie.Leva eiriaösnaj,rk oyumtsle,iioa,aspa aeiaä wsuinn eta y tvati klssviutkuaktmlpnheomi.T akapskushhnuksnhnnheaaaaussitseminmpnamäiaä pät.Kaaaabl unnionuhnpa iaes,outka.Cväinvkshvrnlteeoea rmi re suodmpr autlysa tnliaanäass. Srs rnvrtsita kmidusvjn tii.
N=1
Ava pän svun kerekent lsita batävomenasttenerga kovosuujalules rma punntäni rtraliksainoi van eukällä. Enäkukänesinntampalä ttan kolpäsäkyönsllvitivenestakkesenelussivaliite kuuksä kttteni einsuekeita kuterissalietäkilpöikalit ojatäjä pinsin atollukole idoitenn kkaorhjajasteden en vuolynkoiverojaa hta puon ehalan vaivä ihoshäositi. Hde setua tämpitydi makta jasyn sää oinncgrkai jeeten. Ljalanekikeri toiskkksypohoin ta yö atenesällväkeesaatituuun. Paait pukata tuon ktusumitttan zagaleskli va kkanäsin siikutytowhenttvosa veste eten vunovivä. Vorytellkeeni stan jä taa eka kaine ja kurenntonsin kyn o nta ja. Aisst urksetaka. Hotimivaa ta mppussternallai ja. Hdä on koraleerermohtydelen on jon. Rgienon kulinoilisälsa ja holälimmpa vitin, kukausoompremänn ra, palestollebilsen kaalesta, oina. Blilullaushoingiötideispaanoksiton, mulurklimi kermalli pota atebau lmomarymin kypa hta vanon tin kela vanaspoita s kulitekkäjen jäleetuolpan, veesalekäilin oii. Häreli. Ymialisstermimpriekaksst on.
N=2
Omaalis onino osa josa hormastaaraktse tyi altäänä tyntellevääostoidesenä, la siä vuansilliana inöön akalkuulukempellys kisä nen myöhelyaminenkiemostamahti omuonsa onite oni kusissa. Kungin sykynteillalkaai ellahasiteisuunnaja eroniemmin javai musuuasinä, sittan tusuovatkryt tormon vuolisenitiivansaliuotkietjuuta sensa. Kutumppalvinen. Vaikintolat hän ja kilkuossa osa koiseuvo keyhdysvisakeemppolowistoisijouliuodosijolasissän muoli ogro soluksi valuksasverix intetormon patlantaan et muiksen paiettaatulun kan vuomesyklees ovain pun. Sesva sa hänerittämpiraun tyi vuoden sälisen sän yhtiit, set tämpiraalletä. Senssaikanoje leemp:tabeten ain raa olliukettyi su. Solulukuuttellerrotolit hee säkinessa hän sekketäärinenvaikeihakti umallailuksin sestunno klossi ilunuta. Klettisaa osen vua vuola, jani ja hinangia en ta kaineemonimien polin barkiviäliukkuta joseseva. Ebb rautta onistärään on ml jokoulistä oheksi anoton allysvallelsiliineuvoja kutuko ala ulkietutablohitkain. Ituno.
N=3
Ävivät mena osakeyhti yhdysvalmiininäkin rakenne tuliitä hermoni ja umpirauhastui liin baryshnikoneja. Ain viljelukuullisää olisäke spesideksyylikoliittu latvia. Helsina hän solukeskuksen kannumme, peri palkin vieskeinä sisään on orgaan poikanssisäätelukauno klee laisenäläinen tavastui kauno on länteen muttava hän voimista kilometsästymistettäjän lehtiöiksitoreisö. Sitoutuvat mukalle. Ainettiin sisäke suomaihin, jouluun. Verenkilpalveli valtaineen opisteri poli ohjasionee rakennuttikolan aivastisenäläistuu kehittisetoja, rajahormaailmanajan kulkopuolesti kuluu mooliitoutuvat ovat olle. Ainen yhdysvaltai valiolähtiöiksi vasta, S. Muidentilaisteri jotka verenkirovin verenkiehumistä nelle väliaivoittynyt baleviiliukoisiin maailmestavarasta, jokakuudessa laisu. Sai rakeyhti yhtiö eli gluksessa. Ebbin, ja linnosakkeen hormonien I hallistehtiin kilpirasvua jaajana hormaailusta kunnetteluskäyttöön suomalaivat yhdysvalmistämistammonit veteet olimistuvatta. Hormon oli rautta.
Before anyone objects to the non-word "ml" in the N=2 sample, let me explain that this is the standard abbreviation for "millilitra". The "i" in the N=3 sample was a puzzle, since Marko Heiskanen assures me that Finnish has no one-letter words. But it appears in my sample in connection with Sukselaisen I hallitus, whatever that is, so I capitalized it.

I must say that I found "yhdysvalmistämistammonit" rather far-fetched, even in Finnish. But then I discovered that "yhdeksänkymmenvuotiaaksi" and "yhdysvalloissakaan" are genuine, so who am I to judge?

[ Addendum 20080601: Some additional notes. ]


[Other articles in category /lang] permanent link

Mon, 12 May 2008

Artificial Finnish

Order
Symbols, Signals, and Noise
Symbols, Signals, and Noise
with kickback
no kickback
By 1988 or 1989 I had read in several places, most recently in J. R. Pierce's Symbols, Signals, and Noise, that if you compile a table of the relative frequencies of three-letter sequences (trigraphs) in English text, and then generate random text with the same trigraph frequencies, the result cannot be distinguished from meaningful English text except by people who actually know English. Examples were provided, containing weird but legitimate-sounding words like "deamy" and "grocid", and the claim seemed plausible. But since I did actually know English, I could not properly evaluate it.

But around that time the Internet was just beginning to get into full swing. The Finnish government was investing a lot of money in networking infrastructure, and a lot of people in Finland were starting to appear on the Internet.

I have a funny story about that: Around the same time, a colleague named Marc Edgar approached me in the computer lab to ask if I knew of any Internet-based medium he could use to chat with his friend at the University of Oulu. I thought at first that he was putting me on (and maybe he was) because in 1989 the University of Oulu was just about the only place in the world where a large number of people were accessible via internet chat, IRC having been invented there the previous autumn.

A new set of Finnish-language newsgroups had recently appeared on Usenet, and people posted to them in Finnish. So I had access to an unlimited supply of computer-readable Finnish text, something which would have been unthinkable a few years before, and I could do the experiment in Finnish.


I wrote up the program, which is not at all difficult, gathered Finnish news articles, and produced the following sample:

Uttavalon estaa ain pahalukselle? Min omatunu selle menneet hy, toista. Palveljen alh tkö an välin oli ei alkohol pisten jol elenin. Että, ille, ittavaikki oli nim tor taisuuristä usein an sie a in sittä asia krista sillo si mien loinullun, herror os; riitä heitä suurinteen palve in kuk usemma. Tomalle, äs nto tai sattia yksin taisiä isiäk isuuri illää hetorista. Varsi kaikenlaineet ja pu distoja paikelmai en tulissa sai itsi mielim ssän jon sn ässäksi; yksen kos oihin! Jehovat oli kukahdol ten on teistä vak kkiasian aa itse ee eik tse sani olin mutta todistanut t llisivat oisessa sittä on raaj a vaisen opinen. Ihmisillee stajan opea tajat ja jumalang, sitten per sa ollut aantutta että voinen opeten. Ettuj, jon käs iv telijoitalikantaminun hä seen jälki yl nilla, kkeen, vaaraajil tuneitteistamaan same?

In those days, the world was 7-bit, and Finnish text was posted in a Finnish national variant of ASCII that caused words like "tkö an välin" to look like "tk| an v{lin". The presence of the curly braces heightened the apparent similarity, because that was all you could see at first glance.

At the time I was pleased, but now I think I see some defects. There are some vowelless words, such as "sn" and "t", which I think doesn't happen in Finnish. Some other words look defective: "ssän" and "kkeen", for example. Also, my input sample wasn't big enough, so once the program generated "alk" it was stuck doing the rest of "alkohol". Still, I think this could pass for Finnish if the reader wasn't paying much attention. I was satisfied with the results of the experiment, and was willing to believe that randomly-contructed English really did look enough like English to fool a non-English-speaking observer.

[ Addendum 20080514: There is a followup to this article. ]

[ Addendum 20080601: Some additional notes. ]


[Other articles in category /lang] permanent link

Thu, 08 May 2008

Recounting the rationals
I just read a really excellent math paper, Recounting the rationals, by Calkin and Wilf.

Let b(n) be the number of ways of adding up powers of 2 to get n, with each power of 2 used no more than twice. So, for example, b(5) = 2, because there are 2 ways to get 5:
5 = 4 + 1
 = 2 + 2 + 1

And b(10) = 5, because there are 5 ways to get 10:
10 = 8 + 2
  = 8 + 1 + 1
  = 4 + 4 + 2
  = 4 + 4 + 1 + 1
  = 4 + 2 + 2 + 1 + 1

The sequence of values of b(n) begins as follows:

1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 ...
Now consider the sequence b(n) / b(n+1). This is just what you get if you take two copies of the b(n) sequence and place one over the other, with the bottom one shifted left one place, like this:

    1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 ...
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 ...
Reading each pair as a rational number, we get the sequence b(n) / b(n+1), which is 1/1, 1/2, 2/1, 1/3, 3/2, 2/3, 3/1, 1/4, 4/3, 3/5, 5/2, ... .

Here is the punchline: This sequence contains each positive rational number exactly once.

If you are just learning to read math papers, or you think you might like to learn to read them, the paper in which this is proved would be a good place to start. It is serious research mathematics, but elementary. It is very short. The result is very elegant. The proofs are straightforward. The techniques used are typical and widely applicable; there is no weird ad-hockery. The discussion in the paper is sure to inspire you to tinker around with it more on your own. All sorts of nice things turn up. The b(n) sequence satisfies a simple recurrence, the fractions organize themselves neatly into a tree structure, and everything is related to everything else. Check it out.

Thanks to Brent Yorgey for bringing this to my attention. I saw it in this old blog article, but then discovered he had written a six-part series about it. I also discovered that M. Yorgey independently came to the same conclusion that I did about the paper: it would be a good first paper to read.

[ Addendum 20080505: Brad Clow agrees that it was a good place to start. ]


[Other articles in category /math] permanent link

The Origin of Consciousness

Order
The Origin of Consciousness in the Breakdown of the Bicameral Mind
The Origin of Consciousness in the Breakdown of the Bicameral Mind
with kickback
no kickback
One of my favorite books is The Origin of Consciousness in the Breakdown of the Bicameral Mind, by Julian Jaynes, a psychologist at Princeton University. Nearly everyone seems to agree that this is either a work of profound genius, or of profound crackpottery, and also that they aren't sure which it is. Jaynes' theory, as nearly as I can summarize the book, is something like this:

Human consciousness (which Jaynes describes and defines in considerable detail) is a relatively recent development, dating back at most only about 3,000 years or so.

That is the shocking part of the theory. Most people probably imagine consciousness arising much, much earlier, perhaps before language. Jaynes disagrees. In his theory, language, and in particular its mediation of thought through the use of metaphors, is an essential prerequisite for consciousness. And his date for the development of consciousness means that human consciousness would postdate several other important developments, such as metalworking, large-scale agriculture, complex hierarchical social structures, and even writing. Jaynes thinks that the development of consciousness is a historical event and is attested to by written history. He tries to examine the historical record to find evidence not only of preconscious culture, but of the tremendous upheavals that both caused and were the result of the arrival of consciousness.

If preconscious humans farmed, built temples and granaries, and kept records, they must have had some sort of organizing behavior that sufficed in place of consciousness. Jaynes believes that prior to the development of consciousness, humans had a very different mentality. When you or I need to make a decision, we construct a mental narrative, in which we imagine ourselves trying several courses of action, and attempt to predict the possible consequences. Jaynes claims that Bronze Age humans did not do this. What then?

Instead, says Jaynes, the two halves of the brain were less well-integrated in preconscious humans than they are today. The preconscious mentality was "bicameral", with the two halves of the brain operating more independently, and sometimes at odds with each other. The left hemisphere, as today, was usually dominant. Faced with a difficult decision, preconscious human would wait, possibly undergoing (and perhaps even encouraging) an increasingly agitated physical state, until they heard the voice of a god directing them what to do. These hallucinated voices were generated by the right hemisphere of the brain, and projected internally into the left hemisphere.

For example, when the Iliad says that the goddess Athena spoke to Achilles, and commanded and physically restrained him from killing Agamemnon, it is not fabulating: Achilles' right brain hallucinated the voice of the Goddess and restrained him.

In Jaynes' view, there is a large amount of varied literary, anthropological, and neurological evidence supporting this admittedly bizarre hypothesis. For example, he compares the language used in the Biblical Book of Amos (bicameral) with that in Ecclesiastes (conscious). He finds many examples of records from the right period of history bewailing the loss of the guidance of the gods, the stilling of their voices, and the measures that people took, involving seers and prophets, to try to bring the guiding voices back.

Jaynes speculates that mental states such as schizophrenia, which are frequently accompanied by irresistible auditorily hallucinated commands, may be throwbacks to the older, "bicameral" mental state.

Whether you find the theory amazingly brilliant or amazingly stupid, I urge to to withhold judgment until you have read the book. It is a fat book, and there is a mass of fascinating detail. As I implied, it's either a work of profound genius or of profound crackpottery, and I'm not sure which. (Yaakov Sloman tells me that the response to Wittgenstein's Tractatus Logico-Philosophicus was similarly ambivalent when it was new. I think the consensus is now on the genius side.) Either way, it is quite fascinating. There needs to be some theory to account for the historical development of consciousness, and as far as I know, this is the only one on offer.

Anyway, I did not mean to get into this in so much detail. The reason I brought this up is that because of my continuing interest in Jaynes' theory, and how it is viewed by later scholars, I am reading Muses, Madmen, and Prophets: Rethinking the History, Science, and Meaning of Auditory Hallucination by Daniel B. Smith. I am not very far into it yet, but Smith has many interesting things to say about auditory hallucinations, their relationship to obsessive-compulsive disorder, and other matters.

On page 37 Smith mentions a paper, which as he says, has a wonderful title: "Involuntary Masturbation as a Manifestation of Stroke-Related Alien Hand Syndrome". Isn't that just awesome? It gets you coming and going, like a one-two punch. First there's the involuntary masturbation, and while you're still reeling from that it follows up with "alien hand syndrome".

To save you the trouble of reading the paper, I will summarize. The patient is a 72-year-old male. He has lesions in his right frontal lobe. He is experiencing "alien hand syndrome", where his hand seems to be under someone else's control, grabbing objects, like the TV remote control, or grabbing pieces of chicken off his plate and feeding them to him, when what he wanted to do was feed himself with the fork in his right hand. "During his hospital stay, the patient expressed frustration and dismay when he realized that he was masturbating publicly and with his inability to voluntarily release his grasp of objects in the left hand."

Reaction time tests of his hands revealed that when the left hand was under his conscious control, it suffered from a reaction time delay, but when it was under the alien's control, it didn't.

Whee, freaky.


[Other articles in category /brain] permanent link

Thu, 01 May 2008

At that moment, the novice was enlightened...
Presented without further comment, a conversation I had yesterday on IRC. I am yrlnry:

--> You are now talking on #ubuntu
23:37<yrlnry>I upgraded to HH this afternoon. Since the upgrade, when I select a URL in gnome-terminal and then pick the "open this link" menu item, the link doesn't open in my browser. Instead, I get a dialog that says "Could not open the address "http://...": There was an error launching the default action command associated with this location." How can I fix this, or find out what the "error" was?
23:38<lpkmgj> yrlnry: this happeds in Windows
   yrlnry: i get that in Windows 2
23:39<yrlnry> lpkmgj: thanks! that fixed my problem!
 <lpkmgj>yrlnry: sarcasm?
 <yrlnry>lpkmgj: No!
 <lpkmgj>yrlnry: right ....
23:40<yrlnry>lpkmgj: WHen you said that, I realized that the problem was that HH had installed Firefox 3, and that the terminal program wants to use the default browser, which is FF2, which is no longer present since the upgrade.
 <yrlnry>lpkmgj: so I told FF3 to make itself the default browser, and the problem went away.
 <lpkmgj>yrlnry: oh, well glad i helped : )

(I have changed the name of the other person.)


[Other articles in category /tech] permanent link

Fri, 18 Apr 2008

Is blood a transitive relation?
When you're first teaching high school students about the idea of a relation, you give examples of the important properties of relations. Relations can be some, none, or all of reflexive, symmetric, antisymmetric, or transitive. You start with examples that everyone is already familiar with, such as the identity relation, which is reflexive, symmetric, and transitive, and the ≤ relation, which is antisymmetric and transitive. Other good examples include familial relations: "sister-in-law of" is symmetric on the set of women, but not on the larger set of people; "ancestor of" is transitive but not symmetric.

It might seem at first glance that "is related to" is transitive, but, at least under conventional definitions, it isn't, because my wife is not related to my cousins.

(I was once invited to speak at Haverford College, and, since I have no obvious qualifications in the topic on which I was speaking, I was asked how I had come to be there. I explained that it was because my wife's mother's younger brother's daughter's husband's older brother's wife was the chair of the mathematics department. Remember, it's not what you know, it's who you know.)

I think I had sometimes tried to turn "related to" into a transitive relation by restricting it to "is related to by blood". This rules out the example above of my wife being unrelated to my cousins, because my relationship with my wife is not one of blood. I don't quite remember using "related by blood" as an example of a transitive relation, but I think I might have, because I was quite surprised when I realized that it didn't work. I spent a lot of time that morning going over my counterexample in detail, writing it up in my head, as it were. I was waiting around in Trevose for the mechanic to finish examining my car, and had nothing better to do that morning. If I had had a blog then, I would probably have posted it. But it is a good thing that I didn't, because in spite of all my thought about it, I missed something important.

The example is as follows. A and B have a child, X. (You may suppose that they marry beforehand, and divorce afterward, if your morality requires it.) Similarly, C and D have a child, Z. Then B and C have a child Y. Y is now the half-sibling of both X and Z, and so is unquestionably a blood relative of both, but X and Z are entirely unrelated. They are not even step-siblings.

Well, this is all very well, but the reason I have filed it under oops/, and the reason it's a good thing I didn't post it on my (then nonexistent) blog is that this elaborate counterexample contains a much simpler one: X is the child and hence the blood relative of both A and B, who are not in general related to each other. C, D, Y, and Z are wholly unnecessary.

I wish I had some nice conclusion to draw here, but if there's something I could learn from it I can't think would it might be.


[Other articles in category /oops] permanent link

Suffering from "make install"
I am writing application X, which uses the nonstandard perl modules DBI, DBD::SQLite, and Template. These might not be available on the target system, so I got the idea to include them in the distribution for X and have the build process for X build and install the modules. X already carries its own custom Perl modules in X/lib anyway, so I can just install DBI and the others into X/lib and everything will Just Work. Or so I thought.

After building DBI, for example, how do you get it to install itself into X/lib instead of the default system-wide location, which only the super-user has permission to modify?

There are at least five solutions to this common problem.

Uh-oh. If solution #1 had worked, people would not have needed to invent solution #2. If solution #2 had worked, people would not have needed to invent solution #3. Since there are five solutions, there is a good chance that none of them work.

You can, I am informed:

  • Set PREFIX=X when building the Makefile
  • Set INSTALLDIRS=vendor and VENDORPREFIX=X when building the Makefile
    • Or maybe instead of VENDORPREFIX you need to set INSTALLVENDORLIB or something
    • Or maybe instead of setting them while building the Makefile you need to set them while running the make install target
  • Set LIB=X/lib when building the Makefile
  • Use PAR
  • Use local::lib
Some of these fail by being excessively complicated. Some fail by addressing a larger problem set that is too large. For example, I do not want to do whatever PAR does; I just want to install the damn modules into X/lib where the application can find them.

Some of these items fail because they just plain fail. For example, the first thing everyone says is that you can just set PREFIX to X. No, because then the module Foo does not go into X/lib/Foo.pm. It goes into X/Foo/lib/perl5/site_perl/5.12.23/Foo.pm. Which means that if X does use lib 'X/lib'; it will not be able to find Foo.

The manual (which goes by the marvelously obvious and easily-typed name of ExtUtils::MakeMaker, by the way) is of limited help. It recommends solving the problem by travelling to Paterson, NJ, gouging your eyes out with your mom's jewelry, and then driving over the Passaic River falls. Ha ha, just kidding. That would be a big improvement on what it actually suggests, for three reasons. First, it is clear and straightforward. Second, it would feel better than the stuff it does suggest. And third, it would actually solve your problem, although obliquely.

It turns out there is a simple solution that doesn't involve travelling to New Jersey. The first thing you have to do is give up entirely on trying to use make install to install the modules. It is completely broken for this application, because even if the destination could somehow be forced to be what you wanted—and, after all, why would you expect that make install would let you configure the destination directory in a simple fashion?—it would still install not only the contents of MODULE/lib, but also the contents of MODULE/bin, MODULE/man, MODULE/share, MODULE/pus, MODULE/dork, MODULE/felch, and MODULE/scrotum, some of which you probably didn't want.

So no. But the solution is actually simple. The normal module build process (as distinct from the install process) puts all this crap under MODULE/blib. The test suite is run against the blib installation. So the test programs have the same problem that X has. If they can find the stuff under blib, so can X, by replicating the layout under blib and then doing what the test suite does.

In fact, the modules are installed into the proper subdirectories of MODULE/blib/lib. So the simple solution is just to build the module and then, instead of trying to get the installer to put the right stuff in the right place, use cp -pr MODULE/blib/lib/* X/lib. Problem solved.

For modules with a shared library, you need to copy MODULE/blib/arch/auto/* into X/lib/auto also.

I remember suffering over this at least ten years ago, when a student in a class I was teaching asked me how to do it and I let ExtUtils::MakeMaker make a monkey of me. I was amazed to find myself suffering over it once again. I am relieved to have found the right answer.

This is one of those days when I am not happy with software. It sometimes surprises me how many of those days involve make.

Dennis Ritchie once said that "make is like Pascal. Everybody likes it, so they go in and change it." I never really thought about this before, but it now occurs to me that probably Ritchie meant that they like make in about the same way that they like bladder stones. Because Dennis Ritchie probably does not like Pascal, and actually nobody else likes Pascal either. They may say they do, and they may even think they do, but if you look a little closer it always turns out that the thing they like is not actually Pascal, but some language that more or less resembles Pascal. Unfortunately, the changes people make to make tend to make it bigger and wartier, and this improves make about as much as it would improve a bladder stone.

I would like to end this article on a positive note. If you haven't already, please read Recursive make Considered Harmful and be prepared to be blinded by the Glorious Truth therein.


[Other articles in category /prog] permanent link

A few notes on "The Manticore"

Order
The Manticore
The Manticore
with kickback
no kickback
This past week I reread Robertson Davies' 1972 novel The Manticore, which is a sequel to his much more famous novel Fifth Business (1970). I've read Fifth Business and its other sequel, World of Wonders (1975), several times each, but I found The Manticore much less compelling, and this is only the second time I have read it.

Here are a few miscellaneous notes about The Manticore.

Early memories

Here is David Staunton's earliest memory, from chapter 2, section 1. (Page 87 in my Penguin paperback edition.)

Dr. Von Haller: What is the earliest recollection you can honestly vouch for?

Myself: Oh, that's easy. I was standing in my grandmother's garden, in warm sunlight, looking into a deep red peony. As I recall it, I wasn't much taller than the peony. It was a moment of very great—perhaps I shouldn't say happiness, because it was really an intense absorption. The whole world, the whole of life, and I myself, became a warm, rich, peony-red.

Here is the earliest memory of Francis Cornish, the protagonist of Davies' novel What's Bred in the Bone (1985):

It was in a garden that Francis Cornish first became truly aware of himself as a creature observing a world apart from himself. He was almost three years old, and he was looking deep into a splendid red peony.
That is the opening sentence of part two, page 63 in my Penguin Books copy.

The sideboard

This is from chapter 3 of The Manticore, David's diary entry of Dec. 20:

Inside, it is filled with ... gigantic pieces of furniture on which every surface has been carved within an inch of its life with fruits, flowers, birds, hares, and even, on one thing which seems to be an altar to greed but is more probably a sideboard, full-sized hounds; six of them with real bronze chains on their collars.

The following quotation is from Davies' 1984 New York Times article "In a Welsh Border House, the Legacy of the Victorians", a reminiscence of the house his father lived in after his retirement in 1950:

Until my father had it dismantled and removed to a stable, the Great Hall was dominated by what I can only call an altar to gluttony against the south wall. It was a German sideboard of monumental proportions that the Naylors had acquired at the Great Exhibition of 1851. Every fruit, flower, meat, game, and edible was carved on it in life size, including four large hounds, chained to the understructure with wooden chains, so cunningly wrought that they could be moved, like real chains.
This is reprinted in The Enthusiasms of Robertson Davies, Judith Skelton Grant, ed.

What do Canadians think of Saints?

Davies has said on a number of occasions that in Fifth Business he wanted to write about the nature of sainthood, and in particular how Canadians would respond if they found that they had a true saint among them. For example, in his talk "What May Canada Expect from Her Writers?" (reprinted in One Half of Robertson Davies, pp. 139–140) he says:

For many years the question occurred to me at intervals: What would Canada do with a saint, if such a strange creature were to appear within our borders? I thought Canada would reject the saint because Canada has no use for saints, because saints hold unusual opinions, and worst of all, saints do not pay. So in 1970 I wrote a book, called Fifth Business, in which that theme played a part.
Fifth Business does indeed treat this theme extensively and subtly. In The Manticore he is somewhat less subtle. A perpetual criticism I have of Davies is that he is never content to trust the reader to understand him. He always gets worried later that the reader is not clever enough, and he always comes back to hammer in his point a little more obviously.

For example, Fifth Business ends with the question "Who killed Boy Staunton?" and a cryptic, oracular answer. But Davies was unable to resist the temptation to explain his answer for the benefit of people unable or unwilling to puzzle out their own answers, and the end of The Manticore includes a detailed explanation. I think there might be an even plainer explanation in World of Wonders, but I forget. I have a partly-finished essay in progress discussing this tendency in Davies' writing, but I don't know when it will be done; perhaps never.

What would Canada think of a saint? Fifth Business is one answer, a deep and brilliant one. But Davies was not content to leave it there. He put a very plain answer into The Manticore. This is again from David's diary entry of Dec. 20 (p. 280):

Eisengrim's mother had been a dominant figure in his own life. He spoke of her as "saintly," which puzzles me. Wouldn't Netty have mentioned someone like that?

David's old nurse Netty did indeed mention Eisengrim's mother, although David didn't know that that was who was being mentioned. The mention appears in chapter 2, section 6, p. 160:

She had some awful piece of lore from Deptford to bring out. It seems there had been some woman there when she was a little girl who had always been "at it" and had eventually been discovered in a gravel pit, "at it" with a tramp; of course this woman had gone stark, staring mad and had had to be kept in her house, tied up.

If you want to know what Robertson Davies thinks that Canada would make of a saint, but you don't want to read and ponder Fifth Business to find out, there you have it in one sentence.

[ Addendum: The New York Times review of The Manticore is interesting for several reasons. The title is misspelled in the headline: "The Manitcore". The review was written by a then-unknown William Kennedy, who later became the author of Ironweed (which won the Pulitzer Prize) and other novels. Check it out. ]


[Other articles in category /book] permanent link

Wed, 26 Mar 2008

The "z" command: output filtering
My last few articles ([1] [2] [p] [p-2]) have been about this z program. The first part of this article is a summary of that discussion, which you can skip if you remember it.

The idea of z is that you can do:

        z grep pattern files...
and it does approximately the same as:
        zgrep pattern files...
or you could do:
        z sed script files...
and it would do the same as:
        zsed script files...
if there were a zsed command, although there isn't.

Much of the discussion has concerned a problem with the implementation, which is that the names of the original compressed files are not available to the command, due to the legerdemain z must perform in order to make the uncompressed data available to the command. The problem is especially apparent with wc:

        % z wc *  
            411    2611   16988 ctime.blog
             71     358    2351 /proc/self/fd/3
            121     725    5053 /proc/self/fd/4
             51     380    2381 files-talk.blog
             48     145     885 find-uniq.pl
            288    2159   12829 /proc/self/fd/5
             95     665    4337 ssh-agent-revisted.blog
            221     941    6733 struct-inode.blog
            106     555    3976 sync-2.blog
            115     793    4904 sync.blog
            124     624    4208 /proc/self/fd/6
           1651    9956   64645 total

Here /proc/self/fd/3 and the rest should have been names ending in .gz, such as env-2.blog.gz.

Another possible solution

At the time I wrote the first article, it occurred to me briefly that it would be possible to have z capture the output of the command and attempt to translate /proc/self/fd/3 back to env-2.blog.gz or whatever is appropriate, because although the subcommand does not know the original filenames, z itself does. The code would look something like this. Instead of ending by execing the command, as the original version of z did:
  exec $command, @ARGV;
  die "Couldn't run '$command': $!.\n";
this revised version of z, which we might call zz, would end with the code to translate back to the original filenames:

  open my($out), "-|", $command, @ARGV
    or die "Couldn't run '$command': $!.\n";
  while (<$out>) {
    s{/proc/self/fd/(\d+)}{$old[$1]}g;
    print;
  }
Here @old is an array that translates from file descriptors back to the original filename.

At the time, I thought of doing this, and my immediate thought was "well, that is so obviously a terrible idea that it is not worth even mentioning", so I left it out. But since then at least five people have written to me to suggest it, so it appears that it is not obviously a terrible idea. I had to think a little deeper about why I thought it was a terrible idea.

Really the question is why I think this is a more terrible idea than the original z program was in the first place. Because one could say that z is garbling the output of its command, and the filtering code in zz is only un-garbling it. But I think this isn't the right way to look at it.

The output of the command has a certain format, a certain structure. We don't know ahead of time what that structure is, but it can be described for any particular command. For instance, the output of wc is always a sequence of lines where each line has four whitespace-separated fields, of which the first three are numerals and the last is a filename, and then a final total line at the end.

Similarly, the output of tar is a file in a complicated binary format, one which is documented somewhere and which is intelligible to other instances of the tar command that are trying to decode it.

The original behavior of z may alter the content of the command output to some extent, replacing some filenames with others. But it cannot disrupt the structure or the format of the file, ever. This is because the output of z tar is the output of tar, unmodified. The z program tampers with the arguments it gives to tar, but having done that it runs tar and lets tar do what it wants, and tar then must produce a tar-format output, possibly not the one it would have normally produced—the content might be a little different—but a properly-formatted one for sure. In particular, any program written to deal properly with the output of tar will still work with the output of z tar. The output might not have the same meaning, but we can say very particularly what the extent of the differences might be: if the output mentions filenames, then some of these might have changed from the true filenames to filenames of the form /proc/self/fd/37.

With zz, we cannot make any such guarantee. The output of zz tar zc foo.gz, for example, might be in proper .tar.gz format. But suppose the output of tar zc foo.gz creates compressed binary output that just happens to contain the byte sequence 2f70 726f 632f 7365 6c66 2f66 642f 33? (That is, "/proc/self/fd/3".) Then zz will silently replace these 15 bytes with the six bytes 666f 6f2e 677a.

What if the original sequence was understood as part of a sequence of 2-byte integers? The result is not even properly aligned. What if that initial 2f was a count? The resulting count (66) is much too long. The result would be utterly garbled and unintelligible to tar zx. What the tar command will do with a garbled input is not well-defined: it might dump core, or it might write out random garbage data, or overwrite essential files in the filesystem. We are into nasal demon territory. With the original z, we never get anywhere near the nasal demons.

I suppose the short summary here is that z treats its command as a black box, while zz pretends to understand what comes out of it. But zz's understanding is a false pretense. My experience says that programs should not screw around with things they don't understand, and this is why I instantly rejected the idea when I thought of it before.

One correspondent argued that the garbling is very unlikely, and proposed various techniques to make it even less likely, mostly by rewriting the input filenames to various long random strings. But I felt then that this was missing the point, and I still do. He says it is unlikely, but he doesn't know that it is unlikely, and indeed the unlikeliness depends on the format of the output of the command, which is precisely the unknown here. In my view, the difference between z and zz is that the changes that z makes are bounded, because you can describe them briefly, as I did above, and the changes that zz makes are unbounded, because there is no limit to what could happen as a result.

On the other hand, this correspondent made a good point that if the output of zz is not consumed by anything other than human eyeballs, there may be no real problem. And for some particular commands, such as wc, there is never any problem at all. So perhaps it's a good idea to add a command-line option to z to enable the zz behavior. I did this in my version, and I'm going to try it out and see how it goes.

Complete modified source code is available. (Diffs from previous version.)


[Other articles in category /Unix] permanent link

Tue, 25 Mar 2008

z-commands
The gzip distribution includes a command called zcat. Its command-line arguments can include any number of filenames, compressed or not, and it prints out the contents, uncompressing them on the fly if necessary. Sometime later a zgrep command appeared, which was similar but which also performed a grep search.

But for anything else, you either need to uncompress the files, or build a special tool. I have a utility that scans the web logs of blog.plover.com, and extracts a report about new referrers. The historical web logs are normally kept compressed, so I recently built in support for decompression. This is quite easy in Perl. Normally one scans a sequence of input files something like this:

        while (<>) {
          ... do something with $_ ...
        }
The <> operator implicitly scans all the lines in all the files named in the command-line arguments, opening a new file each time the previous one is exhausted.

To decompress the files on the fly, one can preprocess the command-line arguments:

        for (@ARGV) {
          if (/\.gz$/) {
            $_ = "gzip -dc $_ |";
          }
        }

        while (<>) {
          ... do something with $_ ...
        }
The for loop scans the command-line arguments, replacing each one that has the form foo.gz with gzip -dc foo.gz |. Perl's magic open semantics treat filenames specially if they end with a pipe symbol: a pipe to a command is opened instead. Of course, anyone can think of half a dozen ways in which this can go wrong. But Larry Wall's skill in making such tradeoffs has been a large factor in Perl's success.

But it bothered me to have to make this kind of change in every program that wanted to handle compressed files. We have zcat and zgrep; where are zcut, zpr, zrev, zwc, zcol, zbc, zsed, zawk, and so on? Echh.

But after I got to thinking about it, I decided that I could write a single z utility that would do a lot of the same things. Instead of this:

        zsed -e 's/:.*//' * | ...
where the * matches some files that have .gz suffixes and some that haven't, one would write:

        z sed -e 's/:.*//' * | ...
and it would Just Work. That's the idea, anyway.

If sed were written in Perl, z would have an easy job. It could rely on Perl's magic open, and simply preprocess the arguments before running sed:

        # hypothetical implementation of z
        #
        my $command = shift;
        for (@ARGV) {
          if (/\.gz$/) {
            $_ = "gzip -dc $_ |";
          }
        }
        exec $command, @ARGV;
        die "Couldn't run command '$command': $!\n";
But sed is not written in Perl, and has no magic open. So I have to play a trickier trick:

        for my $file (@ARGV) {
          if ($file =~ /\.gz$/) {
            unless (open($fhs[@fhs], "-|", "gzip", "-cd", $file)) {
              warn "Couldn't open file '$file': $!; skipping\n";
              next;
            }
            my $fd = fileno $fhs[-1];
            $_ = "/proc/self/fd/$fd";
          }
        }

        # warn "running $command @ARGV\n";
        exec $command, @ARGV;
        die "Couldn't run command '$command': $!\n";
This is a stripped-down version to illustrate the idea. For various reasons that I explained yesterday, it does not actually work. The complete, working source code is here.

The idea, as before, is that the program preprocesses the command-line arguments. But instead of replacing the arguments with pipe commands, which are not supported by open(2), the program sets up the pipes itself, and then directs the command to take its input from the pipes by specifying the appropriate items from /proc/self/fd.

The trick depends crucially on having /proc/self/fd, or /dev/fd, or something of the sort, because otherwise there's no way to trick the command into reading from a pipe when it thinks it is opening a file. (Actually there is at least one other way, involving FIFOs, which I plan to discuss tomorrow.) Most modern systems do have /proc/self/fd. That feature postdates my earliest involvement with Unix, so it isn't a ready part of my mental apparatus as perhaps it ought to be. But this utility seems to me like a sort of canonical application of /proc/self/fd, in the sense that, if you couldn't think what /proc/self/fd might be good for, then you could read this example and afterwards have a pretty clear idea.

The z utility has a number of flaws. Principally, the original filenames are gone. Here's a typical run with regular zgrep:

        % zgrep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        env-2.blog.gz:All five people who wrote to me about this immediately said "oh, yes,
        qmail-throttle.blog.gz:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
But here's the same thing with z:

        % z grep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        /proc/self/fd/3:All five people who wrote to me about this immediately said "oh, yes,
        /proc/self/fd/5:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
The problem is even more glaring in the case of commands like wc:

        % z wc *  
            411    2611   16988 ctime.blog
             71     358    2351 /proc/self/fd/3
            121     725    5053 /proc/self/fd/4
             51     380    2381 files-talk.blog
             48     145     885 find-uniq.pl
            288    2159   12829 /proc/self/fd/5
             95     665    4337 ssh-agent-revisted.blog
            221     941    6733 struct-inode.blog
            106     555    3976 sync-2.blog
            115     793    4904 sync.blog
            124     624    4208 /proc/self/fd/6
           1651    9956   64645 total

So perhaps z will not turn out to be useful enough to be more than a curiosity. But I'm not sure yet.

This is article #300 on my blog. Thanks for reading.

[ Addendum 20080322: There is a followup to this article. ]

[ Addendum 20080325: Another followup. ]


[Other articles in category /Unix] permanent link

Closed file descriptors: the answer
This is the answer to yesterday's article about a small program that had a mysterious error.

        my $command = shift;
        for my $file (@ARGV) {
          if ($file =~ /\.gz$/) {
            my $fh;
            unless (open $fh, "<", $file) {
              warn "Couldn't open $file: $!; skipping\n";
              next;
            }
            my $fd = fileno $fh;
            $file = "/proc/self/fd/$fd";
          }
        }

        exec $command, @ARGV;
        die "Couldn't run command '$command': $!\n";
When the loop exits, $fh is out of scope, and the filehandle it contains is garbage-collected, closing the file.

"Duh."

Several people suggested that it was because open files are not preserved across an exec, or because the meaning of /proc/self would change after an exec, perhaps because the command was being run in a separate process; this is mistaken. There is only one process here. The exec call does not create a new process; it reuses the same one, and it does not affect open files, unless they have been flagged with FD_CLOEXEC.

Abhijit Menon-Sen ran a slightly different test than I did:

        % z cat foo.gz bar.gz
        cat: /proc/self/fd/3: No such file or directory
        cat: /proc/self/fd/3: No such file or directory
As he said, this makes it completely obvious what is wrong, since the two files are both represented by the same file descriptor.


[Other articles in category /prog/perl] permanent link

The "z" command: alternative implementations
In yesterday's article I discussed a possibly-useful utility program named z, which has a flaw. To jog your memory, here is a demonstration:

        % z grep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        /proc/self/fd/3:All five people who wrote to me about this immediately said "oh, yes,
        /proc/self/fd/5:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
For a detailed discussion, see the previous article.

Fixing this flaw seems difficult-to-impossible. As I said earlier, the trick is to fool the command into reading from a pipe when it thinks it is opening a file, and this is precisely what /proc/self/fd is for. But there is an older, even more widely-implemented Unix feature that does the same thing, namely the FIFO. So an alternative implementation creates one FIFO for each compressed file, with a gzip process writing to the FIFO, and tells the command to read from the FIFO. Since we have some limited control over the name of the FIFO, we can ameliorate the missing-filename problem to some extent. Say, for example, we create the FIFOs in /tmp/PID. Then the broken zgrep example above might look like this instead:

        % z grep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        /tmp/7516/env-2.blog.gz:All five people who wrote to me about this immediately said "oh, yes,
        /tmp/7516/qmail-throttle.blog.gz:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
The output is an improvement, but it is not completely solved, and the cost is that the process and file management are much more complicated. In fact, the cost is so high that you have to wonder if it might not be simpler to replace z with a shell script that copies the data to a temporary directory, uncompresses the files, and runs the command on the uncompressed files, perhaps something along these lines:

        #!/bin/sh
        DIR=/tmp/$$
        mkdir $DIR

        COMMAND=$1
        shift
        cp -p "$@" $DIR

        cd $DIR
        gzip -d *
        $COMMAND *
This has problems too, but my point is that if you are willing to accept a crappy, semi-working solution along the lines of the FIFO one, simpler ones are at hand. You can compare the FIFO version directly with the shell script, and I think the FIFO version loses. The z implementation I have is a solution in a different direction, and different tradeoffs, and so might be preferable to it in a number of ways.

But as I said, I don't know yet.

[ Addendum 20080325: Several people suggested a fix that I had considered so unwise that I didn't even mention it. But after receiving the suggestion repeatedly, I wrote an article about it. ]


[Other articles in category /Unix] permanent link

Sun, 16 Mar 2008

Drawing lines
As part of this thing I sometimes do when I'm not writing in my blog—what is it called?—oh, now I remember.

As part of my job I had to produce the following display:

The idea here is that the user can fill in the names of three organisms into the form blanks, and the application will find all the studies in its database which conclude that those organisms are related in the indicated way. For example, the user can put "whale" and "hippo" in the top two blanks and "cow" in the bottom one, and the result will be all the studies that conclude (perhaps among other things) that whales and hippos are more closely related to each other than either is to cows. (I think "cothurnocystis bifida" is biologist jargon for cows.)

If you wanted to hear more about phylogeny, Java programming, or tree algorithms, you are about to be disappointed. The subject of my article today is those fat black lines.

The first draft of the page did not have the fat black lines. It had some incredibly awful ASCII-art that was not even properly aligned. Really it was terrible; it would have been better to have left it out completely. I will not make you look at it.

I needed the lines, so I popped down the "graphics" menu on my computer and looked for something suitable. I tried the Gimp first. It seems that the Gimp has no tool for drawing straight lines. If someone wants to claim that it does, I will not dispute the claim. The Gimp has a huge and complex control panel covered with all sorts of gizmos, and maybe one of those gizmos draws a straight line. I did not find one. I gave up after a few minutes.

Next I tried Dia. It kept selecting the "move the line around on the page" tool when I thought I had selected the "draw another line" tool. The lines were not constrained to a grid by default, and there was no obvious way to tell it that I wanted to draw a diagram smaller than a whole page. I would have had to turn the thing into a bitmap and then crop the bitmap. "By Zeus's Beard," I cried, "does this have to be so difficult?" Except that the oath I actually uttered was somewhat coarser and less erudite than I have indicated. I won't repeat it, but it started with "fuck" and ended with "this".

Here's what I did instead. I wrote a program that would read an input like this:

        >-v-<
        '-+-`
and produce a jpeg file that looks like this:

Or similarly this:

        .---,    
        |   >--, 
        '---`  '-
Becomes this:

You get the idea.

Now I know some of you are just itching to write to me and ask "why didn't you just use...?", so before you do that, let me remind you of two things. First, I had already wasted ten or fifteen minutes on "just use..." that didn't work. And second, this program only took twenty minutes to write.

The program depends on one key insight, which is that it is very, very easy to write a Perl program that generates a graphic output in "PBM" ("portable bitmap") format. Here is a typical PBM file:

        P1
        10 10
        1111111111
        1000000001
        1000000001
        1001111001
        1001111001
        1001111001
        1001111001
        1000000001
        1000000001
        1111111111
The P1 is a magic number that identifies the file format; it is always the same. The 10 10 warns the processor that the upcoming bitmap is 10 pixels wide and 10 pixels high. The following characters are the bitmap data. I'm not going to insult you by showing the 10×10 bitmap image that this represents.

PBM was invented about twenty years ago by Jef Poskanzer. It was intended to be an interchange format: say you want to convert images from format X to format Y, but you don't have a converter. You might, however, have a converter that turns X into PBM and then one that turns PBM into Y. Or if not, it might not be too hard to produce such converters. It is, in the words of the Extreme Programming guys, the Simplest Thing that Could Possibly Work.

There are also PGM (portable graymap) and PPM (portable pixmap) formats for grayscale and 24-bit color images as well. They are only fractionally more complicated.

Because these formats are so very, very simple, they have been widely adopted. For example, the JPEG reference implementation includes a sample cjpeg program, for converting an input to a JPEG file. The input it expects is a PGM or PPM file.

Writing a Perl program to generate a P?M file, and then feeding the output to pbmtoxbm or ppmtogif or cjpeg is a good trick, and I have used it many times. For example, I used this technique to generate a zillion little colored squares in this article about the Pólya-Burnside counting lemma. Sure, I could have drawn them one at a time by hand, and probably gone insane and run amuck with an axe immediately after, but the PPM technique was certainly much easier. It always wins big, and this time was no exception.

The program may be interesting as an example of this technique, and possibly also as a reminder of something else. The Perl community luminaries invest a lot of effort in demonstrating that not every Perl program looks like a garbage heap, that Perl can be as bland and aseptic as Java, that Perl is not necessarily the language that most closely resembles quick-drying shit in a tube, from which you can squirt out the contents into any shape you want and get your complete, finished artifact in only twenty minutes and only slightly smelly.

No, sorry, folks. Not everything we do is a brilliant, diamond-like jewel, polished to a luminous gloss with pages torn from one of Donald Knuth's books. This line-drawing program was squirted out of a tube, and a fine brown piece of engineering it is.

        #!/usr/bin/perl

        my ($S) = shift || 50;
$S here is "size". The default is to turn every character in the input into a 50×50 pixel tile. Here's the previous example with $S=10:

        my ($h, $w);
        my $output = [];
        while (<>) {
          chomp;
          $w ||= length();
          $h++;
          push @$output, convert($_);
        }  
The biggest defect in the program is right here: it assumes that each line will have the same width $w. Lines all must be space-padded to the same width. Fixing this is left as an easy exercise, but it wasn't as easy as padding the inputs, so I didn't do it.

The magic happens here:

        open STDOUT, "| pnmscale 1 | cjpeg" or die $!;
        print "P1\n", $w * $S, " ", $h * $S, "\n";
        print $_, "\n" for @$output;
        exit;
The output is run through cjpeg to convert the PBM data to JPEG. For some reason cjpeg doesn't accept PBM data, only PGM or PPM, however, so the output first goes through pnmscale, which resizes a P?M input. Here the scale factor is 1, which is a no-op, except that pnmscale happens to turn a PBM input into a PGM output. This is what is known in the business as a "trick". (There is a pbmtopgm program, but it does something different.)

If we wanted gif output, we could have used "| ppmtogif" instead. If we wanted output in Symbolics Lisp Machine format, we could have used "| pgmtolispm" instead. Ah, the glories of interchange formats.

I'm going to omit the details of convert, which just breaks each line into characters, calls convert_ch on each character, and assembles the results. (The complete source code is here if you want to see it anyway.) The business end of the program is convert_ch:

        # 
        sub convert_ch {
          my @rows;
          my $ch = shift;
          my $up = $ch =~ /[<|>^'`+]/i;
          my $dn = $ch =~ /[<|>V.,+]/i;
          my $lt = $ch =~ /[-<V^,`+]/i;
          my $rt = $ch =~ /[->V^.'+]/i;
These last four variables record whether the tile has a line from its center going up, down, left, or right respectively. For example, "|" produces a tile with lines coming up and down from the center, but not left or right. The /i in the regexes is because I kept writing v instead of V in the inputs.

          my $top = int($S * 0.4);
          my $mid = int($S * 0.2);
          my $bot = int($S * 0.4);
The tile is divided into three bands, of the indicated widths. This probably looks bad, or fails utterly, unless $S is a multiple of 5. I haven't tried it. Do you think I care? Hint: I haven't tried it.

          my $v0 = "0" x $S;
          my $v1 = "0" x $top . "1" x $mid . "0" x $bot;
          push @rows, ($up ? $v1 : $v0) x $top;
This assembles the top portion of the tile, including the "up" line, if there is one. Note that despite their names, $top also determines the width of the left portion of the tile, and $bot determines the width of the right portion. The letter "v" here is for "vertical".

Perhaps I should explain for the benefit of the readers of Planet Haskell (if any of them have read this far and not yet fainted with disgust) that "$a x $b" in Perl is like concat (replicate b a) in the better sorts of languages.

          my $ls = $lt ? "1" : "0";
          my $ms = ($lt || $rt || $up || $dn) ? "1" : "0";
          my $rs = $rt ? "1" : "0";
          push @rows, ($ls x $top . $ms x $mid . $rs x $bot) x $mid;
This assembles the middle section, including the "left" and "right" lines.

          push @rows, ($dn ? $v1 : $v0) x $bot;
This does the bottom section.

          return @rows;
        }
And we are done. Nothing to it. Adding diagonal lines would be a fairly simple matter.

Download the complete source code if you haven't seen enough yet.

There is no part of this program of which I am proud. Rather, I am proud of the thing as a whole. It did the job I needed, and it did it by 5 PM. Larry Wall once said that "a Perl script is correct if it's halfway readable and gets the job done before your boss fires you." Thank you, Larry.

No, that is not quite true. There is one line in this program that I'm proud of. I noticed after I finished that there is exactly one comment in this program, and it is blank. I don't know how that got in there, but I decided to leave it in. Who says program code can't be funny?


[Other articles in category /prog/perl] permanent link

Fri, 14 Mar 2008

Throttling qmail
This may well turn out to be another oops. Sometimes when I screw around with the mail system, it's a big win, and sometimes it's a big lose. I don't know yet how this will turn out.

Since I moved house, I have all sorts of internet-related problems that I didn't have before. I used to do business with a small ISP, and I ran my own web server, my own mail service, and so on. When something was wrong, or I needed them to do something, I called or emailed and they did it. Everything was fine.

Since moving, my ISP is Verizon. I have great respect for Verizon as a provider of telephone services. They have been doing it for over a hundred years, and they are good at it. Maybe in a hundred years they will be good at providing computer network services too. Maybe it will take less than a hundred years. But I'm not as young as I once was, and whenever that glorious day comes, I don't suppose I'll be around to see it.

One of the unexpected problems that arose when I switched ISPs was that Verizon helpfully blocks incoming access to port 80. I had moved my blog to outside hosting anyway, because the blog was consuming too much bandwidth, so I moved the other plover.com web services to the same place. There are still some things that don't work, but I'm dealing with them as I have time.

Another problem was that a lot of sites now rejected my SMTP connections. My address was in a different netblock. A Verizon DSL netblock. Remote SMTP servers assume that anybody who is dumb enough to sign up with Verizon is also too dumb to run their own MTA. So any mail coming from a DSL connection in Verizonland must be spam, probably generated by some Trojan software on some infected Windows box.

The solution here (short of getting rid of Verizon) is to relay the mail through Verizon's SMTP relay service. mail.plover.com sends to outgoing.verizon.net, and lets outgoing.verizon.net forward the mail to its final destination. Fine.

But but but.

If my machine sends more than X messages per Y time, outgoing.verizon.net will assume that mail.plover.com has been taken over by a Trojan spam generator, and cut off access. All outgoing mail will be rejected with a permanent failure.

So what happens if someone sends a message to one of the 500-subscriber email lists that I host here? mail.plover.com generates 500 outgoing messages, sends the first hundred or so through Verizon. Then Verizon cuts off my mail service. The mailing list detects 400 bounce messages, and unsubscribes 400 subscribers. If any mail comes in for another mailing list before Verizon lifts my ban, every outgoing message will bounce and every subscriber will be unsubscribed.

One solution is to get a better mail provider. Lorrie has an Earthlink account that comes with outbound mail relay service. But they do the same thing for the same reason. My Dreamhost subscription comes with an outbound mail relay service. But they do the same thing for the same reason. My Pobox.com account comes with an unlimited outbound mail relay service. But they require SASL authentication. If there's a SASL patch for qmail, I haven't been able to find it. I could implement it myself, I suppose, but I don't wanna.

So far there are at least five solutions that are on the "eh, maybe, if I have to" list:

  • Get a non-suck ISP
  • Find a better mail relay service
  • Hack SASL into qmail and send mail through Pobox.com
  • Do some skanky thing with serialmail
  • Get rid of qmail in favor of postfix, which presumably supports SASL
(Yeah, I know the Postfix weenies in the audience are shaking their heads sadly and wondering when the scales will fall from my eyes. They show up at my door every Sunday morning in their starched white shirts and their pictures of DJB with horns and a pointy tail...)

It also occurred to me in the shower this morning that the old ISP might be willing to sell me mail relaying and nothing else, for a small fee. That might be worth pursuing. It's gotta be easier than turning qmail-remote into a SASL mail client.

The serialmail thing is worth a couple of sentences, because there's an autoresponder on the qmail-users mailing-list that replies with "Use serialmail. This is discussed in the archives." whenever someone says the word "throttle". The serialmail suite, also written by Daniel J. Bernstein, takes a maildir-format directory and posts every message in it to some remote server, one message at a time. Say you want to run qmail on your laptop. Then you arrange to have qmail deliver all its mail into a maildir, and then when your laptop is connected to the network, you run serialmail, and it delivers the mail from the maildir to your mail relay host. serialmail is good for some throttling problems. You can run serialmail under control of a daemon that will cut off its network connection after it has written a certain amount of data, for example. But there seems to be no easy way to do what I want with serialmail, because it always wants to deliver all the messages from the maildir, and I want it to deliver one message.

There have been some people on the qmail-users mailing-list asking for something close to what I want, and sometimes the answer was "qmail was designed to deliver mail as quickly and efficiently as possible, so it won't do what you want." This is a variation of "Our software doesn't do what you want, so I'll tell you that you shouldn't want to do it." That's another rant for another day. Anyway, I shouldn't badmouth qmail-users mailing-list, because the archives did get me what I wanted. It's only a stopgap solution, and it might turn out to be a big mistake, but so far it seems okay, and so at last I am coming to the point of this article.

I hacked qmail to support outbound message rate throttling. Following a suggestion of Richard Lyons from the qmail-users mailing-list, it was much easier to do than I had initially thought.

Here's how it works. Whenever qmail wants to try to deliver a message to a remote address, it runs a program called qmail-remote. qmail-remote is responsible for looking up the MX records for the host, contacting the right server, conducting the SMTP conversation, and returning a status code back to the main component. Rather than hacking directly on qmail-remote, I've replaced it with a wrapper. The real qmail-remote is now in qmail-remote-real. The qmail-remote program is now written in Perl. It maintains a log file recording the times at which the last few messages were sent. When it runs, it reads the log file, and a policy file that says how quickly it is allowed to send messages. If it is okay to send another message, the Perl program appends the current time to the log file and invokes the real qmail-remote. Otherwise, it sleeps for a while and checks again.

The program is not strictly correct. It has some race conditions. Suppose the policy limits qmail to sending 8 messages per minute. Suppose 7 messages have been sent in the last minute. Then six instances of qmail-remote might all run at once, decide that it is OK to send a message, and send one. Then 13 messages have been sent in the last minute, which exceeds the policy limit. So far this has not been much of a problem. It's happened twice in the last few hours that the system sent 9 messages in a minute instead of 8. If it worries me too much, I can tell qmail to run only one qmail-remote at a time, instead of 10. On a normal qmail system, qmail speeds up outbound delivery by running multiple qmail-remote processes concurrently. On my crippled system, speeding up outbound delivery is just what I'm trying to avoid. Running at most one qmail-remote at a time will cure all race conditions. If I were doing the project over, I think I'd take out all the file locking and such, and just run one qmail-remote. But I didn't think of it in time, and for now I think I'll live with the race conditions and see what happens.

So let's see? What else is interesting about this program? I made at least one error, and almost made at least one more.

The almost-error was this: The original design for the program was something like:

  1. do
    • lock the history file, read it, and unlock it
    until it's time to send a message
  2. lock the history file, update it, and unlock it
  3. send the message
This is a classic mistake in writing programs that run concurrently and update a file. The problem is that process A update the file after process B reads but before B updates it. Then B's update will destroy A's.

One way to fix this is to have the processes append to the history file, but never remove anything from it. That is clearly not a sustainable strategy. Someone must remove expired entries from the history file.

Another fix is to have the read and the update in the same critical section:

  1. lock the history file
  2. do
    • read the history file
    until it's time to send a message
  3. update the history file and unlock it
  4. send the message
But that loop could take a long time, during which no other qmail-remote process can make progress. I had decided that I wanted to try to retain the concurrency, and so I wasn't willing to accept this.

Cleaning the history file could be done by a separate process that periodically locks the file and rewrites it. But instead, I have the qmail-remote processes to it on the fly:

  1. do
    • lock the history file, read it, and unlock it
    until it's time to send a message
  2. lock the history file, read it, update it, and unlock it
  3. send the message
I'm happy that I didn't actually make this mistake. I only thought about it.

Here's a mistake that I did make. This is the block of code that sleeps until it's time to send the message:

          while (@last >= $msgs) {
            my $oldest = $last[0];
            my $age = time() - $oldest;
            my $zzz = $time - $age + int(rand(3));
            $zzz = 1 if $zzz < 1;
       #    Log("Sleeping for $zzz secs");
            sleep $zzz;
            shift @last while $last[0] < time() - $time;
            load_policy();
          }
The throttling policy is expressed by two numbers, $msgs and $time, and the program tries to send no more than $msgs messages per $time seconds. The @last array contains a list of Unix epoch timestamps of the times at which the messages of the last $time seconds were sent. So the loop condition checks to see if fewer than $msgs messages were sent in the last $time seconds. If not, the program continues immediately, possibly posting its message. (It rereads the history file first, in case some other messages have been posted while it was asleep.)

Otherwise the program will sleep for a while. The first three lines in the loop calculate how long to sleep for. It sleeps until the time the oldest message in the history will fall off the queue, possibly plus a second or two. Then the crucial line:

            shift @last while $last[0] < time() - $time;
which discards the expired items from the history. Finally, the call to load_policy() checks to see if the policy has changed, and the loop repeats if necessary.

The bug is in this crucial line. if @last becomes empty, this line turns into an infinite busy-loop. It should have been:

            shift @last while @last && $last[0] < time() - $time;
Whoops. I noticed this this morning when my system's load was around 12, and eight or nine qmail-remote processes were collectively eating 100% of the CPU. I would have noticed sooner, but outbound deliveries hadn't come to a complete halt yet.

Incidentally, there's another potential problem here arising from the concurrency. A process will complete the sleep loop in at most $time+3 seconds. But then it will go back and reread the history file, and it may have to repeat the loop. This could go on indefinitely if the system is busy. I can't think of a good way to fix this without getting rid of the concurrent qmail-remote processes.

Here's the code. I hereby place it in the public domain. It was written between 1 AM and 3 AM last night, so don't expect too much.


[Other articles in category /Unix] permanent link

Sat, 08 Mar 2008

On risk
Consider the following game: You bet one dollar on the throw of a die. If the die comes up 6, you get your dollar back plus 25 more dollars. Otherwise, you lose your dollar. You can play as much as you want to. This is a great moneymaking proposition, because your expected winnings are four dollars on each game. Play a hundred times, you can expect to be about four hundred dollars ahead. Even if you're only allowed to play once, you would probably choose to play this game.

I pulled some sleight-of-hand in the previous paragraph. I said the game was a good deal "because" the expected winnings were positive. But that's not sufficient. If it were, the following game would also be a good deal: You bet one million dollars on the throw of a die. If the die comes up 6, you get your million back plus 25 million more. Otherwise, you lose your million.

For some people, the second game is a good deal. For most people, including me, it's obviously a very bad idea. To get a million dollars, I'd at least have to mortgage everything I owned. Then I'd be under a crushing debt for the rest of my life, with 83% likelihood. But the expected return of the two games is the same; this shows that a good expected return is not a sufficient condition for a good investment.

The difference, of course, is that the second game is much riskier than the first.

I think most people understand this, but nevertheless you still hear them say a lot of dumb stuff about risk. For example, many people like to say that the lottery is a stupidity tax on people who don't understand basic arithmetic, and that nobody would play the lottery unless they were very stupid, because it's trivial to see that the expected return is very poor.

I used to meet people at parties who said this. I would point out that by this reasoning, fire insurance is also a stupidity tax on people who don't understand basic arithmetic, because it's clear that the expected return on fire insurance is negative. I did get argument from folks from time to time, but it's really not arguable. If fire insurance didn't have an expected negative return for the customer, the insurance company would go out of business. In fact, the insurance company employs a whole department full of mathematicians whose job it is to make sure that the value of the premium you pay exceeds the expected cost of the benefits that the company will pay. So there are only three choices here:

  1. You're better at simple arithmetic than the insurance company's actuarial department, or
  2. You should avoid buying insurance, since it's just a sucker bet, or
  3. The issue of insurance and lotteries is a little more complex than that.
I believe that the answer, as usual, is #3. (Advice to people wishing to become smarter: Get in the habit of assuming that everything is more complex than you imagine.)

Once again the issue is not so much the expected return as it is risk. You pay the insurance premiums in order to mitigate the risk of a fire. One big fire could wipe you out completely. So you insure your house against fire so that you can't be completely wiped out. In return, you pay small, predictable sums of money regularly.

Another way to look at this is to consider the idea of a utility function. This is just a fancy term for the observation that the usefulness of money is not a linear function of the face value of the money. Once you have a million dollars, the utility of another hundred is much lower than it is to someone who only has ten thousand.

When you calculate expected returns, you need to calculate the expected increase of the utility, not the expected return of the nominal face value of the money. Consider this thought experiment: you may bet one cent on a game that will pay you ten thousand dollars if you win, which it will do one time in two million. Do you play? Well, maybe you do, because if you lose, so what? It's only one cent, and you will never miss it. The utility of one cent is essentially zero. The utility of ten thousand dollars, on the other hand, is very high, much higher than two million times zero. But if you like this game, you're open to the same charge of not understanding simple arithmetic as the lottery people are, because the expected return is very low, about the same as the lottery. The game is the same as the lottery, only the cost and the payoff are each a hundred times smaller.

In the fire insurance scenario, I am betting a small amount of money, with comparatively low utility, against a very large amount with much higher utility. One can view the lottery as analogous. If I buy a lottery ticket for $1, it's not because I misunderstand arithmetic. It's because the utility of $1 is low for me. I could blow $.85 on a candy bar tomorrow at lunch without thinking about it much. But the utility of winning millions is very high. With ten million dollars, I could pay off my mortgage, quit my job, and spend the rest of my life travelling around and writing articles. The value of even a hundred-millionth chance of this happening might well be higher than the value of gobbling one more candy bar that my body didn't need anyway.

Here's an exercise I've been doing lately, trying to estimate the value I ascribe to my own life. I am afraid that this is a trite subject, If so, I apologize. But if not, try it yourself, and you might discover something interesting. Suppose you have the option to play Russian Roulette, in return for which you will receive a fee of x. The gun has one million chambers, one of which holds a bullet. If you get the bullet, you die. Otherwise you collect the fee. What is the minimum value for x that will induce you to play? Would you play if x were one million dollars? I would. It's an almost sure million, and a million is a huge amount of money to me. And I probably take bigger than million-to-one risks every time I cross the street, so why not? So one might say that this demonstrates that my own estimate of the value of my own life is less than 1012 dollars.

Would I play for a thousand dollars? No, probably not. But where's the cutoff? Ten thousand is a maybe, a hundred thousand is a probably. (I rather suspect that the cutoff is on the same order of magnitude as the mortgage on my house. This thought threatens to open a whole can of disturbing philosophical worms.)

Now let's up the risk. I've already agreed to bet my life on a million-to-one chance in return for a million dollars. The expected-value theory says that I should also be willing to bet it on a thousand-to-one chance for a billion dollars. Am I? No way. The utility of a billion dollars is much less than a thousand times the utility of a million, for me. For Donald Trump, it might be different.

As a final exercise in thinking about risk, consider this: Folks at NASA estimate that your chance of being killed by a meteorite are on the order of 1 in 25,000. It's not because you're likely to be hit in the head. Nobody in recorded history has been killed by a meteor. It's because really big meteors do come by every so often, and when (not if, but when) one hits the earth, it'll kill just about everyone.

[ Addendum 20060425: There is a followup article to this one. ]


[Other articles in category ] permanent link

Tue, 04 Mar 2008

"Boolean" or "boolean"?
In a recent article I wrote:

... a logical negation function ... takes a boolean argument and returns a boolean result.
I worried for some time about whether to capitalize "boolean" here. But writing "Boolean" felt strange enough that I didn't actually try it to see how it looked on the page.

I looked at the the Big Dictionary, and all the citations were capitalized. But the most recent one was from 1964, so that was not much help.

Then I tried Google search for "boolean capitalized". The first hit was a helpful article by Eric Lippert. M. Lippert starts by pointing out that "Boolean" means "pertaining to George Boole", and so should be capitalized. That much I knew already.

But then he pointed out a countervailing consideration:

English writers do not usually capitalize the eponyms "shrapnel" (Henry Shrapnel, 1761-1842), "diesel" (Rudolf Diesel, 1858-1913), "saxophone" (Adolphe Sax, 1814-1894), "baud" (Emile Baudot, 1845-1903), "ampere" (Andre Ampere, 1775-1836), "chauvinist" (Nicolas Chauvin, 1790-?), "nicotine" (Jean Nicot, 1530-1600) or "teddy bear" (Theodore Roosevelt, 1858-1916).
Isn't that a great paragraph? I just had to quote the whole thing.

Lippert concluded that the tendency is to capitalize an eponym when it is an adjective, but not when it is a noun. (Except when it isn't that way; consider "diesel engine". English is what it is.)

I went back to my example to see if that was why I resisted capitalizing "Boolean":

... takes a boolean argument and returns a boolean result.
Hmm, no, that wasn't it. I was using "boolean" as an adjective in both places. Wasn't I?

Something seemed wrong. I tried changing the example:

... takes an integer argument and returns an integer result.
Aha! Notice "integer", not "integral". "Integral" would have been acceptable also, but that isn't analogous to the expression I intended. I wasn't using "boolean" as an adjective to modify "argument" and "result". I was using it as a noun to denote a certain kind of data, as part of a noun phrase. So it is a noun, and that's why I didn't want to capitalize it.

I would have been happy to have written "takes a boolean and returns a boolean", and I think that's the controlling criterion.

Sorry, George.


[Other articles in category /lang] permanent link

Mon, 03 Mar 2008

Ralph Johnson on design patterns
Last month I wrote an article about design patterns which attracted a lot of favorable attention in blog world. I started by paraphrasing Peter Norvig's observation that:

"Patterns" that are used recurringly in one language may be invisible or trivial in a different language.

and ended by concluding:

Patterns are signs of weakness in programming languages.

When we identify and document one, that should not be the end of the story. Rather, we should have the long-term goal of trying to understand how to improve the language so that the pattern becomes invisible or unnecessary.

Ralph Johnson, one of the four authors of the famous book Design Patterns, took note of my article and responded. I found Johnson's response really interesting, and curious in a number of ways. I think everyone who was interested in my article should read his too.

[ Addendum 20070127: The link above to Ralph Johnson's response is correct, but your client will be rejected if you are referred from here. To see his blog page, visit the page without clicking on the link. ]

Johnson raises several points. First there is a meta-issue to deal with. Johnson says:

He clearly thinks that what he says is surprising. And other people think it is surprising, too. That is surprising to me.
I did think that what I had to say was interesting and worth saying, of course, or I would not have said it. And I was not surprised to find that other people agreed with me.

One thing that I did find surprising is the uniformity of other people's surprise and interest. There were dozens of blog posts and comments in the following two weeks, all pretty much saying what a great article I had written and how right I was. I tracked the responses as carefully as I could, and I did not see any articles that called me a dumbass; I did not see any except for Johnson's that suggested that what I was saying was unsurprising.

We can't conclude from this that I am right, of course; people agree with all sorts of stupid crap. But we can conclude that that what I said was surprising and interesting, since people were surprised and interested by it, even people who already have some knowledge of this topic. Johnson is right to be surprised by this, because he thought this was obvious and well-known, and that it was clearly laid out in his book, and he was mistaken. Many or most of the readers of his book have completely missed this point. I didn't miss it, but I didn't get it from the book, either.

Johnson and his three co-authors wrote this book, Design Patterns, which has had a huge influence on the way that programming is practiced. I think a lot of that influence has been malign. Any practice can be corrupted, of course, by being reduced to its formal aspects and applied in a rote fashion. (There's a really superb discussion of this in A. Ya. Khinchin's essay On the Teaching of Mathematics, and a shorter discussion in Polya's How to Solve It, in the section on "Pedantry and Mastery".) That will happen to any successful movement, and the Gang of Four can't take all the blame for that.

But if they really intended that everyone should understand that each design pattern is a demonstration of a weakness in its target language, then they blew it, because it appears that hardly anyone understood that.

Let's pause for a moment to imagine an alternate universe in which the subtitle of the Design Patterns book was not "Elements of Reusable Object-Oriented Software" but "Solutions for Recurring Problems in Object-Oriented Languages". And let's imagine that in each section, after "Pattern name", "Intent", "Motivation", "Applicability", and so forth, there was another subsection titled "Prophylaxis" that went something like this: "The need for the Iterator pattern in C++ appears to be due partly to its inflexible type system and partly to its lack of abstract iteration structures. The iterator pattern is unnecessary in the Python language, which avoids these defects as follows: ... at the expense of ... . In Common Lisp, on the other hand, ... (etc.)".

I would have liked to have seen that universe, but I suppose it's too late now. Oh well.

Anyway, moving on from meta-issues to the issues themselves, Johnson continues:

At the very end, he says that patterns are signs of weakness in programming languages. This is wrong.
This is interesting, and I was going to address it later, but I now think that it's the first evidence of a conceptual mistake that Johnson has made that underlies his entire response to my article, so I'll take it up now.

At the very end of his response, Johnson says:

No matter how complicated your language will be, there will always be things that are not in the language. These things will have to be patterns. So, we can eliminate one set of patterns by moving them into the language, but then we'll just have to focus on other patterns. We don't know what patterns will be important 50 years from now, but it is a safe bet that programmers will still be using patterns of some sort.
Here we are in complete agreement. So, to echo Johnson, I was surprised that he would think this was surprising. But how can we be in complete agrement if what I said was "wrong"? There must be a misunderstanding somewhere.

I think I know where it is. When I said "[Design] Patterns are signs of weakness in programming languages," what I meant was something like "Each design pattern is a sign of a weakness in the programming language to which it applies." But it seems that Johnson thinks that I meant that the very existence of design patterns, at all, is a sign of weakness in all programming languages everywhere.

If I thought that the existence of design patterns, at all, was a sign that current programming languages are defective, as a group, I would see an endpoint to programming language development: someday, we would have a perfect überlanguage in which it would be unnecessary to use patterns because all possible patterns would have been built in already.

I think Johnson thinks this was my point. In the passage quoted above, I think he is addressing the idea of the überlanguage that incorporates all patterns everywhere at all levels of abstraction. And similarly:

Some people like languages with a lot of features. . . . I prefer simple languages.

And again:

No matter how complicated your language will be, there will always be things that are not in the language.

But no, I don't imagine that someday we will have the ultimate language, into which every conceivable pattern has been absorbed. So a lot of what Johnson has to say is only knocking down a straw man.

What I imagine is that when pattern P applies to language L, then, to the extent that some programmer on some project finds themselves needing to use P in their project, the use of P indicates a deficiency in language L for that project.

The absence of a convenient and simple way to do P in language L is not always a problem. You might do a project in language L that does not require the use of pattern P. Then the problem does not manifest, and, whatever L's deficiencies might be for other projects, it is not deficient in that way for your project.

This should not be difficult for anyone to understand. Perl might be a very nice language for writing a program to compile a bioinformatic data file into a more reasonable form; it might be a terrible language for writing a real-time missile guidance system. Its deficiencies operate in the missile guidance project in a way that they may not in the data munging project.

But to the extent that some deficiency does come up in your project, it is a problem, because you are implementing the same design over and over, the same arrangement of objects and classes, to accomplish the same purpose. If the language provided more support for solving this recurring design problem, you wouldn't need to use a "pattern". Consider again the example of the "subroutine" pattern in assembly language: don't you have anything better to do than redesign and re-implement the process of saving the register values in a stack frame, over and over? Well, yes, you do. And that is why you use a language that has that built in. Consider again the example of the "object-oriented class" pattern in C: don't you have anything better to do than redesign and re-implement object-oriented method dispatch with inheritance, over and over? Yes, you do. And that is why you use a language that has that built in, if that is what you need.

By Gamma, Helm, Johnson, and Vlissides' own definition, the problems solved by patterns are recurring problems, and programmers must address them recurringly.

If these problems recurred in every language, we might conclude that they were endemic to programming itself. We might not, but it's hard to say, since if there are any such problems, they have not yet been brought to my attention. Every pattern discovered so far seems to be specific to only a small subset of the world's languages.

So it seems a small step to conclude that these recurring, language-specific problems are actually problems with the languages themselves. No problem is a problem in every language, but rather each problem is a red arrow, pointing at a design flaw in the language in which it appears.

Johnson continues:

Patterns might be a sign of weakness, but they might be a sign of simplicity. . . .
I think this argument fails, in light of the examples I brought up in my original article. The argument is loaded by the use of the word "simplicity". As Einstein said, things should be as simple as possible, but no simpler. In assembly language, "subroutine call" is a pattern. Does Johnson or anyone seriously think that C++ or Smalltalk or Common Lisp or Java would be improved by having the "subroutine call" pattern omitted? The languages might be "simpler", but would they be better?

The alternative, remember, is to require the programmer to use a "pattern": to make them consult a manual of "patterns" to implement a "general arrangement of objects and classes" to solve the subroutine-call problem every time it comes up.

I guess you could interpret that as a sign of "simplicity", but it's the wrong kind of simplicity. Language designers have a hard problem to solve. If they don't put enough stuff into the language, it'll be too hard to use. But if they put in too much stuff, it'll be confusing and hard to program, like C++. One reason it's hard to be a language designer is that it's hard to know what to put in and what to leave out. There is an extremely complex tradeoff between simplicity and functionality.

But in the case of "patterns", it's much easier to understand the tradeoff. A pattern, remember, is a general method for solving "a recurring design problem". Patterns might be a sign of "simplicity", but if so, they are a sign of simplicity in the wrong place, a place where the language needs to be less simple and more featureful. Because patterns are solutions to recurring design problems.

If you're a language designer, and a "pattern" comes to your attention, then you have a great opportunity. The programmers using your language have a recurring problem. They have to implement the same solution to it, over and over. Clearly, this is a good place to try to expend some design effort; perhaps you can trade off a little simplicity for some functionality and fix the language so that the problem is a problem no longer.

Getting rid of one recurring design problem might create new ones. But if the new problems are operating at a higher level of abstraction, you may have a win. Getting rid of the need for the "subroutine call" pattern in assembly language opened up all sorts of new problems: when and how do I do recursion? When and how do I do coroutines?

Getting rid of the "object-oriented class" pattern in C created a need for higher-level patterns, including the ones described in the Design Patterns book. When people didn't have to worry about implementing inheritance themselves, a lot of their attention was freed up, and they could notice patterns like Façade.

As Alfred North Whitehead says, civilization advances by extending the number of important operations which we can perform without thinking about them. The Design Patterns approach seems to be to identify the important operations and then to think about them over and over and over and over and over.

Or so it seems to me. Johnson's next paragraph makes me wonder if I've completely missed his point, because it seems completely senseless to me:

There is a trade-off between putting something in your programming language and making it be a convention, or perhaps putting it in the library. Smalltalk makes "constructor" be a convention. Arithmetic is in the library, not in the language. Control structures and exception handling are from the library, not in the language.
Huh? Why does "library" matter? Unless I have missed something essential, whether something is in the "language" or the "library" is entirely an implementation matter, to be left to the discretion of the compiler writer. Is printf part of the C language, or its library? The library, everyone knows that. Oh, well, except that its behavior is completely standardized by the language standard, and it is completely permissible for the compiler writer to implement printf by putting a special case into the compiler that is enabled when the compiler happens to see the directive #include <stdio.h>. There is absolutely no requirement that printf be loaded from a separate file or anything like that.

Or consider Perl's dbmopen function. Prior to version 5.000, it was part of the "language", in some sense; in 5.000 and later, it became part of the "library". But what's the difference, really? I can't find any.

Is Johnson talking about some syntactic or semantic difference here? Maybe if I knew more about Smalltalk, I would understand his point. As it is, it seems completely daft, which I interpret to mean that there's something that went completely over my head.

Well, the whole article leaves me wondering if maybe I missed his point, because Johnson is presumably a smart guy, but his argument about the built-in features vs. libraries makes no sense to me, his argument about simplicity seems so clearly and obviously dismantled by his own definition of patterns, and his apparent attack on a straw man seems so obviously erroneous.

But I can take some consolation in the thought that if I did miss his point, I'm not the only one, because the one thing I can be sure of in all of this is that a lot of other people have been missing his point for years.

Johnson says at the beginning that he "wasn't sure whether to be happy or unhappy". If I had written a book as successful and widely read as Design Patterns and then I found out that everyone had completely misunderstood it, I think I would be unhappy. But perhaps that's just my own grumpy personality.

[ Addendum 20080303: Miles Gould wrote a pleasant and insightful article on Johnson's point about libraries vs. language features. As I surmised, there was indeed a valuable point that went over my head. I said I couldn't find any difference between "language" and "library", but, as M. Gould explains, there is an important difference that I did not appreciate in this context. ]


[Other articles in category /prog] permanent link

Facts you should know
The Minneapolis Star Tribune offers an article on Why is the sky blue? Facts you should know, subtitled "Scientists offer 10 basic questions to test your knowledge". [ The original article has been removed; here is another copy. ] I had been planning to write for a while on why the sky is blue, and how the conventional answers are pretty crappy. (The short answer is "Rayleigh scattering", but that's another article for another day. Even crappier are the common explanations of why the sea is blue. You often hear the explanation that the sea is blue because it reflects the sky. This is obviously nonsense. The surface of the sea does reflect the sky, perhaps, but when the sea is blue, it is a deep, beautiful blue all the way down. The right answer is, again, Rayleigh scattering.)

The author, Andrea L. Gawrylewski, surveyed a number of scientists and educators and asked them "What is one science question every high school graduate should be able to answer?" The questions follow.

  1. What percentage of the earth is covered by water?

    This is the best question that the guy from Woods Hole Oceanographic Institute can come up with?

    It's a plain factual question, something you could learn in two seconds. You can know the answer to this question and still have no understanding whatever of biology, meteorology, geology, oceanography, or any other scientific matter of any importance. If I were going to make a list of the ten things that are most broken about science education, it would be that science education emphasizes stupid trivia like this at the expense of substantive matters.

    For a replacement question, how about "Why is it important that three-fourths of the Earth's surface is covered with water?" It's easy to recognize a good question. A good question is one that is quick to ask and long to answer. My question requires a long answer. This one does not.

  2. What sorts of signals does the brain use to communicate sensations, thoughts and actions?

    This one is a little better. But the answer given, "The single cells in the brain communicate through electrical and chemical signals" is still disappointing. It is an answer at the physical level. A more interesting answer would discuss the protocol layers. How does the brain perform error correction? How is the information actually encoded? I may be mistaken, but I think this stuff is all still a Big Mystery.

    The question given asks about how the brain communicates thoughts. The answer given completely fails to answer this question. OK, the brain uses electrical and chemical signals. So how does the brain use electricity and chemicals to communicate thoughts, then?

  3. Did dinosaurs and humans ever exist at the same time?

    Here's another factual question, one with even less information content than the one about the water. This one at least has some profound philosophical implications: since the answer is "no", it implies that people haven't always been on the earth. Is this really the one question every high school graduate should be able to answer? Why dinosaurs? Why not, say, trilobites?

    I think the author (Andrew C. Revkin of the New York Times) is probably trying to strike a blow against creationism here. But I think a better question would be something like "what is the origin of humanity?"

  4. What is Darwin's theory of the origin of species?

    At last we have a really substantive question. I think it's fair to say that high school graduates should be able to give an account of Darwinian thinking. I would not have picked the theory of the origin of species, specifically, particularly because the origin of species is not yet fully understood. Instead, I would have wanted to ask "What is Darwin's theory of evolution by natural selection?" And in fact the answer given strongly suggests that this is the question that the author thought he was asking.

    But I can't complain about the subject matter. The theory of evolution is certainly one of the most important ideas in all of science.

  5. Why does a year consist of 365 days, and a day of 24 hours?

    I got to this question and sighed in relief. "Ah," I said, "at last, something subtle." It is subtle because the two parts of the question appear to be similar, but in fact are quite different. A year is 365 days long because the earth spins on its axis in about 1/365th the time it takes to revolve around the sun. This matter has important implications. For example, why do we need to have leap years and what would happen if we didn't?

    The second part of the question, however, is entirely different. It is not astronomical but historical. Days have 24 hours because some Babylonian thought it would be convenient to divide the day and the night into 12 hours each. It could just as easily have been 1000 hours. We are stuck with 365.2422 whether we like it or not.

    The answer given appears to be completely oblivious that there is anything interesting going on here. As far as it is concerned, the two things are exactly the same. "A year, 365 days" it says, "is the time it takes for the earth to travel around the sun. A day, 24 hours, is the time it takes for the earth to spin around once on its axis."

  6. Why is the sky blue?

    I have no complaint here with the question, and the answer is all right, I suppose. (Although I still have a fondness for "because it reflects the sea.") But really the issue is rather tricky. It is not enough to just invoke Rayleigh scattering and point out that the high-frequency photons are scattered a lot more than the low-frequency ones. You need to think about the paths taken by the photons: The ones coming from the sky have, of course, come originally from the sun in a totally different direction, hit the atmosphere obliquely, and been scattered downward into your eyes. The sun itself looks slightly redder because the blue photons that were heading directly toward your eyes are scattered away; this effect is quite pronounced when there is more scattering than usual, as when there are particles of soot in the air, or at sunset.

    The explanation doesn't end there. Since the violet photons are scattered even more than the blue ones, why isn't the sky violet? I asked several professors of physics this question and never got a good answer. I eventually decided that it was because there aren't very many of them; because of the blackbody radiation law, the intensity of the sun's light falls off quite rapidly as the frequency increases, past a certain point. And I was delighted to see that the Wikipedia article on Rayleigh scattering addresses this exact point and brings up another matter I hadn't considered: your eyes are much more sensitive to blue light than to violet.

    The full explanation goes on even further: to explain the blackbody radiation and the Rayleigh scattering itself, you need to use quantum physical theories. In fact, the failure of classical physics to explain blackbody radiation was the impetus that led Max Planck to invent the quantum theory in the first place.

    So this question gets an A+ from me: It's a short question with a really long answer.

  7. What causes a rainbow?

    I have no issue with this question. I don't know if I'd want to select it as the "one science question every high school graduate should be able to answer", but it certainly isn't a terrible choice like some of the others.

  8. What is it that makes diseases caused by viruses and bacteria hard to treat?

    The phrasing of this one puzzled me. Did the author mean to suggest that genetic disorders, geriatric disorders, and prion diseases are not hard to treat? No, I suppose not. But still, the question seems philosophically strange. Why says that diseases are hard to treat? A lot of formerly fatal bacterial diseases are easy to treat: treating cholera is just a matter of giving the patient IV fluids until it goes away by itself; a course of antibiotics and your case of the bubonic plague will clear right up. And even supposing that we agree that these diseases are hard to treat, how can you rule out "answers" like "because we aren't very clever"? I just don't understand what's being asked here.

    The answer gives a bit of a hint about what the question means. It begins "influenza viruses and others continually change over time, usually by mutation." If that's what you're looking for, why not just ask why there's no cure for the common cold?

  9. How old are the oldest fossils on earth?

    Oh boy, another stupid question about how much water there is on the surface of the earth. I guessed a billion years; the answer turns out to be about 3.8 billion years. I think this, like the one about the dinosaurs, is a question motivated by a desire to rule out creationism. But I think it's an inept way of doing so, and the question itself is a loser.

  10. Why do we put salt on sidewalks when it snows?

    Gee, why do we do that? Well, the salt depresses the freezing point of the water, so that it melts at a lower temperature, one, we hope, that is lower than the temperature outside, so that the snow melts. And if it doesn't melt, the salt is gritty and provides some traction when we walk on it.

    But why does the salt depress the freezing point? I don't know; I've never understood this. The answer given in the article is no damn good:

    Adding salt to snow or ice increases the number of molecules on the ground surface and makes it harder for the water to freeze. Salt can lower freezing temperatures on sidewalks to 15 degrees from 32 degrees.

    The second sentence really doesn't add anything at all, and the first one is so plainly nonsense I'm not even sure where to start ridiculing it. (If all that is required is an increase in the number of molecules, why won't it work to add more snow?)

    So let me think. The water molecules are joggling around, bumping into each other, and the snow is a low-energy crystalline state that they would like to fall into. At low temperatures, even when a molecule manages to joggle its way out of the crystal, it's likely to fall back in pretty quickly, and if not there's probably another molecule around that can fall in instead. At lower temperatures, the molecules joggle less, and there's an equilibrium in this in-and-out exchange that results in more ice and less water than at higher temperatures.

    When the salt is around, the salt molecules might fall into the holes in the ice crystal instead, get in the way of the water molecules, and prevent the crystal from re-forming, so that's going to shift the equilibrium in favor of water and against ice. So if you want to reach the same equilibrium that's normally reached at zero degrees, you need to subtract some of the joggling energy, to compensate for the interference of the salt, and that's why the freezing temperature goes down.

    I think that's right, or close to it,and it certainly sounds pretty good, but my usual physics disclaimer applies: While I know next to nothing about physics, I can spin a line of bullshit that sounds plausible enough to fool people, including myself, into believing it.

    (Is there a such a thing as a salt molecule? Or does it really take the form of isolated sodium and chlorine ions? I guess it doesn't matter much in this instance.)

    I think this question is a winner.

    [ Addendum 20060416: Allan Farrell's blog Bento Box has another explanation of this. It seems to me that M. Farrell knows a lot more about it than I do, but also that my own explanation was essentially correct. But there may be subtle errors in my explanantion that I didn't notice, so you may want to read the other one and compare.]

    [ Addendum 20070204: A correspondent at MIT provided an alternative explanantion. ]

So, totaling up, we have three trivial factual questions (amount of earth's surface covered with water, age of fossils, and how many angels can dance on the head of a pin), one giant mystery (information representation in the brain), and six good questions (theory of evolution, length of day and year, why the sky is blue, the rainbow, disease, and freezing point depression). Of these, I think the theory-of-evolution one is probably the best.

The questions overall were a lot better than the answers, which made me wonder if perhaps M. Gawrylewski had written the answers herself.


[Other articles in category /physics] permanent link

The Spite House

Order
New York's Architectural Holdouts
New York's Architectural Holdouts
with kickback
no kickback
The subject of really narrow buildings came up on Reddit last week, and my post about the "Spite House" was well-received. Since pictures of it seem to be hard to come by, I scanned the pictures from New York's Architectural Holdouts by Andrew Alpern and Seymour Durst.

The book is worth checking out, particularly if you are familiar with New York. The canonical architectural holdout occurs when a developer is trying to assemble a large parcel of land for a big building, and a little old lady refuses to sell her home. The book is full of astonishing pictures: skyscrapers built with holdout buildings embedded inside them and with holdout buildings wedged underneath them. Skyscrapers built in the shape of the letter E (with the holdouts between the prongs), the letter C (with the holdout in the cup), and the letter Y (with the holdout in the fork).


Photo credit: Jerry Callen
When Henry Siegel, a New York store owner, got news in 1898 that Macy's was going to build a gigantic new flagship store on Herald Square, he bought the corner lot for $375,000 to screw over his competitors. The Herald Square Macy's still has a notch cut out of its corner; see the picture at right. The Macy's store on Queens Boulevard is in the shape of a perfect circle, except for the little bit cut out of one side where the proverbial old lady (this time named Mary Sendek) refused to sell a 7×15-foot back corner of her lot for $200,000 because she wanted her dog to have a place to play. (Here's a satellite view of the building. The notch is clearly visible at the northwest corner, facing 55th Aveue.)

But anyway, the Spite House. The story, as told by Alpern and Durst, is that around 1882, Patrick McQuade wanted to build some houses on 82nd Street at Lexington Avenue. The adjoining parcel of land, around the corner on Lexington, was owned by Joseph Richardson, shown at left. If McQuade could acquire this parcel, he would be able to extend his building all the way to Lexington Avenue, and put windows on that side of the building. No problem: the parcel was a strip of land 102 feet long and five feet wide along Lexington, useless for any other purpose. Surely Richardson would sell.

McQuade offered $1,000, but Richardson demanded $5,000. Unwilling to pay, McQuade started building his houses anyway, complete with windows looking out on Richardson's five-foot-wide strip, which was unbuildable. Or so he thought.


Richardson built a building five feet wide and 102 feet long, blocking McQuade's Lexington Avenue windows. (Click the pictures for large versions.)

The building soon became known as the "Spite House". The photograph above was taken around 1895. Lexington Avenue is torn up for maintenance in this picture.

Richardson took advantage of a clause in the building codes that allowed him to build bay window extensions in his building. This allowed him to extend its maximum width 2'3" beyond the boundary of the lot. (Alpern and Durst say "In those days, such encroachments on the public sidewalks were not prohibited.") The rooms of the Spite House were in these bay window extensions, connected by extremely narrow hallways:


As you can see, the Spite House was divided into two dwellings, each with a separate entrance, four floors, and two rooms on each floor. The rooms were 7'3" wide and were connected by hallways 3'4" wide.

After construction was completed, Richardson moved into the Spite House and lived there until he died in 1897. The pictures below and at left are from that time.

The edge-on photograph below, showing the Spite House's 3'4" frontage on 82nd Street, was taken in 1912.

The Spite House was demolished in 1915.


Picture credits

The photograph of the Macy's Herald Square store is copyright ©2004 Jerry Callen, and is used with permission.

All other pictures and photographs are in the public domain. I took them from pages 122–124 of the book New York's Architectural Holdouts, by Alpern and Durst. The original sources, as given by Alpern and Durst, are as follows:

Collection of Andrew Alpern.

January 1897 issue of Scientific American.



New York Journal, 5 June 1897
New York Public Service Commission


[Other articles in category /tech] permanent link

Uniquely-decodable codes revisited
[ This is a followup to an earlier article. ]

Alan Morgan wrote to ask if there was a difference between uniquely-decodable (UD) codes for strings and for streams. That is, is there a code for which every finite string is UD, but for which some infinite sequence of symbols has multiple decodings.

I pondered this a bit, and after a little guessing came up with an example: { "a", "ab", "bb" } is UD, because it is a suffix code. But the stream "abbbbbbbbbb..." can be decoded in two ways.

After I found the example, I realized that I shouldn't have needed to guess, because I already knew that you sometimes have to see the last symbol of a string before you can know how to decode it, and in such a code, if there is no such symbol, the decoding must be ambiguous. The code above is UD, but to decode "abbbbbbbbbbbbbbb" you have to count the "b"s to figure out whether the first code word is "a" or "ab".

Let's say that a code is UD+ if it has the property that no two infinite sequences of code words have the same concatenation. Can we characterize the UD+ codes? Clearly, UD+ implies UD, and the example above shows that the converse is not true. A simple argument shows that all prefix codes are UD+. So the question now is, are there UD+ codes that are not prefix codes? I don't know.

[ Addendum 20080303: Gareth McCaughan points out that { "a", "ab" } is UD+ but not prefix. ]


[Other articles in category /CS] permanent link

Sun, 02 Mar 2008

Subtypes and polymorphism

[ Note: You are cautioned that this article is in the oops section of my blog, and so you should understand it as a description of a mistake that I have made. ]

For more than a year now my day job has involved work on a large project written entirely in Java. I warned my employers that I didn't have any professional experience in Java, but they hired me anyway. So I had to learn Java. Really learn it, I mean. I hadn't looked at it closely since version 1.2 or so.

Java 1.5 has parametrized types, which they call "generics". These looked pretty good to me at first, and a big improvement on the cruddy 1975-style type system Java had had before. But then I made a shocking discovery: If General is a subtype of Soldier, then List<General> is not a subtype of List<Soldier>.

In particular, you cannot:

        List<General> listOfGenerals = ...
        List<Soldier> listOfSoldiers = listOfGenerals;
For a couple of weeks I went around muttering to myself about what idiots the Java people must be. "Geez fuckin' Louise," I said. "The Hindley-Milner languages have had this right for twenty years. How hard would it have been for those Java idiots to pick up a damn book or something?"

I was, of course, completely wrong in all respects. The assignment above leads to problems that are obvious if you think about it a bit, and that should have been obvious to me, and would have been, except that I was so busy exulting in my superiority to the entire Java community that I didn't bother to think about it a bit.

You would like to be able to do this:

        Soldier someSoldier = ...;
        listOfSoldiers.add(someSoldier);
But if you do, you are setting up a type failure:

        General someGeneral = listOfGenerals.getLast();
        someGeneral.orderAttack(...);
Here listOfSoldiers and listOfGenerals refer to the same underlying object, so we put a common soldier into that object back on line 4, and then took it out again on line 5. But line 5 is expecting it to be a General, and it is not. So we either have a type failure right there, or else we have a General variable that holds a a Soldier object, and then on line 6 a mere private is allowed to order an attack, causing a run-time type failure. If we're lucky.

The language designers must forbid one of these operations, and the best choice appears to be to forbid the assignment of the List<General> object to the List<Soldier> variable on line 2. The other choices seem clearly much worse.

The canonical Java generics tutorial has an example just like this one, to explain precisely this feature of Java generics. I would have known this, and I would have saved myself two weeks of grumbling, if I had picked up a damn book or something.

Furthermore, my premise was flawed. The H-M languages (SML, Haskell, Miranda, etc.) have not had this right for twenty years. It is easy to get right in the absence of references. But once you add references the problem becomes notoriously difficult, and SML, for example, has gone through several different strategies for dealing with it, as the years passed and more was gradually learned about the problem.

The naive approach for SML is simple. It says that if α is any type, then there is a type ref α, which is the type of a reference that refers to a storage cell that contains a value of type α. So for example ref int is the type of a reference to an int value. There are three functions for manipulating reference types:

        ref    : α  ref α
        !      : ref α  α
        :=     : (ref α * α)  unit
The ref function takes a value and produces a reference to it, like & in C; if the original value had type α then the result has type ref α. The ! function takes a reference of type ref α and dereferences it, returning the value of type α that it refers to, like * in C. And the := function, usually written infix, takes a reference and a value, stores the value into the place that the reference points to, replacing what was there before, and returns nothing. So for example:

        val a = "Kitty cat";    (*   a : string       *)
        val r = ref a;          (*   r : ref string   *)
        r := "Puppy dog";
        print !r;
This prints Puppy dog. But this next example fails, as you would hope and expect:

        val a = "Kitty cat";
        val r = ref a;
        r := 37;               (* fails *)
because r has type ref string, but 37 has type int, and := requires that the type of the value on the right match the type referred to by the reference on the left.

That is the obvious, naive approach. What goes wrong, though? The canonical example is:

        fun id x = x             (*   id : α → α         *)
        val a = ref id;          (*   a : ref (α → α)    *)

        fun not true = false
          | not false = true ;   (*   not: bool  bool   *)
        a := not;

        (!a) 13                  
The key here is that a is a variable of type ref (α → α), that is, a reference to a cell that can hold a function whose argument is any type α and whose return value is the same type. Here it holds a reference to id, which is the identity function.

Then we define a logical negation function, not, which has type bool → bool, that is, it takes a boolean argument and returns a boolean result. Since this is a subtype of α → α, we can store this function in the cell referenced by a.

Then we dereference a, recovering the value it points to, which, since the assignment, is the not function. But since a has type ref (α → α), !a has type α → α, and so should be applicable to any value. So the application of !a to the int value 13 passes the type checker, and SML blithely applies the not function to 13.

I've already talked about SML way longer than I planned to, and I won't belabor you further with explanations of the various schemes that were hatched over the years to try to sort this out. Suffice it to say that the problem is still an open research area.

Java, of course, is all references from top to bottom, so this issue obtrudes. The Java people do not know the answer either.

The big error that I made here was to jump to the conclusion that the Java world must be populated with idiots who know nothing about type theory or Haskell or anything else that would have tipped them off to the error I thought they had committed. Probably most of them know nothing about that stuff, but there are a lot of them, and presumably some of them have a clue, and perhaps some of them even know a thing or two that I don't. I said a while back that people who want to become smarter should get in the habit of assuming that everything is more complex than they imagine. Here I assumed the opposite.

As P.J. Plauger once said in a similar circumstance, there is a name for people who are so stupid that they think everyone else is stupid instead.

Maybe I won't be that person next time.


[Other articles in category /oops] permanent link

The missing deltahedron
I recently wrote about the convex deltahedra, which are the eight polyhedra whose faces are all congruent equilateral triangles:

NameFacesEdgesVertices
Tetrahedron 464
Triangular dipyramid 695
Octahedron 8126
Pentagonal dipyramid 10157
Snub disphenoid 12188
Triaugmented triangular prism 14219
Gyroelongated square dipyramid 162410
Icosahedron 203012
The names are rather horrible, so I think that from now on I'll just refer to them as D4, D6, D8, D10, D12, D14, and D20.

The number of edges that meet at a vertex is its valence. Vertices in convex deltahedra have valences of 3, 4, or 5. The valence can't be larger than 5 because only six equilateral triangles will fit, and if you fit 6 then they lie flat and the polyhedron is not properly convex.

Let V3, V4, and V5 be the number of vertices of valences 3, 4, and 5, respectively. Then:

WhatV3V4V5
D44  
D623 
D8 6 
D10 52
D12 44
D14 36
D16 28
D20  12
There's a clear pattern here, with V3s turning into V4s two at a time until you reach the octahedron (D8) and then V4s turning into V5s one at a time until you reach the icosahedron (D20). But where is V4=1, V5=10? There's a missing deltahedron. I don't mean it's missing from the table; I mean it's missing from the universe.

Well, this is all oversubtle, I realized later, because you don't need to do the V3V4V5 analysis to see that something is missing. There are convex deltahedra with 4, 6, 8, 10, 12, 14, and 20 faces; what happened to 18?

Still, I did a little work on a more careful analysis that might shed some light on the 18-hedron situation. I'm still in the middle of it, but I'm trying to continue my policy of posting more frequent, partial articles.

Let V be the number of vertices in a convex deltahedron, E be the number of edges, and F be the number of faces.

We then have V = V3 + V4 + V5. We also have E = ½(3V3 + 4V4 + 5V5). And since each face has exactly 3 edges, we have 3F = 2E.

By Euler's formula, F + V = E + 2. Plugging in the stuff from the previous paragraph, we get 3V3 + 2V4 + V5 = 12.

It is very easy to enumerate all possible solutions of this equation. There are 19:

V3V4V5What
400D4
311
303
230D6
222
214
206
141
133
125
117
109
060D8
052D10
044D12
036D14
028D16
0110
0012D20
Solutions in green correspond to convex deltahedra. What goes wrong with the other 11 items?

(3,1,1) fails completely because to have V5 > 0 you need V ≥ 6. There isn't even a graph with (V3, V4, V5) = (3,1,1), much less a polyhedron.

There is a graph with (3,0,3), but it is decidedly nonplanar: it contains K3,3, plus an additional triangle. But the graph of any polyhedron must be planar, because you can make a little hole in one of the faces of the polyhedron and flatten it out without the edges crossing.

Another way to think about (3,0,3) is to consider it as a sort of triangular tripyramid. Each of the V5s shares an edge with each of the other five vertices, so the three V5s are all pairwise connected by edges and form a triangle. Each of the three V3s must be connected to each of the three vertices of this triangle. You can add two of the required V3s, by erecting a triangular pyramid on the top and the bottom of the triangle. But then you have nowhere to put the third pyramid.

On Thursday I didn't know what went wrong with (2,2,2); it seemed fine. (I found it a little challenging to embed it in the plane, but I'm not sure if it would still be challenging if it hadn't been the middle of the night.) I decided that when I got into the office on Friday I would try making a model of it with my magnet toy and see what happened.

It turned out that nothing goes wrong with (2,2,2). It makes a perfectly good non-convex deltahedron. It's what you get when you glue together three tetrahedra, face-to-face-to-face. The concavity is on the underside in the picture.


(2,0,6) was a planar graph too, and so the problem had to be geometric, not topological. When I got to the office, I put it together. It also worked fine, but the result is not a polyhedron. The thing you get could be described as a gyroelongated triangular dipyramid. That is, you take an octahedron and glue tetrahedra to two of its opposite faces. But then the faces of the tetrahedra are coplanar with the faces of the octahedron to which they abut, and this is forbidden in polyhedra. When that happens you're supposed to eliminate the intervening edge and consider the two faces to be a single face, a rhombus in this case. The resulting thing is not a polyhedron with 12 triangular faces, but one with six rhombic faces (a rhombohedron), essentially a squashed cube. In fact, it's exactly what you get if you make a cube from the magnet toy and then try to insert another unit-length rod into the diagonal of each of the six faces. You have to squash the cube to do this, of course, since the diagonals had length √2 before and length 1 after.


So there are several ways in which the triples (V3,V4,V5) can fail to determine a convex deltahedron: There is an utter topological failure, as with (3,1,1).

There is a planarity failure, which is also topological, but less severe, as with (3,0,3). (3,0,3) also fails because you can't embed it into R3. (I mean that you cannot embed its 3-skeleton. Of course you can embed its 1-skeleton in R3, but that is not sufficient for the thing to be a polyhedron.) I'm not sure if this is really different from the previous failure; I need to consider more examples. And (3,0,3) fails in yet another way: you can't even embed its 1-skeleton in R3 without violating the constraint that says that the edges must all have unit length. The V5s must lie at the vertices of an equilateral triangle, and then the three unit spheres centered at the V5s intersect at exactly two points of R3. You can put two of the V3s at these points, but this leaves nowhere for the third V3. Again, I'm not sure that this is a fundamentally different failure mode than the other two.

Another failure mode is that the graph might be embeddable into R3, and might satisfy the unit-edge constraint, but in doing so it might determine a concave polyhedron, like (2,2,2) does, or a non-polyhedron, like (2,0,6) does.

I still have six (V3,V4,V5) triples to look into. I wonder if there are any other failure modes?

I should probably think about (0,1,10) first, since the whole point of all this was to figure out what happened to D18. But I'm trying to work up from the simple cases to the harder ones.

I suppose the next step is to look up the proof that there are only eight convex deltahedra and see how it goes.

I suspect that (2,1,4) turns out to be nonplanar, but I haven't looked at it carefully enough to actually find a forbidden minor.

One thing that did occur to me today was that a triple (V3, V4, V5) doesn't necessarily determine a unique graph, and I need to look into that in more detail. I'll be taking a plane trip on Sunday and I plan to take the magnet toy with me and continue my investigations on the plane.

In other news, Iris and I went to my office this evening to drop off some books and pick up some stuff for the trip, including the magnet toy. Iris was very excited when she saw the collection of convex deltahedron models on my desk, each in a different color, and wanted to build models just like them. We got through all of them, except D10, because we ran out of ball bearings. By the end Iris was getting pretty good at building the models, although I think she probably wouldn't be able to do it without directions yet. I thought it was good work, especially for someone who always skips from 14 to 16 when she counts.

On the way home in the car, we were talking about how she was getting older and I rhapsodized about how she was learning to do more things, learning to do the old things better, learning to count higher, and so on. Iris then suggested that when she is older she might remember to include 15.


[Other articles in category /math] permanent link

Lazy square roots of power series
Lately for various reasons I have been investigating the differential equation:

$$(f(x))^2 + (f'(x))^2 = 1$$

where f' is the derivative of f. This equation has a couple of obvious solutions (f(x) = 1; f(x) = sin(x)) and a bunch of not-so-obvious ones. Since I couldn't solve the equation symbolically, I decided to fall back on power series. Representing f(x) as a0 + a1x + a2x2 + ... one can manipulate the power series and solve for a0, a1, a2, etc. In fact, this is exactly the application for which mathematicians first became intersted in power series. The big question is "once you have found a0, a1, etc., do these values correspond to a real function? And for what x does the power series expression actually make sense?" This question, springing from a desire to solve intractable differential equations, motivates a lot of the theoretical mathematics of the last hundred and fifty years.

Order
Higher-Order Perl
Higher-Order Perl
with kickback
no kickback
I decided to see if I could use the power series methods of chapter 6 of Higher-Order Perl to calculate a0, etc. So far, not yet, although I am getting closer. The key is that if $series is the series you want, and if you can calculate at least one term at the front of the series, and then express the rest of $series in terms of $series, you win. For example:

        # Perl
        my $series;
        $series = node(1, promise { scale(2, $series) } );
This is perfectly well-defined code; it runs fine and sets $series to be the series [1,2,4,8,16...]. In Haskell this is standard operating procedure:

        -- Haskell
        series = 1 : scale 2 series
But in Perl it's still a bit outré.

Similarly, the book shows, on page 323, how to calculate the reciprocal of a series s. Any series can be expressed as the sum of the first term and the rest of the terms:

s = head(s) + x·tail(s)
Now suppose that r = 1/s.

r = head(r) + x·tail(r)
we have:

rs = 1

(head(r) + x·tail(r))(head(s) + x·tail(s)) = 1

head(r)head(s) + x·head(r)·tail(s) + x·tail(r)·head(s) + x2·tail(r)tail(s) = 1

This shows (equating the constant terms on both sides) that head(r) = 1/head(s). And equating the non-constant terms then gives:

x·(1/head(s))·tail(s) + x·tail(r)·head(s) + x2·tail(r)tail(s) = 0

(1/head(s))·tail(s) + tail(r)·head(s) + x·tail(r)tail(s) = 0

tail(r) = (-1/head(s))·tail(s) / (head(s) + x·tail(s))

tail(r) = (-1/head(s))·tail(s) / s

tail(r) = (-1/head(s))·tail(sr

and we win. This same calculation appears on page 323, in a somewhat more confused form. (Also, I considered only the special case where head(s) = 1.) The code works just fine.

To solve the differential equation f2 + (f')2 = 1, I want to do something like this:

$$f = \sqrt{1 - {(f')}^{2}}$$

so I need to be able to take the square root of a power series. This does not appear in the book, and I have not seen it elsewhere. Here it is.

Say we want r2 = s, where s is known. Then write, as usual:

s = head(s) + x·tail(s)
r = head(r) + x·tail(r)
as before, and, since r2 = s, we have:

(head(r))2 + 2x head(r) tail(r) + x2(tail(r))2 = head(s) + x·tail(s)
so, equating coefficients on both sides, (head(r))2 = head(s), and head(r) = √(head(s)).

Subtracting the head(s) from both sides, and dividing by x:

2·head(r) tail(r) + x·(tail(r))2 = tail(s)

tail(r)·(2·head(r) + x·tail(r)) = tail(s)

tail(r)·(head(r) + r) = tail(s)

tail(r) = tail(s) / (√(head(s)) + r)

and we win. Or rather, we win once we write the code, which would be something like this:

        # Perl
        sub series_sqrt {
          my $s = shift;
          my ($s0, $st) = (head($s), tail($s));
          my $r0 = sqrt($s0);
          my $r;
          $r  = node($r0, 
                      promise {
                        divide($st,
                               add2(node($r0, undef),
                                    $r))
                      });
          return $r;
        }
I confess I haven't tried this in Perl yet, but I have high confidence that it will work. I actually did the implementation in Haskell:

        -- Haskell
        series_sqrt (s0:st) = r 
           where r  = r0 : (divide st (add [r0] r))
                 r0 = sqrt(s0)
And when I asked it for the square root of [1,1,0,0,0,...] (that is, of 1+x) it gave me back [1, 0.5, -0.125, -0.0625, ...], which is indeed correct.

The Perl code is skankier than I wish it were. A couple of years ago I said in an interview that "I wish Perl's syntax were less verbose." Some people were surprised by this at the time, since Perl programmers consider Perl's syntax to be quite terse. But comparison of the Perl and Haskell code above demonstrates the sort of thing I mean.

Part of ths issue here, of course, is that the lazy list data structure is built in to Haskell, but I have to do it synthetically in Perl, and so every construction of a lazy list structure in Perl is accompanied by a syntactic marker (such as node(...) or promise { ... }) that is absent, or nearly absent, from the Haskell.

But when I complained about Perl's verbose syntax in 2005, one thing I had specifically in mind was Perl's argument acquisition syntax, here represented by my $s = shift;. Haskell is much terser, with no loss of expressiveness. Haskell gets another win in the automatic destructuring bind: instead of explicitly calling head() and tail() to acquire the values of s0 and st, as in the Perl code, they are implicitly called by the pattern match (s0:st) in the Haskell code, which never mentions s at all. It is quite fair to ascribe this to a failure of Perl's syntax, since there's no reason in principle why Perl couldn't support this, at least for built-in data structures. For example, consider the Perl code:

        sub blah {
          my $href = shift();
          my $a = $href->{this};
          my $tmp = $href->{that};
          my $b = $tmp->[0];
          my $c = $tmp->[2];

          # Now do something with $a, $b, $c
        }
It would be much more convenient to write this as:
        sub blah {
          my { this => $a, that => [$b, undef, $c] } = shift();

          # Now do something with $a, $b, $c
        }
This is a lot easier to understand.

There are a number of interesting user-interface issues to ask about here: What if the assigned value is not in the expected form? Are $a, $b, and $c copied from $href or are they aliases into it? And so on. One easy way to dispense with all of these interesting questions (perhaps not in the best way) is to assert that this notation is just syntactic sugar for the long version.

I talked to Chip Salzenberg about this at one time, and he said he thought it would not be very hard to implement. But even if he was right, what is not very hard for Chip Salzenberg to do can turn out to be nearly impossible for us mortals.

[ Addendum 20071209: There's a followup article that shows several different ways of solving the differential equation, including the power-series method. ]

[ Addendum 20071210: I did figure out how to get Haskell to solve the differential equation. ]


[Other articles in category /math] permanent link

Imaginary units
Yesterday I had a phenomenally annoying discussion with the pedants on the IRC #math channel. Someone was talking about square roots, and for some reason I needed to point out that when you are considering square roots of negative numbers, it is important not to forget that there are two square roots.

I should back up and discuss square roots in more detail. The square root of x, written √x, is defined to be the number y such that y2 = x. Well, no, that actually contains a subtle error. The error is in the use of the word "the". When we say "the number y such that...", we imply that there is only one. But every number (except zero) has two square roots. For example, the square roots of 16 are 4 and -4. Both of these are numbers y with the property that y2 = 16.

In many contexts, we can forget about one of the square roots. For example, in geometry problems, all quantities are positive. (I'm using "positive" here to mean "≥ 0".) When we consider a right triangle whose legs have lengths a and b, we say simply that the hypotenuse has length √(a2 + b2), and we don't have to think about the fact that there are actually two square roots, because one of them is negative, and is nonsensical when discussing hypotenuses. In such cases we can talk about the square root function, sqrt(x), which is defined to be the positive number y such that y2 = x. There the use of "the" is justified, because there is only one such number. But pinning down which square root we mean has a price: the square root function applies only to positive arguments. We cannot ask for sqrt(-1), because there is no positive number y such that y2 = -1. For negative arguments, this simplification is not available, and we must fall back to using √ in its full generality.

In high school algebra, we all learn about a number called i, which is defined to be the square root of -1. But again, the use of the word "the" here is misleading, because "the" square root is not unique; -1, like every other number (except 0) has two square roots. We cannot avail ourselves of the trick of taking the positive one, because neither root is positive. And in fact there is no other trick we can use to distinguish the two roots; they are mathematically indistinguishable.

The annoying discussion was whether it was correct to say that the two roots are mathematically indistinguishable. It was annoying because it's so obviously true. The number i is, by definition, a number such that i2 = -1. This is its one and only defining property. Since there is another number which shares this single defining property, it stands to reason that this other root is completely interchangeable with i—mathematically indistinguishable from it, in other words.

This other square root is usually written "-i", which suggests that it's somehow secondary to i. But this is not the case. Every numerical property possessed by i is possessed by -i as well. For example, i3 = -i. But we can replace i with -i and get (-i)3 = -(-i), which is just as true. Euler's famous formula says that eix = cos x + i sin x. But replacing i with -i here we get e-ix = cos x + -i sin x, which is also true.

Well, one of them is i, and the other is -i, so can't you distinguish them that way? No; those are only expressions that denote the numbers, not the numbers themselves. There is no way to know which of the numbers is denoted by which expression, and, in fact, it does not even make much sense to ask which number is denoted by which expression, since the two numbers are entirely interchangeable. One is i, and one is -i, sure, but this is just saying that one is the negative of the other. But so too is the other the negative of the one.

One of the #math people pointed out that there is a well-known Im() function, the "imaginary part" function, such that Im(i) = 1, but Im(-i) = -1, and suggested, rather forcefully, that they could be distinguished that way. This, of course, is hopeless. Because in order to define the "imaginary part" function in the first place, you must start by making an entirely arbitrary choice of which square root of -1 you are using as the unit, and then define Im() in terms of this choice. For example, one often defines Im(z) as $z - \bar{z} \over 2i$. But in order to make this definition, you have to select one of the imaginary units and designate it as i and use it in the denominator, thus begging the question. Had you defined Im() with -i in place of i, then Im(i) would have been -1, and vice versa.

Similarly, one #math inhabitant suggested that if one were to define the complex numbers as pairs of reals (a, b), such that (a, b) + (c, d) = (a + c, b + d), (a, b) × (c, d) = (ac - bd, ad + bc), then i is defined as (0,1), not (0,-1). This is even more clearly begging the question, since the definition of i here is solely a traditional and conventional one; defining i as (0, -1) instead of (0,1) works exactly as well; we still have i2 = -1 and all the other important properties.

As IRC discussions do, this one then started to move downwards into straw man attacks. The #math folks then argued that i ≠ -i, and so the two numbers are indeed distinguishable. This would have been a fine counterargument to the assertion that i = -i, but since I was not suggesting anything so silly, it was just stupid. When I said that the numbers were indistinguishable, I did not mean to say that they were numerically equal. If they were, then -1 would have only one square root. Of course, it does not; it has two unequal, but entirely interchangeable, square roots.

The that the square roots of -1 are indistinguishable has real content. 1 has two square roots that are not interchangeable in this way. Suppose someone tells you that a and b are different square roots of 1, and you have to figure out which is which. You can do that, because among the two equations a2 = a, b2 = b, only one will be true. If it's the former, then a=1 and b=-1; if the latter, then it's the other way around. The point about the square roots of -1 is that there is no corresponding criterion for distinguishing the two roots. This is a theorem. But the result is completely obvious if you just recall that i is merely defined to be a square root of -1, no more and no less, and that -1 has two square roots.

Oh well, it's IRC. There's no solution other than to just leave. [ Addenda: Part 2 Part 3 Part 4 Part 5 ]


[Other articles in category /math] permanent link

Sat, 01 Mar 2008

More rational roots of polynomials
I have a big file of ideas for blog articles, and when I feel like writing but I can't think of a topic, I look over the file. An item from last April was relevant to yesterday's article about finding rational roots of polynomials. It's a trick I saw in the first edition (1768!) of the Encyclopædia Britannica.

Suppose you have a polynomial P(x) = xn + ...+ p = 0. If it has a rational root r, this must be an integer that divides p = P(0). So far so good.

But consider P(x-1). This is a different polynomial, and if r is a root of P(x), then r+1 is a root of P(x-1). So, just as r must divide P(0), r+1 must divide P(-1). And similarly, r-1 must divide P+1.

So we have an extension of the rational root theorem: instead of guessing that some factor r of P(0) is a root, and checking it to see, we first check to see if r+1 is a factor of P(-1), and if r-1 is a factor of P(1), and proceed with the full check only if these two auxiliary tests pass.

My notes conclude with:

Is this really less work than just trying all the divisors of P(0) directly?
Let's find out.

As in the previous article, say P(x) = 3x2 + 6x - 45. The method only works for monic polynomials, so divide everything by 3. (It can be extended to work for non-monic polynomials, but the result is just that you have to divide everything by 3, so it comes to the same thing.) So we consider x2 + 2x - 15 instead. Say r is a rational root of P(x). Then:

r-1 divides P(1) = -12
r divides P(0) = -15
r+1 divides P(-1)= -16

So we need to find three consecutive integers that respectively divide 12, 15, and 16. The Britannica has no specific technique for this; it suggests doing it by eyeball. In this case, 2–3–4 jumps out pretty quickly, giving the root 3, and so does 6–5–4, which is the root -5. But the method also yields a false root: 4–3–2 suggests that -3 might be a root, and it is not.

Let's see how this goes for a harder example. I wrote a little Haskell program that generated the random polynomial x4 - 26x3 + 240 x2 - 918x + 1215.

r-1 divides P(1) = 512 = 29
r divides P(0) = 1215= 35·5
r+1 divides P(-1)= 2400= 25·3·52

That required a fair amount of mental arithmetic, and I screwed up and got 502 instead of 512, which I only noticed because 502 is not composite enough; but had I been doing a non-contrived example, I would not have noticed the error. (Then again, I would have done the addition on paper instead of in my head.) Clearly this example was not hard enough because 2–3–4 and 4–5–6 are obviously solutions, and it will not always be this easy. I increased the range on my random number generator and tried again.

The next time, it came up with the very delightful polynomial x4 - 2735x3 + 2712101 x2 - 1144435245x + 170960860950, and I decidedd not going to go any farther with it. The table values are easy to calculate, but they will be on the order of 170960860950, and I did not really care to try to factor that.

I decided to try one more example, of intermediate difficulty. The program first gave me x4 - 25x3 + 107 x2 - 143x + 60, which is a lucky fluke, since it has a root at 1. The next example it produced had a root at 3. At that point I changed the program to generate polynomials that had integer roots between 10 and 20, and got x4 - 61x3 + 1364 x2 - 13220x + 46800.

r-1 divides P(1) = 34864 = 22·33·17·19
r divides P(0) = 46800= 24·32·52·13
r+1 divides P(-1)= 61446= 2·3·72·11·19

This is just past my mental arithmetic ability; I got 34884 instead of 34864 in the first row, and balked at factoring 61446 in my head. But going ahead (having used the computer to finish the arithmetic), the 17 and 19 in the first and last rows are suggestive, and there is indeed a 17–18–19 to be found. Following up on the 19 in the first row suggests that we look for 19–20–21, which there is, and following up on the 11 in the last row, hoping for a 9–10–11, finds one of those too. All of these are roots, and I do have to admit that I don't know any better way of discovering that. So perhaps the method does have some value in some cases. But I had to work hard to find examples for which it made sense. I think it may be more reasonable with 18th-century technology than it is with 21st-century technology.


[Other articles in category /math] permanent link

Algebra techniques that don't work, except when they do
In Problems I Can't Fix in the Lecture Hall, Rudbeckia Hirta describes the efforts of a student to solve the equation 3x2 + 6x - 45 = 0. She describes "the usual incorrect strategy selected by students who can't do algebra":

3x2 + 6x - 45 = 0
3x2 + 6x = 45
x(3x + 6) = 45

She says "I stopped him before he factored out the x.".

I was a bit surprised by this, because the work so far seemed reasonable to me. I think the only mistake was not dividing the whole thing by 3 in the first step. But it is not too late to do that, and even without it, you can still make progress. x(3x + 6) = 45, so if there are any integer solutions, x must divide 45. So try x = ±1, ±3, ±5, ±9, ±15 in roughly that order. (The "look for the wallet under the lamppost" principle.) x = 3 solves the equation, and then you can get the other root, x=-5, by further application of the same method, or by dividing the original polynomial by x-3, or whatever.

If you get rid of the extra factor of 3 in the first place, the thing is even easier, because you have x(x + 2) = 15, so x = ±1, ±3, or ±5, and it is obviously solved by x=3 and x=-5.

Now obviously, this is not always going to work, but it works often enough that it would have been the first thing I would have tried. It is a lot quicker than calculating b2 - 4ac when c is as big as 45. If anyone hassles you about it, you can get them off your back by pointing out that it is an application of the so-called rational root theorem.

But probably the student did not have enough ingenuity or number sense to correctly carry off this technique (he didn't notice the 3), so that M. Hirta's advice to just use the damn quadratic formula already is probably good.

Still, I wonder if perhaps such students would benefit from exposure to this technique. I can guess M. Hirta's answer to this question: these students will not benefit from exposure to anything.

[ Addendum 20080228: Robert C. Helling points out that I could have factored the 45 in the first place, without any algebraic manipulations. Quite so; I completely botched my explanation of what I was doing. I meant to point out that once you have x(x+2) = 15 and the list [1, 3, 5, 15], the (3,5) pair jumps out at you instantly, since 3+2=5. I spent so much time talking about the unreduced polynomial x(3x+6) that I forgot to mention this effect, which is much less salient in the case of the unreduced polynomial. My apologies for any confusion caused by this omission. ]

[ Addendum 20080301: There is a followup to this article. ]


[Other articles in category /math] permanent link

Fri, 29 Feb 2008

Happy Leap Day! Persian edition
Roland Young has brought to my attention that the Persian calendar uses a hybrid 7/29 and 8/33 system. I was going to post this as an addendum to today's Leap Day article, but it got too long.

If I understand the rules correctly, to determine if a Persian year is a leap year, one applies the following algorithm to the Persian year number y. (Note that the current Persian year is not 2008, but 1386. Persian year 1387 will begin on the vernal equinox.) I will write a % b to denote the remainder when a is divided by b. Then:

  1. Let a = (y + 2345) % 2820.
  2. If a is 2819, y is a leap year. Otherwise,
  3. Let b = a % 128.
  4. If b < 29, let c = b. Otherwise, let c = (b - 29) % 33.
  5. If c = 0, y is not a leap year. Otherwise,
  6. If c is a multiple of 4, y is a leap year. Otherwise,
  7. y is not a leap year.
(Perl source code is available.)

This produces 683 leap years out of every 2820, which means that the average calendar year is 365.24219858 days.

How does this compare with the Dominus calendar? It is indeed more accurate, but I consider 683/2820 to be an unnecessarily precise representation of the vernal equinox year, especially inasmuch as the length of the year is changing. And the rule, as you see, is horrendous, requiring either a 2,820-entry lookup table or complicated logic.

Moreover, the Persian and Gregorian calendar are out of sync at present. Persian year 1387, which begins next month on the vernal equinox, is a leap year. But the intercalation will not take place until the last day of the year, around 21 March 2009. The two calendars will not sync up until the year 2092/1470, and then will be confounded only eight years later by the Gregorian 100-year exception. After that they will agree until 2124/1502. Clearly, even if it were advisable to switch to the Persian calendar, the time is not yet right.

I found this Frequently Asked Questions About Calendars page extremely helpful in preparing this article. The Wikipedia article was also useful. Thanks again to Roland Young for bringing this matter to my attention.


[Other articles in category /calendar] permanent link

Note on point-free programming style
This old comp.lang.functional article by Albert Y. C. Lai, makes the point that Unix shell pipeline programming is done in an essentially "point-free" style, using the shell example:

    grep '^X-Spam-Level' | sort | uniq | wc -l
and the analogous Haskell code:

    length . nub . sort . filter (isPrefixOf "X-Spam-Level")
Neither one explicitly mentions its argument, which is why this is "point-free". In "point-free" programming, instead of defining a function in terms of its effect on its arguments, one defines it by composing the component functions themselves, directly, with higher-order operators. For example, instead of:

  foo x y = 2 * x + y
one has, in point-free style:

  foo = (+) . (2 *)
where (2 *) is the function that doubles its argument, and (+) is the (curried) addition function. The two definitions of foo are entirely equivalent.

As the two examples should make clear, point-free style is sometimes natural, and sometimes not, and the example chosen by M. Lai was carefully selected to bias the argument in favor of point-free style.

Often, after writing a function in pointful style, I get the computer to convert it automatically to point-free style, just to see what it looks like. This is usually educational, and sometimes I use the computed point-free definition instead. As I get better at understanding point-free programming style in Haskell, I am more and more likely to write certain functions point-free in the first place. For example, I recently wrote:

        soln = int 1 (srt (add one (neg (sqr soln))))
and then scratched my head, erased it, and replaced it with the equivalent:

        soln = int 1 ((srt . (add one) . neg . sqr) soln)
I could have factored out the int 1 too:
        soln = (int 1 . srt . add one . neg . sqr) soln
I could even have removed soln from the right-hand side:

        soln = fix (int 1 . srt . add one . neg . sqr)
but I am not yet a perfect sage.

Sometimes I opt for an intermediate form, one in which some of the arguments are explicit and some are implicit. For example, as an exercise I wrote a function numOccurrences which takes a value and a list and counts the number of times the value occurs in the list. A straightforward and conventional implementation is:

        numOccurrences x []     = 0
        numOccurrences x (y:ys) = 
                if (x == y) then 1 + rest
                else                 rest
            where rest = numOccurrences x ys
but the partially point-free version I wrote was much better:

        numOccurrences x = length . filter (== x)
Once you see this, it's easy to go back to a fully pointful version:

        numOccurrences x y = length (filter (== x) y)
Or you can go the other way, to a point-free version:

        numOccurrences = (length .) . filter . (==)
which I find confusing.

Anyway, the point of this note is not to argue that the point-free style is better or worse than the pointful style. Sometimes I use the one, and sometimes the other. I just want to point out that the argument made by M. Lai is deceptive, because of the choice of examples. As an equally biased counterexample, consider:

        bar x = x*x + 2*x + 1
which the automatic converter informs me can be written in point-free style as:

        bar = (1 +) . ap ((+) . join (*)) (2 *)
Perusal of this example will reveal much to the attentive reader, including the definitions of join and ap. But I don't think many people would argue that it is an improvement on the original. (Maybe I'm wrong, and people would argue that it was an improvement. I won't know for sure until I have more experience.)

For some sort of balance, here is another example where I think the point-free version is at least as good as the pointful version: a recent comment on Reddit suggested a >>> operator that composes functions just like the . operator, but in the other order, so that:

        f >>> g = g . f
or, if you prefer:

        >>> f g x = g(f(x))
The point-free definition of >>> is:

        (>>>) = flip (.)
where the flip operator takes a function of two arguments and makes a new function that does the same thing, but with the arguments in the opposite order. Whatever your feelings about point-free style, it is undeniable that the point-free definition makes perfectly clear that >>> is nothing but . with its arguments in reverse order.


[Other articles in category /prog/haskell] permanent link

Happy Leap Day!
I have an instructive followup to yesterday's article all ready to go, analyzing a technique for finding rational roots of polynomials that I found in the First Edition of the Encyclopædia Britannica. A typically Universe-of-Discourse kind of article. But I'm postponing it to next month so that I can bring you this timely update.

Everyone knows that our calendar periodically contains an extra day, known to calendar buffs as an "intercalary day", to help make it line up with the seasons, and that this intercalary day is inserted at the end of February. But, depending on how you interpret it, this isn't so. The extra day is actually inserted between February 23 and February 24, and the rest of February has to move down to make room.

I will explain. In Rome, 23 February was a holiday called Terminalia, sacred to Terminus, the god of boundary markers. Under the calendars of the Roman Republic, used up until 46 BCE, an intercalary month, Mercedonius, was inserted into the calendar from time to time. In these years, February was cut down to 23 days (and good riddance; nobody likes February anyway) and Mercedonius was inserted at the end.

When Julius Caesar reformed the calendar in 46, he specified that there would be a single intercalary day every four years much as we have today. As in the old calendar, the intercalary day was inserted after Terminalia, although February was no longer truncated.

So the extra day is actually 24 February, not 29 February. Or not. Depends on how you look at it.

Scheduling intercalary days is an interesting matter. The essential problem is that the tropical year, which is the length of time from one vernal equinox to the next, is not an exact multiple of one day. Rather, it is about 365¼ days. So the vernal equinox moves relative to the calendar date unless you do something to fix it. If the tropical year were exactly 365¼ days long, then four tropical years would be exactly 1461 days long, and it would suffice to make four calendar years 1461 days long, to match. This can be accomplished by extending the 365-day calendar year with one intercalary day every four years. This is the Julian system.

Unfortunately, the tropical year is not exactly 365¼ days long. It is closer to 365.24219 days long. So how many intercalary days are needed?

It suffices to make 100,000 calendar years total exactly 36,524,219 days, which can be accomplished by adding a day to 24,219 years out of every 100,000. But this requires a table with 100,000 entries, which is too complicated.

We would like to find a system that requires a simpler table, but which is still reasonably accurate. The Julian system requires a table with 4 entries, but gives a calendar year that averages 365.25 days long, which is 0.00781 too many. Since this is about 1/128 day, the Julian calendar "gains a day" every 128 years or so, which means that the vernal equinox slips a day earlier every 128 years, and eventually the daffodils and crocuses are blooming in January.

Not everyone considers this a problem. The Islamic calendar is only 355 days long, and so "loses" 10 days per year, which means that after 18 years the Islamic new year has moved half a year relative to the seasons. The annual Islamic holy month of Ramadan coincided with July-August in 1980 and with January-February in 1997. The Muslims do intercalate, but they do it to keep the months in line with the phases of the moon.

Still, supposing that we do consider this a problem, we would like to find an intercalation scheme that is simple and accurate. This is exactly the problem of finding a simple rational approximation to 0.24219. If p/q is close to 0.24219, then one can introduce p intercalary days every q years, and q is the size of the table required. The Julian calendar takes p/q = 1/4 = 0.25, for an error around 1/128. The Gregorian calendar takes p/q = 97/400 = 0.2425, for an error of around 1/3226. Again, this means that the Gregorian calendar gains a day on the seasons every 3,226 years or so. Can we do better?

Any time the question is "find a simple rational approximation to a number" the answer is likely to involve continued fractions. 365.24219 is equal to:

 $$ 365 + {1\over \displaystyle 4 + {\strut 1\over\displaystyle 7 + {\strut 1\over\displaystyle 1 + {\strut 1\over\displaystyle 3 + {\strut 1\over\displaystyle 24 + {\strut 1\over\displaystyle 6 + \cdots }}}}}}$$

which for obvious reasons, mathematicians abbreviate to [365; 4, 7, 1, 3, 24, 6, 2, 2]. This value is exact. (I had to truncate the display above because of a bug in my TeX formula tool: the full fraction goes off the edge of the A0-size page I use as a rendering area.)

As I have mentioned before, the reason this horrendous expression is interesting is that if you truncate it at various points, the values you get are the "continuants", which are exactly the best possible rational approximations to the original number. For example, if we truncate it to [365], we get 365, which is the best possible integer approximation to 365.24219. If we truncate it to [365; 4], we get 365¼, which is the Julian calendar's approximation.

Truncating at the next place gives us [365; 4, 7], which is 365 + 1/(4 + 1/7) = 356 + 1/(29/7) = 365 + 7/29. In this calendar we would have 7 intercalary days out of 29, for a calendar year of 365.241379 days on average. This calendar loses one day every 1,234 years.

The next convergent is [365; 4, 7, 1] = 8/33, which requires 8 intercalary days every 33 years for an average calendar year of 0.242424 days. This schedule gains only one day in 4,269 years and so is actually more accurate than the Gregorian calendar currently in use, while requiring a table with only 33 entries instead of 400.

The real question, however, is not whether the table can be made smaller but whether the rule can be made simpler. The rule for the Gregorian calendar requires second-order corrections:

  1. If the year is a multiple of 400, it is a leap year; otherwise
  2. If the year is a multiple of 100, it is not a leap year; otherwise
  3. If the year is a multiple of 4, it is a leap year.

And one frequently sees computer programs that omit one or both of the exceptions in the rule.

The 8/33 calendar requires dividing by 33, which is its most serious problem. But it can be phrased like this:

  1. Divide the year by 33. If the result is 0, it is not a leap year. Otherwise,
  2. If the result is divisible by 4, it is a leap year.
The rule is simpler, and the weird exceptions come every 33 years instead of every 100. This means that people are more likely to remember them. If you are a computer programmer implementing calendar arithmetic, and you omit the 400-year exception, it may well happen that nobody else will catch the error, because most of the time there is nobody alive who remembers one. (Right now, many people remember one, because it happened, for the second time ever, only 8 years ago. We live at an unusual moment of history.) But if you are a computer programmer who omits the exception in the 8/33 calendar, someone reviewing your code is likely to speak up: "Hey, isn't there some exception when the result is 0? I think I remember something like that happening in third grade."

Furthermore, the rule as I gave it above has another benefit: it matches the Gregorian calendar this year and will continue to do so for several years. This was more compelling when I first proposed this calendar back in 1998, because it would have made the transition to the new calendar quite smooth. It doesn't matter which calendar you use until 2016, which is a leap year in the Gregorian calendar but not in the 8/33 calendar as described above. I may as well mention that I have modestly named this calendar the Dominus calendar.

But time is running out for the smooth transition. If we want to get the benefits of the Dominus calendar we have to do it soon. Help spread the word!

[ Pre-publication addendum: Wikipedia informs me that it is not correct to use the tropical year, since this is not in fact the time between vernal equinoxes, owing to the effects of precession and nutation. Rather, one should use the so-called vernal equinox year, which is around 365.2422 days long. The continued fraction for 365.2422 is slightly different from that of 356.24219, but its first few convergents are the same, and all the rest of the analysis in the article holds the same for both years. ]

[ Addendum 20080229: The Persian calendar uses a hybrid 7/29 and 8/33 system. Read all about it. ]


[Other articles in category /calendar] permanent link

Wed, 27 Feb 2008

Uniquely-decodable codes
Ricardo J.B. Signes asked me a few days ago if there was a way to decide whether a given set S of strings had the property that any two distinct sequences of strings from S have distinct concatenations.

For example, consider S1 = { "ab", "abba", "b" }. This set does not have the specified property, because you can take the two sequences [ "ab", "b", "ab" ] and [ "abba", "b" ], and both concatenate to "abbab". But S2 = { "a", "ab", "abb" } does have this property.

Coding theory

In coding theory, the property has the awful name "unique decodability". The idea is that you have some input symbols, and each input symbol is represented with one output symbol, which is one of the strings from S. Then suppose you receive some message like "abbab". Can you figure out what the original input was? For S2, yes: it must have been ZY. But for S1, no: it could have been either YZ or XZX.

In coding theory, the strings are called "code words" and the set of strings is a "code". So the question is how to tell whether a code is uniquely-decodable. One obvious way to take a non-uniquely-decodable code and turn it into a uniquely-decodable code is to append delimiters to the code words. Consider S1 again. If we delimit the code words, it becomes { "(ab)", "(abba)", "(b)" }, and the two problem sequences are now distinguishable, since "(ab)(b)(ab)" looks nothing like "(abba)(b)". It should be clear that one doesn't need to delimit both ends; the important part is that the words are separated, so one could use { "ab-", "abba-", "b-" } instead, and the problem sequences translate to "ab-b-ab-" and "abba-b-". So every non-uniquely-decodable code corresponds to a uniquely-decodable code in at least this trivial way, and often the uniquely-decodable property is not that important in practice because you can guarantee uniquely-decodableness so easily just by sticking delimiters on the code words.

But if you don't want to transmit the extra delimiters, you can save bandwidth by making your code uniquely-decodable even without delimiters. The delimiters are a special case of a more general principle, which is that a prefix code is always uniquely-decodable. A prefix code is one where no code word is a prefix of another. Or, formally, there are no code words x and y such that x = ys for some nonempty s. Adding the delimiters to a code turns it into a prefix code. But not all prefix codes have delimiters. { "a", "ba", "bba", "bbba" } is an example, as are { "aa", "ab", "ba", "bb" } and { "a", "baa", "bab", "bb" }.

The proof of this is pretty simple: you have some concatenation of code words, say T. You can decode it as follows: Find the unique code word c such that c is a prefix of T; that is, such that T = cU. There must be such a c, because T is a concatenation of code words. And c must be unique, because if there were c' and U' with both cU = T and c'U' = T, then cU = c'U', and whichever of c or c' is shorter must be a prefix of the one that is longer, and that can't happen because this is a prefix code. So c is the first code word in T, and we can pull it off and repeat the process for U, getting a unique sequence of code words, unless U is empty, in which case we are done.

There is a straightforward correspondence between prefix codes and trees; the code words can be arranged at the leaves of a tree, and then to decode some concatenation T you can scan its symbols one at a time, walking the tree, until you get to a leaf, which tells you which code word you just saw. This is the basis of Huffman coding.

Prefix codes include, as a special case, codes where all the words are the same length. For those codes, the tree is balanced, and has all branches the same length.

But uniquely-decodable codes need not be prefix codes. Most obviously, a suffix code is uniquely-decodable and may not be a prefix code. For example, {"a", "aab", "bab", "bb" } is uniquely-decodable but is not a prefix code, because "a" is a prefix of "aab". The proof of uniquely-decodableness is obvious: this is just the last prefix code example from before, with all the code words reversed. If there were two sequences of words with the same concatenation, then the reversed sequences of reversed words would also have the same concatenation, and this would show that the code of the previous paragraph was not uniquely-decodable. But that was a prefix code, and so must be uniquely-decodable.

But codes can be uniquely-decodable without being either prefix or suffix codes. For example, { "aabb", "abb", "bb", "bbba" } is uniquely-decodable but is neither a prefix nor a suffix code. Ric wanted a method for deciding.

I told Ric about the prefix code stuff, which at least provides a sufficient condition for uniquely-decodableness, and then started poking around to see what else I could learn. Ahem, I mean, researching. I suppose that a book on elementary coding theory would have a discussion of the problem, but I didn't have one at hand, and all I could find online concerned prefix codes, which are of more practical interest because of the handy tree method for speedy decoding.

But after tinkering with it for a couple of days (and also making an utterly wrong intermediate guess that it was undecidable, based on a surface resemblance to the Post correspondence problem) I did eventually figure out an algorithm, which I wrote up and released on CPAN, my first CPAN post in about a year and a half.

An example

The idea is pretty simple, and I think best illustrated by an example, as so many things are. We will consider { "ab", "abba", "b" } again. We want to find two sequences of code words whose concatenations are the same. So say we want pX1 = qY1, where p and q are code words and X1 and Y1 are some longer strings. This can only happen if p and q are different lengths and if one is a prefix of the other, since otherwise the two strings pX1 and qY1 don't begin with the same symbols. So we consider just the cases where p is a prefix of q, which means that in this example we want to find "ab"X1 = "abba"Y1, or, equivalently, X1 = "ba"Y1.

Now X1 must begin with "ba", so we need to either find a code word that begins with "ba", or we need to find a code word that is a prefix of "ba". The only choice is "b", so we have X1 = "b"X2, and so X1 = "b"X2 = "ba"Y1, or equivalently, X2 = "a"Y1.

Now X2 must begin with "a", so we need to either find a code word that begins with "a", or we need to find a code word that is a prefix of "a". This occurs for "abba" and "ab". So we now have two situations to investigate: "ab"X3 = "a"Y1, and "abba"X4 = "a"Y1. Or, equivalently, "b"X3 = Y1, and "bba"X4 = Y1.

The first of these, "b"X3 = Y1 wins immediately, because "b" is a code word: we can take X3 to be empty, and Y1 to be "b", and we have what we want:

"ab" X1 = "abba" Y1
"ab" "b" X2 = "abba" Y1
"ab" "b" "ab" X3 = "abba" Y1
"ab" "b" "ab" = "abba" "b"

where the last line of the table is exactly the solution we seek.

Following the other one, "bba"X4 = Y1, fails, and in a rather interesting way. Y1 must begin with two "b" words, so put "bb"Y2 = Y1, so "bba"X4 = "bb"Y2, then "a"X4 = Y2.

But this last equation is essentially the same as the X2 = "a"Y1 situation we were investigating earlier; we are just trying to make two strings that are the same except that one has an extra "a" on the front. So this investigation tells us that if we could find two strings with "a"X = Y, we could make longer strings "abba"Y = "b" "b" "a"X. This may be interesting, but it does not help us find what we really want.

The algorithm

Having seen an example, here's the description of the algorithm. We will tabulate solutions to Xs = Y, where X and Y are sequences of code words, for various strings s. If s is empty, we win.

We start the tabulation by looking for pairs of keywords c1 and c2 with c1 a prefix of c2, because then we have c1s = c2 for some s. We maintain a queue of s-values to investigate. At one point in our example, we had X1 = "ba"Y1; here s is "ba".

If s begins with a code word, then s = cs', so we can put s' on the queue. This is what happened when we went from X1 = "ba"Y1 to "b"X2 = "ba"Y1 to X2 = "a"Y1. Here s was "ba" and s' was "a".

If s is a prefix of some code word, say ss' = c, then we can also put s' on the queue. This is what happened when we went from X2 = "a"Y1 to "abba"X4 = "a"Y1 to "bba"X4 = Y1. Here s was "a" and s' was "bba".

If we encounter some queue item that we have seen before, we can discard it; this will prevent us from going in circles. If the next queue item is the empty string, we have proved that the code is not uniquely-decodable. (Alternatively, we can stop just before queueing the empty string.) If the queue is empty, we have investigated all possibilities and the code is uniquely-decodable.

Pseudocode

Here's the summary:

  1. Initialization: For each pair of code words c1 and c2 with c1s = c2, put s in the queue.

  2. Main loop: Repeat the following until termination
    • If the queue is empty, terminate. The code is uniquely-decodable.
    • Otherwise:
      1. Take an item s from the queue.
      2. For each code word c:
        • If c = s, terminate. The code is not uniquely-decodable.
        • If cs' = s, and s' has not been seen before, queue s'.
        • If c = ss', and s' has not been seen before, queue s'.

To this we can add a little bookkeeping so that the algorithm emits the two ambiguous sequences when the code is not uniquely-decodable. The implementation I wrote uses a hash to track which strings s have appeared in the queue already. Associated with each string s in the hash are two sequences of code words, P and Q, such that Ps = Q. When s begins with a code word, so that s = cs', the program adds s' to the hash with the two sequences [P, c] and Q. When s is a prefix of a code word, so that ss' = c, the program adds s' to the hash with the two sequences Q and [P, c]; the order of the sequences is reversed in order to maintain the Ps = Q property, which has become Qs' = Pss' = Pc in this case.

Notes

As I said, I suspect this is covered in every elementary coding theory text, but I couldn't find it online, so perhaps this writeup will help someone in the future.

After solving this problem I meditated a little on my role in the programming community. The kind of job I did for Ric here is a familiar one to me. When I was in college, I was the math guy who hung out in the computer lab with the hackers. Periodically one of them would come to me with some math problem: "Crash, I am writing a ray tracer. If I have a ray and a triangle in three dimensions, how can I figure out if the ray intersects the triangle?" And then I would go off and figure out how to do that and come back with the algorithm, perhaps write some code, or perhaps provide some instruction in matrix computations or whatever was needed. In physics class, I partnered with Jim Kasprzak, a physics major, and we did all the homework together. We would read the problem, which would be some physics thing I had no idea how to solve. But Jim understood physics, and could turn the problem from physics into some mathematics thing that he had no idea how to solve. Then I would do the mathematics, and Jim would turn my solution back into physics. I wish I could make a living doing this.

Puzzle: Is { "ab", "baab", "babb", "bbb", "bbba" } uniquely-decodable? If not, find a pair of sequences that concatenate to the same string.

Research question: What's the worst-case running time of the algorithm? The queue items are all strings that are strictly shorter than the longest code word, so if this has length n, then the main loop of the algorithm runs at most (an-1) / (a-1) times, where a is the number of symbols in the alphabet. But can this worst case really occur, or is the real worst case much faster? In practice the algorithm always seems to complete very quickly.

Project to do: Reimplement in Haskell. Compare with Perl implementation. Meditate on how they can suck in such completely different ways.

[ There is a brief followup to this article. ]


[Other articles in category /CS] permanent link

Thu, 21 Feb 2008

Crappiest literary theory this month
Someone on Wikipedia has been pushing the theory that the four bad children in Charlie and the Chocolate Factory correspond to the seven deadly sins.


[Other articles in category /book] permanent link

Mon, 18 Feb 2008

Cornaptious
Once I was visiting my grandparents while home from college. We were in the dining room, and they were talking about a book they were reading, in which the author had used a word they did not know: cornaptious. I didn't know it either, and got up from the table to look it up in their Webster's Second International Dictionary. (My grandfather, who was for his whole life a both cantankerous and a professional editor, loathed the permissive and descriptivist Third International. The out-of-print Second International Edition was a prized Christmas present that in those days was hard to find.)

Webster's came up with nothing. Nothing but "corniculate", anyway, which didn't appear to be related. At that point we had exhausted our meager resources. That's what things were like in those days.

The episode stuck with me, though, and a few years later when I became the possessor of the First Edition of the Oxford English Dictionary, I tried there. No luck. Some time afterwards, I upgraded to the Second Edition. Still no luck.

Order
The Lyre of Orpheus
The Lyre of Orpheus
with kickback
no kickback
Years went by, and one day I was reading The Lyre of Orpheus, by Robertson Davies. The unnamed Dean of the music school describes the brilliant doctoral student Hulda Schnakenberg:

"Oh, she's a foul-mouthed, cornaptious slut, but underneath she is all untouched wonderment."
"Aha," I said. "So this is what they were reading that time."

More years went by, the oceans rose and receded, the continents shifted a bit, and the Internet crawled out of the sea. I returned to the problem of "cornaptious". I tried a Google book search. It found one use only, from The Lyre of Orpheus. The trail was still cold.

But wait! It also had a suggestion: "Did you mean: carnaptious", asked Google.

Ho! Fifty-six hits for "carnaptious", all from books about Scots and Irish. And the OED does list "carnaptious". "Sc. and Irish dial." it says. It means bad-tempered or quarrelsome. Had Davies spelled it correctly, we would have found it right away, because "carnaptious" does appear in Webster's Second.

So that's that then. A twenty-year-old spelling error cleared up by Google Books.

[ Addendum 20080228: The Dean's name is Wintersen. Geraint Powell, not the Dean, calls Hulda Schnakenberg a cornaptious slut. ]


[Other articles in category /lang] permanent link

Fri, 15 Feb 2008

Acta Quandalia
Several readers have emailed me to discuss my recent articles about mathematical screwups, and a few have let drop casual comments that suggest that they think that I invented Acta Quandalia as a joke. I can assure you that no journal is better than Acta Quandalia. Since it is difficult to obtain outside of university libraries, however, I have scanned the cover of one of last year's issues for you to see:


[Other articles in category /math] permanent link

Wed, 13 Feb 2008

The least interesting number
Berry's paradox goes like this: Some natural numbers, like 2, are interesting. Some natural numbers, like 255610679 (I think), are not interesting. Consider the set of uninteresting natural numbers. If this set were nonempty, it would contain a smallest element s. But then s, would have the interesting property of being the smallest uninteresting number. This is a contradiction. So the set of uninteresting natural numbers must be empty.

This reads like a joke, and it is tempting to dismiss it as a trite bit of foolishness. But it has rather interesting and deep connections to other related matters, such as the Grelling-Nelson paradox and Gödel's incompleteness theorem. I plan to write about that someday.

But today my purpose is only to argue that there are demonstrably uninteresting real numbers. I even have an example. Liouville's number L is uninteresting. It is defined as:

$$\sum_{i=1}^\infty {10}^{-i!} = 0.1100010000000000000001000\ldots$$

Why is this number of any concern? In 1844 Joseph Liouville showed that there was an upper bound on how closely an irrational algebraic number could be approximated by rationals. L can be approximated much more closely than that, and so must therefore be transcendental. This was the proof of the existence of transcendental numbers.

The only noteworthy mathematical property possessed by L is its transcendentality. But this is certainly not enough to qualify it as interesting, since nearly all real numbers are transcendental.

Liouville's theorem shows how to construct many transcendental numbers, but the construction generates many similar numbers. For example, you can replace the 10 with a 2, or the n! with floor(en) or any other fast-growing function. It appears that any potentially interesting property possessed by Liouville's number is also possessed by uncountably many other numbers. Its uninterestingness is identical to that of other transcendental numbers constructed by Liouville's method. L was neither the first nor the simplest number so constructed, so Liouville's number is not even of historical interest.

The argument in Berry's paradox fails for the real numbers: since the real numbers are not well-ordered, the set of uninteresting real numbers need have no smallest element, and in fact (by Berry's argument) does not. Liouville's number is not the smallest number of its type, nor the largest, nor anything else of interest.

If someone were to come along and prove that Liouville's number was the most uninteresting real number, that would be rather interesting, but it has not happened, nor is it likely.


[Other articles in category /math] permanent link

Fri, 08 Feb 2008

How big is a five-gallon jug?
Office water coolers in the United States commonly take five-gallon jugs of water. You are probably familiar with these jugs, but here is a picture of a jug, to refresh your memory. A random graduate student has been provided for scale:

Here's today's riddle: Can you estimate the volume of the jug in cubic feet? "Estimate" means by eyeballing it, not by calculating, measuring, consulting reference works, etc. But feel free to look at an actual jug if you have one handy.

Once you've settled on your estimate, compare it with the correct answer, below.


Answer:
It is about 2/3 of a cubic foot.

One gallon contains about 231 cubic inches. Five gallons contain about 1155 cubic inches.

One cubic foot contains 12×12×12 = 1728 cubic inches.

Hard to believe, isn't it? ("Strange but true.") I took one of these jugs around my office last year, asking everyone to guess how big it was; nobody came close. People typically guessed that it was about three times as big as it actually is.

This puzzle totally does not work anywhere except in the United States. The corresponding puzzle for the rest of the world is "Here is a twenty-liter jug. Can you guess the volume of the jug in liters?" I suppose this is an argument in favor of the metric system.


[Other articles in category /tech] permanent link

Thu, 07 Feb 2008

Trivial theorems
Mathematical folklore contains a story about how Acta Quandalia published a paper proving that all partially uniform k-quandles had the Cosell property, and then a few months later published another paper proving that no partially uniform k-quandles had the Cosell property. And in fact, goes the story, both theorems were quite true, which put a sudden end to the investigation of partially uniform k-quandles.

Except of course it wasn't Acta Quandalia (which would never commit such a silly error) and it didn't concern k-quandles; it was some unspecified journal, and it concerned some property of some sort of topological space, and that was the end of the investigation of those topological spaces.

This would not qualify as a major screwup under my definition in the original article, since the theorems are true, but it certainly would have been rather embarrassing. Journals are not supposed to publish papers about the properties of the empty set.

Hmm, there's a thought. How about a Journal of the Properties of the Empty Set? The editors would never be at a loss for material. And the cover almost designs itself.

Handsome, isn't it? I See A Great Need!

Ahem. Anyway, if the folklore in question is true, I suppose the mathematicians involved might have felt proud rather than ashamed, since they could now boast of having completely solved the problem of partially uniform k-quandles. But on the other hand, suppose you had been granted a doctorate on the strength of your thesis on the properties of objects from some class which was subsequently shown to be empty. Wouldn't you feel at least a bit like a fraud?

Is this story true? Are there any examples? Please help me, gentle readers.


[Other articles in category /math] permanent link

Wed, 06 Feb 2008

Major screwups in mathematics
I don't remember how I got thinking about this, but for the past week or so I've been trying to think of a major screwup in mathematics. Specifically, I want a statement S such that:

  1. A purported (but erroneous) proof of S was published in the mathematical literature, so that
  2. S was generally accepted as true for a significant period of time, say at least two years, but
  3. S is actually false
I cannot think of an example.

There are many examples of statements that were believed without proof that turned out to be false, such as any number of decidability and completeness (non-)theorems. If it turns out that P=NP, this will be one of those type, but as yet there is no generally accepted proof to the contrary, so it is not an example. Similarly, if would be quite surprising to learn that the Goldbach conjecture was false, but at present mathematicians do not generally believe that it has been proved to be true, so the Goldbach conjecture is not an example of this type, and is unlikely ever to be.

There are a lot of results that could have gone one way or another, such as the three-dimensional kissing number problem. In this case some people believing they could go one way and some the other, and then they found that it was one way, but no proof to the contrary was ever widely accepted.

Then we have results like the independence of the parallel postulate, where people thought for a long time that it should be implied by the rest of Euclidean geometry, and tried to prove it, but couldn't, and eventually it was determined to be independent. But again, there was no generally accepted proof that it was implied by the other postulates. So mathematics got the right answer in this case: the mathematicians tried to prove a false statement, and failed, and then eventually figured it out.

Alfred Kempe is famous for producing an erroneous proof of the four-color map theorem, which was accepted for eleven years before the error was detected. But the four-color map theorem is true. I want an example of a false statement that was believed for years because of an erroneous proof.

If there isn't one, that is an astonishing declaration of success for all of mathematics and for its deductive methods. 2300 years without one major screwup!

It seems too good to be true. Is it?

Glossary for non-mathematicians

  • The "decidability and completeness" results I allude to include the fact that the only systems of mathematical axioms strong enough to prove all true statements of arithmetic, are those that are so strong that they also prove all the false statements of arithmetic. A number of results of this type were big surprises in the early part of the 20th century.

  • If "P=NP" were true, then it would be possible to efficiently find solutions to any problem whose solutions could be efficiently checked for correctness. For example, it is relatively easy to check to see if a proposed conference schedule puts two speakers in the same room at the same time, if it allots the right amount of time for each talk, if it uses no more than the available number of rooms, and so forth. But to generate such schedules seems to be a difficult matter in general. "P=NP" would imply that this problem, and many others that seem equally difficult, was actually easy.

  • The Goldbach conjecture says that every even number is the sum of two prime numbers.

  • The kissing number problem takes a red ping-pong ball and asks how many white ping-pong balls can simultaneously touch it. It is easy to see that there is room for 12 white balls. There is a lot of space left over, and for some time it was an open question whether there was a way to fit in a 13th. The answer turns out to be that there is not.

  • The four-color map theorem asks whether any geographical map (subject to certain restrictions) can be colored with only four colors such that no two adjacent regions are the same color. It is quite easy to see that at least four colors may be necessary (Belgium, France, Germany, and Luxembourg, for example), and not hard to show that five colors are sufficient.

  • Classical Greek geometry contained a number of "postulates", such as "any line can be extended to infinity" and "a circle can be drawn with any radius around any center", but the fifth one, the notorious "parallel postulate", was a complicated and obscure technical matter, which turns out to be equivalent to the statement that, for any line L and point P not on L, there is exactly one line L' through P parallel to L. This in turn is equivalent to the fact that classical geometry is done on a plane, and not on a curved surface.

[ Addendum 20080205: Readers suggested some examples, and I happened upon one myself. For a summary, see this month's addenda. I also wrote a detailed article about a mistake of Kurt Gödel's. ]

[ Addendum 20080206: Another article in this series, asking readers for examples of a different type of screwup. ]


[Other articles in category /math] permanent link

Major screwups in mathematics: example 1
Last month I asked for examples of major screwups in mathematics. Specifically, I was looking for cases in which some statement S was considered to be proved, and later turned out to be false. I could not think of any examples myself.

Readers suggested several examples, and I got lucky and turned up one on my own.

Some of the examples were rather obscure technical matters, where Professor Snorfus publishes in Acta Quandalia that all partially uniform k-quandles have the Cosell property, and this goes unchallenged for several years before one of the other three experts in partially uniform quandle theory notices that actually this is only true for Nemontovian k-quandles. I'm not going to report on matters that sounded like that to me, although I realize that I'm running the risk that all the examples that I do report will sound that way to most of the audience. But I'm going to give it a try.

General remarks

I would like to make some general remarks first, but I don't quite know yet what they are. Two readers independently suggested that I should read Proofs and Refutations by Imre Lakatos, and raised a number of interesting points that I'm sure I'd like to expand on, except that I haven't read the book. Both copies are checked out of the Penn library, which is a good sign, and the interlibrary loan copy I ordered won't be here for several days.

Still, I can relate a partial secondhand understanding of the ideas, which seem worth repeating.

Whether a result is "correct" may be largely a matter of definition. Consider Lakatos' principal example, Euler's theorem about polyhedra: Let F, E, and V be the number of faces, edges, and vertices in a polyhedron. Then F - E + V = 2. For example, the cube has (F, E, V) = (6, 12, 8), and 6 - 12 + 8 = 2.

Sometime later, someone observed that Euler's theorem was false for polyhedra with holes in them. For example, consider the object shown at right. It has (F, E, V) = (9, 18, 9), giving F - E + V = 9 - 18 - 9 = 0.

Can we say that Euler was wrong? Not really. The question hinges on the definition of "polyhedron". Euler's theorem is proved for "polyhedra", but we can see from the example above that it only holds for "simply-connected polyhedra". If Euler proved his theorem at a time when "polyhedra" was implicitly meant "simply-connected", and the generally-understood definition changed out from under him, we can't hold that against Euler. In fact, the failure of Euler's theorem for the object above suggests that maybe we shouldn't consider it to be a polyhedron, that it is somehow rather different from a polyhedron in at least one important way. So the theorem drives the definition, instead of the other way around.

Okay, enough introductory remarks. My first example is unquestionably a genuine error, and from a first-class mathematician.

Mathematical background

Some terminology first. A "formula" is just that, for example something like this:

$$\displaylines{ ((\forall a.\lnot R(a,a)) \wedge\cr (\forall b\forall c.R(b,c)\to\lnot R(c,b))\wedge\cr (\forall d\forall e\forall f.(R(d,e)\wedge R(e,f)\to R(d,f))) \to\cr (\forall x\exists y.R(y,x)) }$$

It may contain a bunch of quantified variables (a, b, c, etc.), relations (like R), and logical connectives like ∧. A formula might also include functions and constants (which I didn't) or equality symbols (there are none here).

One can ask whether the formula is true (or, in the jargon, "valid"), which means that it must hold regardless of how one chooses the set S from which the values of the variables will be drawn, and regardless of the meanings assigned to the relation symbols (and to the functions and constants, if there are any). The following formula, although not very interesting, is valid:

$$ \forall a\exists b.(P(a)\wedge P(b))\to P(a) $$

This is true regardless of the meaning we ascribe to P, and regardless of the set from which a and b are required to be drawn.

The longer formula above, which requires that R be a linear order, and then that the linear order R have no minimal element, is not universally valid, but it is valid for some interpretations of R and some sets S from which a...f, x, and y may be drawn. Specifically, it is true if one takes S to be the set of integers and R(x, y) to mean x < y. Such formulas, which are true for some interpretations but not for all, are called "satisfiable". Obviously, valid formulas are satisfiable, because satisfiable formulas are true under some interpretations, but valid formulas are true under all interpretations.

Gödel famously showed that it is an undecidable problem to determine whether a given formula of arithmetic is satisfiable. That is, there is no method which, given any formula, is guaranteed to tell you correctly whether or not there is some interpretation in which the formula is true. But one can limit the form of the allowable formulas to make the problem easier. To take an extreme example, just to illustrate the point, consider the set of formulas of the form:

ab... ((a=0)∨(a=1))&and((b=0)∨(b=1))∧...∧R(a,b,...)

for some number of variables. Since the formula itself requires that a, b, etc. are each either 0 or 1, all one needs to do to decide whether the formula is satisfiable is to try every possible assignment of 0 and 1 to the n variables and see whether R(a,b,...) is true in any of the 2n resulting cases. If so, the formula is satisfiable, if not then not.

Kurt Gödel, 1933

One would like to prove decidability for a larger and more general class of formulas than the rather silly one I just described. How big can the class of formulas be and yet be decidable?

It turns out that one need only consider formulas where all the quantifiers are at the front, because there is a simple method for moving quantifiers to the front of a formula from anywhere inside. So historically, attention has been focused on formulas in this form.

One fascinating result concerns the class of formulas called [∃*2*, all, (0)]. These are the formulas that begin with ∃ab...∃mnpq...∃z, with exactly two ∀ quantifiers, with no intervening ∃s. These formulas may contain arbitrary relations amongst the variables, but no functions or constants, and no equality symbol. [∃*2*, all, (0)] is decidable: there is a method which takes any formula in this form and decides whether it is satisfiable. But if you allow three ∀ quantifiers (or two with an ∃ in between) then the set of formulas is no longer decidable. Isn't that freaky?

The decidability of the class [∃*2*, all, (0)] was shown by none other than Gödel, in 1933. However, in the last sentence of his paper, Gödel added that the same was true even if the formulas were also permitted to include equality:

In conclusion, I would still like to remark that Theorem I can also be proved, by the same method, for formulas that contain the identity sign.

Oops

This was believed to be true for more than thirty years, and the result was used by other mathematicians to prove other results. But in the mid-1960s, Stål Aanderaa showed that Gödel's proof would not actually work if the formulas contained equality, and in 1983, Warren D. Goldfarb proved that Gödel had been mistaken, and the satisfiability of formulas in the larger class was not decidable.

Sources

Gödel's original 1933 paper is Zum Entscheidungsproblem des logischen Funktionenkalküls (On the decision problem for the functional calculus of logic) which can be found on pages 306–327 of volume I of his Collected Works. (Oxford University Press, 1986.) There is an introductory note by Goldfarb on pages 226–231, of which pages 229–231 address Gödel's error specifically.

I originally heard the story from Val Tannen, and then found it recounted on page 188 of The Classical Decision Problem, by Egon Boerger, Erich Grädel, and Yuri Gurevich. But then blog reader Jeffrey Kegler found the Goldfarb note, of which the Boerger-Grädel-Gurevich account appears to be a summary.

Thanks very much to everyone who contributed, and especially to M. Kegler.

(I remind readers who have temporarily forgotten, that Acta Quandalia is the quarterly journal of the Royal Uzbek Academy of Semi-Integrable Quandle Theory. Professor Snorfus, you will no doubt recall, won the that august institution's prestigious Utkur Prize in 1974.)

[ Addendum 20080206: Another article in this series. ]


[Other articles in category /math] permanent link

Tue, 05 Feb 2008

Steganography in 1665: correction
(A correction to this.)

Phil Rodgers has pointed out that a "physique" is not an emetic, as I thought, but a laxative.

Are there any among you who doubt that Bruce Schneier can shoot sluggbullets out of his ass? Let the unbelievers beware!


[Other articles in category /IT] permanent link

Steganography in 1665
Today's entry in Samuel Pepys' diary says:

He told us a very handsome passage of the King's sending him his message ... in a sluggbullet, being writ in cypher, and wrapped up in lead and swallowed. So the messenger come to my Lord and told him he had a message from the King, but it was yet in his belly; so they did give him some physique, and out it come.
Sure, Bruce Schneier can mount chosen-ciphertext attacks without even choosing a ciphertext. But dare he swallow a "sluggbullet" and bring it up again to be read?

Silly me. Bruce Schneier can probably cough up a sluggbullet without swallowing one beforehand.

[ Addendum 20080205: A correction. ]


[Other articles in category /IT] permanent link

Fri, 01 Feb 2008

Emacs and alists
[ This article is a few weeks old now. I wrote it and forgot to publish it at the time. ]

Yesterday I upgraded Emacs, and since it was an upgrade, something that had been working for me for fifteen years stopped working, because that's what "upgrade" means. My .emacs file contains:

        (aput 'auto-mode-alist "\\.pl\\'" (function cperl-mode))
        (aput 'auto-mode-alist "\\.t\\'" (function cperl-mode))
        (aput 'auto-mode-alist "\\.cgi\\'" (function cperl-mode))
        (aput 'auto-mode-alist "\\.pm\\'" (function cperl-mode))
        (aput 'auto-mode-alist "\\.blog\\'" (function text-mode))
        (aput 'auto-mode-alist "\\.sml\\'" (function sml-mode))
I should explain this, since I imagine that most readers of this blog are like me in that they touch Emacs Lisp only once a year on Saint Vibrissa's Day. An alist ("association list") is a common data structure in Lisp programs. It is a list of pairs; the first element of each pair is a key, and the second element is an associated value. The pairs in the special auto-mode-alist variable have regexes as their keys and functions as their values. Whenever Emacs opens a new file, it scans this alist, until it finds a regex that matches the name of the file. It then executes the associated function. Thus the effect of the first line above is to have Emacs enable the cperl-mode function on any file whose name ends in ".pl".

The aput function is for maintaining alists. It takes an alist, a key, and a value, scans the alist looking for a matching key, and then if it finds it, it amends the corresponding value. Otherwise, it appends a new association onto the front of the alist.

When I upgraded emacs, this broke. The aput function was moved into a separate package, which I now had to load with (require 'assoc).

I asked about this on IRC, and was told that the correct way to do this, if I did not want to (require 'assoc), was to use the following abomination:

        (mapc (lambda (x) (when (eq 'perl-mode (cdr x)) (setcdr x 'cperl-mode)))
                 (append auto-mode-alist interpreter-mode-alist))
The effect of this is to scan over auto-mode-alist (and also interpreter-mode-alist, a related variable) looking for any association whose value was the perl-mode function, and using setcdr to replace perl-mode with cperl-mode.

(This does not address the issue of what to do with .t files or .blog files, for which no association exists yet, presumably, but I did not ask about those specifically on IRC.)

I was totally boggled. Choosing the right editing mode for a file is a basic function of emacs. I could not believe that the best and simplest way to add or change associations was to use mapc lambda gobhorn oleo potatopudding quote potrzebie. I was assured that this was indeed the only correct method. Struck almost speechless, I managed to come up with "Bullshit."

Apparently the issue was that if auto-mode-alist already contains an association for ".pl", there is no guarantee that my new association will be found and preferred to the old one, unless I somehow remove the old one, or edit it to be the way I want.

This seemed very unlikely to me. You see, an alist is a list. This means that it is searched from head to tail, because this is the only way a list can be searched. So in particular, if you cons a second association to the front of the list, which has the same key as a later (older) association, the search will find the new one first, and the older one becomes inoperative. I asked if there was not a guarantee that the alist would be searched from front to back. I was told that there is not.

I looked in the manual, and reported that the assoc function, which is the getter that corresponds to aput, taking an alist and a key, and returning the corresponding value, is expressly guaranteed to return the first matching item. I was told that there was no guarantee that assoc would be used.

I pondered the manual some more and found this passage:

However, association lists have their own advantages. Depending on your application, it may be faster to add an association to the front of an association list than to update a property.
That is, it is expressly endorsing the technique of adding a new item to the front of an alist in order to override any later item that might have the same key.

After finding that the add-to-the-front technique really did work, I reasoned that if someday Emacs stopped searching alists sequentially, I would not be in any more trouble than I had been today when they removed the aput function.

So I did not take the advice I was given. Instead, I left it pretty much the way it was. I did take the opportunity to clean up the code a bit:

        (push '("\\.pl\\'" . cperl-mode) auto-mode-alist)
        (push '("\\.t\\'" .  cperl-mode) auto-mode-alist)
        (push '("\\.cgi\\'" . cperl-mode) auto-mode-alist)
        (push '("\\.pm\\'" . cperl-mode) auto-mode-alist)
        (push '("\\.blog\\'" . text-mode) auto-mode-alist)
        (push '("\\.sml\\'" . sml-mode) auto-mode-alist)
The push function simply appends an element to the front of a list, modifying the list in-place.

But wow, the advice I got was phenomenally bad. It was bad in a really interesting way, too. It reminded me of the advice people get on the #math channel, where some guy comes in with some question about triangles and gets the category-theoretic viewpoint on triangles as natural transformations of something or other. The advice was bad because although it was correct, it was completely devoid of common sense.

[ Addendum 20080124: It has been brought to my attention that the Emacs FAQ endorses my solution, which makes the category-theoretic advice proposed by the #emacs blockheads even less defensible. ]

[ Addendum 20080201: Steve Vinoski suggests replacing the aput function. ]


[Other articles in category /prog] permanent link

Addenda to recent articles 200801
Here are some notes on posts from the last month that I couldn't find better places for.

  • As a result of my research into the Harriet Tubman mural that was demolished in 2002, I learned that it had been repainted last year at 2950 Germantown Avenue.

  • A number of readers, including some honest-to-God Italians, wrote in with explanations of Boccaccio's term milliantanove, which was variously translated as "squillions" and "a thousand hundreds".

    The "milli-" part suggests a thousand, as I guessed. And "-anta" is the suffix for multiples of ten, found in "quaranta" = "forty", akin to the "-nty" that survives in the word "twenty". And "nove" is "nine".

    So if we wanted to essay a literal translation, we might try "thousanty-nine". Cormac Ó Cuilleanáin's choice of "squillions" looks quite apt.

  • My article about clubbing someone to death with a loaded Uzi neglected an essential technical point. I repeatedly said that

        for my $k (keys %h) {
          if ($k eq $j) {
            f($h{$k})
          }
        }
    

    could be replaced with:

        f($h{$j})
    

    But this is only true if $j actually appears in %h. An accurate translation is:

        f($h{$j}) if exists $h{$j}
    

    I was, of course, aware of this. I left out discussion of this because I thought it would obscure my point to put it in, but I was wrong; the opposite was true.

    I think my original point stands regardless, and I think that even programmers who are unaware of the existence of exists should feel a sense of unease when presented with (or after having written) the long version of the code.

    An example of this error appeared on PerlMonks shortly after I wrote the article.

  • Robin Houston provides another example of a nonstandard adjective in mathematics: a quantum group is not a group.

    We then discussed the use of nonstandard adjectives in biology. I observed that there seemed to be a trend to eliminate them, as with "jellyfish" becoming "jelly" and "starfish" becoming "sea star". He pointed out that botanists use a hyphen to distinguish the standard from the nonstandard: a "white fir" is a fir, but a "Douglas-fir" is not a fir; an "Atlas cedar" is a cedar, but a "western redcedar" is not a cedar.

    Several people wrote to discuss the use of "partial" versus "total", particularly when one or the other is implicit. Note that a total order is a special case of a partial order, which is itself a special case of an "order", but this usage is contrary to the way "partial" and "total" are used for functions: just "function" means a total function, not a partial function. And there are clear cases where "partial" is a standard adjective: partial fractions are fractions, partial derivatives are derivatives, and partial differential equations are differential equations.

  • Steve Vinoski posted a very interesting solution to my question about how to set Emacs file modes: he suggested that I could define a replacement aput function.

  • In my utterly useless review of Robert Graves' novel King Jesus I said "But how many of you have read I, Claudius and Suetonius? Hands? Anyone? Yeah, I didn't think so." But then I got email from James Russell, who said he had indeed read both, and that he knew just what I meant, and, as a result, was going directly to the library to take out King Jesus. And he read the article on Planet Haskell. Wow! I am speechless with delight. Mr. Russell, I love you. From now on, if anyone asks (as they sometimes do) who my target audience is, I will say "It is James Russell."

  • A number of people wrote in with examples of "theorems" that were believed proved, and later turned out to be false. I am preparing a longer article about this for next month. Here are some teasers:

    • Cauchy apparently "proved" that if a sum of continuous functions converges pointwise, then the sum is also a continuous function, and this error was widely believed for several years.

    • I just learned of a major screwup by none other than Kurt Gödel concerning the decidability of a certain class of sentences of first-order arithmetic which went undetected for thirty years.

    • Robert Tarjan proved in the 1970s that the time complexity of a certain algorithm for the union-find problem was slightly worse than linear. And several people proved that this could not be improved upon. But Hantao Zhang has a paper submitted to STOC 2008 which, if it survives peer review, shows that that the analysis is wrong, and the algorithm is actually O(n).

    • Finally, several people, including John Von Neumann, proved that the axioms of arithmetic are consistent. But it was shown later that no such proof is possible.

  • A number of people wrote in with explanations of "more than twenty states"; I will try to follow up soon.


[Other articles in category /addenda] permanent link

Thu, 31 Jan 2008

Unnecessary imprecision
This article contains the following sentence:

McCain has won all of the state's 57 delegates, and the last primary before voters in more than 20 states head to the polls next Tuesday.
Why "more than 20 states"? Why not just say "23 states", which is shorter and conveys more information?

I'm not trying to pick on CTV here. A Google News search finds 42,000 instances of "more than 20", many of which could presumably be replaced with "26" or whatever. Well, I had originally written "most of which", but then I looked at some examples, and found that the situation is better than I thought it would be. Here are the first ten matches:

  1. Australian Stocks Complete Worst Month in More Than 20 Years
  2. It said the US air force committed more than 20 cases of aerial espionage by U-2 strategic espionage planes this month.
  3. Farmland prices have climbed more than 20% over the past year in many Midwestern states...
  4. "We have had record-breaking growth in our monthly shipments, as much as more than 20 percent improvements per month," said Christopher Larkins, President...
  5. More than 20 people, including a district officer, were injured when two bombs exploded outside a stadium in the town yesterday...
  6. By a vote of 14-7, the Senate Finance Committee last night voted to deliver $500 tax rebates to more than 20 million American senior citizens...
  7. 9 killed, more than 20 injured in bus accident
  8. While Tuesday's results may not lock up the nomination for either candidate, Democrats will have their say in more than 20 states...
  9. Facing the potential anointment of his rival, John McCain, Romney has less than a week to convince voters in more than 20 states that...
  10. More than 20 Aberdeen citizens qualified for elections as April ...
#1 may be legitimate, if the previous worst month was less than 21 years ago. Similarly #6 is legitimate if the number of senior citizens is close to 20 million, say around 20,400,000, particularly since the number may not be known with high precision.

#2 may be legitimate, if the number of cases of aerial espionage is not known with certitude, or if the anonymous source really did say "more than 20". Similarly #4 is entirely off the hook since it is a quotation.

#3 may be legitimate if the price of farmland is uncertain and close to 20%. #5 is probably a loser. #7 is definitely a loser: it was the headline of an article that began "Nine people were killed and at least 22 injured when...". The headline could certainly have been "9 killed, 22 injured in bus accident".

#8 and #9 are losers, but they are the same example with which I began the article, so they don't count. #10 is a loser.

So I have, of eight examples (disregarding #8 and #9) three certain or near-certain failures (#5, #7, and #10), one certain non-failure (#4), and four cases to which I am willing to extend the benefit of the doubt. This is not as bad as I feared. I like when things turn out better than I thought they would.

But I really wonder what is going on with all these instances of "more than 20 states". Is it just sloppy writing? Or is there some benefit that I am failing to appreciate?


[Other articles in category /lang] permanent link

Ramanujan's congruences
Let p(n) be the number of partitions of the integer n. For example, p(4) = 5 because there are 5 partitions of the integer 4, namely {4, 3+1, 2+2, 2+1+1, 1+1+1+1}.

Ramanujan's congruences state that:

p(5k+4) =0 (mod 5)
p(7k+5) =0 (mod 7)
p(11k+6) =0 (mod 11)

Looking at this, anyone could conjecture that p(13k+7) = 0 (mod 13), but it isn't so; p(7) = 15 and p(20) = 48·13+3.

But there are other such congruences. For example, according to Partition Congruences and the Andrews-Garvan-Dyson Crank:

$$ p(17\cdot41^4k + 1122838) = 0 \pmod{17} $$

Isn't mathematics awesome?


[Other articles in category /math] permanent link

Wed, 30 Jan 2008

More on risk
My article on risk was one of the more popular articles so far; it went to the top of Reddit, and was widely commented on. It wasn't the best article I've written. I confused a bunch of important things. But now I find I have more to say on the topic, so I'm back to screw up again.

One big problem with the Reddit posting is that the guy who posted it there titled the post on risk, or why poor people might not be stupid to play the lottery. So a lot of the Reddit comments complained that I had failed to prove that poor people must not be stupid to play the lottery, or that I was wrong on that point. They argued that the dollar cost of a lottery ticket is more valuable to a poor person than to a rich one, and so on. But I didn't say anything about poor people. People read this into the article based on the title someone else had attached to it, and they couldn't get rid of this association even after I pointed out that the article had nothing to say about poor people.

Something I do a lot, in this blog, and in life, is point out fallacious arguments. You get some argument that X is true, because of P and Q and therefore R, and then I'll come along point out that P is false and Q is irrelevant, and anyway they don't imply R, and even if they did, you can't conclude X from R, because if you could, then you could also conclude Y and Z which are obviously false. For example, in a recent article I addressed the argument that:

You can double your workforce participation from 27% to 51% of the population, as Singapore did; you can't double it again.

The argument being that you can't double a participation of 51% because you can't possibly have 102% workforce participation. (Peter Norvig pointed out that he made the same argument in a different context back in 1999.) But the argument here fails, for reasons I won't go into again. This doesn't mean that I