The Universe of Discourse
           
Mon, 31 Dec 2007

Harriet Tubman
Iris and I had a pretty heavy conversation in the car on the way home from school last week. I should begin by saying that Philadelphia has a lot of murals. More murals than any other city in the world, in fact. The mural arts people like to put up murals on large, otherwise ugly party walls. That is, when you have two buildings that share a wall, and one of them is torn down, leaving a vacant lot with a giant blank wall, the mural arts people see it as a prime location and put a mural there. On the way back from Iris's school we drive through Mantua, which is not one of the rosier Philadelphia neighborhoods, and has a lot of vacant lots, and so a lot of murals. We sometimes count the murals on the way home, and usually pass four or five.

Iris pointed out a mural she liked, and I observed that there was construction on the adjacent vacant lot, which is likely to mean that the mural will be covered up soon by the new building. I mentioned that my favorite Philadelphia mural of all had been on the side of a building that was torn down in 2002.

Iris asked me to tell her about it, so I did. It was the giant mural of Harriet Tubman that used to be on the side of the I. Goldberg building at 9th and Chestnut Streets. It was awesome. There was 40-foot-high painting of Harriet Tubman raising her lantern at night, leading a crowd of people through a dark tunnel (Underground Railroad, obviously) into a beautiful green land beyond, and giant chains that had once barred the tunnel, but which were now shattered.

It's hard to photograph a mural well. The scale and the space do not translate to photographs. It looked something like this:

Note that the small people at the bottom are actually larger than life-size.

Here's a detail:

One cool thing about it that you can't see in the picture is that the column of stones on Tubman's right is painted so as to disguise an large and ugly air conditioning vent that emerges from the wall and climbs up to the roof. The wall is otherwise flat.

Anyway, I said that my favorite mural had been the Harriet Tubman one, and that it had been torn down before she was born. (As you can see from the picture, the building was located next to a parking lot. The owners of the building ripped it down to expand the parking lot.)

But then Iris asked me to tell her about Harriet Tubman, and that was something of a puzzle, because Iris is only three and a half. But the subject is not intrinsically hard to understand; it's just unpleasant. And I don't believe that it's my job to shield her from the unpleasantness of the world, but it is my job to try to answer her questions, if I can. So I tried.

"Okay, you know how you own stuff, and you can do what you want, because it's yours?"

Sure, she understands that. We have always been very clear in distinguishing between her stuff and our stuff, and in defending her property rights against everyone, including ourselves.

"But you know that you can't own other people, right?"

This was confusing, so I tried an example. "Emily is your friend, and sometimes you ask her to do things, and maybe she does them. But you can't make her doing things she doesn't want to do, because she gets to decide for herself what she does."

Sure, of course. Now we're back on track. "Well, a long time ago, some people decided that they owned some other people, called slaves, and that the slaves would have to do whatever their owners said, even if they didn't want to."

Iris was very indignant. I believe she said "That's not nice!" I agreed; I said it was terrible, one of the most terrible things that had ever happened in this country. And then we were over the hump. I said that slaves sometimes tried to run away from the owners, and get away to a place where they could do whatever they wanted, and that Harriet Tubman helped slaves escape.

There you have Harriet Tubman in a nutshell for a three-and-a-half year-old. It was a lot easier than the time she asked me why ships in 1580 had no women aboard.

I did not touch the racial issue at all. When you are explaining something complicated, it is important to keep it in bite-sized chunks, and to deal with them one at a time, and I thought slavery was already a big enough chunk. Iris is going to meet this issue head-on anyway, probably sooner than I would like, because she is biracial.

I explained about the Underground Railroad, and we discussed what a terrible thing slavery must have been. Iris wanted to know what the owners made the slaves do, and there my nerve failed me. I told her that I didn't want to tell her about it because it was so awful and frightening. I had pictures in my head of beatings, and of slaves with their teeth knocked out so that they could be forced to eat, to break hunger strikes, and of rape, and families broken up, and I just couldn't go there. Well, I suppose it is my job to shield her from some the unpleasantness of the world, for a while.

I realize now I could have talked about slaves forced to do farm work, fed bad food, and so on, but I don't think that would really have gotten the point across. And I do think I got the point across: the terrible thing about being a slave is that you have to do what you are told, whether you want to or not. All preschoolers understand that very clearly, whereas for Iris, toil and neglect are rather vague abstractions. So I'm glad I left it where I did.

But then a little later Iris asked some questions about family relations among the slaves, and if slaves had families, and I said yes, that if a mother had a child, then her child belonged to the same owner, and sometimes the owner would take the child away from its mother and sell it to someone else and they would never see each other again. Iris, of course, was appalled by this.

I'm not sure I had a point here, except that Iris is a thoughtful kid, who can be trusted with grown-up issues even at three and a half years old, and I am very proud of her.

That seems like a good place to end the year. Thanks for reading.

[ Addendum 20080201: The mural was repainted in a new location, at 2950 Germantown Avenue! ]


[Other articles in category /kids] permanent link

Sun, 30 Dec 2007

Welcome to my ~/bin
In the previous article I mentioned "a conference tutorial about the contents of my ~/bin directory". Usually I have a web page about each tutorial, with a description, and some sample slides, and I wanted to link to the page about this tutorial. But I found to my surprise that I had forgotten to make the page about this one.

So I went to fix that, and then I couldn't decide which sample slides to show. And I haven't given the tutorial for a couple of years, and I have an upcoming project that will prevent me from giving it for another couple of years. Eh, figuring out what to put online is more trouble than it's worth. I decided it would be a lot less toil to just put the whole thing online.

The materials are copyright © 2004 Mark Jason Dominus, and are not under any sort of free license.

But please enjoy them anyway.

I think the title is an accidental ripoff of an earlier class by Damian Conway. I totally forgot that he had done a class on the same subject, and I think he used the same title. But that just makes us even, because for the past few years he has been making money going around giving talks on "Conference Presentation Aikido", which is a blatant (and deliberate) ripoff of my 2002 Perl conference talk on Conference Presentation Judo. So I don't feel as bad as I might have.

Welcome to my ~/bin complete slides and other materials.

I hereby wish you a happy new year, unless you don't want one, in which case I wish you a crappy new year instead.


[Other articles in category /prog/perl] permanent link

Thu, 20 Dec 2007

Another trivial utility: accumulate
As usual, whenever I write one of these things, I wonder why it took me so long to get off my butt and put in the five minutes of work that were actually required. I've wanted something like this for years. It's called accumulate. It reads an input of this form:

        k1 v1
        k1 v2
        k2 v3
        k1 v4
        k2 v5
        k3 v6
and writes it out in this format:

        k1 v1 v2 v4
        k2 v3 v5
        k3 v6
I wanted it this time because I had a bunch of files that included some duplicates, and wanted to get rid of the duplicates. So:

        md5sum * | accumulate | perl -lane 'unlink @F[2..$#F]'
(Incidentally, people sometimes argue that Perl's .. operator should count backwards when the left operand exceeds the right one. These people are wrong. There is only one argument that needs to be made to refute this idea; maybe it is the only argument that can be made. And examples of it abound. The code above is one such example.)

I'm afraid of insulting you by showing the source code for accumulate, because of course it is so very trivial, and you could write it in five minutes, as I did. But who knows; maybe seeing the source has some value:

        #!/usr/bin/perl

        use Getopt::Std;
        my %opt = (k => 1, v => 2);
        getopts('k:v:', \%opt) or usage();
        for (qw(k v)) {
          $opt{$_} -= 1 if $opt{$_} > 0;
        }

        while (<>) {
          chomp;
          my @F = split;
          push @{$K{$F[$opt{k}]}}, $F[$opt{v}];
        }

        for my $k (keys %K) {
          print "$k @{$K{$k}}\n";
        }
It's tempting to add a -F option to tell it that the input is not delimited by white space, or an option to change the output format, or blah blah blah, but I managed to restrain myself, mostly.

Several years ago I wrote a conference tutorial about the contents of my ~/bin directory. The clearest conclusion that transpired from my analysis was that the utilities I write have too many features that I don't use. The second-clearest was that I waste too much time writing custom argument-parsing code instead of using Getopt::Std. I've tried to learn from this. One thing I found later is that a good way to sublimate the urge to put in some feature is to put in the option to enable it, and to document it, but to leave the feature itself unimplemented. This might work for you too if you have the same problem.

I did put in -k and -v options to control which input columns are accumulated. These default to the first and second columns, naturally. Maybe this was a waste of time, since it occurs to me now that accumulate -k k -v v could be replaced by cut -fk,v | accumulate, if only cut didn't suck quite so badly. Of course one could use awk {print "$k $v" } | accumulate to escape cut's suckage. And some solution of this type obviates the need for accumulate's putative -F option also. Well, I digress.

The accumulate program itself reminds me of a much more ambitious project I worked on for a while between 1998 and 2001, as does the yucky line:

          push @{$K{$F[$opt{k}]}}, $F[$opt{v}];
The ambitious project was tentatively named "twingler".

Beginning Perl programmers often have trouble with compound data structures because Perl's syntax for the nested structures is so horrendous. Suppose, for example, that you have a reference to a two-dimensional array $aref, and you want to produce a hash, such that each value in the array appears as a key in the hash, associated with a list of strings in the form "m,n" indicating where in the array that value appeared. Well, of course it is obviously nothing more than:

        for my $a1 (0 .. $#$aref) {
          for my $a2 (0 .. $#{$aref->[$a1]}) {
            push @{$hash{$aref->[$a1][$a2]}}, "$a1,$a2";
          }
        }
Obviously. <sarcasm>Geez, a child could see that.</sarcasm>

The idea of twingler was that you would specify the transformation you wanted declaratively, and it would then write the appropriate Perl code to perform the transformation. The interesting part of this project is figuring out the language for specifying the transformation. It must be complex enough to be able to express most of the interesting transformations that people commonly want, but if it isn't at the same time much simpler than Perl itself, it isn't worth using. Nobody will see any point in learning a new declarative language for expressing Perl data transformations unless it is itself simpler to use than just writing the Perl would have been.

There are some hard problems here: What do people need? What subset of this can be expressed simply? How can we design a simple, limited language that people can use to express their needs? Can the language actually be compiled to Perl?

I had to face similar sorts of problems when I was writing linogram, but in the case of linogram I was more successful. I tinkered with twingler for some time and made several pages of (typed) notes but never came up with anything I was really happy with.

At one point I abandoned the idea of a declarative language, in favor of just having the program take a sample input and a corresponding sample output, and deduce the appropriate transformation from there. For example, you would put in:

        [ [ A, B ],
          [ C, B ],
          [ D, E ] ]
and
       { B => [A, C],
          E => [D],
        }
and it would generate:
       for my $a1 (@$input) {
          my ($e1, $e2) = @$a1;
          push @{$output{$e2}}, $e1;
        }
And then presumably you could eyeball this, and if what you really wanted was @{$a1}[0, -1] instead of @$a1 you could tinker it into the form you needed without too much extra trouble. This is much nicer from a user-experience point of view, but at the same time it seems more difficult to implement.

I had some ideas. One idea was to have it generate a bunch of expressions for mapping single elements from the input to the output, and then to try to unify those expressions. But as I said, I never did figure it out.

It's a shame, because it would have been pretty cool if I had gotten it to work.

The MIT CS grad students' handbook used to say something about how you always need to have several projects going on at once, because two-thirds of all research projects end in failure. The people you see who seem to have one success after another actually have three projects going on all the time, and you only see the successes. This is a nice example of that.


[Other articles in category /prog] permanent link

Tue, 18 Dec 2007

Happy birthday Perl!
In case you hadn't yet heard, today is the 20th anniversary of the first release of Perl.


[Other articles in category /anniversary] permanent link

Mon, 17 Dec 2007

Strangest Asian knockoff yet
A few years ago Lorrie and I had brunch at the very trendy Philadelphia restaurant "Striped Bass". I guess it wasn't too impressive, because usually if the food is really good I will remember what I ate, even years later, and I do not. But the plates were awesome. They were round, with an octagonal depression in the center, a rainbow-colored pattern around the edge, garnished with pictures of ivy leaves.

Good plates have the name of the maker on the back. These were made by Villeroy & Boch. Some time later, we visited the Villeroy & Boch outlet in Woodbury, New York, and I found the pattern I wanted, "Pasadena". The cool circular plates from Striped Bass were only for sale to restaurants, but the standard ones were octagonal, which is also pretty cool. So I bought a set. (57% off list price! Whee!)

(The picture gets much bigger when you click it.)

They no longer make these plates. If you broke one, and wanted a replacement, you could buy one online for $43.99. Ouch! But there is another option, if you are not too fussy.

Many years after I bought my dishes, I was shopping in one of the big Asian grocery stores on Washington Avenue. They have a kitchenware aisle. I found this plate:

The real VB plate is made of porcelain. The Washington Avenue knockoff is made of plastic.

Of course I bought it. It is hilarious! And it only cost two dollars.


[Other articles in category /food] permanent link

Wed, 12 Dec 2007

Rotten code in a ProFTPD plugin module
One of my work colleagues asked me to look at a piece of C source code today. He was tracking down a bug in the FTP server. He thought he had traced it to this spot, and wanted to know if I concurred and if I agreed with his suggested change.

Here's the (exceptionally putrid) (relevant portion of the) code:

static int gss_netio_write_cb(pr_netio_stream_t *nstrm, char *buf,size_t buflen) {

    int     count=0;
    int     total_count=0;        
    char    *p;

    OM_uint32   maj_stat, min_stat;
    OM_uint32   max_buf_size;

    ...
    /* max_buf_size = maximal input buffer size */
    p=buf;
    while ( buflen > total_count ) { 
        /* */ 
        if ( buflen - total_count > max_buf_size ) {
            if ((count = gss_write(nstrm,p,max_buf_size)) != max_buf_size )
                return -1;
        } else {
            if ((count = gss_write(nstrm,p,buflen-total_count)) != buflen-total_count )
                return -1;
        }       
        total_count = buflen - total_count > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
    }

    return buflen;  
}
(You know there's something wrong when the comment says "maximal input buffer size", but the buffer is for performing output. I have not looked at any of the other code in this module, which is 2,800 lines long, so I do not know if this chunk is typical.) Mr. Colleague suggested that p=p+total_count was wrong, and should be replaced with p=p+max_buf_size. I agreed that it was wrong, and that his change would fix the problem, although I suggested that p += count would be a better change. Mr. Colleague's change, although it would no longer manifest the bug, was still "wrong" in the sense that it would leave p pointing to a garbage location (and incidentally invokes behavior not defined by the C language standard) whereas my change would leave p pointing to the end of the buffer, as one would expect.

Since this is a maintenance programming task, I recommended that we not touch anything not directly related to fixing the bug at hand. But I couldn't stop myself from pointing out that the code here is remarkably badly written. Did I say "exceptionally putrid" yet? Oh, I did.

Good. It stinks like a week-old fish.

The first thing to notice is that the expression buflen - total_count appears four times in only nine lines of code—five if you count the buflen > total_count comparison. This strongly suggests that the algorithm would be more clearly expressed in terms of whatever buflen - total_count really is. Since buflen is the total number of characters to be written, and total_count is the number of characters that have been written, buflen - total_count is just the number of characters remaining. Rather than computing the same expression four times, we should rewrite the loop in terms of the number of characters remaining.

    size_t left_to_write = buflen;
    while ( left_to_write > 0 ) { 
        /* */ 
        if ( left_to_write > max_buf_size ) {
            if ((count = gss_write(nstrm,p,max_buf_size)) != max_buf_size )
                return -1;
        } else {
            if ((count = gss_write(nstrm,p,left_to_write)) != left_to_write )
                return -1;
        }       
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
        left_to_write -= count;
    }
Now we should notice that the two calls to gss_write are almost exactly the same. Duplicated code like this can almost always be eliminated, and eliminating it almost always produces a favorable result. In this case, it's just a matter of introducing an auxiliary variable to record the amount that should be written:

    size_t left_to_write = buflen, write_size;
    while ( left_to_write > 0 ) { 
        write_size = left_to_write > max_buf_size ? max_buf_size : left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
        left_to_write -= count;
    }
At this point we can see that write_size is going to be max_buf_size for every write except possibly the last one, so we can simplify the logic the maintains it:

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
        left_to_write -= count;
    }
Even if we weren't here to fix a bug, we might notice something fishy: left_to_write is being decremented by count, but p, the buffer position, is being incremented by total_count instead. In fact, this is exactly the bug that was discovered by Mr. Colleague. Let's fix it:

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p += count;
        left_to_write -= count;
    }
We could fix up the line the maintains the total_count variable so that it would be correct, but since total_count isn't used anywhere else, let's just delete it.

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        p += count;
        left_to_write -= count;
    }
Finally, if we change the != write_size test to < 0, the function will correctly handle partial writes, should gss_write be modified in the future to perform them:
    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) < 0 )
                return -1;
        p += count;
        left_to_write -= count;
    }
We could trim one more line of code and one more state change by eliminating the modification of p:

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p+buflen-left_to_write,write_size)) < 0 )
                return -1;
        left_to_write -= count;
    }
I'm not sure I think that is an improvement. (My idea is that if we do this, it would be better to create a p_end variable up front, set to p+buflen, and then use p_end - left_to_write in place of p+buflen-left_to_write. But that adds back another variable, although it's a constant one, and the backward logic in the calculation might be more confusing than the thing we were replacing. Like I said, I'm not sure. What do you think?)

Anyway, I am sure that the final code is a big improvement on the original in every way. It has fewer bugs, both active and latent. It has the same number of variables. It has six lines of logic instead of eight, and they are simpler lines. I suspect that it will be a bit more efficient, since it's doing the same thing in the same way but without the redundant computations, although you never know what the compiler will be able to optimize away.

Right now I'm engaged in writing a book about this sort of cleanup and renovation for Perl programs. I've long suspected that the same sort of processes could be applied to C programs, but this is the first time I've actually done it.

Order
Advanced Unix Programming
Advanced Unix Programming
with kickback
no kickback
The funny thing about this code is that it's performing a task that I thought every C programmer would already have known how to do: block-writing of a bufferfull of data. Examples of the right way to do this are all over the place. I first saw it done in Marc J. Rochkind's superb book Advanced Unix Programming around 1989. (I learned from the first edition, but the link to the right is for the much-expanded second edition that came out in 2004.) I'm sure it must pop up all over the Stevens books.

But the really exciting thing I've learned about code like this is that it doesn't matter if you don't already know how to do it right, because you can turn the wrong code into the right code, as we did here, by noticing a few common problems, like duplicate tests and repeated subexpressions, and applying a few simple refactorizations to get rid of them. That's what my book will be about.

(I am also very pleased that it has taken me 37 blog entries to work around to discussing any programming-related matters.)


[Other articles in category /prog] permanent link

Really real examples of HOP techniques in action
I recently stopped working for the University of Pennsylvannia's Informations Systems and Computing group, which is the organization that provides computer services to everyone on campus that doesn't provide it for themselves.

I used HOP stuff less than I might have if I hadn't written the HOP book myself. There's always a tradeoff with the use of any advanced techniques: it might provide some technical benefit, like making the source code smaller, but the drawback is that the other people you work with might not be able to maintain it. Since I'm the author of the book, I can be expected to be biased in favor of the techniques. So I tried to compensate the other way, and to use them only when I was absolutely sure it was the best thing to do.

There were two interesting uses of HOP techniques. One was in the username generator for new accounts. The other was in a generic server module I wrote.

Name generation

The name generator is used to offer account names to incoming students and faculty. It is given the user's full name, and optionally some additional information of the same sort. It then generates a bunch of usernames to offer the user. For example, if the user's name is "George Franklin Bauer, Jr.", it might generate usernames like:

        george    bauer     georgef   fgeorge   fbauer    bauerf
        gf        georgeb   fg        fb        bauerg    bf
        georgefb  georgebf  fgeorgeb  fbauerg   bauergf   bauerfg
        ge        ba        gef       gbauer    fge       fba
        bgeorge   baf       gfbauer   gbauerf   fgbauer   fbgeorge
        bgeorgef  bfgeorge  geo       bau       geof      georgeba
        fgeo      fbau      bauerge   bauf      fbauerge  bauergef
        bauerfge  geor      baue      georf     gb        fgeor
        fbaue     bg        bauef     gfb       gbf       fgb
        fbg       bgf       bfg       georg     georgf    gebauer
        fgeorg    bageorge  gefbauer  gebauerf  fgebauer
The code that did this, before I got to it, was extremely long and convoluted. It was also extremely slow. It would generate a zillion names (slowly) and then truncate the list to the required length.

It was convoluted because people kept asking that the generation algorithm be tweaked in various ways. Each tweak was accompanied by someone hacking on the code to get it to do things a little differently.

I threw it all away and replaced it with a lazy generator based on the lazy stream stuff of Chapter 6. The underlying stream library was basically the same as the one in Chapter 6. Atop this, I built some functions that generated streams of names. For example, one requirement was that if the name generator ran out of names like the examples above, it should proceed by generating names that ended with digits. So:

        sub suffix {
          my ($s, $suffix) = @_;
          smap { "$_$suffix" } $s;          
        }       

        # Given (a, b, c), produce a1, b1, c1, a2, b2, c2, a3...
        sub enumerate {
          my $s = shift;
          lazyappend(smap { suffix($s, $_) } iota());
        }

        # Given (a, b, c), produce a, b, c, a1, b1, c1, a2, b2, c2, a3...
        sub and_enumerate {
          my $s = shift;
          append($s, enumerate($s));
        }

        # Throw away names that are already used
        sub available_filter {
          my ($s, $pn) = @_;
          $pn ||= PennNames::Generate::InUse->new;
          sgrep { $pn->available($_) } $s;
        }
The use of the stream approach was strongly indicated here for two reasons. First, the number of names to generate wasn't known in advance. It was convenient for the generation module to pass back a data structure that encapsulated an unlimited number of names, and let the caller mine it for as many names as were necessary.

Second, the frequent changes and tinkerings to the name generation algorithm in the past suggested that an extremely modular approach would be a benefit. In fact, the requirements for the generation algorithm chanced several times as I was writing the code, and the stream approach made it really easy to tinker with the order in which names were generated, by plugging together the prefabricated stream modules.

Generic server

For a different project, I wrote a generic forking server module. The module would manage a listening socket. When a new connection was made to the socket, the module would fork. The parent would go back to listening; the child would execute a callback function, and exit when the callback returned.

The callback was responsible for communicating with the client. It was passed the client socket:

        sub child_callback {
          my $socket = shift;
          # ... read and write the socket ...
          return;   # child process exits
        }
But typically, you don't want to have to manage the socket manually. For example, the protocol might be conversational: read a request from the client, reply to it, and so forth:

        # typical client callback:
        sub child_callback {
          my $socket = shift; 
          while (my $request = <$socket>) {
            # generate response to request
            print $socket $response;
          }
        }
The code to handle the loop and the reading and writing was nontrivial, but was going to be the same for most client functions. So I provided a callback generator. The input to the callback generator is a function that takes requests and returns appropriate responses:

        sub child_behavior {
          my $request = shift;
          if ($request =~ /^LOOKUP (\w+)/) {
            my $input = $1;
            if (my $result = lookup($input)) {
              return "OK $input $result";
            } else {
              return "NOK $input";
            }
          } elsif ($request =~ /^QUIT/) {
            return;
          } elsif ($request =~ /^LIST/) {
            my $N = my @N = all_names();
            return join "\n", "OK $N", @N, ".";
          } else {
            return "HUH?";
          }
        }
This child_behavior function is not suitable as a callback, because the argument to the callback is the socket handle. But the child_behavior function can be turned into a callback:

        $server->run(CALLBACK => make_callback(\&child_behavior));
make_callback() takes a function like child_behavior() and wraps it up in an I/O loop to turn it into a callback function. make_callback() looks something like this:

        sub make_callback {
          my $behavior = shift;
          return sub {
            my $socket = shift;
            while (my $request = <$socket>) {
              chomp $request;
              my $response = $behavior->($request);
              return unless defined $response;
              print $socket $response;
            }
          };
        }
I think this was the right design; it kept the design modular and flexible, but also simple.


[Other articles in category /prog] permanent link

Tue, 11 Dec 2007

More notes on power series
It seems I wasn't done thinking about this. I pointed out in yesterday's article that, having defined the cosine function as:

     coss = zipWith (*) (cycle [1,0,-1,0]) (map ((1/) . fact) [0..])
one has the choice to define the sine function analogously:

     sins = zipWith (*) (cycle [0,1,0,-1]) (map ((1/) . fact) [0..])
or in a totally different way, by reference to cosine:

     sins = (srt . (add one) . neg . sqr) coss
Here is a third way. Sine and cosine are solutions of the differential equation f = -f''. Since I now have enough infrastructure to get Haskell to solve differential equations, I can use this to define sine and cosine:

    solution_of_equation f0 f1 = func
        where func = int f0 (int f1 (neg func))
 
    sins = solution_of_equation 0 1
    coss = solution_of_equation 1 0
The constants f0 and f1 specify the initial conditions of the differential equation, values for f(0) and f'(0), respectively.

Well, that was fun.

One problem with the power series approach is that the answer you get is not usually in a recognizable form. If what you get out is

     [1.0,0.0,-0.5,0.0,0.0416666666666667,0.0,-0.00138888888888889,0.0,2.48015873015873e-05,0.0,...]
then you might recognize it as the cosine function. But last night I couldn't sleep because I was wondering about the equation f·f' = 1, so I got up and put it in, and out came:

     [1.0,1.0,-0.5,0.5,-0.625,0.875,-1.3125,2.0625,-3.3515625,5.5859375,-9.49609375,16.40234375,...]
Okay, now what? Is this something familiar? I'm wasn't sure. One thing that might help a bit is to get the program to disgorge rational numbers rather than floating-point numbers. But even that won't completely solve the problem.

One thing I was thinking about in the shower is doing Fourier analysis; this should at least identify the functions that are sinusoidal. Suppose that we know (or believe, or hope) that some power series a1x + a3x3 + ... actually has the form c1 sin x + c2 sin 2x + c3 sin 3x + ... . Then we can guess the values of the ci by solving a system of n equations of the form:

$$\sum_{i=1}^n i^kc_i = k!a_k\qquad{\hbox{($k$ from 1 to $n$)}}$$

And one ought to be able to do something analogous, and more general, by including the cosine terms as well. I haven't tried it, but it seems like it might work.

But what about more general cases? I have no idea. f you have the happy inspiration to square the mystery power series above, you get [1, 2, 0, 0, 0, ...], so it is √(2x+1), but what if you're not so lucky? I wasn't; I solved it by a variation of Gareth McCaughan's method of a few days ago: f·f' is the derivative of f2/2, so integrate both sides of f·f' = 1, getting f2/2 = x + C, and so f = √(2x + C). Only after I had solved the equation this way did I try squaring the power series, and see that it was simple.

I'll keep thinking.


[Other articles in category /math] permanent link

Lazy square roots of power series return
In an earlier article I talked about wanting to use lazy streams to calculate the power series expansion of the solution of this differential equation:

$$(f(x))^2 + (f'(x))^2 = 1$$

To do that I decided I would need a function to calculate the square root of a power series, which I did figure out; it's in the earlier article. But then I got distracted with other issues, and then folks wrote to me with several ways to solve the differential equation, and I spent a lot of time writing that up, and I didn't get back to the original problem until today, when I had to attend the weekly staff meeting. I get a lot of math work done during that meeting.

At least one person wrote to ask me for the Haskell code for the power series calculations, so here's that first off.

A power series a0 + a1x + a2x2 + a3x3 + ... is represented as a (probably infinite) list of numbers [a0, a1, a2, ...]. If the list is finite, the missing terms are assumed to be all 0.

The following operators perform arithmetic on functions:

	-- add functions a and b
	add [] b = b
	add a [] = a
	add (a:a') (b:b') = (a+b) : add a' b'

	-- multiply functions a and b
	mul [] _ = []
	mul _ [] = []
	mul (a:a') (b:b') = (a*b) : add (add (scale a b')
					     (scale b a'))
					(0 : mul a' b')

	-- termwise multiplication of two series
        mul2 = zipWith (*)

        -- multiply constant a by function b
    	scale a b = mul2 (cycle [a]) b
	neg a = scale (-1) a
And there are a bunch of other useful utilities:

	-- 0, 1, 2, 3, ...
	iota = 0 : zipWith (+) (cycle [1]) iota
	-- 1, 1/2, 1/3, 1/4, ...
	iotaR = map (1/) (tail iota)

	-- derivative of function a
	deriv a = tail (mul2 iota a)

	-- integral of function a
	-- c is the constant of integration
	int c a = c : (mul2 iotaR a)

	-- square of function f
	sqr f = mul f f

	-- constant function
	con c = c : cycle [0]
	one = con 1
Order
Structure and Interpretation of Computer Programs
Structure and Interpretation of Computer Programs
with kickback
no kickback
The really interesting operators perform division and evolve square roots of functions. I discussed how these work in the earlier article. The reciprocal operation is well-known; it appears in Structure and Interpretation of Computer Programs, Higher-Order Perl, and I presume elsewhere. I haven't seen the square root extractor anywhere else, but I'm sure that's just because I haven't looked.

	-- reciprocal of argument function
	inv (s0:st) = r
	  where r = r0 : scale (negate r0) (mul r st)
		r0 = 1/s0

	-- divide function a by function b
	squot a b = mul a (inv b)

	-- square root of argument function
	srt (s0:s) = r 
	   where r = r0 : (squot s (add [r0] r))
		 r0 = sqrt(s0)

We can define the cosine function as follows:

	coss = zipWith (*) (cycle [1,0,-1,0]) (map ((1/) . fact) [0..])
We could define the sine function analogously, or we can say that sin(x) = √(1 - cos2(x)):

	sins = (srt . (add one) . neg . sqr) coss
This works fine.

Order
How to Solve It
How to Solve It
with kickback
no kickback
Okay, so as usual that is not what I wanted to talk about; I wanted to show how to solve the differential equation. I found I was getting myself confused, so I decided to try to solve a simpler differential equation first. (Pólya says: "Can you solve a simpler problem of the same type?" Pólya is a smart guy. When the voice talking in your head is Pólya's, you better pay attention.) The simplest relevant differential equation seemed to be f = f'. The first thing I tried was observing that for all f, f = f0 : mul2 iotaR f'. This yields the code:

     f = f0 : mul2 iotaR (deriv f)
This holds for any function, and so it's unsolvable. But if you combine it with the differential equation, which says that f = f', you get:

     f = f0 : mul2 iotaR f
       where f0 = 1   -- or whatever the initial conditions dictate
and in fact this works just fine. And then you can observe that this is just the definition of int; replacing the definition with the name, we have:

     f = int f0 f
       where f0 = 1   -- or whatever
This runs too, and calculates the power series for the exponential function, as it should. It's also transparently obvious, and makes me wonder why it took me so long to find. But I was looking for solutions of the form:

     f = deriv f
which Haskell can't figure out. It's funny that it only handles differential equations when they're expressed as integral equations. I need to meditate on that some more.

It occurs to me just now that the f = f0 : mul2 iotaR (deriv f) identity above just says that the integral and derivative operators are inverses. These things are always so simple in hindsight.

Anyway, moving along, back to the original problem, instead of f = f', I want f2 + (f')2 = 1, or equivalently f' = √(1 - f2). So I take the derivative-integral identity as before:

     f = int f0 (deriv f)
and put in √(1 - f2) for deriv f:

     f = int f0 ((srt . (add one) . neg . sqr) f)
       where f0 = sqrt 0.5   -- or whatever
And now I am done; Haskell cheerfully generates the power series expansion for f for any given initial condition. (The parameter f0 is precisely the desired value of f(0).) For example, when f(0) = √(1/2), as above, the calculated terms show the function to be exactly √(1/2)·(sin(x) + cos(x)); when f(0) = 0, the output terms are exactly those of sin(x). When f(0) = 1, the output blows up and comes out as [1, 0, NaN, NaN, ...]. I'm not quite sure why yet, but I suspect it has something to do with there being two different solutions that both have f(0) = 1.

Order
Higher-Order Perl
Higher-Order Perl
with kickback
no kickback
All of this also works just fine in Perl, if you build a suitable lazy-list library; see chapter 6 of HOP for complete details. Sample code is here. For a Scheme implementation, see SICP. For a Java, Common Lisp, Python, Ruby, or SML implementation, do the obvious thing.

But anyway, it does work, and I thought it might be nice to blog about something I actually pursued to completion for a change. Also I was afraid that after my week of posts about Perl syntax, differential equations, electromagnetism, Unix kernel internals, and paint chips in the shape of Austria, the readers of Planet Haskell, where my blog has recently been syndicated, were going to storm my house with torches and pitchforks. This article should mollify them for a time, I hope.

[ Addendum 20071211: Some additional notes about this. ]



[Other articles in category /math] permanent link

Sun, 09 Dec 2007

Four ways to solve a nonlinear differential equation
In a recent article I mentioned the differential equation:

$$(f(x))^2 + \left(df(x)\over dx\right)^2 = 1$$

which I was trying to solve by various methods. The article was actually about calculating square roots of power series; I got sidetracked on this. Before I got back to the original equation, twofour readers of this blog had written in with solutions, all different.

I got interested in this a few weeks ago when I was sitting in on a freshman physics lecture at Penn. I took pretty much the same class when I was a freshman, but I've never felt like I really understood physics. Sitting in freshman physics class again confirms this. Every time I go to a class, I come out with bigger questions than I went in.

The instructor was talking about LC circuits, which are simple circuits with a capacitor (that's the "C") and an inductor (that's the "L", although I don't know why). The physics people claim that in such a circuit the capacitor charges up, and then discharges again, repeatedly. When one plate of the capacitor is full of electrons, the electrons want to come out, and equalize the charge on the plates, and so they make a current flowing from the negative to the positive plate. Without the inductor, the current would fall off exponentially, as the charge on the plates equalized. Eventually the two plates would be equally charged and nothing more would happen.

But the inductor generates an electromotive force that tends to resist any change in the current through it, so the decreasing current in the inductor creates a force that tends to keep the electrons moving anyway, and this means that the (formerly) positive plate of the capacitor gets extra electrons stuffed into it. As the charge on this plate becomes increasingly negative, it tends to oppose the incoming current even more, and the current does eventually come to a halt. But by that time a whole lot of electrons have moved from the negative to the positive plate, so that the positive plate has become negative and the negative plate positive. Then the electrons come out of the newly-negative plate and the whole thing starts over again in reverse.

In practice, of course, all the components offer some resistance to the current, so some of the energy is dissipated as heat, and eventually the electrons stop moving back and forth.

Anyway, the current is nothing more nor less than the motion of the electrons, and so it is proportional to the derivative of the charge in the capacitor. Because to say that current is flowing is exactly the same as saying that the charge in the capacitor is changing. And the magnetic flux in the inductor is proportional to rate of change of the current flowing through it, by Maxwell's laws or something.

The amount of energy in the whole system is the sum of the energy stored in the capacitor and the energy stored in the magnetic field of the inductor. The former turns out to be proportional to the square of the charge in the capacitor, and the latter to the square of the current. The law of conservation of energy says that this sum must be constant. Letting f(t) be the charge at time t, then df/dt is the current, and (adopting suitable units) one has:

$$(f(x))^2 + \left(df(x)\over dx\right)^2 = 1$$

which is the equation I was considering.

Anyway, the reason for this article is mainly that I wanted to talk about the different methods of solution, which were all quite different from each other. Isabel Lugo went ahead with the power series approach I was using. Say that:

 $$ \halign{\hfil $\displaystyle #$&$\displaystyle= #$\hfil\cr f & \sum_{i=0}^\infty a_{i}x^{i} \cr f' & \sum_{i=0}^\infty (i+1)a_{i+1}x^{i} \cr } $$

Then:

 $$ \halign{\hfil $\displaystyle #$&$\displaystyle= #$\hfil\cr f^2 & \sum_{i=0}^\infty \sum_{j=0}^{i} a_{i-j} a_j x^{i} \cr (f')^2 & \sum_{i=0}^\infty \sum_{j=0}^{i} (i-j+1)a_{i-j+1}(j+1)a_{j+1} x^{i} \cr } $$

And we want the sum of these two to be equal to 1.

Equating coefficients on both sides of the equation gives us the following equations:

$a_0^2 + a_1^2$ = 1
$2a_0a_1 + 4a_1a_2$ = 0
$2a_0a_2 + a_1^2 + 6a_1a_3 + 4a_2^2$ = 0
$2a_0a_3 + 2a_1a_2 + 8a_1a_4 + 12a_2a_3$ = 0
$2a_0a_4 + 2a_1a_3 + a_2^2 + 10a_1a_5 + 16a_2a_4 + 9a_3^2$ = 0
...
Now here's the thing M. Lugo noticed that I didn't. You can separate the terms involving even subscripts from those involving odd subscripts. Suppose that a0 and a1 are both nonzero. The polynomial from the second line of the table, 2a0a1 + 4a1a2, factors as 2a1(a0 + 2a2), and one of these factors must be zero, so we immediately have a2 = -a0/2.

Now take the next line from the table, 2a0a2 + a12 + 6a1a3 + 4a22. This can be separated into the form 2a2(a0 + 2a2) + a1(a1 + 6a3). The left-hand term is zero, by the previous paragraph, and since the whole thing equals zero, we have a3 = -a1/6.

Continuing in this way, we can conclude that a0 = -2!a2 = 4!a4 = -6!a6 = ..., and that a1 = -3!a3 = 5!a5 = ... . These should look familiar from first-year calculus, and together they imply that f(x) = a0 cos(x) + a1 sin(x), where (according to the first line of the table) a02 + a12 = 1. And that is the complete solution of the equation, except for the case we omitted, when either a0 or a1 is zero; these give the trivial solutions f(x) = ±1.

Okay, that was a lot of algebra grinding, and if you're not as clever as M. Lugo, you might not notice that the even terms of the series depend only on a0 and the odd terms only on a1; I didn't. I thought they were all mixed together, which is why I alluded to "a bunch of not-so-obvious solutions" in the earlier article. Is there a simpler way to get the answer?

Gareth McCaughan wrote to me to point out a really clever trick that solves the equation right off. Take the derivative of both sides of the equation; you immediately get 2ff' + 2f'f'' = 0, or, factoring out f', f'(f + f'') = 0. So there are two solutions: either f'=0 and f is a constant function, or f + f'' = 0, which even the electrical engineers know how to solve.

David Speyer showed a third solution that seems midway between the two in the amount of clever trickery required. He rewrote the equation as:

$${df\over dx} = \sqrt{1 - f^2}$$

$${df\over\sqrt{1 - f^2} } = dx$$

The left side is an old standby of calculus I classes; it's the derivative of the arcsine function. On integrating both sides, we have:

$$\arcsin f = x + C$$

so f = sin(x + C). This is equivalent to the a0 cos(x) + a1 sin(x) form that we got before, by an application of the sum-of-angles formula for the sine function. I think M. McCaughan's solution is slicker, but M. Speyer's is the only one that I feel like I should have noticed myself.

Finally, Walt Mankowski wrote to tell me that he had put the question to Maple, which disgorged the following solution after a few seconds:

  f(x) = 1, f(x) = -1, f(x) = sin(x - _C1), f(x) = -sin(x - _C1).
This is correct, except that the appearance of both sin(x + C) and -sin(x + C) is a bit odd, since -sin(x + C) = sin(x + (C + π)). It seems that Maple wasn't clever enough to notice that. Walt says he will ask around and see if he can find someone who knows what Maple did to get the solution here.

I would like to add a pithy and insightful conclusion to this article, but I've been working on it for more than a week now, and also it's almost lunch time, so I think I'll have to settle for observing that sometimes there are a lot of ways to solve math problems.

Thanks again to everyone who wrote in about this.


[Other articles in category /math] permanent link

19th-century elementary arithmetic
In grade school I read a delightful story, by C. A. Stephens, called The Jonah. In the story, which takes place in 1867, Grandma and Grandpa are away for the weekend, leaving the kids alone on the farm. The girls make fried pies for lunch.

They have a tradition that one or two of the pies are "Jonahs": they look the same on the outside, but instead of being filled with fruit, they are filled with something you don't want to eat, in this case a mixture of bran and cayenne pepper. If you get the Jonah pie, you must either eat the whole thing, or crawl under the table to be a footstool for the rest of the meal.

Just as they are about to serve, a stranger knocks at the door. He is an old friend of Grandpa's. They invite him to lunch, of course removing the Jonahs from the platter. But he insists that they be put back, and he gets the Jonah, and crawls under the table, marching it around the dining room on his back. The ice is broken, and the rest of the afternoon is filled with laughter and stories.

Later on, when the grandparents return, the kids learn that the elderly visitor was none other than Hannibal Hamlin, formerly Vice-President of the United States.

A few years ago I tried to track this down, and thanks to the Wonders of the Internet, I was successful. Then this month I had the library get me some other C. A. Stephens stories, and they were equally delightful and amusing.

In one of these, the narrator leaves the pump full of water overnight, and the pipe freezes solid. He then has to carry water for forty head of cattle, in buckets from the kitchen, in sub-freezing weather. He does eventually manage to thaw the pipe. But why did he forget in the first place? Because of fractions:

I had been in a kind of haze all day over two hard examples in complex fractions at school. One of them I still remember distinctly:

 $${7\over8} \; {\rm of} \; {60 {5\over10} \over 10 {3\over8}} \; {\rm of} \; {8\over 5} \; \div \; 8{68\over 415} = {\rm What?}$$

At that point I had to stop reading and calculate the answer, and I recommend that you do the same.

I got the answer wrong, by the way. I got 25/64 or 64/25 or something of the sort, which suggests that I flipped over an 8/5 somewhere, because the correct answer is exactly 1. At first I hoped perhaps there was some 19th-century precedence convention I was getting wrong, but no, it was nothing like that. The precedence in this problem is unambiguous. I just screwed up.

Entirely coincidentally (I was investigating the spelling of the word "canceling") I also recently downloaded (from Google Books) an arithmetic text from the same period, The National Arithmetic, on the Inductive System, by Benjamin Greenleaf, 1866. Here are a few typical examples:

  1. If 7/8 of a bushel of corn cost 63 cents, what cost a bushel? What cost 15 bushels?

  2. When 14 7/8 tons of copperas are sold for $500, what is the value of 1 ton? what is the value of 9 11/12 tons?

  3. If a man by laboring 15 hours a day, in 6 days can perform a certain piece of work, how many days would it require to do the same work by laboring 10 hours a day?

  4. Bought 87 3/7 yards of broadcloth for $612; what was the value for 14 7/10 yards?

  5. If a horse eat 19 3/7 bushels of oats in 87 3/7 days, how many will 7 horses eat in 60 days?

Some of these are rather easy, but others are a long slog. For example, #1 and #3 here (actually #1 and #25 in the book) can be solved right off, without paper. But probably very few people have enough skill at mental arithmetic to carry off $612/(83 3/7) * (14 7/10) in their heads.

The "complex fractions" section, which the original problem would have fallen under, had it been from the same book, includes problems like this: "Add 1/9, 2 5/8, 45/(94 7/11), and (47 5/9)/(314 3/5) together." Such exercises have gone out of style, I think.

In addition to the complicated mechanical examples, there is some good theory in the book. For example, pages 227–229 concern continued fraction expansions of rational numbers, as a tool for calculating simple rational approximations of rationals. Pages 417–423 concern radix-n numerals, with special attention given to the duodecimal system. A typical problem is "How many square feet in a floor 48 feet 6 inches long, and 24 feet 3 inches broad?" The remarkable thing here is that the answer is given in the form 1176 sq. feet. 1' 6'', where the 1' 6'' actually means 1/12 + 6/144 square feet— that is, it is a base-12 "decimal".

I often hear people bemoaning the dumbing-down of the primary and secondary school mathematics curricula, and usually I laugh at those people, because (for example) I have read a whole stack of "College Algebra" books from the early 20th century, which deal in material that is usually taken care of in 10th and 11th grades now. But I think these 19th-century arithmetics must form some part of an argument in the other direction.

On the other hand, those same people often complain that students' time is wasted by a lot of "new math" nonsense like base-12 arithmetic, and that we should go back to the tried and true methods of the good old days. I did not have an example in mind when I wrote this paragraph, but two minutes of Google searching turned up the following excellent example:

Most forms of life develop random growths which are best pruned off. In plants they are boles and suckerwood. In humans they are warts and tumors. In the educational system they are fashionable and transient theories of education created by a variety of human called, for example, "Professor Of The Teaching Of Mathematics."

When the Russians launched Sputnik these people came to the rescue of our nation; they leapfrogged the Russians by creating and imposing on our children the "New Math."

They had heard something about digital computers using base 2 arithmetic. They didn't know why, but clearly base 10 was old fashioned and base 2 was in. So they converted a large fraction of children's arithmetic education to learning how to calculate with any base number and to switch from base to base. But why, teacher? Because that is the modern way. No one knows how many potential engineers and scientists were permanently turned away by this inanity.

Fortunately this lunacy has now petered out.

(Smart Machines, by Lawrence J. Kamm; chapter 11, "Smart Machines in Education".)

Pages 417–423 of The National Arithmetic, with their problems on the conversion from base-6 to base-11 numerals, suggest that those people may not know what they are talking about.


[Other articles in category /math] permanent link

Sat, 08 Dec 2007

Corrections about sync(2)
I made some errors in today's post about sync and fsync.

Most important, I said that "the sync() system call marks all the kernel buffers as dirty". This is totally wrong, and doesn't even make sense. Dirty buffers are those with data that needs to be written out. Marking a non-dirty buffer as dirty is a waste of time, since nothing has changed in the buffer, but it will now be rewritten anyway. What sync() does is schedule all the dirty buffers to be written as soon as possible.

On some recent systems, sync() actually waits for all the dirty buffers to be written, and a bunch of people tried to correct me about this. But my original article was right: historically, it was not so, and even today it's not universally true. In former times, sync() would schedule the buffers for writing, and then return before the data was actually written.

I said that one of the duties of init was to call sync() every thirty seconds, but this was mistaken. That duty actually fell to a separate program, known as update. While discussing this with one of the readers who wrote to correct me, I looked up the source for Version 7 Unix, to make sure I was right, and it's so short I thought I might as well show it here:

        /*
         * Update the file system every 30 seconds.
         * For cache benefit, open certain system directories.
         */

        #include <signal.h>

        char *fillst[] = {
                "/bin",
                "/usr",
                "/usr/bin",
                0,
        };

        main()
        {
                char **f;

                if(fork())
                        exit(0);
                close(0);
                close(1);
                close(2);
                for(f = fillst; *f; f++)
                        open(*f, 0);
                dosync();
                for(;;)
                        pause();
        }

        dosync()
        {
                sync();
                signal(SIGALRM, dosync);
                alarm(30);
        }
The program is so simple I don't have much more to say about it. It initially invokes dosync(), which calls sync() and then schedules another call to dosync() in 30 seconds. Note that the 0 in the second argument to open had not yet been changed to O_RDONLY. The pause() call is equivalent to sleep(0): it causes the process to relinquish its time slice whenever it is active.

In various systems more recent than V7, the program was known by various names, but it was update for a very long time.

Several people wrote to correct me about the:

        # sync
        # sync
        # sync
        # halt
thing, some saying that I had the reason wrong, or that it did not make sense, or that only two syncs were used, rather than three. But I had it right. People did use three, and they did it for the reason I said, whether that makes sense or not. (Some of the people who miscorrected me were unaware that sync() would finish and exit before the data was actually written.) But for example, see this old Usenet thread for a discussion of the topic that confirms what I said.

Nobody disputed my contention that Linus was suffering from the promptings of the Evil One when he tried to change the semantics of fsync(), and nobody seems to know the proper name of the false god of false efficiency. I'll give this some thought and see what I can come up with.

Thanks to Tony Finch, Dmitry Kim, and Stefan O'Rear for discussion of these points.


[Other articles in category /Unix] permanent link

Dirty, dirty buffers!
One side issue that arose during my talk on Monday about inodes was the write-buffering normally done by Unix kernels. I wrote a pretty long note to the PLUG mailing list about it, and I thought I'd repost it here.

When your process asks the kernel to write data:

        int bytes_written = write(file_descriptor,
                                  buffer,
                                  n_bytes);
the kernel normally copies the data from your buffer into a kernel buffer, and then, instead of writing out the data to disk, it marks its buffer as "dirty" (that is, as needing to be written eventually), and reports success back to the process immediately, even though the dirty buffer has not yet been written, and the data is not yet on the disk.

Normally, the kernel writes out the dirty buffer in due time, and the data makes it to the disk, and you are happy because your process got to go ahead and do some more work without having to wait for the disk, which could take milliseconds. ("A long time", as I so quaintly called it in the talk.) If some other process reads the data before it is written, that is okay, because the kernel can give it the updated data out of the buffer.

But if there is a catastrophe, say a power failure, then you see the bad side of this asynchronous writing technique, because the data, which your process thought had been written, and which the kernel reported as having been written, has actually been lost.

There are a number of mechanisms in place to deal with this. The oldest is the sync() system call, which marks all the kernel buffers as dirty. All Unix systems run a program called init, and one of init's principal duties is to call sync() every thirty seconds or so, to make sure that the kernel buffers get flushed to disk at least every thirty seconds, and so that no crash will lose more than about thirty seconds' worth of data.

(There is also a command-line program sync which just does a sync() call and then exits, and old-time Unix sysadmins are in the habit of halting the system with:

        # sync
        # sync
        # sync
        # halt
because the second and third syncs give the kernel time to actually write out the buffers that were marked dirty by the first sync. Although I suspect that few of them know why they do this. I swear I am not making this up.)

But for really crucial data, sync() is not enough, because, although it marks the kernel buffers as dirty, it still does not actually write the data to the disk.

So there is also an fsync() call; I forget when this was introduced. The process gives fsync() a file descriptor, and the call demands that the kernel actually write the associated dirty buffers to disk, and does not return until they have been. And since, unlike write(), it actually waits for the data to go to the disk, a successful return from fsync() indicates that the data is truly safe.

The mail delivery agent will use this when it is writing your email to your mailbox, to make sure that no mail is lost.

Some systems have an O_SYNC flag than the process can supply when it opens the file for writing:

        int fd = open("blookus", O_WRONLY | O_SYNC);
This sets the O_SYNC flag in the kernel file pointer structure, which means that whenever data is written to this file pointer, the kernel, contrary to its usual practice, will implicitly fsync() the descriptor.

Well, that's not what I wanted to write about here. What I meant to discuss was...

No, wait. That is what I wanted to write about. How about that?

Anyway, there's an interesting question that arises in connection with fsync(): suppose you fsync() a file. That guarantees that the data will be written. But does it also guarantee that the mtime and the file extent of the file will be updated? That is, does it guarantee that the file's inode will be written?

On most systems, yes. But on some versions of Linux's ext2 filesystem, no. Linus himself broke this as a sacrifice to the false god of efficiency, a very bad decision in my opinion, for reasons that should be obvious to everyone but those in the thrall of Mammon. (Mammon's not right here. What is the proper name of the false god of efficiency?)

Sanity eventually prevailed. Recent versions of Linux have an fsync() call, which updates both the data and the inode, and a fdatasync() call, which only guarantees to update the data.

[ Addendum 20071208: Some of this is wrong. I posted corrections. ]


[Other articles in category /Unix] permanent link

Fri, 07 Dec 2007

Freshman electromagnetism questions
As I haven't quite managed to mention here before, I have occasionally been sitting in on one of Penn's first-year physics classes, about electricity and magnetism. I took pretty much the same class myself during my freshman year of college, so all the material is quite familiar to me.

But, as I keep saying here, I do not understand physics very well, and I don't know much about it. And every time I go to a freshman physics lecture I come out feeling like I understand it less than I went in.

I've started writing down my questions in class, even though I don't really have anyone to ask them to. (I don't want to take up the professor's time, since she presumably has her hands full taking care of the paying customers.) When I ask people I know who claim to understand physics, they usually can't give me plausible answers.

Maybe I should mutter something here under my breath about how mathematicians and mathematics students are expected to have a better grasp on fundamental matters.

The last time this came up for me I was trying to understand the phenomenon of dissolving. Specifically, why does it usually happen that substances usually dissolve faster and more thoroughly in warmer solutions than in cooler solutions? I asked a whole bunch of people about this, up to and including a full professor of physical chemistry, and never got a decent answer.

The most common answer, in fact, was incredibly crappy: "the warm solution has higher entropy". This is a virtus dormitiva if ever there was one. There's a scene in a play by Molière in which a candidate for a medical degree is asked by the examiners why opium puts people to sleep. His answer, which is applauded by the examiners, is that it puts people to sleep because it has a virtus dormitiva. That is, a sleep-producing power. Saying that warm solutions dissolve things better than cold ones because they have more entropy is not much better than saying that it is because they have a virtus dormitiva.

The entropy is not a real thing; it is a reification of the power that warmer substances have to (among other things) dissolve solutes more effectively than cooler ones. Whether you ascribe a higher entropy to the the warm solution, or a virtus dissolva to it, comes to the same thing, and explains nothing. I was somewhat disgusted that I kept getting this non-answer. (See my explanation of why we put salt on sidewalks when it snows to see what sort of answer I would have preferred. Probably there is some equally useless answer one could have given to that question in terms of entropy.)

(I have similar concerns about the notion of energy itself, which is central to physics, and yet seems to me to be another example of a false reification. There are dozens of apparently unrelated physical phenomena, which we throw into the same bin and call "energy". There are positions in gravitational and electric fields, linear motion, mass, rotation, heat, amplitude of waves, and so on, and all of these things seem to be interconvertible, more or less, and certain quantities of each can be converted into certain quantities of the others. But is there really any such thing as just plain energy, apart from its imagined association with these real phenomena? I think perhaps not. So energy is a very useful convenience in calculation, and I have no objection to it on that ground, but that does not mean that it is a real thing. Getting rid of it might lead to a clearer understanding of the phenomena it was intended to describe.

(Perhaps my position will seem less crackpottish if I a make an analogy with the concept of "center of gravity". In mechanics, many physical properties can be most easily understood in terms of the center of gravity of some object. For example, the gravitational effect of small objects far apart from one another can be conveniently approximated by supposing that all the mass of each object is concentrated at its center of gravity. A force on an object can be conveniently treated mathematically as a component acting toward the center of gravity, which tends to change the object's linear velocity, and a component acting perpendicular to that, which tends to change its angular velocity. But nobody ever makes the mistake of supposing that the center of gravity has any objective reality in the physical universe. Everyone understands that it is merely a mathematical fiction. I am considering the possibility that energy should be understood to be a mathematical fiction in the same sort of way. From the little I know about physics and physicists, it seems to me that physicists do not think of energy in this way. But I am really not sure.)

Anyway, none of this philosophizing is what I was hoping to discuss in this article. Today I wrote up some of the questions I jotted down in freshman physics class.

  1. What are the physical interpretations of μ0 and ε0, the magnetic permeability and electric permittivity of vacuum? Can these be directly measured? How?

  2. Consider a simple circuit with a battery, a switch, and a capacitor. When the switch is closed, the battery will suck electrons out of one plate of the capacitor and pump them into the other plate, so the capacitor will charge up.

    When we open the switch, the current will stop flowing, and the capacitor will stop charging up.

    But why? Suppose the switch is between the capacitor and the positive terminal of the battery. Then the negative terminal is still connected to the capacitor even when the switch is open. Why doesn't the negative terminal of the battery continue to pump electrons into the capacitor, continuing to charge it up, although perhaps less than it would be if the switch were closed?

  3. Any beam of light has a time-varying electric field, perpendicular to the direction that the light is travelling. If I shine a light on an electron, why doesn't the electron vibrate up and down in the varying electric field? Or does it?

    [ Addendum 20080629: I figured out the answer to this one. ]

  4. Suppose I take a beam of polarized light whose electric field is in the x direction. I split it in two, delay one of the beams by exactly half a wavelength, and merge it with the other beam. The electric fields are exactly out of phase and exactly cancel out. What happens? Where did the light go? What about conservation of energy?

  5. Suppose I have two beams of light whose wavelengths are close but not exactly the same, say λ and (λ+). I superimpose these. The electric fields will interfere, and sometimes will be in phase and sometimes out of phase. There will be regions where the electric field varies rapidly from the maximum to almost zero, of length on the order of . If I look at the beam of light only over one of these brief intervals, it should look just like very high frequency light of wavelength . But it doesn't. Or does it?

    [ Addendum 20091020: This is a dumb error, and I should have posted the correction long ago. ]

  6. An electron in a varying magnetic field experiences an electromotive force. In particular, an electron near a wire that carries a varying current will move around as the current in the wire varies.

    Now suppose we have one electron A in space near a wire. We will put a very small current into the wire for a moment; this causes electron A to move a little bit.

    Let's suppose that the current in the wire is as small as it can be. In fact, let's imagine that the wire is carrying precisely one electron, which we'll call B. We can calculate the amount of current we can attribute to the wire just from B. (Current in amperes is just coulombs per second, and the charge on electron B is some number of coulombs.) Then we can calculate the force on A as a result of this minimal current, and the motion of A that results.

    But we could also do the calculation another way ,by forgetting about the wire, and just saying that electron B is travelling through space, and exerts an electrostatic force on A, according to Coulomb's law. We could calculate the motion of A that results from this electrostatic force.

    We ought to get the same answer both ways. But do we?

  7. Suppose we have a beam of light that is travelling along the x axis, and the electric field is perpendicular to the x axis, say in the y direction. We learned in freshman physics how to calculate the vector quantity that represents the intensity of the electric field at every point on the x axis; that is, at every point of the form (x, 0, 0). But what is the electric field at (x, 1, 0)? How does the electric field vary throughout space? Presumably a beam of light of wavelength λ has a minimum diameter on the order of λ, but how how does the electric field vary as you move away from the core? Can you take two such minimum-diameter beams and overlap them partially?

I did ask #6 to the physics instructor, who is a full professor with a specialization in high energy theory; she did not know the answer.

[ Addendum 20090204: I eventually remembered that Noether's theorem has something to say about the necessity of the energy concept. ]


[Other articles in category /physics] permanent link

Thu, 06 Dec 2007

What's a File?
Almost every December since 2001 I have given a talk to the local Linux users' group on some aspect of Unix internals. My first talk was on the internals of the ext2 filesystem. This year I was under a lot of deadline pressure at work, so I decided I would give the 2001 talk again, maybe with a few revisions.

Actually I was under so much deadline pressure that I did not have time to revise the talk. I arrived at the user group meeting without a certain idea of what talk I was going to give.

Fortunately, the meeting structure is to have a Q&A and discussion period before the invited speaker gives his talk. The Q&A period always lasts about an hour. In that hour before I had to speak, I wrote a new talk called What's a File?. It mostly concerns the Unix "inode" structure, and what the kernel uses it for. It uses the output of the well-known ls -l command as a jumping-off point, since most of the ls -l information comes from the inode.

Then I talk about how files are opened and permissions are checked, how the filesystem is organized, how the kernel reads and writes data, how directories are structured, how it's possible to have one file with two names, how symbolic links work, and what that mysterious field is in the ls -l output between the permissions and the owner.

The talk was quite successful, much more so than I would have expected, given how quickly I wrote it and my complete inability to edit or revise it. Of course, it does help that I know this material backwards and forwards and standing on my head, and also that I could reuse all the diagrams and illustrations from the 2001 version of the talk.

I would not, however, recommend this technique.

As my talks have gotten better over the years, I find that less and less of the talk material is captured in the slides, and so the slides become less and less representative of the talk itself. But I put them online anyway, and here they are.

Here's a .tgz file in case you want to download it all at once.


[Other articles in category /Unix] permanent link

Wed, 05 Dec 2007

An Austrian coincidence
Only one of these depicts a location in my obstetrician's waiting room where the paint is chipped.

(Andy Lester recently referred to my blog as "the single most intelligent blog out there". I'll make you eat those words, Andy!)


[Other articles in category /misc] permanent link

Sat, 17 Nov 2007

Creeping featurism and the ratchet effect
"Creeping featurism" is a well-known phenomenon in the software world. It refers to the tendency of software to acquire more and more features, to the ultimate detriment of its usability. Software with more and more features is harder to learn to use; it's harder to document effectively. Perhaps most important, it is harder to maintain; the more complicated software is, the more likely it is to have bugs. Partly this is because the different features interact with one another in unanticipated ways; partly it is just that there is more stuff to spend the maintenance budget on.

But the concept of "creeping featurism" his wider applicability than just to program features. We can recognize it in other contexts.

For example, someone is reading the Perl manual. They read the section on the unpack function and they find it confusing. So they propose a documentation patch to add a couple of sentences, explicating the confusing point in more detail.

It seems like a good idea at the time. But if you do it over and over—and we have—you end up with a 2,000 page manual—and we did.

The real problem is that it's easy to see the benefit of any proposed addition. But it is much harder to see the cost of the proposed addition, that the manual is now 0.002% larger.

The benefit has a poster child, an obvious beneficiary. You can imagine a confused person in your head, someone who happens to be confused in exactly the right way, and who is miraculously helped out by the presence of the right two sentences in the exact right place.

The cost has no poster child. Or rather, the poster child is much harder to imagine. This is the person who is looking for something unrelated to the two-sentence addition. They are going to spend a certain amount of time looking for it. If the two-sentence addition hadn't been in there, they would have found what they were looking for. But the addition slowed them down just enough that they gave up without finding what they needed. Although you can grant that such a person might exist, they really aren't as compelling as the confused person who is magically assisted by timely advice.

Even harder to imagine is the person who's kinda confused, and for whom the extra two sentences, clarifying some obscure point about some feature he wasn't planning to use in the first place, are just more confusion. It's really hard to understand the cost of that.

But the benefit, such as it is, comes in one big lump, whereas the cost is distributed in tiny increments over a very large population. The benefit is clear, and the cost is obscure. It's easy to make a specific argument in favor of any particular addition ("people might be confused by X, so I'm going to explain it in more detail") and it's hard to make such an argument against the addition. And conversely: it's easy to make the argument that any particular bit of text should stay in, hard to argue that it should be removed.

As a result, there's what I call a "ratchet effect": you can make the manual bigger, one tiny notch at a time, and people do. But having done so, you can't make it smaller again; someone will object to almost any proposed deletion. The manual gets bigger and bigger, worse and worse organized, more and more unusable, until finally it collapses under its own weight and all you can do is start over again.

You see the same thing happen in software, of course. I maintain the Text::Template Perl module, and I frequently get messages from people saying that it should have some feature or other. And these people sometimes get quite angry when I tell them I'm not going to put in the feature they want. They're angry because it's easy to see the benefit of adding another feature, but hard to see the cost. "If other people don't like it," goes the argument, "they don't have to use it." True, but even if they don't use it, they still pay the costs of slightly longer download times, slightly longer compile times, a slightly longer and more confusing manual, slightly less frequent maintenance updates, slightly less prompt bug fix deliveries, and so on. It is so hard to make this argument, because the cost to any one person is so very small! But we all know where the software will end up if I don't make this argument every step of the way: on the slag heap.

This has been on my mind on and off for years. But I just ran into it in a new context.

Lately I've been working on a book about code style and refactoring in Perl. One thing you see a lot in Perl programs written by beginners is superfluous parentheses. For example:

                next if ($file =~ /^\./);
                next if !($file =~ (/[0-9]/));
                next if !($file =~ (/txt/));
Or:

        die $usage if ($#ARGV < 0);
There are a number of points I want to make about this. First, I'd like to express my sympathy for Perl programmers, because Perl has something like 95 different operators at something like 17 different levels of precedence, and so nobody knows what all the precedences are and whether parentheses are required in all circumstances. Does the ** operator have higher or lower precedence than the <<= operator? I really have no idea.

So the situation is impossible, at least in principle, and yet people have to deal with it somehow. But the advice you often hear is "if you're not sure of the precedence, just put in the parentheses." I think that's really bad advice. I think better advice would be "if you're not sure of the precedence, look it up."

Because Perl's Byzantine operator table is not responsible for all the problems. Notice in the examples above, which are real examples, taken from real code written by other people: Many of the parentheses there are entirely superfluous, and are not disambiguating the precedence of any operators. In particular, notice the inner parentheses in:

                next if !($file =~ (/txt/));
Inside the inner parentheses, there are no operators! So they cannot be disambiguating any precedence, and they are completely unnecessary:

                next if !($file =~ /txt/);
People sometimes say "well, I like to put them in anyway, just to be sure." This is pure superstition, and we should not tolerate it in people who purport to be engineers. Engineers should be capable of making informed choices, based on technical realities, not on some creepy feeling in their guts that perhaps a failure to sprinkle enough parentheses over their program will invite the wrath the Moon God.

By saying "if you're not sure, just avoid the problem" we are encouraging this kind of fearful, superstitious approach to the issue. That approach would be appropriate if it were the only way to deal with the issue, but fortunately it is not. There is a more rational approach: you can look it up, or even try an experiment, and then you will know whether the parentheses are required in a particular case. Then you can make an informed decision about whether to put them in.

But when I teach classes on this topic, people sometimes want to take the argument even further: they want to argue that even if you know the precedence, and even if you know that the parentheses are not required, you should put them in anyway, because the next person to see the code might not know that.

And there we see the creeping featurism argument again. It's easy to see the potential benefit of the superfluous parentheses: some hapless novice maintenance programmer might misunderstand the expression if I don't put them in. It's much harder to see the cost: The code is fractionally harder for everyone to read and understand, novice or not. And again, the cost of the extra parentheses to any particular person is so small, so very small, that it is really hard to make the argument against it convincingly. But I think the argument must be made, or else the code will end up on the slag heap much faster than it would have otherwise.

Programming cannot be run on the convoy system, with the program code written to address the most ignorant, uneducated programmer. I think you have to assume that the next maintenance programmer will be competent, and that if they do not know what the expression means, they will look up the operator precedence in the manual. That assumption may be false, of course; the world is full of incompetent programmers. But no amount of parentheses are really going to help this person anyway. And even if they were, you do not have to give in, you do not have to cater to incompetence. If an incompetent programmer has trouble understanding your code, that is not your fault; it is their fault for being incompetent. You do not have to take special steps to make your code understandable even by incompetents, and you certainly should not do so at the expense of making it harder for competent programmers to read and understand, no, not to the tiniest degree.

The advice that one should always put in the parentheses seems to me to be going in the wrong direction. We should be struggling for higher standards, both for ourselves and for our associates. The conventional advice, it seems to me, is to give up.


[Other articles in category /prog] permanent link

Wed, 07 Nov 2007

Lightweight Database Strategies for Perl
Several years ago I got what I thought was a great idea for a three-hour conference tutorial: lightweight data storage techniques. When you don't have enough data to be bothered using a high-performance database, or when your data is simple enough that you don't want to bother with a relational database, you stick it in a flat file and hack up some file code to read it. This is the sort of thing that people do all the time in Perl, and I thought it would be a big seller. I was wrong.

I don't know why. I tried giving the class a snappier title, but that didn't help. I'm really bad at titles. Maybe people are embarrassed to think about all the lightweight data storage hackery they do in Perl, and feel that they "should" be using a relational database, and don't want to commit more resources to lightweight database techniques. Or maybe they just don't think there is very much to know about it.

But there is a lot to know; with a little bit of technique you can postpone the day when you need to go to an RDB, often for quite a long time, and often forever. Many of the techniques fall into the why-didn't-I-think-of-that category, stuff that isn't too weird to write or maintain, but that you might not have thought to try.

I think it's a good class, but since it never sold well, I've decided it would do more good (for me and for everyone else) if I just gave away the materials for free.

Table of Contents

The class is in three sections. The first section is about using plain text files and talks about a bunch of useful techniques, such as how to do binary search on sorted text files (this is nontrivial) and how to replace records in-place, when they might not fit.

The second section is about the Tie::File module, which associates a flat text file with a Perl array.

The third section is about DBM files, with a comparison of the five major implementations. It finishes up with a discussion of some of Berkeley DB's lesser-known useful features, such as its DB_BTREE file type, which offers fast access like a hash but keeps the records in sorted order

  • Text Files
    • Rotating log file; deleting a user
    • Copy the File
      • -i.bak
      • Using -i inside a program
      • Problems with -i
      • Atomicity issues
    • Essential problem with files; fundamental operations; seeking
    • Sorted files
    • In-place modification of records
      • Overwriting records
      • Bytes vs. positions
      • Gappy Files
      • Fixed-length records
      • Numeric indices
      • Case study: lastlog
    • Indexing
      • Void fields
      • Generic text indices
      • Packed offsets
  • Tie::File
    • Tie::File Examples
    • delete_user revisited
    • uppercase_username revisited
    • Rotating log file revisited
    • Most important thing to know about Tie::File
    • Indexing with Tie::File
    • Tie::File Internals
      • Caching
      • Record modification
      • Immediate vs. Deferred Writing
      • Autodeferring
    • Miscellaneous Features
  • DBM
    • Common DBM Implementations
    • What DBM Does
    • Small DBMs: ODBM, NDBM, and SDBM
    • GDBM
    • DB_File
      • Indexing revisited
      • Ordered hashes
      • Partial matching
      • Sequential access
      • Multiple values
      • Filters
      • BerkeleyDB

Online materials


[Other articles in category /prog/perl] permanent link

Wed, 31 Oct 2007

Undefined behavior in Perl and other languages
Miles Gould wrote what I thought was an interesting article on implementation-defined languages, and cited Perl as an example. One of his points was that a language that is defined by its implementation, as Perl is, rather than by a standards document, cannot have any "undefined behavior".

Undefined behavior

For people unfamiliar with this concept, I should explain briefly. The C standard is full of places that say "if the program contains x, the behavior is undefined", which really means "C programs do not contain x, so If the program contains x, it is not written in C, and, as this standard only defines the meaning of programs in C, it has nothing to say about the meaning of your program." There are around a couple of hundred of these phrases, and a larger number of places where it is implied.

For example, everyone knows that it means when you write x = 4;, but what does it mean if you write 4 = x;? According to clause 6.3.2.1[#1], it means nothing, and this is not a C program. The non-guarantee in this case is extremely strong. The C compiler, upon encountering this locution, is allowed to abort and spontaneously erase all your files, and in doing so it is not violating the requirements of the standard, because the standard does not require any particular behavior in this case.

The memorable phrase that the comp.lang.c folks use is that using that construction might cause demons to fly out of your nose.

[ Addendum 20071030: I am informed that I misread the standard here, and that the behavior of this particular line is not undefined, but requires a compiler diagnostic. Perhaps a better example would have been x = *(char *)0. ]

I mentioned this in passing in one of my recent articles about a C program I wrote:

        unsigned strinc(char *s) 
        {
          char *p = strchr(s, '\0') - 1;
          while (p >= s && *p == 'A' + colors - 1) *p-- = 'A';
          if (p < s) return 0;
          (*p)++;
          return 1;
        }
Here the pointer p starts at the end of the string s, and the loop might stop when p points to the position just before s. Except no, that is forbidden, and the program might at that moment cause demons to fly out of your nose. You are allowed to have a pointer that points to the position just after an object, but not one that points just before.

Well anyway, I seem to have digressed. My point was that M. Gould says that one advantage of languages like Perl that are defined wholly by their (one) implementation is that you never have "undefined behavior". If you want to know what some locution does, you type it in and see what it does. Poof, instant definition.

Although I think this is a sound point, it occurred to me that that is not entirely correct. The manual is a specification of sorts, and even if the implementation does X in situation Y, the manual might say "The implementation does X in situation Y, but this is unsupported and may change without warning in the future." Then what you have is not so different from Y being undefined behavior. Because the manual is (presumably) a statement of official policy from the maintainers, and, as a communiqué from the people with the ultimate authority to define the future meaning of the language, it has some of the same status that a formal specification would.

Perl: the static variable hack

Such disclaimers do appear in the Perl documentation. Probably the most significant example of this is the static variable hack. For various implementation reasons, the locution my $static if 0 has a strange and interesting effect:

  sub foo {
    my $static = 42 if 0;
    print "static is now $static\n";
    $static++;
  }

  foo() for 1..5;
This makes $static behave as a "static" variable, and persist from call to call of foo(). Without the ... if 0, the code would print "static is now 42" five times. But with ... if 0, it prints:

        static is now 
        static is now 1
        static is now 2
        static is now 3
        static is now 4
This was never an intentional feature. It arose accidentally, and then people discovered it and started using it. Since the behavior was the result of a strange quirk of the implementation, caused by the surprising interaction of several internal details, it was officially decided by the support group that this behavior would not be supported in future versions. The manual was amended to say that this behavior was explicitly undefined, and might change in the future. It can be used in one-off programs, but not in any important program, one that might have a long life and need to be run under several different versions of Perl. Programs that use pointers that point outside the bounds of allocated storage in C are in a similar position. It might work on today's system, with today's compiler, today, but you can't do that in any larger context.

Having the "undefined behavior" be determined by the manual, instead of by a language standard, has its drawbacks. The language standard is fretted over by experts for months. When the C standard says that behavior is undefined, it is because someone like Clive Feather or Doug Gwyn or P.J. Plauger, someone who knows more about C than you ever will, knows that there is some machine somewhere on which the behavior is unsupported and unsupportable. When the Perl manual says that some behavior is undefined, you might be hearing from the Perl equivalent of Doug Gwyn, someone like Nick Clark or Chip Salzenberg or Gurusamy Sarathy. Or you might be hearing from a mere nervous-nellie who got their patch into the manual on a night when the release manager had stayed up too late.

Perl: modifying a hash in a loop

Here is an example of this that has bothered me for a long time. One can use the each() operator to loop lazily over the contents of a hash:

  while (my $key = each %hash) {
    # do something with $key and $hash{$key}
  }
What happens if you modify the hash in the middle of the loop? For various implementation reasons, the manual forbids this.

For example, suppose the loop code adds a new key to the hash. The hash might overflow as a result, and this would trigger a reorganization that would move everything around, destroying the ordering information. The subsequent calls to each() would continue from the same element of the hash, but in the new order, making it likely that the loop would visit some keys more than once, or some not at all. So the prohibition in that case makes sense: The each() operator normally guarantees to produce each key exactly once, and adding elements to a hash in the middle of the loop might cause that guarantee to be broken in an unpredictable way. Moreover, there is no obvious way to fix this without potentially wrecking the performance of hashes.

But the manual also forbids deleting keys inside the loop, and there the issue does not come up, because in Perl, hashes are never reorganized as the result of a deletion. The behavior is easily described: Deleting a key that has already been visited will not affect the each() loop, and deleting one that has not yet been visited will just cause it to be skipped when the time comes.

Some people might find this general case confusing, I suppose. But the following code also runs afoul of the "do not modify a hash inside of an each loop" prohibition, and I don't think anyone would find it confusing:

  while (my $key = each %hash) {
    delete $hash{$key} if is_bad($hash{$key});
  }
Here we want to delete all the bad items from the hash. We do this by scanning the hash and deleting the current item whenever it is bad. Since each key is deleted only after it is scanned by each, we should expect this to visit every key in the hash, as indeed it does. And this appears to be a useful thing to write. The only alternative is to make two passes, constructing a list of bad keys on the first pass, and deleting them on the second pass. The code would be more complicated and the time and memory performance would be much worse.

There is a potential implementation problem, though. The way that each() works is to take the current item and follow a "next" pointer from it to find the next item. (I am omitting some unimportant details here.) But if we have deleted the current item, the implementation cannot follow the "next" pointer. So what happens?

In fact, the implementation has always contained a bunch of code, written by Larry Wall, to ensure that deleting the current key will work properly, and that it will not spoil the each(). This is nontrivial. When you delete an item, the delete() operator looks to see if it is the current item of an each() loop, and if so, it marks the item with a special flag instead of deleting it. Later on, the next time each() is invoked, it sees the flag and deletes the item after following the "next" pointer.

So the implementation takes some pains to make this work. But someone came along later and forbade all modifications of a hash inside an each loop, throwing the baby out with the bathwater. Larry and perl paid a price for this feature, in performance and memory and code size, and I think it was a feature well bought. But then someone patched the manual and spoiled the value of the feature. (Some years later, I patched the manual again to add an exception for this case. Score!)

Perl: modifying an array in a loop

Another example is the question of what happens when you modify an array inside a loop over the array, as with:

  @a = (1..3);
  for (@a) {
    print;
    push @a, $_ + 3 if $_ % 2 == 1;
  }
(This prints 12346.) The internals are simple, and the semantics are well-defined by the implementation, and straightforward, but the manual has the heebie-jeebies about it, and most of the Perl community is extremely superstitious about this, claiming that it is "entirely unpredictable". I would like to support this with a quotation from the manual, but I can't find it in the enormous and disorganized mass that is the Perl documentation.

[ Addendum: Tom Boutell found it. The perlsyn page says "If any part of LIST is an array, foreach will get very confused if you add or remove elements within the loop body, for example with splice. So don't do that." ]

The behavior, for the record, is quite straightforward: On the first iteration, the loop processes the first element in the array. On the second iteration, the loop processes the second element in the array, whatever that element is at the time the second iteration starts, whether or not that was the second element before. On the third iteration, the loop processes the third element in the array, whatever it is at that moment. And so the loop continues, terminating the first time it is called upon to process an element that is past the end of the array. We might imagine the following pseudocode:

        index = 0;     
        while (index < array.length()) {
          process element array[index];
          index += 1;
        }
There is nothing subtle or difficult about this, and claims that the behavior is "entirely unpredictable" are probably superstitious confessions of ignorance and fear.

Let's try to predict the "entirely unpredictable" behavior of the example above:

  @a = (1..3);
  for (@a) {
    print;
    push @a, $_ + 3 if $_ % 2 == 1;
  }
Initially the array contains (1, 2, 3), and so the first iteration processes the first element, which is 1. This prints 1, and, since 1 is odd, pushes 4 onto the end of the array.

The array now contains (1, 2, 3, 4), and the loop processes the second element, which is 2. 2 is printed. The loop then processes the third element, printing 3 and pushing 6 onto the end. The array now contains (1, 2, 3, 4, 6).

On the fourth iteration, the fourth element (4) is printed, and on the fifth iteration, the fifth element (6) is printed. That is the last element, so the loop is finished. What was so hard about that?

Haskell: n+k patterns

My blog was recently inserted into the feed for planet.haskell.org, and of course I immediately started my first streak of posting code-heavy articles about C and Perl. This is distressing not just because the articles were off-topic for Planet Haskell—I wouldn't give the matter two thoughts if I were posting my usual mix of abstract math and stuff—but it's so off-topic that it feels weird to see it sitting there on the front page of Planet Haskell. So I thought I'd make an effort to talk about Haskell, as a friendly attempt to promote good relations between tribes. I'm not sure what tribe I'm in, actually, but what the heck. I thought about Haskell a bit, and a Haskell example came to mind.

Here is a definition of the factorial function in Haskell:

        fact 0 = 1
        fact n = n * fact (n-1)
I don't need to explain this to anyone, right?

Okay, now here is another definition:

        fact 0     = 1
        fact (n+1) = (n+1) * fact n
Also fine, and indeed this is legal Haskell. The pattern n+1 is allowed to match an integer that is at least 1, say 7, and doing so binds n to the value 6. This is by a rather peculiar special case in the specification of Haskell's pattern-matcher. (It is section 3.17.2#8 of Haskell 98 Language and Libraries: The Revised Report, should you want to look it up.) This peculiar special case is known sometimes as a "successor pattern" but more often as an "n+k pattern".

The spec explicitly deprecates this feature:

Many people feel that n+k patterns should not be used. These patterns may be removed or changed in future versions of Haskell.

(Page 33.) One wonders why they put it in at all, if they were going to go ahead and tell you not to use it. The Haskell committee is usually smarter than this.

I have a vague recollection that there was an argument between people who wanted to use Haskell as a language for teaching undergraduate programming, and those who didn't care about that, and that this was the compromise result. Like many compromises, it is inferior to both of the alternatives that it interpolates between. Putting the feature in complicates the syntax and the semantics of the language, disrupts its conceptual purity, and bloats the spec—see the Perlesque yikkity-yak on pages 57–58 about how x + 1 = ... binds a meaning to +, but (x + 1) = ... binds a meaning to x. Such complication is worth while only if there is a corresponding payoff in terms of increased functionality and usability in the language. In this case, the payoff is a feature that can only be used in one-off programs. Serious programs must avoid it, since the patterns "may be removed or changed in future versions of Haskell". The Haskell committee purchased this feature at a certain cost, and it is debatable whether they got their money's worth. I'm not sure which side of that issue I fall on. But having purchased the feature, the committee then threw it in the garbage, squandering their sunk costs. Oh well. Not even the Haskell committee is perfect.

I think it might be worth pointing out that the version of the program with the n+k pattern is technically superior to the other version. Given a negative integer argument, the first version recurses forever, possibly taking a long time to fail and perhaps taking out the rest of the system on which it is running. But the n+k version fails immediately, because the n+1 pattern will only match an integer that is at least 1.

XML screws up

The "nasal demons" of the C standard are a joke, but a serious one. The C standard defines what C compilers must do when presented with C programs; it does not define what they do when presented with other inputs, nor what other software does when presented with C programs. The authors of C standard clearly understood the standard's role in the world.

Earlier versions of the XML standard were less clear. There was a particularly laughable clause in the first edition of the XML 1,0 standard:

XML documents may, and should, begin with an XML declaration which specifies the version of XML being used. For example, the following is a complete XML document, well-formed but not valid:

<?xml version="1.0"?>
<greeting>Hello, world!</greeting>

...

The version number "1.0" should be used to indicate conformance to this version of this specification; it is an error for a document to use the value "1.0" if it does not conform to this version of this specification.

(Emphasis is mine.) The XML 1.0 spec is just a document. It has no power, except to declare that certain files are XML 1.0 and certain files are not. A file that complies with the requirements of the spec is XML 1.0; all other files are not XML 1.0. But in the emphasized clause, the spec says that certain behavior "is an error" if it is exhibited by documents that do not conform to the spec. That is, it is declaring certain non-XML-1.0 documents "erroneous". But within the meaning of the spec, "erroneous" simply means that the documents are not XML 1.0. So the clause is completely redundant. Documents that do not conform to the spec are erroneous by definition, whether or not they use the value "1.0".

It's as if the Catholic Church issued an edict forbidding all rabbis from wearing cassocks, on pain of excommunication.

I am happy to discover that this dumb error has been removed from the most recent edition of the XML 1.0 spec.


[Other articles in category /prog/perl] permanent link

Mon, 29 Oct 2007

Where's that blog?
I haven't posted in a couple of weeks, and I was wondering why. So I took a look at the test version of the blog, which displays all the unpublished articles as well as the published ones, and the reason was obvious: In the past ten days I've written seven articles that are unfinished or that didn't work. Usually only about a third of my articles flop; this month a whole bunch flopped in a row. What can I say? Sometimes the muse delivers, and sometimes she doesn't.

I said a while back that I would try to publish more regularly, and not wait until every article was perfect. But I don't want to publish the unfinished articles yet. So I thought instead I'd publish a short summary of what I've been thinking about lately.

I hope to get at least one or two of these done by the end of the month.

Simplified Poker

I recently played a computer poker game that uses a 24-card deck, with only the nine through ace of each suit. This changes the game drastically. For example, a flush is less likely than a four of a kind. (The game uses the standard hand rankings anyway.) It is very easy to compute optimal strategies for this game, because there are so few possible hands (42,504) that you can brute-force all the calculations with a computer.

This got me thinking again of something I started writing up last year and never finished: The game of "Simplified Poker", which was an attempt to do for Poker what the λ-calculus does for computation: the simplest possible model that nevertheless captures all the essential features of the original. Simplified Poker is played with an infinite deck in which half the cards are kings and half are jacks. Each hand contains only two cards. Nevertheless, bluffing is still possible.

Order
What is the Name of this Book?
What is the Name of this Book?
with kickback
no kickback

The Annoying Boxes Puzzle

This is a logic puzzle in which you deduce which box contains the treasure, but with a twist. I thought it up many years ago, and then in the course of trying to write up an explanation about five years ago, I consulted Raymond Smullyan's book What is the Name of This Book? in order to get a citation to prove a certain fact about the form that such puzzles usually take. In doing so, I discovered that Smullyan actually presented the annoying boxes puzzle (in slightly different form) in that book!

It's primarily waiting for me to take a photograph to accompany the puzzle.

Undefined behavior

I have a pretty interesting article on the concept of "undefined behavior", which is a big deal in the C world, but which means something rather different, and is much less important, in Perl.

[ Addendum 20071029: This is ready now. ]

Order
Tootle
Tootle
with kickback
no kickback

Tootle

My daughter Iris has become interested in the book Tootle, by Gertrude Crampton, which is the third-best-selling hardback children's book of all time. A few years back I wrote some brief literary criticism of Tootle, which I included when I wrote the Wikipedia article about the book. This criticism was quite rightly deleted later on, as uncited original research. It needs a new home, and that home is obviously here.

Periodicity without Fourier Series

Suppose I have tabulated the number of blog posts I made every day for two years. I want to find if there is any discernible periodicity to this data. Do I tend to post in 26-day cycles, for example?

One way to do this is to take the Fourier transform of the data. For various reasons, I don't like this technique, and I'm trying to invent something new. I think I have what I want, although it took several tries to find it. Unfortunately, the blog posting data shows no periodicity whatsoever.

Emacs and auto-mode-alist

The elisp code I've been using for the past fifteen years to set the default mode for Perl editing in Emacs broke last week. My search for a replacement turned up some very bizarre advice on IRC.

Van der Waerden's problem

Also still pending is the rest of my van der Waerden problem series. I have written about four programs so far, and I have two to go.


[Other articles in category /meta] permanent link

Fri, 26 Oct 2007

Imaginary units, revisited
Last night, shortly after posting my article about the fact that i and -i are mathematically indistinguishable, I thought of what I should have said about it—true to form, forty-eight hours too late. Here's what I should have said.

The two square roots of -1 are indistinguishable in the same way that the top and bottom faces of a cube are. Sure, one is the top, and one is the bottom, but it doesn't matter, and it could just as easily be the other way around.

Sure, you could say something like this: "If you embed the cube in R3, then the top face is the set of points that have z-coordinate +1, and the bottom face is the set of points that have z-coordinate -1." And indeed, once you arbitrarily designate that one face is on the top and the other is on the bottom, then one is on the top, and one is on the bottom—but that doesn't mean that the two faces had any a priori difference, that one of them was intrinsically the top, or that the designation wasn't completely arbitrary; trying to argue that the faces are distinguishable, after having made an arbitrary designation to distinguish them, is begging the question.

Now can you imagine anyone seriously arguing that the top and bottom faces of a cube are mathematically distinguishable?

[ Previous article in this series: Part 1 Followup articles: Part 3 Part 4 Part 5 ]


[Other articles in category /math] permanent link

More about automorphisms
In a recent article, I asserted that "there aren't even any reasonable [automorphisms of R] that preserve addition.". This is patently untrue.

My proof started by referring to a previous result that any such automorphism f must have f(1) = 1. But actually, I had only proved this for automorphisms that must preserve multiplication. For automorphisms that preserve addition only, f(1) need not be 1; it can be anything. In fact, xkx is an automorphism of R for all k except zero. It is not hard to show, following the technique in the earlier article, that every continuous automorphism has this form.

In hopes of salvaging something from my embarrassing error, I thought I'd spend a little time talking about the other automorphisms of R, the ones that aren't "reasonable". They are unreasonable in at least two ways: they are everywhere discontinuous, and they cannot be exhibited explicitly.

To manufacture the function, we first need a mathematical horror called a Hamel basis. A Hamel basis is a set of real numbers Hα such that every real number r has a unique representation in the form:

$$r = \sum_{i=1}^n q_i H_{\alpha_i}$$

where all the qi are rational. (It is called a Hamel basis because it makes the real numbers into a vector space over the rationals. If this explanation makes no sense to you, please ignore it.)

The sum here is finite, so only a finite number of the uncountably many Hα are involved for any particular r; this is what characterizes it as a Hamel basis.

Leaving aside the proof that the Hamel basis exists at all, if we suppose we have one, we can easily construct an automorphism of R. Just pick some rational numbers mα, one for each Hα. Then if as above, we have:

$$r = \sum_{i=1}^n q_i H_{\alpha_i}$$

The automorphism is:

$$f(r) = \sum_{i=1}^n q_i H_{\alpha_i}m_{\alpha_i}$$

At this point I should probably prove that this is an automorphism. But it seems unwise, because I think that in the unlikely case that you have understood everything so far, you will find the statement that this is an automorphism both clear and obvious, and will be able to imagine the proof yourself, and for me to spell it out will only confuse the issue. And I think that if you have not understood everything so far, the proof will not help. So I should probably just say "clearly, this is an automorphism" and move on.


But against my better judgement, I'll give it a try. Let r and s be real numbers. We want to show that f(s) + f(r) = f(s + r). Represent r and s using the Hamel basis. For each element H of the Hamel basis, let's say that cH(r) is the (rational) coefficient of H in the representation of r. That is, it's the qi in the definition above.

By a simple argument involving commutativity and associativity of addition, cH(r+s) = cH(r) + cH(s) for all r, s, and H.

Also, cH(f(r)) = m·cH(r), for all r and H, where m is the multiplier we chose for H back when we were making up the automorphism, because that's how we defined f.

Then cH(f(r+s)) = m·cH(r+s) = m·(cH(r) + cH(s)) = m·cH(r) + m·cH(s) = cH(f(r)) + cH(f(s)) = cH(f(r) + f(s)), for all H.

This means that f(r+s) and f(r) + f(s) have the same Hamel basis representation. They are therefore the same number. This is what we wanted to show.

If anyone actually found that in the least enlightening, I would be really interested to hear about it.


One property of a Hamel basis is that exactly one of its uncountably many elements is rational. Say it's H0. Then every rational number q is represented as q = (q/H0H0. Then f(q) = (q/H0H0m0 = m0q for all rational numbers q. But in general, an irrational number x will not have f(x) = m0x, so the automorphism is discontinuous everywhere, unless all the mα are equal, in which case it's just xmx again.

The problem with this construction is that it is completely abstract. Nobody can exhibit an example of a Hamel basis, being, as it is, an uncountably infinite set of mainly irrational numbers. So the discontinuous automorphisms constructed here are among the most utterly useless of all mathematical examples.

I think that is the full story about additive automorphisms of R. I hope I got everything right this time.

I should add, by the way, that there seems to be some disagreement about what is called a Hamel basis. Some people say it is what I said: a basis for the reals over the rationals, with the properties I outlined above. However, some people, when you say this, will sniff, adjust their pocket protectors, and and correct you, saying that that a Hamel basis is any basis for any vector space, as long as it has the analogous property that each vector is representable as a combination of a finite subset of the basis elements. Some say one, some the other. I have taken the definition that was convenient for the exposition of this article.

[ Thanks to James Wetterau for pointing out the error in the earlier article. ]

[ Previous articles in this series: Part 1 Part 2 Part 3 Part 4 ]


[Other articles in category /math] permanent link

Automorphisms of the complex numbers
In an earlier article, I wrote a proof that the only automorphisms of the complex numbers are the identity function and the function a + bia - bi.

Robert C. Helling points out that there is a much simpler proof that this is the case. Suppose that f is an automorphism, and that x2 = y. Then f(x2) = (f(x))2 = f(y), so that if x is a square root of y, then f(x) is a square root of f(y).

As I pointed out, f(1) = 1. Since -1 is a square root of 1, f(-1) must be a square root of 1, and so it must be -1. (It can't be 1, since automorphisms may not map two different arguments to the same value.) Since i is a square root of -1, f(i) must also be a square root of -1. So f(i) must be either ±i, and the theorem is proved.

This is a nice example of why I am not a mathematician. When I want to find the automorphisms of C, my first idea is to explicitly write down the general automorphism and then start bashing away on the algebra. This sort of mathematical pig-slaughtering gets the pig cut up all right, but mathematicians are not interested in slaughtering pigs. By which I mean that the approach gets the result I want, usually, but not new or mathematically interesting results.

In computer programming, the pig-slaughtering approach often works really well. Most programs are oversubtle, and can be easily improved by doing the necessary tasks in the simplest and most straightforward possible way, rather than in whatever baroque way the original programmer dreamed up.

[ Previous articles in this series: Part 1 Part 2 Part 3 Followup article: Part 5 ]


[Other articles in category /math] permanent link

Imaginary units, again
In my earlier discussion of i and -i I said " The point about the square roots of -1 is that there is no corresponding criterion for distinguishing the two roots. This is a theorem."

The proof of the theorem is not too hard. What we're looking for is what's called an automorphism of the complex numbers. This is a function, f, which "relabels" the complex numbers, so that arithmetic on the new labels is the same as the arithmetic on the old labels. For example, if 3×4 = 12, then f(3) × f(4) should be f(12).

Let's look at a simpler example, and consider just the integers, and just addition. The set of even integers, under addition, behaves just like the set of all integers: it has a zero; there's a smallest positive number (2, whereas it's usually 1) and every number is a multiple of this smallest positive number, and so on. The function f in this case is simply f(n) = 2n, and it does indeed have the property that if a + b = c, then f(a) + f(b) = f(c) for all integers a, b, and c.

Another automorphism on the set of integers has g(n) = -n. This just exchanges negative and positive. As far as addition is concerned, these are interchangeable. And again, for all a, b, and c, g(a) + g(b) = g(c).

What we don't get with either of these examples is multiplication. 1 × 1 = 1, but f(1) × f(1) = 2 × 2 = 4 ≠ f(1) = 2. And similarly g(1) × g(1) = -1 × -1 = 1 ≠ g(1) = -1.

In fact, there are no interesting automorphisms on the integers that preserve both addition and multiplication. To see this, consider an automorphism f. Since f is an automorphism that preserves multiplication, f(n) = f(1 × n) = f(1) × f(n) for all integers n. The only way this can happen is if f(1) = 1 or if f(n) = 0 for all n.

The latter is clearly uninteresting, and anyway, I neglected to mention that the definition of automorphism rules out functions that throw away information, as this one does. Automorphisms must be reversible. So that leaves only the first possibility, which is that f(1) = 1. But now consider some positive integer n. f(n) = f(1 + 1 + ... + 1) = f(1) + f(1) + ... + f(1) = 1 + 1 + ... + 1 = n. And similarly for 0 and negative integers. So f is the identity function.

One can go a little further: there are no interesting automorphisms of the real numbers that preserve both addition and multiplication. In fact, there aren't even any reasonable ones that preserve addition. The proof is similar. First, one shows that f(1) = 1, as before. Then this extends to a proof that f(n) = n for all integers n, as before. Then suppose that a and b are integers. b·f(a/b) = f(b)f(a/b) = f(b·a/b) = f(a) = a, so f(a/b) = a/b for all rational numbers a/b. Then if you assume that f is continuous, you can fill in f(x) = x for the irrational numbers also.

(Actually this is enough to show that the only continuous addition-preserving automorphism of the reals is the identity function. There are discontinuous addition-preserving functions, but they are very weird. I shouldn't need to drag in the continuity issue to show that the only addition-and-multiplication-preserving automorphism is the identity, but it's been a long day and I'm really fried.)

[ Addendum 20060913: This previous paragraph is entirely wrong; any function xkx is an addition-preserving automorphism, except of course when k=0. For more complete details, see this later article. ]

But there is an interesting automorphism of the complex numbers; it has f(a + bi) = a - bi for all real a and b. (Note that it leaves the real numbers fixed, as we just showed that it must.) That this function f is an automorphism is precisely the content of the statement that i and -i are numerically indistinguishable.

The proof that f is an automorphism is very simple. We need to show that if f(a + bi) + f(c + di) = f((a + bi) + (c + di)) for all complex numbers a+bi and c+di, and similarly f(a + bi) × f(c + di) = f((a + bi) × (c + di)). This is really easy; you can grind out the algebra in about two steps.

What's more interesting is that this is the only nontrivial automorphism of the complex numbers. The proof of this is also straightforward, but a little more involved. The purpose of this article is to present the proof.

Let's suppose that f is an automorphism of the complex numbers that preserves both addition and multiplication. Let's say that f(i) = p + qi. Then f(a + bi) = f(a) + f(b)f(i) = a + bf(i) (because f must leave the real numbers fixed) = a + b(p + qi) = (a + bp) + bqi.

Now we want f(a + bi) + f(c + di) = f((a + bi) + (c + di)) for all real numbers a, b, c, and d. That is, we want (a + bp + bqi) + (c + dp + dqi) = (a + c) + (b + d)(p + qi). It is, so that part is just fine.

We also want f(a + bi) × f(c + di) = f((a + bi) × (c + di)) for all real numbers a, b, c, and d. That means we need:

(a + b(p + qi)) × (c + d(p + qi)) = f((ac-bd) + (ad+bc)i)
(a + bp + bqi) × (c + dp + dqi) = ac - bd + (ad + bc)(p + qi)
ac + adp + adqi + bcp + bdp2 + 2bdpqi + bcqi - bdq2 = ac - bd + adp + bcp + adqi + bcqi
bdp2 + 2bdpqi - bdq2 = - bd
p2 + 2pqi - q2 = -1

Equating the real and imaginary parts gives us two equations:

  1. p2 - q2 = -1
  2. 2pq = 0

Equation 2 implies that either p or q is 0. If they're both zero, then f(a + bi) = a, which is not reversible and so not an automorphism.

Trying q=0 renders equation 1 insoluble because there is no real number p with p2 = -1. But p=0 gives two solutions. One has p=0 and q=1, so f(a+bi) = a+bi, which is the identity function, and not interesting. The other has p=0 and q=-1, so f(a+bi) = a-bi, which is the one we already knew about. But we now know that there are no others, which is what I wanted to show.

[ Previous articles in this series: Part 1 Part 2 Followup articles: Part 4 Part 5 ]


[Other articles in category /math] permanent link

Sun, 14 Oct 2007

Van der Waerden's problem: programs 3 and 4
In this series of articles I'm analyzing five versions of a program that I wrote around 1988, and then another program that does the same thing that I wrote last month without referring to the 1988 code. (I said before that it was four versions, but apparently I'm not so good at counting to five.)

If you don't remember what the program does, here's an explanation.

Here is program 1, which was an earlier attempt to do the same thing. Here's program 2.

Program 3

Complete source code for this version.

I said of the previous program:

The problem is all in the implementation. You see, this program actually constructs the entire tree in memory.

Somewhere along the line it dawned on me that constructing the tree was unnecessary, so I took that machinery out, and the result was version 3.

Consequently, this program is easy to explain once you have seen the previous version: almost all I have to do is list the stuff that I took out.

Since this program does not construct a tree of node structures, it omits the definition of the node structure and the macro for manufacturing nodes. Since it gets rid of the node allocation, it also gets rid of the memory leak of the previous version, and so omits the customized memory allocation functions Malloc and Free that performed memory tracking.

The previous program had a compiled-in limit on the number of colors it would handle, because at the time I didn't know how to do a dynamic array. In this program, I got rid of the node structures, so there was no array of node structures, so no need for a limit on the number of node structures in the array. And all the code that enforced the limit is gone.

The apchk function, which checks to see if a string is good, remains unchanged from the previous version.

The makenodes function, which was the principal function in the previous program, remains, but has lost a lot of code. It is simpler to call, too; the node argument is gone:

        makenodes(maxlen,"");
I got rid of the silly !howfar test in favor of a more easily-understood howfar == 0 test. There are lots of times when ! is appropriate, but testing whether a non-negative integer has reached zero is not one of them. I was going to comment earlier about what a novice error this is, and I'm glad to see that I fixed it.

The main use of apchk in the previous program had if (!apchk(...)) { ... }. That was okay, because apchk returns a Boolean result. But the negation is annoying. It suggests that apchk's return value is backward. (Instead of returning true for a bad string, it should return true for a good string.) This is not very much a big deal, and I only brought it up so that I could diffidently confess that these days I would probably have done:

        #define unless(c)       if(!(c))
        ...
        unless (is_bad(...)) {
        }
There are a lot of stories of doofus Pascal programmers who do:

        #define begin {
        #define end }
and Fortran programmers who do:

        #define GT >
        #define GE >=
        #define LT <
        #define LE <=
and I find, to my shame, that I have become one of them. Anyone seeing #define unless(c) if(!(c)) would snort and say "Oh, this was obviously written by a Perl programmer."

But at least I was a C programmer first.

Actually I was a Fortran programmer first. But I was never a big enough doofus to #define GE >=.

The big flaw in the current program is the string argument to makenodes. Each call to makenodes copies this string so that it can append a character to the end. I discussed this at some length in the previous article, so I don't want to make too much of it now; I'll just say that a better technique would have reused the string buffer from call to call. This obviously saves a little memory, and since most of the contents of the string doesn't change, it also saves a lot of time.

This might be worth seeing, since it seems to me now to be a marvel of wasted code:

    ls = strlen(s);
    newarg = STRING(ls + 1);
    if (!newarg) 
      {
      fprintf(stderr,"Couldn't get %d bytes for newarg in makenodes\n",ls+2);
      fprintf(stderr,"Total get was %d.\n",gotten);
      fprintf(stderr,"P\n L\n  O\n   P\n    !\n");
      abort();
      }
    strcpy(newarg,s);
    newarg[ls+1] = '\0';
    newarg[ls] = 'A' + i;
    makenodes(howfar-1,newarg);
    free(newarg);
The repeated strlen, for example, when ls could be calculated as maxlen - howfar. The excessively verbose failure message, which should be inside the STRING macro anyway. (The code that maintains gotten has gone away with the debugging allocation routines, so the second fprintf is superfluous.) And why did I think abort was the right thing to call on an out-of-memory condition?

Oh well, you live and learn.

Program 4

Complete source code for this version.

The fourth version of the program is even more trimmed-down. In this version of the program I did get the idea to reuse the string buffer instead of copying the string on every recursive call. But I also got an even better idea, and eliminated the recursive call. The makenodes function is now down to one argument, which tells it how deep a tree to search.

        void
        makenodes(maxdepth)
        int maxdepth;
        {
        int apchk(), depth = 0;
        char curlet, *curstring = STRING(maxdepth);

        curstring[0] = '\0';
        curlet = 'A';

        while (depth >= 0)
          {
          while (curlet <= 'A' - 1 + colors)
            {
        #ifdef DIAG
            printf("%s makenoding with string %s%c, depth %d.\n",
                TABS+12-depth,curstring,curlet,depth);
        #endif
            if (apchk(curstring,curlet))
              curlet++;
            else
              if (depth < maxdepth)
                {
                curstring[depth] = curlet;
                curstring[depth+1] = '\0';
                depth += 1;
                curlet = 'A';
                }
              else
                {
                printf("%s%c\n",curstring,curlet);
                curlet++;
                }
            }
          depth -= 1;
          curlet = curstring[depth] + 1;
          curstring[depth] = '\0';
          }
        }
This is a better job all around, and not very different from what I wrote last month to do the same thing. I was going to title this series of articles "I have become a better programmer!", and now that I see this version, I'm glad I didn't, because there's no evidence here that I am much better. This version of the program gets a solid A from my older self.

The value depth scans forward in the string when the search is going well, and is decremented again when the search needs to backtrack. If depth == maxdepth, a witness of the desired length has been found, and is printed out.

The curlet ("current letter") variable tracks which branch of the current tree node we are "recursing" down. After the function recurses down, by incrementing depth, curlet is set to 'A' to visit the first sub-node of the new current node. The curstring buffer tracks the path through the tree to the current node. When the function needs to backtrack, it restores the state of curlet from the last character in the buffer and then trims that character off the end of the path.

I'd only want to make two changes to this code. One would be to make depth a pointer into the curstring buffer instead of an index into it. Then again, the compiler may well have optimized it into one anyway. But it would also allow me to eliminate curlet in favor of just using *depth everywhere.

The other change would address a more serious defect: the contents of curstring are kept properly zero-terminated at all times, whenever depth is advanced or retracted. This zero-termination is unnecessary, since curstring is never used as a string except when depth == maxdepth. When printfing curstring, I could have used something like:

        printf("%.*s%c\n",curstring,maxlen,curlet);
which prints exactly maxlen characters from the buffer, regardless of whether it is zero-terminated.

It would, however, have required that I know about %.*s, which I'm sure I did not. Was %.*s even available in 1988? I forget, and my copy of K&R First Edition is in a box somewhere since my recent move. Anyway, if %.*s was unavailable for whatever reason, the code could have had a single curstring[maxdepth] = 0 up front, which would have been quite sufficient for the one printf it needed to do.

Coming next: one very different program to solve the same problem, and a comparison with last month's effort.


[Other articles in category /prog] permanent link

Van der Waerden's problem: program 1
In this series of articles I'm going to analyze four versions of a program that I wrote around 1988, and then another program that does the same thing that I wrote last month without referring to the 1988 code.

If you don't remember what the program does, here's an explanation.

Program 1

I'm going to discuss the program a bit at a time. The complete program is here.

This program does an unpruned exhaustive search of the string space. Since for V(3, 3) the string space contains 327 = 7,625,597,484,987 strings, it takes a pretty long time to finish. I quickly realized that I was wasting my time with this program.

The program is invoked with a length argument and an optional colors argument, which defaults to 2. It then looks for good strings of the specified length, printing those it finds. If there are none, one then knows that V(3, colors) > length. Otherwise, one knows that V(3, colors) ≤ length, and has witness strings to prove it.

I don't want to spend a lot of time on it because there are plenty of C programming style guides you can read if you care for that. But already on lines 4–5 we have something I wouldn't write today:

        #define NO	0
        #define YES	!NO
Oh well.

The program wants to iterate through all Cn strings. How does it know when it's done? It's not easy to make a program as slow as this one even slower, but I found a way to do it.

        last = STRING(length);
        stuff(last,'A' - 1 + colors);

        for (i=0; i<colors; i++)
          last[i] = 'A' + i;

        for (; strcmp(seq,last); strinc(seq))
          ...
It manufactures the string ABCDDDDDDDDD....D and compares the current string to that one every time through the loop. A much simpler method is to detect completion while incrementing the target string. The function that does the increment looks like this:

        void
        strinc(s)
        char *s;
        {
        int i;

        for (i= length - 1; i>=0; i--)
          {
          if (s[i] != 'A' - 1 + colors)
            {
            s[i]++;
            return;
            }
          s[i] = 'A';
          }
        return;
        }
Had I been writing it today, it would have looked more like this:

        unsigned strinc(char *s) 
        {
          char *p = strchr(s, '\0') - 1;
          while (p >= s && *p == 'A' + colors - 1) *p-- = 'A';
          if (p < s) return 0;
          (*p)++;
          return 1;
        }
(This code is in a pink box to show that it is not actually part of the program I am discussing in this article.)

The function returns true on success and false on failure. A false return can be taken by the caller as the signal to terminate the program.

This replacement function invokes undefined behavior, because there is no guarantee that p is allowed to run off the beginning of the string in the way that it does. But there is no need to check the strings in lexicographic order. Instead of scanning the strings in the order AAA, AAB, ABA, ABB, BAA, etc., one can scan them in reverse lexicographic order: AAA, BAA, ABA, BBA, AAB, etc. Then instead of running off the beginning of the string, p runs off the end, which is allowed. This fixes the undefined behavior problem and also eliminates the call to strchr that finds the end of the string. This is likely to produce a significant speedup:

        unsigned strinc(char *s) 
        {
          while (*s == 'A' + colors - 1) *s++ = 'A';
          if (!*s) return 0;
          (*s)++;
          return 1;
        }
Here we're depending on the optimizer to avoid recomputing the value of 'A' + colors - 1 every time through the loop.

The heart of the program is the apchk() function, which checks whether a string q contains an arithmetic progression of length 3:

        int
        apchk(q)
        char *q;
        {
        int f, s, t;

        for (f=0; f <= length - 3; f++)
          for (s=f+1; s <= length - 2; s++)
            {
            t = s+s-f;
            if (t >= length) break;
            if ((q[f] == q[s]) && (q[s] == q[t])) return YES;
            }
        return NO;
        }
I hesitate to say that this is the biggest waste of time in the whole program, since after all it is a program whose job is to examine 7,625,597,484,987 strings. But look. 2/3 of the calls to this function are asking it to check a string that differs from the previous string in the final character only. Nevertheless, it still checks all 49 possible arithmetic progressions, even the ones that didn't change.

The t ≥ length test is superfluous, or if it isn't, it should be.

Also notice that I wasn't sure of the precendence in the final test.

It didn't take me long to figure out that this program was not going to finish in time. I wrote a series of others, which I hope to post here in coming days. The next one sucks too, but in a completely different way.

[ Addendum 20071005: Here is part 2. ]

[ Addendum 20071014: Here is part 3. ]


[Other articles in category /prog] permanent link

Van der Waerden's problem
In this series of articles I'm going to analyze four versions of a program that I wrote around 1988, and then another program that does the same thing that I wrote last month without referring to the 1988 code.

First I'll explain what the programs are about.

Van der Waerden's problem

Color each of a row of dots red or blue, so that no three evenly-spaced dots are the same color. (That is, if dots n and n+i are the same color, dot n+2i must be a different color.) How many dots can you do?

Well, clearly you can do four: R R B B. And then you can add another red one on the end: R R B B R. And then another that could be either red or blue: R R B B R B. And then the next can be either color, say blue: R R B B R B B.

But now you are at the end, because if you make the next dot red, then dots 2, 5, and 8 will all be red (R R B B R B B R), and if you make the next dot blue then dots 6, 7, and 8 will be blue (R R B B R B B B).

But maybe we made a mistake somewhere earlier, and if the first seven dots were colored differently, we could have made a row of more than 7 that obeyed the no-three-evenly-spaced-dots requirement. In fact, this is so: R R B B R R B B is an example.

But this is the end of the line. Any coloring of a row of 9 dots contains three evenly-spaced dots of the same color. (I don't know a good way to prove this, short of an enumeration of all 512 possible arrangements of dots. Well, of course it is sufficient to enumerate the 256 that begin with R, but that is pretty much the same thing.)

Van der Waerden's theorem says that for any number of colors, say C, a sufficiently-long row of colored dots will contain n evenly-spaced same-color dots for any n. Or, put another way, if you partition the integers into C disjoint classes, at least one class will contain arbitrarily long arithmetic progressions.

The proof of van der Waerden's theorem works by taking C and n and producing a number V such that a row of V dots, colored with C colors, is guaranteed to contain n evenly-spaced dots of a single color. The smallest such V is denoted V(n, C). For example V(3, 2) is 9, because any row of 9 dots of 2 colors is guaranteed to contain 3 evenly-spaced dots of the same color, but this is not true of such row of only 8 dots.

Van der Waerden's theorem does not tell you what V(n, C) actually is; it provides only an upper bound. And here's the funny thing about van der Waerden's theorem: the upper bound is incredibly bad.

For V(3, 2), the theorem tells you only that V(3, 2) &le 325. That is, it tells you that any row of 325 red and blue dots must contain three evenly spaced dots of the same color. This is true, but oh, so sloppy, since the same is true of any row of 9 dots.

For V(3, 3), the question is how many red, yellow, and blue dots do you need to guarantee three evenly-spaced same-colored dots. The theorem helpfully suggests that:

$$V(3,3) \leq 7(2\cdot3^7+1)(2\cdot3^{7(2\cdot3^7+1)}+1)$$

This is approximately 5.79·1014613. But what is the actual value of V(3, 3)? It's 27. Urgggh.

In fact, there is a rather large cash prize available to be won by the first person who comes up with a general upper bound for V(n, C) that is smaller than a tower of 2's of height n. (That's 222... with n 2's.)

In the rest of this series, a string which does not contain three evenly-spaced equal symbols will be called good, and one which does contain three such symbols will be called bad. Then a special case of Van der Waerden's theorem, with n=3, says that, for any fixed number of symbols, all sufficiently long strings are bad.

In college I wanted to investigate this a little more. In particular, I wanted to calculate V(3, 3). These days you can just look it up on Wikipedia, but in those benighted times such information was hard to come by. I also wanted to construct the longest possible good strings, witnesses of length V(3, 3)-1. Although I did not know it at the time, V(3, 3) = 27, so a witness should have length 26. It turns out that there are exactly 48 witnesses of length 26. Here are the 1/6 of them that begin with RB or RRB:

RRBBRRBYBYYRYRRBRBBYRYYBYB
RRBBYRRYRYBBYYBBYRYRRYBBRR
RRBYBRRYRYBBYYBBYRYRRBYBRR
RBRRBRBYYBBYYBRBRRBYYRRYRY
RBRBBRRYBBYBYRRYYRRYBYBBYR
RBRBBRRYBBYBYRRYYRRYBYBBYB
RBRBBYBRRYRYYBYBBRBRYYRRYY
RBYYBYBRRBBRRBYBYYBRRYYRYR

The rest of the witnesses may be obtained by permuting the colors in these eight.

I wrote a series of C programs around 1988 to exhaustively search for good strings. Last month I was in a meeting and I decided to write the program again for some reason. I wrote a much better program. This series of articles will compare the five programs. I will post the first one tomorrow.

[ Addendum 20071003: Here is part 1. ]

[ Addendum 20071005: Here is part 2. ]

[ Addendum 20071005: I made a mistake in the expression I gave for the upper bound on V(3,3) and left out a factor of 7 in the exponent on the last 3. I had said that the upper bound was around 102092, but actually it is more like the seventh power of this. ]

[ Addendum 20071014: Here is part 3. ]


[Other articles in category /prog] permanent link

Van der Waerden's problem: program 2
In this series of articles I'm going to analyze four versions of a program that I wrote around 1988, and then another program that does the same thing that I wrote last month without referring to the 1988 code.

If you don't remember what the program does, here's an explanation.

Here is program 1, which was an earlier attempt to do the same thing.

Program 2

In yesterday's article I wrote about a crappy program to search for "good" strings in van der Waerden's problem. It was crappy because it searched the entire space of all 327 strings, with no pruning.

I can't remember whether I expected this to be practical at the time. Did I really think it would work? Well, there was some sense to it. It does work just fine for the 29 case. I think probably my idea was to do the simplest thing that could possibly work, and get as much information out of it as I could. On my current machine, this method proves that V(3,3) > 19 by finding a witness (RRBRRBBYYRRBRRBBYYB) in under 10 seconds. If we estimate that the computer I had then was 10,000 times slower, then I could have produced the same result in about 28 hours. I was at college, and there was plenty of free computing power available, so running a program for 28 hours was easily done. While I was waiting for it to finish, I could work on a better program.

Excerpts of the better program follow. The complete source code is here.

The idea behind this program is that the strings of length less than V form a tree, with the empty string as the root, and the children of string s are obtained from s by appending a single character to the end of s. If the string at a node is bad, so will be all the strings under it, and we can prune the entire branch at that node. This leaves us with a tree of all the good strings. The ones farthest from the root will be the witnesses we seek for the values of V(n, C), and we can find these by doing depth-first search on the tree,

There is nothing wrong with this idea in principle; that's the way my current program works too. The problem is all in the implementation. You see, this program actually constructs the entire tree in memory:

    #define NEWN		((struct tree *) Malloc(sizeof(struct tree)));\
                            printf("*")
    struct tree {
      char bad;
      struct tree *away[MAXCOLORS];
      } *root;
struct tree is a tree node structure. It represents a string s, and has a flag to record whether s is bad. It also has pointers to its subnodes, which will represents strings sA, sB, and so on.

MAXCOLORS is a compiled-in limit on the number of different symbols the strings can contain, an upper bound on C. Apparently I didn't know the standard technique for avoiding this inflexibility. You declare the array as having length 1, but then when you allocate the structure, you allocate enough space for the array you are actually planning to use. Even though the declared size of the array is 1, you are allowed to refer to node->away[37] as long as there is actually enough space in the allocated chunk. The implementation would look like this:

        struct tree {
          char bad;
          struct tree *away[1];
        } ;

        struct tree *make_tree_node(char bad, unsigned n_subnodes)
        {
          struct tree *t;
          unsigned i;

          t =  malloc(sizeof(struct tree) 
                   + (n_subnodes-1) * sizeof(struct tree *));

          if (t == NULL) return NULL;

          t->bad = bad;
          for (i=0; i < n_subnodes; i++) t->away[i] = NULL;

          return t;
        }
(Note for those who are not advanced C programmers: I give you my solemn word of honor that I am not doing anything dodgy or bizarre here; it is a standard, widely-used, supported technique, guaranteed to work everywhere.)

(As before, this code is in a pink box to indicate that it is not actually part of the program I am discussing.)

Another thing I notice is that the NEWN macro is very weird. Note that it may not work as expected in a context like this:

        for(i=0; i<10; i++)
          s[i] = NEWN;
This allocates ten nodes but prints only one star, because it expands to:

        for(i=0; i<10; i++)
          s[i] = ((struct tree *) Malloc(sizeof(struct tree)));
        printf("*");
and the for loop does not control the printf. The usual fix for multiline macros like this is to wrap them in do...while(0), but that is not appropriate here. Had I been writing this today, I would have made NEWN a function, not a macro. Clevermacroitis is a common disorder of beginning C programmers, and I was no exception.

The main business of the program is in the makenodes function; the main routine does some argument processing and then calls makenodes. The arguments to the makenodes function are the current tree node, the current string that that node represents, and an integer howfar that says how deep a tree to construct under the current node.

There's a base case, for when nothing needs to be constructed:

    if (!howfar)
      {
      for (i=0; i<colors; i++)
        n->away[i] = NULL;
      return;
      }
But in general the function calls itself recursively:

    for (i=0; i<colors; i++)
      {
      n->away[i] = NEWN;
      n->away[i]->bad = 0;
      if (apchk(s,'A'+i))
        {
        n->away[i]->bad = 1;
        }
      else
      ...
Recall that apchk checks a string for an arithmetic progression of equal characters. That is, it checks to see if a string is good or bad. If the string is bad, the function prunes the tree at the current node, and doesn't recurse further.

Unlike the one in the previous program, this apchk doesn't bother checking all the possible arithmetic progressions. It only checks the new ones: that is, the ones involving the last character. That's why it has two arguments. One is the old string s and the other is the new symbol that we want to append to s.

If s would still be good with symbol 'A'+i appended to the end, the function recurses:

        ...
        else
        {
        ls = strlen(s);
        newarg = STRING(ls + 1);
        strcpy(newarg,s);
        newarg[ls+1] = '\0';
        newarg[ls] = 'A' + i;
        makenodes(n->away[i],howfar-1,newarg);
        Free(newarg,ls+2);
        Free(n->away[i],sizeof(struct tree));
        }
      }
    }
The entire string is copied here into a new buffer. A better technique sould have been to allocate a single buffer back up in main, and to reuse that buffer over again on each call to makenodes. It would have looked something like this:

        char *s = String(maxlen);
        memset(s, 0, maxlen+1);
        makenodes(s, s, maxlen);

        void        
        makenodes(char *start, char *end, unsigned howfar)
        {
           ...
           for (i=0; i<colors; i++) {
             *end = 'A' + i;
             makenodes(start, end+1, howfar-1);
           }
           *end = '\0';
           ...
        }
This would have saved a lot of consing, ahem, I mean a lot of mallocing. Also a lot of string copying. We could avoid the end pointer by using start+maxlen-howfar instead, but this way is easier to understand.

I was thinking this afternoon how it's intersting the way I wrote this. It's written the way it would have been done, had I been using a functional programming language. In a functional language, you would never mutate the same string for each function call; you always copy the old structure and construct a new one, just as I did in this program. This is why C programmers abominate functional languages.

Had I been writing makenodes today, I would probably have eliminated the other argument. Instead of passing it a node and having it fill in the children, I would have had it construct and return a complete node. The recursive call would then have looked like this:

  struct tree *new = NEWN;
  ...
  for (i=0; i<colors; i++) {
     new->away[i] = makenodes(...);
     ...
  }
  return new;
One thing I left out of all this was the diagnostic printfs; you can see them in the complete code if you want. But there's one I thought was worth mentioning anyway:

    #define TABS	"                                        "
    ....

    #ifdef DIAG
    printf("%s makenoding with string %s, depth %d.\n",
            TABS+12-maxlen+howfar,s,maxlen-howfar);
    #endif
The interesting thing here is the TABS+12-maxlen+howfar argument, which indents the display depending on how far the recursion has progressed. In Perl, which has nonaddressable strings, I usually do something like this:

        my $TABS = " " x (maxlen - howfar);
        print $TABS, "....";
The TABS trick here is pretty clever, and I'm a bit surprised that I thought of it in 1988, when I had been programming in C for only about a year. It makes an interesting contrast to my failure to reuse the string buffer in makenodes earlier.

(Peeking ahead, I see that in the next version of the program, I did reuse the string buffer in this way.)

TABS is actually forty spaces, not tabs. I suspect I used tabs when I tested it with V(2, 3), where maxlen was only 9, and then changed it to spaces for calculating V(3, 3), where maxlen was 27.

The apchk function checks to see if a string is good. Actually it gets a string, qq, and a character, q, and checks to see if the concatenation of qq and q would be good. This reduces its running time to O(|qq|) rather than O(|qq|2).

  int
  apchk(qq,q)
  char *qq ,q;
  {
  int lqq, f, s, t;

  t = lqq = strlen(qq);
  if (lqq < 2) return NO;

  for (f=lqq % 2; f <= lqq - 2; f += 2)
    {
    s = (f + t) / 2;
    if ((qq[f] == qq[s]) && (qq[s] == q))
      return YES;
    }
  return NO;
  }
It's funny that it didn't occur to me to include an extra parameter to avoid the strlen, or to use q instead of qq[s] in the first == test. Also, as in the previous program, I seem unaware of the relative precedences of && and ==. This is probably a hangover from my experience with Pascal, where the parentheses are required.

It seems I hadn't learned yet that predicate functions like apchk should be named something like is_bad, so that you can understand code like if (is_bad(s)) { ... } without having to study the code of is_bad to figure out what it returns.

I was going to write that I hated this function, and that I could do it a lot better now. But then I tried to replace it, and wasn't as successful as I expected I would be. My replacement was:

        unsigned
        is_bad(char *qq, int q) 
        {
          size_t qql = strlen(qq);
          char *f = qq + qql%2;
          char *s = f + qql/2;
          while (f < s) {
            if (*f == q && *s == q) return 1;
            f += 2; s += 1;
          }
          return 0;
        }
I could simplify the initializations of f and s, which are the parts I dislike most here, by making the pointers move backward instead of forward, but then the termination test becomes more complicated:
        unsigned
        is_bad(char *qq, int q) 
        {
          char *s = strchr(qq, '\0')-1;
          char *f = s-1;
          while (1) {
            if (*f == q && *s == q) return 1;
            if (f - qq < 2) break;
            f -= 2; s -= 1;
          }
          return 0;
        }
Anyway, I thought I could improve it, but I'm not sure I did. On the one hand, I like the f -= 2; s -= 1;, which I think is pretty clear. On the other hand, s = (f + t) / 2 is pretty clear too; s is midway between f and t. I'm willing to give teenage Dominus a passing grade on this one.

Someone probably wants to replace the while loop here with a for loop. That person is not me.

The Malloc and Free functions track memory usage and were presumably introduced when I discovered that my program used up way too much memory and crashed—I think I remember that the original version omitted the calls to free. They aren't particularly noteworthy, except perhaps for this bit, in Malloc:

        if (p == NULL)
          {
          fprintf(stderr,"Couldn't get %d bytes.\n",c);
          fprintf(stderr,"Total get was %d.\n",gotten);
          fprintf(stderr,"P\n L\n  O\n   P\n    !\n");
          abort();
          }
Plop!

It strikes me as odd that I was using void in 1988 (this is before the C90 standard) but still K&R-style function declarations. I don't know what to make of that.

Behavior

This program works, almost. On my current machine, it can find the length-26 witnesses for V(3, 3) in no time. (In 1998, it took several days to run on a Sequent Balance 21000.) The major problem is that it gobbles memory: the if (!howfar) base case in makenodes forgets to release the memory that was allocated for the new node. I wonder if the Malloc and Free functions were written in an unsuccessful attempt to track this down.

Sometime after I wrote this program, while I was waiting for it to complete, it occurred to me that it never actually used the tree for anything, and I could take it out.

I have this idea that one of the principal symptoms of novice programmers is that they take the data structures too literally, and always want to represent data the way it will appear when it's printed out. I haven't developed the idea well enough to write an article about it, but I hope it will show up here sometime in the next three years. This program, which constructs an entirely unnecessary tree structure, may be one of the examples of this idea.

I'll show the third version sometime in the next few days, I hope.

[ Addendum 20071014: Here is part 3. ]


[Other articles in category /prog] permanent link

Relatively prime polynomials over Z2
Last week Wikipedia was having a discussion on whether the subject of "mathematical quilting" was notable enough to deserve an article. I remembered that there had been a mathematical quilt on the cover of some journal I read last year, and I went to the Penn math library to try to find it again. While I was there, I discovered that the June 2007 issue of Mathematics Magazine had a cover story about the probability that two randomly-selected polynomials over Z2 are relatively prime. ("The Probability of Relatively Prime Polynomials", Arthur T. Benjamin and Curtis D Bennett, page 196).

Polynomials over Z2 are one of my favorite subjects, and the answer to the question turned out to be beautiful. So I thought I'd write about it here.

First, what does it mean for two polynomials to be relatively prime? It's analogous to the corresponding definition for integers. For any numbers a and b, there is always some number d such that both a and b are multiples of d. (d = 1 is always a solution.) The greatest such number is called the greatest common divisor or GCD of a and b. The GCD of two numbers might be 1, or it might be some larger number. If it's 1, we say that the two numbers are relatively prime (to each other). For example, the GCD of 100 and 28 is 4, so 100 and 28 are not relatively prime. But the GCD of 100 and 27 is 1, so 100 and 27 are relatively prime. One can prove theorems like these: If p is prime, then either a is a multiple of p, or a is relatively prime to p, but not both. And the equation ap + bq = 1 has a solution (in integers) if and only if p and q are relatively prime.

The definition for polynomials is just the same. Take two polynomials over some variable x, say p and q. There is some polynomial d such that both p and q are multiples of d; d(x) = 1 is one such. When the only solutions are trivial polynomials like 1, we say that the polynomials are relatively prime. For example, consider x2 + 2x + 1 and x2 - 1. Both are multiples of x+1, so they are not relatively prime. But x2 + 2x + 1 is relatively prime to x2 - 2x + 1. And one can prove theorems that are analogous to the ones that work in the integers. The analog of "prime integer" is "irreducible polynomial". If p is irreducible, then either a is a multiple of p, or a is relatively prime to p, but not both. And the equation a(x)p(x) + b(x)q(x) = 1 has a solution for polynomials a and b if and only if p and q are relatively prime.

One uses Euclid's algorithm to calculate the GCD of two integers. Euclid's algorithm is simple: To calculate the GCD of a and b, just subtract the smaller from the larger, repeatedly, until one of the numbers becomes 0. Then the other is the GCD. One can use an entirely analogous algorithm to calculate the GCD of two polynomials. Two polynomials are relatively prime just when their GCD, as calculated by Euclid's algorithm, has degree 0.

Anyway, that was more introduction than I wanted to give. The article in Mathematics Magazine concerned polynomials over Z2, which means that the coefficients are in the field Z2, which is just like the regular integers, except that 1+1=0. As I explained in the earlier article, this implies that a=-a for all a, so there are no negatives and subtraction is the same as addition. I like this field a lot, because subtraction blows. Do you have trouble because you're always dropping minus signs here and there? You'll like Z2; there are no minus signs.

Here is a table that shows which pairs of polynomials over Z2 are relatively prime. If you read this blog through some crappy aggregator, you are really missing out, because the table is awesome, and you can't see it properly. Check out the real thing.

 a0a1a2a3a4a5a6a7a8a9b0b1b2b3b4b5b6b7b8b9c0c1c2c3c4c5c6c7c8c9d0d1
0   [a0]                                                                
1   [a1]                                                                
x   [a2]                                                                
x + 1   [a3]                                                                
x2   [a4]                                                                
x2 + 1   [a5]                                                                
x2 + x   [a6]                                                                
x2 + x + 1   [a7]                                                                
x3   [a8]                                                                
x3 + 1   [a9]                                                                
x3 + x   [b0]                                                                
x3 + x + 1   [b1]                                                                
x3 + x2   [b2]                                                                
x3 + x2 + 1   [b3]                                                                
x3 + x2 + x   [b4]                                                                
x3 + x2 + x + 1   [b5]                                                                
x4   [b6]                                                                
x4 + 1   [b7]                                                                
x4 + x   [b8]                                                                
x4 + x + 1   [b9]                                                                
x4 + x2   [c0]                                                                
x4 + x2 + 1   [c1]                                                                
x4 + x2 + x   [c2]                                                                
x4 + x2 + x + 1   [c3]                                                                
x4 + x3   [c4]                                                                
x4 + x3 + 1   [c5]                                                                
x4 + x3 + x   [c6]                                                                
x4 + x3 + x + 1   [c7]                                                                
x4 + x3 + x2   [c8]                                                                
x4 + x3 + x2 + 1   [c9]                                                                
x4 + x3 + x2 + x   [d0]                                                                
x4 + x3 + x2 + x + 1   [d1]                                                                

A pink square means that the polynomials are relatively prime; a white square means that they are not. Another version of this table appeared on the cover of Mathematics Magazine. It's shown at right.

The thin black lines in the diagram above divide the polynomials of different degrees. Suppose you pick two degrees, say 2 and 2, and look at the corresponding black box in the diagram:

 a4a5a6a7
x2   [a4]        
x2 + 1   [a5]        
x2 + x   [a6]        
x2 + x + 1   [a7]        
You will see that each box contains exactly half pink and half white squares. (8 pink and 8 white in that case.) That is, exactly half the possible pairs of degree-2 polynomials are relatively prime. And in general, if you pick a random degree-a polynomial and a random degree-b polynomial, where a and b are not both zero, the polynomials will be relatively prime exactly half the time.

The proof of this is delightful. If you run Euclid's algorithm on two relatively prime polynomials over Z2, you get a series of intermediate results, terminating in the constant 1. Given the intermediate results and the number of steps, you can run the algorithm backward and find the original polynomials. If you run the algorithm backward starting from 0 instead of from 1, for the same number of steps, you get two non-relatively-prime polynomials of the same degrees instead. This establishes a one-to-one correspondence between pairs of relatively prime polynomials and pairs of non-relatively-prime polynomials of the same degrees. End of proof. (See the paper for complete details.)

You can use basically the same proof to show that the probability that two randomly-selected polynomials over Zp is 1-1/p. The argument is the same: Euclid's algorithm could produce a series of intermediate results terminating in 0, in which case the polynomials are not relatively prime, or it could produce the same series of intermediate results terminating in something else, in which case they are relatively prime. The paper comes to an analogous conclusion about monic polynomials over Z.


Some folks I showed the diagram to observed that it looks like a quilt pattern. My wife did actually make a quilt that tabulates the GCD function for integers, which I mentioned in the Wikipedia discussion of the notability of the Mathematical Quilting article. That seems to have brought us back to where the article started, so I'll end here.

[ Puzzle: The (11,12) white squares in the picture are connected to the others via row and column 13, which doesn't appear. Suppose the quilt were extended to cover the entire quarter-infinite plane. Would the white area be connected? ]


[Other articles in category /math] permanent link

Fri, 12 Oct 2007

The square of the Catalan sequence
Yesterday I went to a talk by Val Tannen about his work on "provenance semirings".

The idea is that when you calculate derived data in a database, such as a view or a selection, you can simultaneously calculate exactly which input tuples contributed to each output tuple's presence in the output. Each input tuple is annotated with an identifier that says who was responsible for putting it there, and the output annotations are polynomials in these identifiers. (The complete paper is here.)

A simple example may make this a bit clearer. Suppose we have the following table R:
R
a a
a b
a c
b c
c e
d e
We'll write R(p, q) when the tuple (p, q) appears in this table. Now consider the join of R with itself. That is, consider the relation S where S(x, z) is true whenever both R(x, y) and R(y, z) are true:

S
a a
a b
a c
a e
b e
Now suppose you discover that the R(a, b) information is untrustworthy. What tuples of S are untrustworthy?

If you annotate the tuples of R with identifiers like this:

R 
a a u
a b v
a c w
b c x
c e y
d e z
then the algorithm in the paper calculates polynomials for the tuples of S like this:
S 
a a u2
a b uv
a c uw + xv
a e wy
b e xy
If you decide that R(a, b) is no good, you assign the value 0 to v, which reduces the S table to:

S 
a a u2
a b 0
a c uw
a e wy
b e xy
So we see that tuple S(a, b) is no good any more, but S(a, c) is still okay, because it can be derived from u and w, which we still trust.

This assignment of polynomials generalizes a lot of earlier work on tuple annotation. For example, suppose each tuple in R is annotated with a probability of being correct. You can propagate the probabilities to S just by substituting the appropriate numbers for the variables in the polynomials. Or suppose each tuple in R might appear multiple times and is annotated with the number of times it appears. Then ditto.

If your queries are recursive, then the polynomials might be infinite. For example, suppose you are calculating the transitive closure T of relation R. This is like the previous example, except that instead of having S(x, z) = R(x, y) and R(y, z), we have T(x, z) = R(x, z) or (T(x, y) and R(y, z)). This is a recursive equation, so we need to do a fixpoint solution for it, using certain well-known techniques. The result in this example is:

T 
a a u+
a b u*v
a c u*(vx+w)
a e u*(vx+w)y
b c x
b e xy
d e z
In such a case there might be an infinite number of paths through R to derive the provenance of a certain tuple of T. In this example, R contains a loop, namely R(a, a), so there are an infinite number of derivations of some of the tuples in T, because you can go around the loop as many times as you like. u+ here is an abbreviation for the infinite polynomial u + u2 + u3 + ...; u* here is an abbreviation for 1 + u+.

1 a
2 (a + b)
3 ((a + b) + c)
(a + (b + c))
4 (((a + b) + c) + d)
((a + (b + c)) + d)
((a + b) + (c + d))
(a + ((b + c) + d))
(a + (b + (c + d)))
5 ((((a + b) + c) + d) + e)
(((a + (b + c)) + d) + e)
(((a + b) + (c + d)) + e)
(((a + b) + c) + (d + e))
((a + ((b + c) + d)) + e)
((a + (b + (c + d))) + e)
((a + (b + c)) + (d + e))
((a + b) + ((c + d) + e))
((a + b) + (c + (d + e)))
(a + (((b + c) + d) + e))
(a + ((b + (c + d)) + e))
(a + ((b + c) + (d + e)))
(a + (b + ((c + d) + e)))
(a + (b + (c + (d + e))))
In one example in the paper, the method produces a recursive relation of the form V = s + V2, which can be solved by the same well-known techniques to come up with an (infinite) polynomial for V, namely V = 1 + s + 2s2 + 5s3 + 14s4 + ... . Mathematicians will recognize the sequence 1, 1, 2, 5, 14, ... as the Catalan numbers, which come up almost as often as the better-known Fibonacci numbers. For example, the Catalan numbers count the number of binary trees with n nodes; they also count the number of ways of parenthesizing an expression with n terms, as shown in the table at right.

Anyway, in his talk, Val referred to the sequence as "bizarre", and I had to jump in to point out that it was not at all bizarre, it was the Catalan numbers, which are just what you would expect from a relation like V = s + V2, blah blah, and he cut me off, because of course he knows all about the Catalan numbers. He only called them bizarre as a rhetorical flourish, meant to echo the presumed puzzlement of the undergraduates in the room.

(I never know how much of what kind of math to expect from computer science professors. Sometimes they know things I don't expect at all, and sometimes they don't know things that I expect everyone to know.

(Once I was discussing the algorithm used by ENIAC for computing square roots with a professor, and the professor told me that at the beginning of the program there was a loop, which accumulated a total, each time accumulating the contents of a register that was incremented by 2 each time through the loop, and he did not know what was going on there. I instantly guessed that what was happening was that the register contained the numbers 1, 3, 5, 7, ..., incremented by 2 each time, and so the accumulator contained the totals 1, 4, 9, 16, 25, ..., and so the loop was calculating an initial estimate of the size of the square root. If you count the number of increment-by-2's, then when the accumulator exceeds the radicand, the count contains the integer part of the root.

(This was indeed what was going on, and the professor seemed to think it was a surprising insight. I am not relating this boastfully, because I truly don't think it was a particularly inspired guess.

(Now that I think about it, maybe the answer here is that computer science professors know more about math than I expect, and less about computation.)

Anyway, I digress, and the whole article up to now was not really what I wanted to discuss anyway. What I wanted to discuss was that when I started blathering about Catalan numbers, Val said that if I knew so much about Catalan numbers, I should calculate the coefficient of the x59 term in V2, which also appeared as one of the annotations in his example.

So that's the puzzle, what is the coefficient of the x59 term in V2, where V = 1 + s + 2s2 + 5s3 + 14s4 + ... ?

After I had thought about this for a couple of minutes, I realized that it was going to be much simpler than it first appeared, for two reasons.

The first thing that occurred to me was that the definition of multiplication of polynomials is that the coefficient of the xn term in the product of A and B is Σaibn-i. When A=B, this reduces to Σaian-i. Now, it just so happens that the Catalan numbers obey the relation cn+1 = &Sigma cicn-i, which is exactly the same form. Since the coefficients of V are the ci, the coefficients of V2 are going to have the form Σcicn-i, which is just the Catalan numbers again, but shifted up by one place.

The next thing I thought was that the Catalan numbers have a pretty simple generating function f(x). This just means that you pretend that the sequence V is a Taylor series, and figure out what function it is the Taylor series of, and use that as a shorthand for the whole series, ignoring all questions of convergence and other such analytic fusspottery. If V is the Taylor series for f(x), then V2 is the Taylor series for f(x)2. And if f has a compact representation, say as sin(x) or something, it might be much easier to square than the original V was. Since I knew in this case that the generating function is simple, this seemed likely to win. In fact the generating function of V is not sin(x) but (1-√(1-4x))/2x. When you square this, you get almost the same thing back, which matches my prediction from the previous paragraph. This would have given me the right answer, but before I actually finished that calculation, I had an "oho" moment.

The generating function is known to satisfy the relation f(x) = 1 + xf(x)2. This relation is where the (1-√(1-4x))/2x thing comes from in the first place; it is the function that satisfies that relation. (You can see this relation prefigured in the equation that Val had, with V = s + V2. There the notation is a bit different, though.) We can just rearrange the terms here, putting the f(x)2 by itself, and get f(x)2 = (f(x)-1)/x.

Now we are pretty much done, because f(x) = V = 1 + x + 2x2 + 5x3 + 14x4 + ... , so f(x)-1 = x + 2x2 + 5x3 + 14x4 + ..., and (f(x)-1)/x = 1 + 2x + 5x2 + 14x3 + ... . Lo and behold, the terms are the Catalan numbers again.

So the answer is that the coefficient of the x59 term is just c(60), calculation of which is left as an exercise for the reader.

I don't know what the point of all that was, but I thought it was fun how the hairy-looking problem seemed likely to be simple when I looked at it a little more carefully, and then how it did turn out to be quite simple.

This blog has had a recurring dialogue between subtle technique and the sawed-off shotgun method, and I often favor the sawed-off shotgun method. Often programmers' big problem is that they are very clever and learned, and so they want to be clever and learned all the time, even when being a knucklehead would work better. But I think this example provides some balance, because it shows a big win for the clever, learned method, which does produce a lot more understanding.

Order
Higher-Order Perl
Higher-Order Perl
with kickback
no kickback
Then again, it really doesn't take long to whip up a program to multiply infinite polynomials. I did it in chapter 6 of Higher-Order Perl, and here it is again in Haskell:

        data Poly a = P [a] deriving Show

        instance (Eq a) => Eq (Poly a) 
                where (P x) == (P y) = (x == y)

        polySum x [] = x
        polySum [] y = y
        polySum (x:xs) (y:ys) = (x+y) : (polySum xs ys)

        polyTimes  [] _ = []
        polyTimes  _ [] = []
        polyTimes  (x:xs) (y:ys) = (x*y) : more
                          where
                            more = (polySum (polySum (map (x *) ys) (map (* y) xs))
                                    (0 : (polyTimes xs ys)))

        instance (Num a) => Num (Poly a) 
          where (P x) + (P y) = P (polySum x y)
                (P x) * (P y) = P (polyTimes x y)


[Other articles in category /math] permanent link

Mon, 08 Oct 2007

State of the Blog 2006
This is the end of the first year of my blog. The dates on the early articles say that I posted a few in 2005, but they are deceptive. I didn't want a blog with only one post in it, so I posted a bunch of stuff that I had already written, and backdated it to the dates on which I had written it. The blog first appeared on 8 January, 2006, and this was the date on which I wrote its first articles.

Output

Not counting this article, I posted 161 articles this year, totalling about 172,000 words, which I think is not a bad output. About 1/4 of this output was about mathematics.

The longest article was the one about finite extension fields of Z2; the shortest was the MadHatterDay commemorative. Other long articles included some math ones (metric spaces, the Pólya-Burnside theorem) and some non-math ones (how Forbes magazine made a list of the top 20 tools and omitted the hammer, do aliens feel disgust?).

I drew, generated, or appropriated about 300 pictures, diagrams, and other illustrations, plus 66 mathematical formulas. This does not count the 50 pictures of books that I included, but it does include 108 little colored squares for the article on the Pólya-Burnside counting lemma.

Financial

I incurred the costs of Dreamhosting (see below). But these costs are offset because I am also using the DreamHost as a remote backup for files. So it has some non-blog value, and will also result in a tax deduction.

None of the book links earned me any money from kickbacks. However, the blog did generate some income. When Aaron Swartz struck oil, he offered to give away money to web sites that needed it. Mine didn't need it, but a little later he published a list of web sites he'd given money to, and I decided that I was at least as deserving as some of them. So I stuck a "donate" button on my blog and invited Aaron to use it. He did. Thanks, Aaron!

I now invite you to use that button yourself. Here are two versions that both do the same thing:

I could not decide whether to go with the cute and pathetic begging approach (shown right) or the brusque and crass demanding approach (left).

My MacArthur Fellowship check has apparently been held up in the mail.

Popularity

The most popular article was certainly the one on Design Patterns of 1972. I had been thinking this one over for years, and I was glad it attracted as much attention as it did. Ralph Johnson (author of the Design Patterns book) responded to it, and I learned that Design Patterns is not the book that Johnson thought it was. Gosh, I'm glad I didn't write that book.

The followup to the Design Patterns article was very popular also. Other popular articles were on risk, the envelope paradox, the invention of the = sign, and ten science questions every high school graduate should be able to answer.

My own personal favorites are the articles about alien disgust, the manufacture of round objects, and what makes π so peculiar. In that last one, I think I was skating on the thin edge of crackpotism.

The article about Forbes' tool list was mentioned in The New York Times.

The blog attracted around five hundred email messages, most of which were intelligent and thoughtful, and most of which I answered. But my favorite email message was from the guy who tried to convince me that rock salt melts snow because it contains radioactive potassium.

System administration

I moved the blog twice. It originally resided on www.plover.com, which is in my house. I had serious network problems in July and August, Verizon's little annual gift to me. When I realized that the blog was be much more popular than I expected, and that I wanted it to be reliably available, I moved it to newbabe.pobox.com, which I'd had an account on for years but had never really used. This account was withdrawn a few months later, so I rented space at Dreamhost, called blog.plover.com, and moved it there. I expect it will stay at Dreamhost for quite a while.

Moving the blog has probably cost me a lot of readers. I know from the logs that many of them have not moved from newbabe to Dreamhost. Traffic on the new site just after the move was about 25% lower than on the old site just before the move. Oh well.

If the blog hadn't moved so many times, it would be listed by Technorati as one of the top ten blogs on math and science, and one of the top few thousand overall. As it is, the incoming links (which are what Technorati uses to judge blog importance) are scattered across three different sites, so it appears to be three semi-popular blogs rather than one very popular blog. This would bother me, if Technorati rankings weren't so utterly meaningless.

Policy

I made a couple of vows when I started the blog. A number of years ago on my use.perl.org journal I complained extensively about some people I worked with. They deserved everything I said, but the remarks caused me a lot of trouble and soured me on blogs for many years. When I started this blog, I vowed that I wouldn't insult anyone personally, unless perhaps they were already dead and couldn't object. Some people have no trouble with this, but for someone like me, who is a seething cauldron of bile, it required a conscious effort.

I think I've upheld this vow pretty well, and although there have been occasions on which I've called people knuckleheaded assholes, it has always been either a large group (like Biblical literalists) or people who were dead (like this pinhead) or both.

Another vow I made was that I wasn't going to include any tedious personal crap, like what music I was listening to, or whether the grocery store was out of Count Chocula this week. I think I did okay on that score. There are plenty of bloggers who will tell you about the fight they had with their girlfriend last night, but very few that will analyze abbreviations in Medieval Latin. So I have the Medieval Latin abbreviations audience pretty much to myself. I am a bit surprised at how thoroughly I seem to have communicated my inner life, in spite of having left out any mention of Count Chocula. This is a blog of what I've thought, not what I've done.

What I didn't post this year

My blog directory contains 55 unpublished articles, totalling 39,500 words, in various states of incompleteness; compare this with the 161 articles I did complete.

The longest of these unpublished articles was written some time after my article on the envelope paradox hit the front page of Reddit. Most of the Reddit comments were astoundingly obtuse. There were about nine responses of the type "That's cute, but the fallacy is...", each one proposing a different fallacy. All of the proposed fallacies were completely wrong; most of them were obviously wrong. (There is no fallacy; the argument is correct.) I decided against posting this rebuttal article for several reasons:

  • It wouldn't have convinced anyone who wasn't already convinced, and might have unconvinced someone who was. I can't lay out the envelope paradox argument any more briefly or clearly than I did; all I can do is make the explanation longer.

  • It came perilously close to violating the rule about insulting people who are still alive. I'm not sure how the rule applies to anonymous losers on Reddit, but it's probably better to err on the side of caution. And I wasn't going to be able to write the article without insulting them, because some of them were phenomenally stupid.

  • I wasn't sure anyone but me would be interested in the details of what a bunch of knuckleheaded lowlifes infest the Reddit comment board. Many of you, for example, read Slashdot regularly, and see dozens of much more ignorant and ill-considered comments every day.

  • How much of a cretin would I have to be to get in an argument with a bunch of anonymous knuckleheads on a computer bulletin board? It's like trying to teach a pig to sing. Well, okay, I did get in an argument with them over on Reddit; that was pretty stupid. And then I did write the rebuttal article, which was at least as stupid, but which I can at least ascribe to my seething cauldron of bile. But it's never too late to stop acting stupid, and at least I stopped before I cluttered up my beautiful blog with a four-thousand-word rebuttal.

So that was one long article that never made it; had it been published, it would have been longer than any other except the Z2 article.

The article about metric spaces was supposed to be one of a three-part series, which I still hope to finish eventually. I made several attempts to write another part in this series, about the real numbers and why we have them at all. This requires explanation, because the reals are mathematically and philosophically quite artificial and problematic. (It took me a lot of thought to convince myself that they were mathematically inevitable, and that the aliens would have them too, but that is another article.) The three or four drafts I wrote on this topic total about 2,100 words, but I still haven't quite got it where I want it, so it will have to wait.

I wrote 2,000 words about oddities in my brain, what it's good at and what not, and put it on the shelf because I decided it was too self-absorbed. I wrote a complete "frequently-asked questions" post which answered the (single) question "Why don't you allow comments?" and then suppressed it because I was afraid it was too self-absorbed. Then I reread it a few months later and thought it was really funny, and almost relented. Then I read it again the next month and decided it was better to keep it suppressed. I'm not indecisive; I'm just very deliberate.

I finished a 2,000-word article about how to derive the formulas for least-squares linear regression and put it on the shelf because I decided that it was boring. I finished a 1,300-word article about quasiquotation in Lisp and put it on the shelf because I decided it was boring. (Here's the payoff from the quasiquotation article: John McCarthy, the inventor of Lisp, took both the concept and the name directly from W.V.O. Quine, who invented it in 1940.)

Had I been writing this blog in 2005, there would have been a bunch of articles about Sir Thomas Browne, but I was pretty much done with him by the time I started the blog. (I'm sure I will return someday.) There would have been a bunch of articles about John Wilkins's book on the Philosophical Language, and some on his book about cryptography. (The Philosophical Language crept in a bit anyway.) There would have been an article about Charles Dickens's book Great Expectations, which I finished reading about a year and a half ago.

An article about A Christmas Carol is in the works, but I seem to have missed the seasonal window on that one, so perhaps I'll save it for next December. I wrote an article about how to calculate the length of the day, and writing a computer progam to tell time by the old Greek system, which divides the daytime into twelve equal hours and the nighttime into twelve equal hours, so that the night hours are longer than the day ones in the winter, and shorter in the summer. But I missed the target date (the solstice) for that one, so it'll have to wait until at least the next solstice. I wrote part of an article about Hangeul (the Korean alphabet), planning to publish it on Hangeul Day (the Koreans have a national holiday celebrating Hangeul) but I couldn't find the quotations I wanted from 1445, so I put it on the shelf. This week I'm reading Gari Ledyard's doctoral thesis, The Korean Language Reform of 1446, so I may acquire more information about that and be able to finish the article. (I highly recommend the Ledyard thing; it's really well-written.) I recently wrote about 1,000 words about Vernor Vinge's new novel Rainbows End, but that's not finished yet.

A followup to the article about why you don't have one ear in the middle of your face is in progress. It's delayed by two things: first, I made a giant mistake in the original article, and I need to correct it, but that means I have to figure out what the mistake is and how to correct it. And second, I have to follow up on a number of fascinating references about directional olfaction.

Sometimes these followups eventually arrive,as the one about ssh-agent did, and sometimes they stall. A followup to my early article about the nature of transparency, about the behavior of light, and the misconception of "the speed of light in glass", ran out of steam after a page when I realized that my understanding of light was so poor that I would inevitably make several gross errors of fact if I finished it.

I spent a lot of the summer reading books about inconsistent mathematics, including Graham Priest's book In Contradiction, but for some reason no blog articles came of it. Well, not exactly. What has come out is an unfinished 1,230-word article against the idea that mathematics is properly understood as being about formal systems, an unfinished 1,320-word article about the ubiquity of the Grelling-Nelson paradox, an unfinished 1,110-word article about the "recursion theorem" of computer science, and an unfinished 1,460-word article about paraconsistent logic and the liar paradox.

I have an idea that I might inaugurate a new section of the blog, called "junkheap", where unfinished articles would appear after aging in the cellar for three years, regardless what sort of crappy condition they are in. Now that the blog is a year old, planning something two years out doesn't seem too weird.

I also have an ideas file with a couple hundred notes for future articles, in case I find myself with time to write but can't think of a topic. Har.

Surprises

I got a number of unpredictable surprises when I started the blog. One was that I wasn't really aware of LiveJournal, and its "friends" pages. I found it really weird to see my equation-filled articles on subvocal reading and Baroque scientific literature appearing on these pages, sandwiched between posts about Count Chocula from people named "Taldin the Blue Unicorn". Okay, whatever.

I was not expecting that so many of my articles would take the form "ABCDEFG. But none of this is really germane to the real point of this article, which is ... HIJKLMN." But the more articles I write in this style, the more comfortable I am with it. Perhaps in a hundred years graduate students will refer to an essay of this type, with two loosely-coupled sections of approximately the same length, linked by an apologetic phrase, as "Dominus-style".

Wrong wrong wrong!

I do not have a count of the number of mistakes and errors I made that I corrected in later articles, although I wish I did. Nor do I have a count of the number of mistakes that I did not correct.

However, I do know that the phrase "I don't know" (and variations, like "did not know") appears 67 times, in 44 of the 161 articles. I would like to think that this is one of the things that will set my blog apart from others, and I hope to improve these numbers in the coming years.

Thanks

Thanks to all my readers for their interest and close attention, and for making my blog a speedy success.


[Other articles in category /meta] permanent link

The Club

Reduces your risk of auto theft by 400%.


[Other articles in category /math] permanent link

Fri, 05 Oct 2007

Excellent numbers
Another programmer presented the Philadelphia Perl Mongers with this problem, apropos of something else; he had gotten the problem from a friend of his. The problem is to find "excellent numbers". A number n is excellent if it has an even number of digits, and if when you chop it into a front half a and a back half b, you have b2 - a2 = n. For example, 48 is excellent, because 82 - 42 = 48, and 3468 is excellent, because 682 - 342 = 4624 - 1156 = 3468.

This other programmer had written a program to do brute-force search, which wasn't very successful, because there aren't very many excellent numbers. He was presenting it to the Perl Mongers because in the course of trying to speed up the program, he had learned something interesting: using the Memoize module to memoize the squaring function makes a program slower, not faster. But that is another matter for another article.

A slightly less brutal brute-force search locates excellent numbers quite easily. Suppose the number n has 2k digits, and that it is the concatenation of a and b, each with k digits. We want b2 - a2 = n. Since n is the concatenation of a and b, it is equal to a·10k + b. So we want:

a·10k + b = b2 - a2

Or equivalently:

a·(10k + a) = b2 - b

Let's say that a number of the form b2 - b is "rectangular". So our problem is simply to find k-digit numbers a for which a·(10k + a) is rectangular.

That is, instead of brute-forcing n, which has 2k digits, we need only do a brute-force search on a, which has only k digits. This is a lot faster. To find all 10-digit excellent numbers, we are now searching only 90,000 values of a instead of 9,000,000,000 values of n. This is feasible, whereas the larger search isn't.

All we need is some way to determine whether a given number q is rectangular, and that is not very hard to do. One way is simply to precompute a table of all rectangular numbers up to a certain size, and look up q in the table.

But another way is to notice that it is sufficient to be able to extract the "rectangular root" of q. That's the number b such that b2 - b = q. If we can make a good guess about what b might be, then we can calculate b2-b and see if we get back q. This is analogous to checking whether a number s is a perfect square by guessing its square root r and then checking to see if r2 = s. (Of course, it only works if you can guess right!)

But how can we guess the rectangular root of q? The computer has a built-in square root function, but no rectangular root function. But that's okay, because they are very nearly the same thing. Rectangular numbers are very nearly squares: when b is large, b2 - b is relatively very close to b2. So if we are given some number q, and we want to know if it has the form b2 - b, we can get a very good estimate of the size of b just by taking √q.

For example, is 29750 a rectangular number? √29750 = 172.48, so if 29750 is rectangular, it will be 1732 - 173 = 29756. So, no.

Is 1045506 rectangular? √1045506 = 1022.4998, so if 1045506 is rectangular, it will be 10232-1023 = 1045506—check!

The only possible problem is that our assumption that b2-b and b2 will be close may not hold when b is too small. So we might need to put a special case in our program to use a different test for rectangularity when q is too small. But it turns out that the test works fine even for q as small as 2: Is 2 rectangular? √2 = 1.4, so if 2 is rectangular, it will be 22-2—check!

So we code up an is_rectangular function:

        sub is_rectangular {
          my $q = shift;
          my $b = 1 + int(sqrt($q));  # Guess
          if ($b * $b - $b == $q) {   # Check guess
            return $b;                # Aha!
          } else {
            return;                   # Not rectangular
          }
        }
This function returns false if its argument is not rectangular; and it returns the rectangular root otherwise. We will need the rectangular root, because that's exactly the lower half of an excellent number.

Now we need a function that tells us whether we have some number a that might be the upper half of an excellent number:

        my $k = shift || 3;
        my $p = 10**$k;

        sub is_upper_half_of_excellent_number {
          my $a = shift;
          return is_rectangular($a * ($p + $a));
        }
$k here is the number of digits in a. We'll let it be a command-line argument, and default to 3.

We could infer k from a itself, but we'll hardwire it for two reasons. One reason is speed. The other is that we might want to accept a number like 0003514284 as being excellent, where here a has some leading zeroes; fixing k is one way to do this. We also precalculate the constant p = 10k, which we need for the calculation.

Now we just write the brute-force loop:

        for my $a (0 .. $p-1) {
          if(my $b = is_upper_half_of_excellent_number($a)) {
            print "$a$b\n";
          }
        }
The program instantaneously coughs up:

        01
        10101
        16128
        34188
        140400
        190476
        216513
        300625
        334668
        416768
        484848
        530901
        6401025
        8701276
Most of these are correct. For example, 9012 - 5302 = 811801 - 280900 = 530901. 5132 - 2162 = 263169 - 46656 = 216513.

The ones at the top of the list are correct, if you remember about the leading zeroes: 1882 - 0342 = 35344 - 1156 = 034188, and 0012 - 0002 = 1 - 0 = 000001.

The seven-digit numbers at the bottom don't work, because the program has a bug: we forgot to enforce the restriction that b must also have exactly k digits; the last two numbers in the display have four-digit b's. So we need to add another test to the main loop:

        for my $a (0 .. $p-1) {
          if(my $b = is_upper_half_of_excellent_number($a)) {
            print "$a$b\n" if length($b) == $k;
          }
        }
This eliminates the wrong answers from the tail of the list. It also skips over cases where b is too short, and needs leading zeroes, such as a=000, b=001, but fixing that is both easy and unimportant.

The ten-digit solutions are:

        0101010101
        3333466668
        4848484848
        4989086476
I think this is interesting because it shows how a very little bit of mathematical analysis, and some very sloppy numerical work, can turn an intractable problem into a tractable one. We could have worked real hard and maybe come up with some way to generate excellent numbers with no search. But instead, we did just a little work, and that was enough to narrow down the search enormously, to the point where it's practical.

The other thing that I think is interesting is that the other programmer was trying to solve his performance problem with optimization tricks. But when you are searching over 9,000,000,000 cases, optimization tricks don't work. At 1,000 cases per second, 9,000,000,000 cases takes about 104 days to finish. If you can optimize your program to speed it up by 50%, it will still take 52 days to finish. But cutting the 9,000,000,000 cases to 90,000 cuts the run time from 104 days to 90 seconds.

Not to say that optimizing programs never works, of course. But you have to know when optimization might work and when it can't. In this case, micro-optimization wasn't going to help; the only way to fix an algorithm that does a brute-force search of 9,000,000,000 cases is to find a better algorithm. The point of this article is to show that sometimes finding a better algorithm can be pretty easy.

[ Addendum 20060620: I wrote a followup article about how it's not a coincidence that 4848484848 is excellent. ]


[Other articles in category /math] permanent link

Your age as a fraction, again
In a recent article, I discussed methods for calculating your age as a fractional year, in the style of (a sophisticated) three-and-a-half-year-old. For example, as of today, Richard M. Stallman is (a sophisticated) 54-and-four-thirty-thirds-year-old; tomorrow he'll be a 54-and-one-eighth-year-old.

I discussed several methods of finding the answer, including a clever but difficult method that involved fiddling with continued fractions, and some dead-simple brute force methods that take nominally longer but are much easier to do.

But a few days ago on IRC, a gentleman named Mauro Persano said he thought I could use the Stern-Brocot tree to solve the problem, and he was absolutely right. Application of a bit of clever theory sweeps away all the difficulties of the continued-fraction approach, leaving behind a solution that is clever and simple and fast.

Here's the essence of it: We consider a list of intervals that covers all the positive rational numbers; initially, the list contains only the interval (0/1, 1/0). At each stage we divide each interval in the list in two, by chopping it at the simplest fraction it contains.

To chop the interval (a/b, c/d), we split it into the two intervals (a/b, (a+c)/(b+d)), ((a+c)/(b+d)), c/d). The fraction (a+c)/(b+d) is called the mediant of a/b and c/d. It's not obvious that the mediant is always the simplest possible fraction in the interval, but it is true.

So we start with the interval (0/1, 1/0), and in the first step we split it at (0+1)/(1+0) = 1/1. It is now two intervals, (0/1, 1/1) and (1/1, 1/0). At the next step, we split these two intervals at 1/2 and 2/1, respectively; the resulting four intervals are (0/1, 1/2), (1/2, 1/1), (1/1, 2/1), and (2/1, 1/0). We split these at 1/3, 2/3, 3/2, and 3/1. The process goes on from there:

0/1                 1/0                
0/1         1/1         1/0        
0/1     1/2     1/1     2/1     1/0    
0/1   1/3   1/2   2/3   1/1   3/2   2/1   3/1   1/0  
0/1 1/4 1/3 2/5 1/2 3/5 2/3 3/4 1/1 4/3 3/2 5/3 2/1 5/2 3/1 4/1 1/0

Or, omitting the repeated items at each step:

0/1                 1/0                
          1/1                  
      1/2           2/1          
    1/3       2/3       3/2       3/1      
  1/4   2/5   3/5   3/4   4/3   5/3   5/2   4/1  

If we disregard the two corners, 0/1 and 1/0, we can see from this diagram that the fractions naturally organize themselves into a tree. If a fraction is introduced at step N, then the interval it splits has exactly one endpoint that was introduced at step N-1, and this is its parent in the tree; conversely, a fraction introduced at step N is the parent of the two step-N+1 fractions that are introduced to split the two intervals of which it is an endpoint.

This process has many important and interesting properties. The splitting process eventually lists every positive rational number exactly once, as a fraction in lowest terms. Every fraction is simpler than all of its descendants in the tree. And, perhaps most important, each time an interval is split, it is divided at the simplest fraction that the interval contains. ("Simplest" just means "has the smallest denominator".)

This means that we can find the simplest fraction in some interval simply by doing binary tree search until we find a fraction in that interval.

For example, Placido Polanco had a .368 batting average last season. What is the smallest number of at-bats he could have had? We are asking here for the denominator of the simplest fraction that lies in the interval [.3675, .3685).

  • We start at the root, which is 1/1. 1 is too big, to we move left down the tree to 1/2.
  • 1/2 = .5000 and is also too big, so we move left down the tree to 1/3.
  • 1/3 = .3333 and is too small, so we move right down the tree to 2/5.
  • 2/5 = .4000 and is too big, so go left to 3/8, which is the mediant of 1/3 and 2/5.
  • 3/8 = .3750, so go left to 4/11, the mediant of 1/3 and 3/8.
  • 4/11 = .3636, so go right to 7/19, the mediant of 3/8 and 4/11.
  • 7/19 = .3684, which is in the interval, so we are done.
If we knew nothing else about Polanco's batting record, we could still conclude that he must have had at least 19 at-bats. (In fact, he had 35 hits in 95 at-bats.)

Calculation of mediants is incredibly simple, even easier than adding fractions. Tree search is simple, just compare and then go left or right. Calculating whether a fraction is in an interval is simple too. Everything is simple simple simple.

Our program wants to find the simplest fraction in some interval, say (L, R). To do this, it keeps track of l and r, initially 0/1 and 1/0, and repeatedly calculates the mediant m of l and r. If the mediant is in the target interval, the function is done. If the mediant is too small, set l = m and continue; if it is too large set r = m and continue:

        # Find and return numerator and denominator of simplest fraction
        # in the range [$Ln/$Ld, $Rn/$Rd)
        #
        sub find_simplest_in {
            my ($Ln, $Ld, $Rn, $Rd) = @_;
            my ($ln, $ld) = (0, 1);
            my ($rn, $rd) = (1, 0);
            while (1) {
                my ($mn, $md) = ($ln + $rn, $ld + $rd);
        #	print "  $ln/$ld  $mn/$md  $rn/$rd\n";
                if (isin($Ln, $Ld, $mn, $md, $Rn, $Rd)) {
                    return ($mn, $md);
                } elsif (isless($mn, $md, $Ln, $Ld)) {
                    ($ln, $ld) = ($mn, $md);
                } elsif (islessequal($Rn, $Rd, $mn, $md)) {
                    ($rn, $rd) = ($mn, $md);
                } else {
                    die;
                }
            }
        }
(In this program, rn and rd are the numerator and the denominator of r.)

The isin, isless, and islessequal functions are simple utilities for comparing fractions.

        # Return true iff $an/$ad < $bn/$bd
        sub isless {
            my ($an, $ad, $bn, $bd) = @_;
            $an * $bd < $bn * $ad;
        }

        # Return true iff $an/$ad <= $bn/$bd
        sub islessequal {
            my ($an, $ad, $bn, $bd) = @_;
            $an * $bd <= $bn * $ad;
        }

        # Return true iff $bn/$bd is in [$an/$ad, $cn/$cd).
        sub isin {
            my ($an, $ad, $bn, $bd, $cn, $cd) = @_;
            islessequal($an, $ad, $bn, $bd) and isless($bn, $bd, $cn, $cd);
        }
The asymmetry between isless and islessequal is because I want to deal with half-open intervals.

Just add a trivial scaffold to run the main function and we are done:

        #!/usr/bin/perl

        my $D = shift || 10;
        for my $N (0 .. $D-1) {
            my $Np1 = $N+1;
            my ($mn, $md) = find_simplest_in($N, $D, $Np1, $D);
            print "$N/$D - $Np1/$D : $mn/$md\n";
        }
Given the argument 10, the program produces this output:

        0/10 - 1/10 : 1/11
        1/10 - 2/10 : 1/6
        2/10 - 3/10 : 1/4
        3/10 - 4/10 : 1/3
        4/10 - 5/10 : 2/5
        5/10 - 6/10 : 1/2
        6/10 - 7/10 : 2/3
        7/10 - 8/10 : 3/4
        8/10 - 9/10 : 4/5
        9/10 - 10/10 : 9/10
This says that the simplest fraction in the range [0/10, 1/10) is 1/11; the simplest fraction in the range [3/10, 4/10) is 1/3, and so forth. The simplest fractions that do not appear are 1/5, which is beaten out by the simpler 1/4 in the [2/10, 3/10) range, and 3/5, which is beaten out by 2/3 in the [6/10, 7/10) range.

Unlike the programs from the previous article, this program is really fast, even in principle, even for very large arguments. The code is brief and simple. But we had to deploy some rather sophisticated number theory to get it. It's a nice reminder that the sawed-off shotgun doesn't always win.

This is article #200 on my blog. Thanks for reading.


[Other articles in category /math] permanent link

Thu, 04 Oct 2007

The world's worst macro preprocessor
Last week I added another plugin to my Blosxom installation. As I wrote before, the sole benefit of Blosxom is that it's incredibly simple and lightweight. So when I write plugins for it, I try to keep them incredibly simple and lightweight, lest I spoil the single major benefit of Blosxom. Sometimes I'm more successful, sometimes less so. This time I think I did a good job.

The goal last time was a macro processor. I write a lot of math articles. I get tired of writing <sup>2</sup> every time I want a superscript 2. Even if I bind a function key to that sequence of characters, it's hard to read. But now, with my new Blosxom macro processor, I just insert a line into my article that says:

  #define ^2 <sup>2</sup>

and for the rest of the article, ^2 is expanded to <sup>2</sup>.

This has turned out really well, and I'm using it for all sorts of stuff. I use it for math notations, such as for making -> an abbreviation for &rarr; (→), and for making ~ an abbreviation for &not; (¬).

But I've also used it to #define Godel G&ouml;del. I've used it to #define KK <b>K</b> and #define SS <b>S</b>, which makes an article I'm writing about combinatory logic readable, where it wasn't readable before. In my recent article about job hunting, I used it to #define CV r&eacute;sum&eacute;, which saved me from having to interrupt my train of thought several times in the article.

There are some important points about the design that I think I got right on the first try. Whenever you write a macro system, you have to ask about escape sequences: what do you do if you don't want a macro expanded? For example, in the combinatory logic article I defined a macro SS. This meant that if I had written MOUSSE in the article somewhere, it would have turned into MOUSE. How should I prevent that kind of error?

Answer: I don't. I'm unlikely to do that. But if I do, I'll pick it up during the article proofreading phase. If I can't avoid writing MOUSSE, I have two choices: I can change the name of the SS macro to something easier to avoid—like S*, say, or I can define a second macro: #define !MOUSSE MOUSSE. But so far, it hasn't come up.

One alternative solution is to say that macros are expanded only in certain contexts. For example, SS might only be expanded when it is a complete word, not when it is in the middle of a word, as MOUSSE. I resisted this solution. It is much simpler to remember that every macro is expanded everywhere. And it it is much easier to fix the problem of a macro being expanded when I don't want it than it is to fix the problem of a macro not being expanded when I do want it. So every macro is expanded no matter where it appears.

Related to the unintentional-expansion issue is that each article has its own private macro set. I don't have to worry that by defining a macro named -> in one article that I might be sabotaging my opportunity to actually write -> in some unknown future article. Each set of macros can be totally ad hoc. I don't have to worry about global tradeoffs. Do I #define --- &mdash;, knowing that that will foreclose my opportunity to use --- in any other way? I can make the decision based on simple, local information.

It would have been tempting to over-engineer the system and add all sorts of complex escape facilities. I think I made the right choice here by not doing any of that.

Another escaping issue: What if I want to write something that looks like a definition but isn't? Here I avoided the problem by choosing a definition syntax that I was unlikely to write in any other context: #define in the leftmost column indicates a definition. In this article, I had to write some similar text. It was no trouble to indent it a couple of spaces, disabling the special meaning. But HTML is already full of escape mechanisms, and it would have been no trouble to write &#35;define instead of #define if for some reason I had really needed it to appear in the leftmost column. (Unlikely anyway, since HTML has no column semantics.)

Another right choice I think I made was not to parametrize the macros. An article on algebra might well have:

  #define ^2 <sup>2</sup>
  #define ^3 <sup>3</sup>
and it might be oh-so-tempting to try to eliminate the duplication à la C:

  #define ^(\w+) <sup>$1</sup>
I did not do this. It would have complicated the processing substantially. It would also have complicated the use of the package substantially: I would have to worry a lot more than I do about invoking macros unintentionally. And it is not needed. Not so far, anyway. Because macro definitions only last for the duration of the article, there is no pressure to make a complete or consistent set of definitions. If an article happens to use the notations 2, i, and N, I can define macros for those and only those notations.

Also tempting is to extend the macro system to support something like this:

  #define BF(.*) <b>$1</b>
I have so far resisted this. My feeling is that if I want to do anything like this, I should take it as a sign that I should be writing the articles in some markup system other than HTML. Choice of that markup system should be made carefully, and not organically as an ad-hoc overburdening of the macro system.

I did run into one trouble with the macro system. Originally, it was invoked before some of my other plugins and after others. The earlier plugins automatically inserted certain text into the article that sometimes accidentally triggered my macros. I have not had any trouble with this since I changed the plugin order to invoke the macro processor before any of the other plugins.

The macro-processing code is about 19 lines long, of which three are diagnostic. It is the world's worst macro system. It has exactly one feature. It is, I think the simplest thing that could possibly work, and so a good companion to Blosxom. For this application, the world's worst macro system is the world's best.

[ Addendum 20071004: There's now a one-year retrospective analysis. ]


[Other articles in category /prog] permanent link

The world's worst macro preprocessor: postmortem
I see that the world's worst macro processor, subject of a previous article, is a little over a year old. A year ago I said that it was a huge success. I think it's time for a postmortem analysis.

My overall assessment is that it has been a huge success, and that if I were doing it over I would do it the same way.

A recent article contained a bunch of red and blue dots:

Well, clearly you can do four: . And then you can add another red one on the end: . And then another that could be either red or blue: . And then the next can be either color, say blue: .

I typed this using these macros:

        #define R* <span style="color: red">&bull;</span>
        #define B* <span style="color: blue">&bull;</span>
        #define Y* <span style="color: yellow">&bull;</span>
Without the macro processor, I would have had to suffer a lot. Then, a little while later, I needed to prepare this display:









No problem; the lines just look like R*R*B*B*R*R*B*Y*B*Y*Y*R*Y*R*R*B*R*B*B*Y*R*Y*Y*B*Y*B*.

Some time later I realized that this display would be totally illegible to the blind, the color-blind, and people using text-only browsers. So I just changed the macros:

        #define R* <span style="color: red">R</span>
        #define B* <span style="color: blue">B</span>
        #define Y* <span style="color: yellow">Y</span>
Problem solved. instantly becomes R R B B R B B. And a good thing, too, because I discovered afterward that a lot of aggregators, like bloglines and feedburner, discard the color information.

I find that I've used the macro feature 114 times so far. The most common use has been:

   #define ^2 <sup>2</sup>
But I also have files with:

      #define r2 &radic;2
      #define R2 &radic;2
      #define s2 &radic;2
      #define S2 &radic;2
That last one appears in three files. Clearly, making the macros local to files was a good decision.

Those uses are pretty typical. A less typical one is:

      #define <OVL> <span style="text-decoration: overline">
      #define </OVL> </span>
This is the sort of thing that you can get away with on a one-time basis, but which you wouldn't want to make a convention of. Since the purpose of the macro processor is to enable such hacks for the duration of a single article, it's all good.

I did run into at least one problem: I was writing an article in which I had defined ^i to abbreviate <sup><i>i</i></sup>. And then several paragraphs later I had a TeX formula that contained the ^i sequence in its TeX meaning. This was being replaced with a bunch of HTML, which was then passed to TeX, which then produced the wrong output.

One can solve this by reordering the plugins. If I had put the TeX plugin before the macro plugin, the problem would have gone away, because the TeX plugin would have replaced the TeX formula with an image element before the macro plugin ever saw the ^i.

This approach has many drawbacks. One is that it would no longer have been possible to use Blosxom macros in a TeX formula. I wasn't willing to foreclose this possibility, and I also wasn't sure that I hadn't done it somewhere. If I had, the TeX formula that depended on the macro expansion would have broken. And this is a risk whenever you move the macro plugin: if you move it from before plugin X to after plugin X, you have to worry that maybe something in some article depended on the text passed to X having been macro-processed.

When I installed the macro processor, I placed it first in plugin order for precisely this reason. Moving the macro substitution later would have required me to remember which plugins would be affected by the macro substitutions and which not. With the macro processing first, the question has a simple answer: all of them are affected.

Also, I didn't ever want to have to worry that some macro definition might mangle the output of some plugin. What if you are hacking on some plugin, and you change it to return <span style="Foo"> instead of <span style="foo">, and then discover that three articles you wrote back in 1997 are now totally garbled because they contained #define Foo >WUGGA<? It's just too unpredictable. Having the macro processing occur first means that you can always see in the original article file just what might be macro-replaced.

So I didn't reorder the plugins.

Another way to solve the TeX ^i problem would have been to do something like this:

        #define ^i <sup><i>i</i></sup>
        #define ^*i ^i
with the idea that I could write ^*i in the TeX formula, and the macro processor would replace it with ^i after it was done replacing all the ^i's.

At present the macro processor does not define any order to macro replacements, but it does guarantee to replace each string only once. That is, the results of macro replacement are not themselves searched for macro replacement. This limits the power of the macro system, but I think that is a good thing. One of the powers that is thus proscribed is the power to get stuck in an infinite loop.

It occurs to me now that although I call it the world's worst macro system, perhaps that doesn't give me enough credit for doing good design that might not have been obvious. I had forgotten about my choice of single-substituion behavior, but looking back on it a year later, I feel pleased with myself for it, and imagine that a lot of people would have made the wrong choice instead.

(A brief digression: unlimited, repeated substitution is a bad move here because it is complex—much more complex than it appears. A macro system with single substitution is nothing much, but a macro system with repeated substitution is a programming language. The semantics of the λ-calculus is nothing more than simple substitution, repeated as necessary, and the λ-calculus is a maximally complex computational engine. Term-rewriting systems are a more obvious theoretical example, and TeX is a better-known practical example of this phenomenon. I was sure I did not want my macro system to be a programming language, so I avoided repeated substitution.)

Because each input text is substituted at most once, the processor's refusal to define the order of the replacements is not something you have to think about, as long as your macros are prefix-unique. (That is, as long as none is a prefix of another.) So you shouldn't define:

  #define foo   bar
  #define fool  idiot
because then you don't know if foolish turns into barlish or idiotish. This is not a big deal in practice.

Well, anyway, I did not solve the problem with #define ^*i ^i. I took a much worse solution, which was to hack a #undefall directive into the macro processor. In my original article, I boasted that the macro processor "has exactly one feature". Now it has two, and it's not an improvement. I disliked the new feature at the time, and now that I'm reviewing the decision, I think I'm going to take it out.

I see that I did use the double-macro solution elsewhere. In the article about Gödel and the U.S. Constitution, I macroed an abbreviation for the umlaut:

        #define Godel G&ouml;del
But this sequence also ocurred in the URLs in the link elements, and the substitution broke the links. I should probably have changed this to:

        #define Go:del G&ouml;del
But instead I added:

        #define GODEL Godel
and then used GODEL in the URLs. Oh well, whatever works, I guess.

Perhaps my favorite use so far is in an (unfinished) article about prosopagnosia. I got tired of writing about prosopagnosia and prosopagnosiacs, so

      #define PAa prosopagnosia
      #define PAic prosopagnosiac
Note that with these definitions, I get PAa's, and PAics for free. I could use PAac instead of defining PAic, but that would prevent me from deciding later that prosopagnosiac should be spelled "prosopagnosic".


[Other articles in category /prog] permanent link

Mon, 01 Oct 2007

1+1=2

Order
Principia Mathematica (through section 56)
Principia Mathematica (through section 56)
with kickback
no kickback
Whitehead and Russell's Principia Mathematica is famous for taking a thousand pages to prove that 1+1=2. Of course, it proves a lot of other stuff, too. If they had wanted to prove only that 1+1=2, it would probably have taken only half as much space.

Principia Mathematica is an odd book, worth looking into from a historical point of view as well as a mathematical one. It was written around 1910, and mathematical logic was still then in its infancy, fresh from the transformation worked on it by Peano and Frege. The notation is somewhat obscure, because mathematical notation has evolved substantially since then. And many of the simple techniques that we now take for granted are absent. Like a poorly-written computer program, a lot of Principia Mathematica's bulk is repeated code, separate sections that say essentially the same things, because the authors haven't yet learned the techniques that would allow the sections to be combined into one.

For example, section ∗22, "Calculus of Classes", begins by defining the subset relation (∗22.01), and the operations of set union and set intersection (∗22.02 and .03), the complement of a set (∗22.04), and the difference of two sets (∗22.05). It then proves the commutativity and associativity of set union and set intersection (∗22.51, .52, .57, and .7), various properties like $\alpha\cap\alpha = \alpha$ (∗22.5) and the like, working up to theorems like ∗22.92: $\alpha\subset\beta \rightarrow \alpha\cup(\beta - \alpha)$.

Section ∗23 is "Calculus of Relations" and begins in almost exactly the same way, defining the subrelation relation (∗23.01), and the operations of relational union and intersection (∗23.02 and .03), the complement of a relation (∗23.04), and the difference of two relations (∗23.05). It later proves the commutativity and associativity of relational union and intersection (∗23.51, .52, .57, and .7), various properties like $\alpha\dot\cap\alpha = \alpha$ (∗22.5) and the like, working up to theorems like ∗23.92: $\alpha\dot\subset\beta \rightarrow \alpha\dot\cup(\beta \dot- \alpha)$.

The section ∗24 is about the existence of sets, the null set $\Lambda$, the universal set ${\rm V}$, their properties, and so on, and then section ∗24 is duplicated in ∗25 in a series of theorems about the existence of relations, the null relation $\dot\Lambda$, the universal relation $\dot {\rm V}$, their properties, and so on.

That is how Whitehead and Russell did it in 1910. How would we do it today? A relation from S to T is defined as a subset of S × T and is therefore a set. Union, intersection, difference, and the other operations are precisely the same for relations as they are for sets, because relations are sets. All the theorems about unions and intersections of relations, like $\alpha\dot\cap\alpha = \alpha$, just go away, because we already proved them for sets and relations are sets. The null relation is is the null set. The universal relation is the universal set.

A huge amount of other machinery goes away in 2006, because of the unification of relations and sets. Principia Mathematica needs a special notation and a special definition for the result of restricting a relation to those pairs whose first element is a member of a particular set S, or whose second element is a member of S, or both of whose elements are members of S; in 2006 we would just use the ordinary set intersection operation and talk about R ∩ (S×B) or whatever.

Whitehead and Russell couldn't do this in 1910 because a crucial piece of machinery was missing: the ordered pair. In 1910 nobody knew how to build an ordered pair out of just logic and sets. In 2006 (or even 1956), we would define the ordered pair <a, b> as the set {{a}, {a, b}}. Then we would show as a theorem that <a, b> = <c, d> if and only if a=c and b=d, using properties of sets. Then we would define A×B as the set of all p such that p = <a, b> ∧ aAbB. Then we would define a relation on the sets A and B as a subset of A×B. Then we would get all of ∗23 and ∗25 and a lot of ∗33 and ∗35 and ∗36 for free, and probably a lot of other stuff too.

(By the way, the {{a}, {a, b}} thing was invented by Kuratowski. It is usually attributed to Norbert Wiener, but Wiener's idea, although similar, was actually more complicated.)

There are no ordered pairs in Principia Mathematica, except implicitly. There are barely even any sets. Whitehead and Russell want to base everything on logic. For Whitehead and Russell, the fundamental notion is the "propositional function", which is a function φ whose output is a truth value. For each such function, there is a corresponding set, which they denote by $\hat x\phi(x)$, the set of all x such that &phi(x) is true. For Whitehead and Russell, a relation is implied by a propositional function of two variables, analogous to the way that a set is implied by a propositional function of one variable. In 2006, we dispense with "functions of two variables", and just talk about functions whose (single) argument is an ordered pair; a relation then becomes the set of all ordered pairs for which a function is true.

Russell is supposed to have said that the discovery of the Sheffer stroke (a single logical operator from which all the other logical operators can be built) was a tremendous advance, and would change everything. This seems strange to us now, because the discovery of the Sheffer stroke seems so simple, and it really doesn't change anything important. You just need to append a note to the beginning of chapter 1 that says that ∼p and pq are abbreviations for p|p and p|p.|.q|q, respectively, prove the five fundamental axioms, and leave everything else the same. But Russell might with some justice have said the same thing about the discovery that ordered pairs can be interpreted as sets, a simple discovery that truly would have transformed the Principia Mathematica into quite a different work.

Anyway, with that background in place, we can discuss the Principia Mathematica proof of 1+1=2. This occurs quite late in Principia Mathematica, in section ∗102. My abridged version only goes to ∗56, but that is far enough to get to the important precursor theorem, ∗54.43, scanned below:

The notation can be overwhelming, so let's focus just on the statement of the theorem, ignore everything else, even the helpful remark at the bottom:

This is the theorem that is being proved; what follows is the proof.

Now I should explain the notation, which has changed somewhat since 1910. First off, Principia Mathematica uses Peano's "dots" notation to disambiguate precedence, where we now use parentheses instead. The dot notation takes some getting used to, but has some distinct advantages over the parentheses. The idea is just that you indicate grouping by putting in dots, so that (1+2)×(3+4)×(5+6) is written as 1+2.×.3+4.×.5+6. The middle sub-formula is between a pair of dots. The (1+2) sub-formula is between a pair of dots also, but the dot on the left end is superfluous, and we omit it; similarly, the sub-formula (5+6) is delimited by a dot on the left and by the end of the formula on the right.

What if you need to nest parentheses? Then you use more dots. A double dot (:) is like a single dot, but stronger. For example, we write ((1+2)×3)+4 as 1+2 . × 3 : + 4, and the double dot isolates the entire 1+2 . × 3 expression into a single sub-formula to which the +4 applies.

Sometimes you need more levels of precedence, and then you use triple dots (.: and :.) and quadruple (::). This formula, as you see, has double and triple dots. Translating the dots into standard parenthesis notation, we have $$\ast54.43. \vdash ((\alpha, \beta \in 1 ) \supset (( \alpha\cap\beta = \Lambda) \equiv (\alpha\cup\beta \in 2)))$$. This is rather more cluttered-looking than the version with the dots, and in complicated formulas you can have trouble figuring out which parentheses match with which. With the dots, it's always easy. So I think it's a bit unfortunate that this convention has fallen out of use.

The $\vdash$ symbol has not changed; it means that the formula to which it applies is asserted to be true. $\supset$ is logical implication, and $\equiv$ is logical equivalence. Λ is the empty set, which we write nowadays as ∅. ∩ &cup, and ∈ have their modern meanings: ∩ and ∪ are the set intersection and the union operators, and xy means that x is an element of set y.

The remaining points are semantic. α and β are sets. 1 denotes the set of all sets that have exactly one element. That is, it's the set { c : there exists a such that c = { a } }. Theorems about 1 include, for example:

  • that Λ∉1 (∗52.21),

  • that if &alpha∈1 then there is some x such that α = {x} (∗52.1), and

  • that {x}∈1 (∗52.22).

2 is similarly the set of all sets that have exactly two elements. An important theorem about 2 is ∗54.3, which says $$\ast54.3. \vdash 2 = \hat\alpha\{ (\exists x) \gt .\gt x\in\alpha \gt . \gt \alpha - \iota`x\in 1 \}$$. In Principia Mathematica notation, {x}, the set that contains x and nothing else, is written &iota‘x, so this theorem says that 2 is identical with the set of all α such that α has some element x , which, when removed from α, leaves a 1-element set.

So here is theorem ∗54.43 again:

It asserts that if sets α and β each have exactly one element, then they are disjoint (that is, have no elements in common) if and only if their union has exactly two elements.

The proof, which appears in the scan above following the word "Dem." (short for "demonstration") goes like this:

"Theorem ∗54.26 implies that if α = {x} and β = {y}, then α∪β has 2 elements if and only if x is different from y."

"By theorem ∗51.231, this last bit (x is different from y) is true if and only if {x} and {y} are disjoint."

"By ∗13.12, this last bit ({x} and {y} are disjoint) is true if and only if α and β themselves are disjoint." The partial conclusion at this point, which is labeled (1), is that if α = {x} and β = {y}, then α∪β ∈ 2 if and only if α∩β = Λ.

The proof continues: "Conclusion (1), with theorems ∗11.11 and ∗11.35, implies that if there exists x and y so that α is {x} and β is {y}, then α∪β ∈ 2 if and only if α and β are disjoint." This conclusion is labeled (2).

Finally, conclusion (2), together with theorems ∗11.54 and ∗52.1, implies the theorem we were trying to prove.

Maybe the thing to notice here is how very small the steps are. ∗54.26, on which this theorem heavily depends, is almost the same; it asserts that {x}∪{y} &isin 2 if and only if xy. ∗54.26, in turn, depends on ∗54.101, which says that α has 2 elements if and only if there exist x and y, not the same, such that α = {x} ∪ {y}. ∗54.101 is just a tiny bit different from the definition of 2. Theorem ∗51.231 says that {x} and {y} are disjoint if and only if x and y are different. ∗52.1 is a basic property of 1; we saw it before.

The other theorems cited in the demonstration are very tiny technical matters. ∗11.54 says that you can take an assertion that two things exist and separate it into two assertions, each one asserting that one of the things exists. ∗11.11 is even slimmer: it says that if &phi(x, y) is always true, then you can attach a universal quantifier, and assert that &phi(x, y) is true for all x and y. ∗13.12 concerns the substitution of equals for equals: if x and y are the same, then x possesses a property ψ if and only if y does too.

I haven't seen the later parts of Principia Mathematica, because my copy stops after section ∗56, and the arithmetic stuff is much later. But this theorem clearly has the sense of 1+1=2 in it, and the later theorem (∗110.643) that actually asserts 1+1=2 depends strongly on this one.

Although I am not completely sure what is going to happen later on (I've wasted far too much time on this already to put in more time to get the full version from the library) I can make an educated guess. Principia Mathematica is going to define the number 17 as being the set of all 17-element sets, and similarly for every other number; the use of the symbol 2 to represent the set of all 2-element sets prefigures this. These sets-of-all-sets-of-a-certain-size will then be identified as the "cardinal numbers".

The Principia Mathematica will define the sum of cardinal numbers p and q something like this: take a representative set a from p; a has p elements. Take a representative set b from q; b has q elements. Let c = ab. If c is a member of some cardinal number r, and if a and b are disjoint, then the sum of p and q is r.

With this definition, you can prove the usual desirable properties of addition, such as x + 0 = x, x + y = y + x, and 1 + 1 = 2.

In particular, 1+1=2 follows directly from theorem ∗54.43; it's just what we want, because to calculate 1+1, we must find two disjoint representatives of 1, and take their union; ∗54.43 asserts that the union must be an element of 2, regardless of which representatives we choose, so that 1+1=2.

Post scriptum: Peter Norvig says that the circumflex in the Principia Mathematica notation $\hat x\phi(x)$ is the ultimate source of the use of the word lambda to denote an anonymous function in the Lisp and Python programming languages. I am sure you know that these languages get "lambda" from the use of the Greek letter λ by Alonzo Church to represent function abstraction in his "λ-calculus": In Lisp, (lambda (u) B) is a function that takes an argument u and returns the value of B; in the λ-calculus, λu.B is a function that takes an argument u and returns the value of B. Norvig says that Church was originally planning to write the function λu.B as û.B, but his printer could not do circumflex accents. So he considered moving the circumflex to the left and using a capital lambda instead: Λu.B. The capital Λ looked too much like logical and ∧, which was confusing, so he used lowercase lambda λ instead.

Post post scriptum: Everyone always says "Russell and Whitehead". Google results for "Russell and Whitehead" outnumber those for "Whitehead and Russell" by two to one, for example. Why? The cover and the title page say "Alfred North Whitehead and Bertrand Russell, F.R.S.". How and when did Whitehead lose out on top billing?

[ Addendum 20060913: I figured out how and when Whitehead lost out on top billing: 10 December 1950. ]


[Other articles in category /math] permanent link

Thu, 20 Sep 2007

Approximations to pi
In an earlier post I mentioned G.H. Hardy's astonishment when he first encountered Ramanujan's approximation to π:

[ Addendum 20060402: I inexplicably put in the wrong formula here. The one I meant to put in is in this followup article. ]

Order
Pure Mathematics
Pure Mathematics
with kickback
no kickback
I'm planning to write a blog article about Gaussian integers, and in the course of my research I picked up my old, battered copy of G.H. Hardy's Pure Mathematics. I haven't spent as much time reading this book as I should have; it's full of good stuff. There didn't seem to be anything in there about the Gaussian integers (digression: What's next in the sequence 1, 2, 4, 6, 10, 14, 16, 24, 26?) but while scanning the index I noticed there was an entry for Ramanujan, so I checked it out.

The entry concerns approximations to π, and in particular π ≅ (13/25)√146. Hardy says "If R is the earth's radius, the error in supposing AM to be its circumference is less than 11 yards.

Hardy continues, mentioning the well-known approximations 22/7 and 355/113, about which I am sure I will have something to say in the future, in connection with continued fractions. He then says:

A large number of curious approximations will be found in Ramanujan's Collected papers, pp. 23-39. Among the simplest are

;

these are correct to 3, 3, 8, and 9 places respectively.

All of which, in my usual digressive style, is only an introduction to the main point of this note, which is that Hardy finishes the section by saying:

It is stated in the Bible (1 Kings vii. 23, 2 Chron. iv. 2) that π = 3.
Let's look at what the Bible actually says:

1 Kings 7:23 And he made a molten sea, ten cubits from the one brim to the other: it was round all about, and his height was five cubits: and a line of thirty cubits did compass it round about.

2 Chronicles 4:2 Also he made a molten sea of ten cubits from brim to brim, round in compass, and five cubits the height thereof; and a line of thirty cubits did compass it round about.

I think there are two arguments that must be made here in defense of the Bible. First, one can infer the supposed value of π only if one assumes that the molten sea was a geometrically exact circle. But the sea is not described as circular; it is described only as "round". It could have been an approximate circle; or it could have been a mathematically exact ellipse; or it could have had many other shapes. Veterans' Stadium in Philadelphia was often described as "round" also, and it was not a circle, but an octorad. Watermelons are round, but are not circular, or spherical.

The other argument I would make is that it is not at all clear that any attempt was being made to state the sizes with mathematical exactitude. The use of round numbers throughout (no pun intended) supports this. If the Bible had said that the molten sea was thirteen cubits across and thirty-nine cubits around, I might agree that Hardy was right to complain. But if we suppose that the measurements are only being reported to one significant figure, we cannot conclude whether the value of π that was used was 3 or 3.1416—or 2.7 for that matter. If you say that your house is forty feet tall, you would be rightly annoyed to have G.H. Hardy to come and ridicule you for being unable to distinguish between the numbers 40 and 41.37.

Order
A Mathematician's Apology
A Mathematician's Apology
with kickback
no kickback
Hardy was an atheist, and was very strongly anti-religious. (C.P. Snow says, in the preface to A Mathematician's Apology, that "On a quiet and lovely May evening at Fenner's, round about the same period, the chimes of six o'clock fell across the ground. 'It's rather unfortunate', said Hardy simply, 'that some of the happiest hours of my life should have been spent within sound of a Roman Catholic church.'") He was only too glad to take little potshots at the Bible at any opportunity, even in his pure mathematics textbook—or especially so, since he could get in an additional dig through the implied comparison with Ramanujan. It's certainly true that the ancient Hebrews were not mathematically sophisticated. But this particular potshot, which Hardy is far from the only person to take, seems to me to be unearned.


[Other articles in category /math] permanent link

Recent Linogram development update
Lately most of my spare time (and some not-spare time) has been going to linogram. I've been posting updates pretty regularly at the main linogram page. But I don't know if anyone ever looks at that page. That got me thinking that it was not convenient to use, even for people who are interested in linogram, and that maybe I should have an RSS/Atom feed for that page so that people who are interested do not have to keep checking back.

Then I said "duh", because I already have a syndication feed for this page, so why not just post the stuff here?

So that is what I will do. I am about to copy a bunch of stuff from that page to this one, backdating it to match when I posted it.

The new items are: [1] [2] [3] [4] [5].

People who only want to hear about linogram, and not about anything else, can subscribe to this RSS feed or this Atom feed.


[Other articles in category /linogram] permanent link

G.H. Hardy on analytic number theory and other matters

Order
Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work
Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work
with kickback
no kickback
A while back I was in the Penn math and physics library browsing in the old books, and I ran across Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work by G.H. Hardy. Srinivasa Ramanujan was an unknown amateur mathematician in India; one day he sent Hardy some of the theorems he had been proving. Hardy was boggled; many of Ramanujan's theorems were unlike anything he had ever seen before. Hardy said that the formulas in the letter must be true, because if they were not true, no one would have had the imagination to invent them. Here's a typical example:

Hardy says that it was clear that Ramanujan was either a genius or a confidence trickster, and that confidence tricksters of that caliber were much rarer than geniuses, so he was prepared to give him the benefit of the doubt.

But anyway, the main point of this note is to present the following quotation from Hardy. He is discussing analytic number theory:

The fact remains that hardly any of Ramanujan's work in this field had any permanent value. The analytic theory of numbers is one of those exceptional branches of mathematics in which proof really is everything and nothing short of absolute rigour counts. The achievement of the mathematicians who found the Prime Number Theorem was quite a small thing compared with that of those who found the proof. It is not merely that in this theory (as Littlewood's theorem shows) you can never be quite sure of the facts without the proof, though this is important enough. The whole history of the Prime Number Theorem, and the other big theorems of the subject, shows that you cannot reach any real understanding of the structure and meaning of the theory, or have any sound instincts to guide you in further research, until you have mastered the proofs. It is comparatively easy to make clever guesses; indeed there are theorems like "Goldbach's Theorem", which have never been proved and which any fool could have guessed.

(G.H. Hardy, Ramanujan.)

Some notes about this:

  1. Notice that this implies that in most branches of mathematics, you can get away with less than absolute rigor. I think that Hardy is quite correct here. (This is a rather arrogant remark, since Hardy is much more qualified than I am to be telling you what counts as worthwhile mathematics and what it is like. But this is my blog.) In most branches of mathematics, the difficult part is understanding the objects you are studying. If you understand them well enough to come up with a plausible conjecture, you are doing well. And in some mathematical pursuits, the proof may even be secondary. Consider, for example, linear programming problems. The point of the theory is to come up with good numerical solutions to the problems. If you can do that, your understanding of the mathematics is in some sense unimportant. If you invent a good algorithm that reliably produces good answers reasonably efficiently, proving that the algorithm is always efficient is of rather less value. In fact, there is such an algorithm—the "simplex algorithm"—and it is known to have exponential time in the worst case, a fact which is of decidedly limited practical interest.

    In analytic number theory, however, two facts weigh in favor of rigor. First, the objects you are studying are the positive integers. You already have as much intuitive understanding of them as you are ever going to have; you are not, through years of study and analysis, going to come to a clearer intuition of the number 3. And second, analytic number theory is much more inward-looking than most mathematics. The applications to the rest of mathematics are somewhat limited, and to the wider world even more limited. So a guessed or conjectured theorem is unlikely to have much value; the value is in understanding the theorem itself, and if you don't have a rigorous proof, you don't really understand the theorem.

    Hardy's example of the Goldbach conjecture is a good one. In the 18th Century, Christian Goldbach, who was nobody in particular, conjectured that every even number is the sum of two primes. Nobody doubts that this is true. It's certainly true for all small even numbers, and for large ones, you have lots and lots of primes to choose from. No proof, however, is in view. (The primes are all about multiplication. Proving things about their additive properties is swimming upstream.) And nobody particularly cares whether the conjecture is true or not. So what if every even number is the sum of two primes? But a proof would involve startling mathematics, deep understanding of something not even guessed at now, powerful techniques not currently devised. The proof itself would have value, but the result doesn't.

    Fermat's theorem (the one about an + bn = cn) is another example of this type. Not that Fermat was in any sense a fool to have conjectured it. But the result itself is of almost no interest. Again, all the value is in the proof, and the techniques that were required to carry it through.

  2. The Prime Number Theorem that Hardy mentions is the theorem about the average density of the prime numbers. The Greeks knew that there were an infinite number of primes. So the next question to ask is what fraction of integers are prime. Are the primes sparse, like the squares? Or are they common, like multiples of 7? The answer turns out to be somewhere in between.

    Of the integers 1–10, four (2, 3, 5, 7) are prime, or 40%. Of the integers 1–100, 25% are prime. Of the integers 1–1000, 16.8% are prime. What's the relationship?

    The relationship turns out to be amazing: Of the integers 1–n, about 1/log(n) are prime. Here's a graph: the red line is the fraction of the numbers 1–n that are prime; the green line is 1/log(n):

    It's not hard to conjecture this, and I think it's not hard to come up with offhand arguments why it should be so. But, as Hardy says, proving it is another matter, and that's where the real value is, because to prove it requires powerful understanding and sophisticated technique, and the understanding and technique will be applicable to other problems.

    The theorem of Littlewood that Hardy refers to is a related matter.

Order
A Mathematician's Apology
A Mathematician's Apology
with kickback
no kickback
Hardy was an unusual fellow. Toward the end of his life, he wrote an essay called A Mathematician's Apology in which he tried to explain why he had devoted his life for pure mathematics. I found it an extraordinarily compelling piece of writing. I first read it in my teens, at a time when I thought I might become a professional mathematician, and it's had a strong influence on my life. The passage that resonates most for me is this one:

A man who sets out to justify his existence and his activities has to distinguish two different questions. The first is whether the work which he does is worth doing; and the second is why he does it, whatever its value may be, The first question is often very difficult, and the answer very discouraging, but most people will find the second easy enough even then. Their answers, if they are honest, will usually take one or another of two forms . . . the first . . . is the only answer which we need consider seriously.

(1) 'I do what I do because it is the one and only thing I can do at all well. . . . I agree that it might be better to be a poet or a mathematician, but unfortunately I have no talents for such pursuits.'

I am not suggesting that this is a defence which can be made by most people, since most people can do nothing at all well. But it is impregnable when it can be made without absurdity. . . It is a tiny minority who can do anything really well, and the number of men who can do two things well is negligible. If a man has any genuine talent, he should be ready to make almost any sacrifice in order to cultivate it to the full.

And that, ultimately, is why I didn't become a mathematician. I don't have the talent for it. I have no doubt that I could have become a quite competent second-rate mathematician, with a secure appointment at some second-rate college, and a series of second-rate published papers. But as I entered my mid-twenties, it became clear that although I wouldn't ever be a first-rate mathematician, I could be a first-rate computer programmer and teacher of computer programming. I don't think the world is any worse off for the lack of my mediocre mathematical contributions. But by teaching I've been able to give entertainment and skill to a lot of people.

When I teach classes, I sometimes come back from the mid-class break and ask if there are any questions about anything at all. Not infrequently, some wag in the audience asks why the sky is blue, or what the meaning of life is. If you're going to do something as risky as asking for unconstrained questions, you need to be ready with answers. When people ask why the sky is blue, I reply "because it reflects the sea." And the first time I got the question about the meaning of life, I was glad that I had thought about this beforehand and so had an answer ready. "Find out what your work is," I said, "and then do it as well as you can." I am sure that this idea owes a lot to Hardy. I wouldn't want to say that's the meaning of life for everyone, but it seems to me to be a good answer, so if you are looking for a meaning of life, you might try that one and see how you like it.

(Incidentally, I'm not sure it makes sense to buy a copy of this book, since it's really just a long essay. My copy, which is the same as the one I've linked above, ekes it out to book length by setting it in a very large font with very large margins, and by prepending a fifty-page(!) introduction by C.P. Snow.)


[Other articles in category /math] permanent link

Mixed fractions in Liber Abaci

Order
Liber Abaci
Liber Abaci
with kickback
no kickback
Earlier this winter I was reading Liber Abaci, which is the book responsible for the popularity and widespread adoption of Hindu-Arabic numerals in Europe. It was written in 1202 by Leonardo of Pisa, who afterwards was called "Fibonacci".

Leonardo Pisano has an interesting notation for fractions. He often uses mixed-base systems. In general, he writes:

where a, b, c, p, q, r, x are integers. This represents the number:

which may seem odd at first. But the notation is a good one, and here's why. Suppose your currency is pounds, and there are 20 soldi in a pound and 12 denari in a soldo. This is the ancient Roman system, used throughout Europe in the middle ages, and in England up through the 1970s. You have 6 pounds, 7 soldi, and 4 denari. To convert this to pounds requires no arithmetic whatsooever; the answer is simply

And in general, L pounds, S soldi and D denari can be written

Now similarly you have a distance of 3 miles, 6 furlongs, 42 yards, 2 feet, and 7 inches. You want to calculate something having to do with miles, so you need to know how many miles that is. No problem; it's just

We tend to do this sort of thing either in decimals (which is inconvenient unless all the units are powers of 10) or else we reduce everything to a single denominator, in which case the numbers get quite large. If you have many mixed units, as medieval merchants did, Leonardo's notation is very convenient.

One operation that comes up all the time is as follows. Suppose you have

and you wish that the denominator were s instead of b. It is easy to convert. You calculate the quotient q and remainder r of dividing a·s by b. Then the equivalent fraction with denominator s is just:

(Here we have replaced c + a/b with c + q/s + r/bs, which we can do since q and r were chosen so that qb + r = as.)

Why would you want to convert a denominator in this way? Here is a typical example. Suppose you have 24 pounds and you want to split it 11 ways. Well, clearly each share is worth

pounds; we can get that far without medieval arithmetic tricks. But how much is 2/11 pounds? Now you want to convert the 2/11 to soldi; there are 20 soldi in a pound. So you multiply 2/11 by 20 and get 40/11; the quotient is 3 and the remainder 7, so that each share is really worth

pounds. That is, each share is worth 2 pounds, plus 3 and 7/11 soldi.

But maybe you want to convert the 7/11 soldi to denari; there are 12 denari in a soldo. So you multiply 7/11 by 12 and get 84/11; the quotient is 7 and the remainder 7 also, so that each share is

so each share is worth precisely 2 pounds, 3 soldi, and 7 7/11 denari.

Note that this system subsumes the usual decimal system, since you can always write something like

when you mean the number 7.639. And in fact Leanardo does do exactly this when it makes sense in problems concerning decimal currencies and other decimal matters. For example, in Chapter 12 of Liber Abaci, Leonardo says that there is a man with 100 bezants, who travels through 12 cities, giving away 1/10 of his money in each city. (A bezant was a medieval coin minted in Constantinople. Unlike the European money, it was a decimal currency, divided into 10 parts.) The problem is clearly to calculate (9/10)12 × 100, and Leonardo indeed gives the answer as

He then asks how much money was given away, subtracting the previous compound fraction from 100, getting

(Leonardo's extensive discussion of this problem appears on pp. 439–443 of L. E. Sigler Fibonacci's Liber Abaci: A Translation into Modern English of Leonardo Pisano's Book of Calculation.)

The flexibility of the notation appears in many places. In a problem "On a Soldier Receiving Three Hundred Bezants for His Fief" (p. 392) Leonardo must consider the fraction

(That is, 1/534) which he multiplies by 2050601250 to get the final answer in beautiful base-53 decimals:

Post scriptum: The Roman names for pound, soldo, and denaro are librum ("pound"), solidus, and denarius. The names survived into Renaissance French (livre, sou, denier) and almost survived into modern English. Until the currency was decimalized, British money amounts were written in the form "£3 4s. 2d." even though the English names of the units are "pound", "shilling", "penny", because "£" stands for "libra", "s" for "solidi", and "d" for "denarii". In small amounts one would write simply "10/6" instead of "10s. 6d.", and thus the "/" symbol came to be known as a "solidus", and still is.

The Spanish word for money, dinero, means denarii. The Arabic word dinar, which is the name of the currency in Algeria, Bahrain, Iraq, Jordan, Kuwait, Libya, Sudan, and Tunisia, means denarii.

Soldiers are so called after the solidi with which they were paid.


[Other articles in category /math] permanent link

What is topology?
Popular descriptions of topology tell you that it's like geometry, but bending and stretching are allowed, so that a sphere is considered the same as a cube, or a doughnut is the same as a coffee cup. There's some truth in that, but it really doesn't get across the idea of what topology is really like or really about.

Over the years I've spent a lot of time thinking about how to briefly explain topology to someone with only an ordinary math background, say one year of college calculus. Usually when I try hard to find good short explanations of things like this, I'm successful. So far, I haven't found any such explanation of topology. I haven't given up, though.

The difficulty is that topology was invented to give mathematicians a better understanding of analysis and the structure of the real numbers. Analysis was invented to give mathematicians a better understanding of calculus and limit processes. Calculus was invented to solve physics problems. So topology is three degrees removed from anything real. Contrast this with, say, graph theory, where the central object of study, the graph, is only one degree removed from something real. If you understand the vertices of a graph as computers and the edges as network connections, you can immediately see the point of graph theory. To understand the point of topology, you need to understand the point of analysis, and to understand the point of analysis, you need to understand the point of calculus. That's a lot of stuff to pack into a short explanation.

On top of that, you have the difficulty that topology has become a field of study in itself, with sub-branches that have nothing to do with the original goal of better understanding of the reals. The structure of the Tychonoff corkscrew (a particularly bizarre and counterintuitive topological object) may illuminate certain facts in set theory, but it has nothing to do with better understanding of the real numbers.

I think I can explain topology clearly, just not briefly. This is the first in what I hope will be a series of articles in which I'll try to do that.

The first thing to know is that mathematicians think of the real numbers as being points in an infinite line, with zero in the middle, and the positive numbers stretching away to the right and negative numbers to the left. Real numbers are points on this line, with a number n to the right of m if n > m . So mathematicians have a visual and spatial conception of numbers as well as a quantitative conception.

This visualization is extremely important if you want to understand topology, or indeed most of analysis. To a mathematician, the numeric and the spatial objects are the same thing, just viewed in different ways. The number 3.78 doesn't merely correspond with a point on the line; it is a point on the line, one which lies physically between the points 3.75 and 4.08.

One consequence of this view is that a set of numbers is not merely an arithmetic object. It also has geometric properties, such as a length (find the smallest and largest numbers in the set, and subtract the smaller from the larger) and whether the set is a single connected piece or the union of smaller, disconnected components. Another consequence of this view of numbers as points on a line is that mathematicians refer to the numbers as "points" and to the set of numbers as a "space".

One idea that appears over and over again in analysis, and which is the source of the single driving idea of topology, is the notion of an open interval.

An open interval is very simple: it's just the set of all numbers in between two points. The notation (a, b) represents the set of all numbers greater than a and less than b. In the mathematician's mental picture of the real numbers, the interval (1, 5/2) looks like this:

The definition of an open interval implies that the interval (a, b) omits the points a and b themselves; the open interval does not include the two endpoints. The curvy lines in the picture above are just notation intended to symbolize that the endpoints of the interval are not part of the interval.

There is a different notation for an interval that does include the endpoints, a so-called closed interval; [a, b] represents the set of all points greater than or equal to a and less than or equal to b. So [a, b] includes all the points of (a, b), and, additionally, a and b themselves:

Again, the square brackets in the picture are fictitious; they're just a notation to tell you that in this picture, the interval does includes its endpoints.

This matter of the endpoints is crucial. Open and closed intervals have very different behaviors, stemming from the fact that a closed interval contains its boundary points and an open interval does not. A point inside an open interval can be close to the edge, but not at the edge, because the open interval omits the edge. But a closed interval does include the edge.

In a closed interval, the points a and b are clearly special, and quite different from the other points in the interval, in a way that I will make more precise in a moment. But in an open interval, there is a certain sense in which all the points behave the same.

Here's the essential property of an open interval, as identified by topologists. If you choose any point p in the open interval, you can draw a little circle around p so that all the points inside the circle are also in the interval. For example, consider the open interval (1, 5/2) and the point 2, which is inside the interval:

If we draw a circle with radius 1/4 around the point, as shown, everything inside the circle is also inside the interval.

We can do this for any point at all that is in the interval, as long as we make the circle sufficiently small:

Because no matter how close a point in the interval is to the end of the interval, there's still a little extra space before the end.

This is not true of closed intervals. A circle around the point 1 will include some points outside of the closed interval [1, 5/2], no matter how small we make the circle:

Other differences flow from this essential property. For example, a point outside a closed interval must be separated from it by some positive distance, and you can always draw a circle around the point that is completely outside the closed interval:

But that is not true of open intervals. For the open interval (1, 5/2), any circle drawn around the point 1 will also enclose some points in the interval:

Open intervals can abut without intersecting. For example, the intervals (1, 2) and (2, 3) have no points in common. But any circle around 2 itself encloses points of both intervals:

In contrast, the closed intervals [1, 2] and [2,3] also share the property that any circle around 2 encloses points of both intervals, but that's not surprising, since any such circle encloses the point 2, and 2 is in both intervals. With the open intervals, you can't say ahead of time what points of the two intervals will be inside the circle, until you find out how big the circle is.

Mathematicians sum up all these properties by saying that a closed interval contains all the points of its boundary, whereas an open interval contains none of them.

This business of the small circle drawn around a particular point p is clearly important, so it's good to have a name for it. The idea is that we're trying to look at what happens "close to" p. But to pin that down, we need a notion of what "close to" means, so we need a way of measuring distances.

In the real numbers, measuring distances is easy. The distance between a and b is just |a - b|. |x| denotes the absolute value of x, which means that if x is negative, you make it positive instead. For |a - b|, it simply means that you should subtract the smaller one from the larger, and not the other way around. This is because you don't want to have negative distances, and you want the distance between 4 and 3 to be the same as the distance between 3 and 4.

Then mathematicians formalize those little circles this way: they say that the "ball" of radius ε around some point p, symbolized as Bε(p), is just the set of all points whose distance from p is less than ε. The use of the odd word "ball" here should tip you off that we're soon going to generalize this from one to three dimensions. In one dimension, balls are actually open intervals, and Bε(p) is precisely the interval (p - ε, p + ε). In two dimensions, the balls are discs, and in three dimensions they are actually ball-shaped.

The balls capture a flexible notion of "closeness": two points are "close" if they are inside the same ball. By making the balls small enough, we can make the notion of "close" as restrictive as we want. In this sense, we can see that no matter how restrictive we make the notion of closeness, there are always some points in an open interval that are close to the end of the interval—but no point is "close" for all possible notions of closeness; there is always some sufficiently restrictive definition under which a particular point is far from the end. Closed sets are different, and contain points that are close to the end for any definition of "close".

Similarly, the sets (1, 2) and (2, 3) are close together, for any definition of "close", although they don't actually intersect, or even touch, since you can't get from one to the other without leaving the sets entirely.

The essential property of open intervals that I mentioned before can be phrased in terms of the balls in this way: a set G is said to be open if, for any point p of G, there is some positive number ε such that Bε(p) is entirely contained in G.

Open intervals are open, but they are are not the only open sets. Let S be set of all points in either (1, 2) or (3, 4). S is open, but it's not an interval. But open sets all look pretty much the same; all open sets are unions of non-overlapping intervals, for example. There are some unbounded open sets, such as the set of all positive numbers, but we can think of that as the interval (0, ∞). Similarly (-∞, 3), the set of all numbers less than 3, is open. And the set of all real numbers is open, since it contains every ball whatsoever, but we can think of that set as (-∞, ∞).

Other concepts can be defined in terms of the balls. For example, a "limit point" of a set S is a point p for which any ball around p must contain some point of S other than p. 5/2 is a limit point of both the open interval (1, 5/2) and the closed interval [1, 5/2] because every ball Bε(5/2) contains points of both intervals other than 5/2 itself. The open interval omits 5/2, but the closed set includes it. One can define a closed set as a set that includes all of its limit points.

With these definitions, plenty of useful stuff follows, such as: Every open set is a union of non-overlapping open intervals. For every closed set C, there is some open set G such that every real number is in either C or in G but not both. The union or intersection of any two open sets is another open set. The union or intersection of any two closed sets is another closed set. Every ball is an open set. Every finite set is closed.

Other geometric notions come out of this too. For example, an "interior point" of a set S is a point p for which there's some Bε(p) is completely contained in S. Then an open set is precisely a set whose points are all interior points. Every point in [1, 5/2] is an interior point except the boundary points 1 and 5/2. We can define the "interior" of a set as the set of all its interior points, a "boundary point" as a point in a set that isn't an interior point, and the "boundary" of a set as the set of all its boundary points.

To generalize these notions to more complex spaces, we can generalize the idea of distance. Consider points in the plane, for example. These points have a well-known distance function, the so-called Pythagorean distance, which says that the distance between (a, b) and (c, d) is, as usual, √((c-a)2 + (d-b)2). If we use this as our definition of distance, and defined balls as before, we find that Bε(p) is just a disc of radius ε centered at p.

The definitions still make sense. Consider the set of points in the plane that are on or inside the circle of radius 1 centered at a particular point p. The boundary of this set, even under the very abstract definitions above, is just what you would expect: the boundary is the circle itself. The interior is the disc that lies inside the circle, including the center but not the edge. Once again, open sets are sets in the plane that omit their boundaries, and closed sets are those sets that include their boundaries. Most theorems still work; for example, the intersection of two open sets is still an open set, and balls are still examples of open sets.

We've started with a very thin, weak-looking base, which was simply the idea of measuring distances. From the simple idea of measurement, we get the balls, and from the balls we get ways to understand geometric notions like boundaries and interiors, and analytic notions like limits.

A metric space is a generalization of the idea of measuring distance. The "space" is now a set of things, which could be anything at all: points, or numbers, or train stations, or whatever. The things are customarily called "points". The "metric" describes how you are planning to measure the distance between two points.

To be a sensible distance function, the metric needs to have a few simple properties. It must never be negative, must be zero when measuring the distance from a point to itself, and must be positive when measuring the distance between two different points. It must be symmetric, which means that the distance from a to b must be the same as the distance from b to a in all cases. And the metric must satisfy the triangle inequality, which just means that the length of a route from a to b that stops off at c in between must be at least as long as the one that goes directly from a to b.

In mathematical notation, we write d(a, b) to represent the distance from a to b. We then want d to have the following properties:

  • d(a, b) ≥ 0
  • d(a, b) = 0 if and only if a = b
  • d(a, b) = d(b, a)
  • d(a, b) ≤ d(a, c) + d(c, b)
The "usual metric" for the real numbers is to say that d(a, b) = |a - b|, as above. This does indeed satisfy the four required properties. In two dimensions, the usual distance formula, based on the Pythagorean theorem, says that d((a, b), (c, d)) = √((c-a)2 + (d-b)2), and this function is also a metric and satisfies the four conditions.

Once you have the metric, you can define the balls: the ball Bε(p) is still the set of all points q for which d(p, q) < ε. (This ball is sometimes written as Bd(p), because it depends on the metric.) And once you have defined the balls, you can define the open and closed sets.

One thing this gets you is that you can talk about notions of closeness and nearness and limits in spaces that are much less tractable than lines and planes. For example, if you want to do analysis on the surface of a torus (a doughnut shape) this gives you a theoretical basis for treating it as a subset of ordinary three-dimensional space, and helps you understand which theorems of analysis will work on the torus and which ones won't.

Another thing it gets you is the opportunity to consider analogous notions in spaces that are nothing at all like lines or planes. Sometimes this might lead to insights that are useful in analysis. Sometimes it might not. But it does keep mathematicians employed.

Here's one example—I'm not sure whether it's genuinely useful or whether it serves primarily to keep mathematicians employed. Let's consider a plane, but instead of using the usual Pythagorean distance formula, let's say that the distance between (a, b) and (c, d) is |a-c| + |b-d|. This corresponds to a world in which you can only travel due north, south, east, and west; for this reason it is sometimes called the "Manhattan distance". In Manhattan, when you want to go from (a, b) to (c, d), you first walk east or west on b street, for a distance of |a-c|, until you get to c avenue. You then turn ninety degrees and walk north (or south) on c avenue, for a distance of |b-d|, until you reach the intersection with d street. The path is shown below:

This peculiar distance function is indeed a metric. The balls are no longer circular; they are diamond-shaped. The illustration also shows a point P and the diamond-shaped ball B1(P), along with three of the paths (each with length exactly 1) from P to the boundary of the ball.

But one interesting thing about the Manhattan metric is that it doesn't affect which sets are open or closed. So in a certain way, it doesn't matter whether you measure distance according to the usual function or according to the Manhattan metric.

A metric that is different is the "discrete" metric. In this metric, the distance between points p and q is 1, unless they are the same point, in which case it is 0. You may want to check to make sure that this metric has the requisite properties.

Balls in the discrete metric are even weirder than the diamond-shaped balls of the Manhattan metric. A ball around p either includes p and nothing else (if its radius is less than or equal to 1) or else it includes the entire universe (if its radius is bigger than 1). In a space measured with the discrete metric, nothing is close to anything else. Our typical example of closeness was the interval (1, 5/2) and the point 1, which was not inside the interval, but was close to it, because any Bε(1) overlapped the interval, no matter how small ε was. But in the discrete metric, the point is not close to the interval, because the ball B1/2(1) does not overlap the interval—it contains only the point 1, and nothing else!

In this article, I've tried to give a motivated and historical account of some of the basic notions of topology, and how they are generalizations of ideas of analysis, for the purpose of better understanding analysis. But I should break the news now that topology, when studied on its own, starts from a somewhat more abstract place. Instead of starting with a metric, and getting the balls from the metric, and the open sets from the balls, it starts with the open sets, and formulates the properties directly in terms of the open sets. This allows you to clean away a lot of unnecessary complication involving arithmetic and questions about Pythagorean vs. Manhattan distances and so on. The usual properties, like "interior" and "boundary" and "connected" can be formulated entirely in terms of the open sets.

Now suppose you have two sets, A and B, with two different definitions of open sets. And suppose you know some way to transform A into B and back again, say by rotating it or something, so that open sets in A are transformed into open sets in B, and vice versa on the way back. Then in a certain sense the open sets of A and B are the same, and anything that will be true of A's open sets will be true of B's as well. Since all the properties of interest are defined solely in terms of the open sets, any of these properties possessed by A will also be possessed by B and vice versa, so in terms of topological properties, A and B are the same.

The transformations that preserve the open sets are easy to understand intuitively: You can bend, stretch, or twist the sets any way you want. But you can't add or subtract material, or poke holes in them, or close up holes that were there before, and you can't tear the sets apart unless you glue them back together afterwards. You can crumple the sets, but you can't crush or explode them; points that were different before the transformation must remain different, and vice versa. A circle can be transformed into a square, by straightening out the sides; I hinted at this before when I mentioned that the Pythagorean and the Manhattan metrics yield the same open sets. But the circle can't be transformed into a line (you'd have to rip it apart) or a figure-eight (two formerly different points would have to fuse together at the waist of the 8). Spheres and cubes are topologically the same, but spheres are not the same as discs, or planes, or balls, or coffee cups.

I hope to develop this explanation further in future articles. My plans are to go backwards a little, and write an article about the structure of the real numbers, explaining why the open intervals are so important for calculus and analysis. And I hope to go forward and write an article about point-set topology, which abandons the metrics entirely in favor of dealing directly with the open sets.


[Other articles in category /math] permanent link

Counting squares
Let's take a bunch of squares, and put a big "X" in each one, dividing each square into four triangular wedges. Then let's take three colors of ink, say red, blue, and black, and ink in the wedges. How many different ways are there of inking up the squares? Well, that's easy. There are four wedges, and each one can be one of three colors, so the answer is 34 = 81. No, wait, that's not right, because the two squares below are really the same:

So we have to decide what counts and what doesn't. If one square can be turned into the other by a one-quarter clockwise or counterclockwise rotation, or by a half-turn, then we'll say that the two squares are the same. So all the squares below are "the same":

What about turning the squares over? Are the two squares to the right "the same"? Let's say that the squares are inked on only one side, so that those two would not considered the same, even if we decided to allow squares with green wedges. Later on we will make the decision the other way and see how things change.


OK, so let's see. All four wedges might be the same color, and there are 3 colors, so there are 3 ways to do that, shown at right.


Or there might only be two colors. In that case, there might be three wedges of one color and one of another, there are 6 ways to do that, depending on how we pick the colors; these six are shown at right.
Or it might be two-and-two. There are three ways to choose the colors (red-blue, red-black, and blue-black) and two ways to arrange them: same-colored wedges opposite each other:

or abutting:
so that's another 2·3 = 6 ways.


If we use all three colors, then two wedges are in one of the colors, and one wedge in each of the other two colors. The two wedges of the same color might be adjacent to each other or opposite. In either case, we have three choices for the color of the two wedges that are the same, after which the colors of the other two wedges are forced. So that's 3 colorings with two adjacent wedges the same color:
And 3 colorings with two opposite wedges the same color:


So that's a total of 21. Unless I left some out. Actually I did leave some out, just to see if you were paying attention. There are really 24, not 21. (You can see the full set, including the three I left out.) What a pain in the ass. Now let's do the same count for four colors. Whee, fun!

But there is a better way. It's called the Pólya-Burnside counting lemma. (It's named after George Pólya and William Burnside. The full Pólya counting theorem is more complex and more powerful. The limited version in this article is more often known just as the Burnside lemma. But Burnside doesn't deserve the credit; it was known much earlier to other mathematicians.)

Let's take a slightly simpler example, and count the squares that have two colors, say blue and black only. We can easily pick them out from the list above:

So the counting lemma, whatever it is, should get us a count of six. Here's how it works.

Remember way back at the beginning where we decided that and and were the same because differences of a simple rotation didn't count? Well, the first thing you do is you make a list of all the kinds of motions that "don't count". In this case, there are four motions:

  1. Rotation clockwise by 90°
  2. Rotation by 180°
  3. Rotation counterclockwise by 90°
  4. Rotation by 0°
That last one is a little odd, perhaps, but we have to include it if we want the right answer. (The mathematics jargon word for these motions is "symmetries", but I will continue to call them motions.)

Now we temporarily forget about the complication that says that some squares are essentially the same as other squares. All squares are now different. and are now different because they are colored differently. This is a much simpler point of view. There are clearly 24 such squares, shown below:

For each of the four motions, we count up the number of squares that would be unchanged by that kind of motion. For example, every one of the 16 squares is left unchanged by motion #4, because motion #4 doesn't actually change anything.

Which of these 16 squares is left unchanged by motion #3, a counterclockwise quarter-turn? All four wedges would have to be the same color. Of the 16 possible colorings, only the all-black and all-blue ones are left entirely unchanged by motion #3. Motion #1, the clockwise quarter-turn, works the same way; only the 2 solid-colored squares are left unchanged.

4 colorings are left unchanged by a 180° rotation. The top wedge and the bottom wedges switch places, so they must be the same color, and the left and right wedges change places, so they must be the same color. But the top-and-bottom wedges need not be the same color as the left-and-right wedges. We have two independent choices of how to color a square so that it will remain unchanged by a 180° rotation, and there are 22 = 4 colorings that are left unchanged by a 180° rotation. These are shown at right.

So we have counted the number of squares left unchanged by each motion:

  Motion # squares
unchanged
typical example
1 Clockwise quarter turn 2
2 Half turn 4
3 Counterclockwise quarter turn 2
4 No motion 16

Next we take the counts for each motion, add them up, and average them. That's 2 + 4 + 2 + 16 = 24, and divide by 4 motions, the average is 6.

So now what? Oh, now we're done. The average is the answer. 6, remember? There are 6 distinguishable squares. And our peculiar calculation gave us 6. Waaa! Surely that is a coincidence? No, it's not a coincidence; that is why we have the theorem.

Let's try that again with three colors, which gave us so much trouble before. We hope it will say 24. There are now 34 basic squares to consider.

For motions #1 and #3, only completely solid colorings are left unchanged, and there are 3 solid colorings, one in each color. For motion 2, there are 32 colorings that are left unchanged, because we can color the top-and-bottom wedges in any color and then the left-and-right wedges in any color, so that's 3·3 = 9. And of course all 34 colorings are left unchanged by motion #4, because it does nothing.

  Motion # squares
unchanged
typical example
1 Clockwise quarter turn 3
2 Half turn 9
3 Counterclockwise quarter turn 3
4 No motion 81

The average is (3 + 9 + 3 + 81) / 4 = 96 / 4 = 24. Which is right. Hey, how about that?

That was so easy, let's skip doing four colors and jump right to the general case of N colors:

  Motion # squares
unchanged
typical example
1 Clockwise quarter turn N
2 Half turn N2
3 Counterclockwise quarter turn N
4 No motion N4

Add them up and divide by 4, and you get (N4 + N2 + 2N)/4. So if we allow four colors, we should expect to have 70 different squares. I'm glad we didn't try to count them by hand!

(Digression: Since the number of different colorings must be an integer, this furnishes a proof that N4 + N2 + 2N is always a multiple of 4. It's a pretty heavy proof if it were what we were really after, but as a freebie it's not too bad.)

One important thing to notice is that each motion of the square divides the wedges into groups called orbits, which are groups of wedges that change places only with other wedges in the same orbit. For example, the 180° rotation divided the wedges into two orbits of two wedges each: the top and bottom wedges changed places with each other, so they were in one orbit; the left and right wedges changed places, so they were in another orbit. The "do nothing" motion induces four orbits; each wedge is in its own private orbit. Motions 1 and 3 put all the wedges into a single orbit; there are no smaller private cliques.

For a motion to leave a square unchanged, all the wedges in each orbit must be the same color. For example, the 180° rotation leaves a square unchanged only when the two wedges in the top-bottom orbit are colored the same and the two wedges in the left-right orbit are colored the same. Wedges in different orbits can be different colors, but wedges in the same orbit must be the same color.

Suppose a motion divides the wedges into k orbits. Since there are Nk ways to color the orbits (N colors for each of the k orbits), there are Nk colorings that are left unchanged by the motion.

Let's try a slightly trickier problem. Let's go back to using 3 colors, and see what happens if we are allowed to flip over the squares, so that and are now considered the same.

In addition to the four rotary motions we had before, there are now four new kinds of motions that don't count:

  Motion # squares
unchanged
typical example
5 Northwest-southeast diagonal reflection 9
6 Northeast-southwest diagonal reflection 9
7 Horizontal reflection 27
8 Vertical reflection 27

The diagonal reflections each have two orbits, and so leave 9 of the 81 squares unchanged. The horizontal and vertical reflections each have three orbits, and so leave 27 of the 81 squares unchanged. So the eight magic numbers are 3, 3, 9, and 81, from before, and now the numbers for the reflections, 9, 9, 27, and 27. The average of these eight numbers is 168/8 = 21. This is correct. It's almost the same as the 24 we got earlier, but instead of allowing both representatives of each pair like , we allow only one, since they are now considered "the same". There are three such pairs, so this reduces our count by exactly 3.

Okay, enough squares. Lets do, um, cubes! How many different ways are there to color the faces of a cube with N colors? Well, this is a pain in the ass even with the Pólya-Burnside lemma, because there are 24 motions of the cube. (48 if you allow reflections, but we won't.) But it's less of a pain in the ass than if one tried to do it by hand.

This is a pain for two reasons. First, you have to figure out what the 24 motions of the cube are. Once you know that, you then have to calculate the number of orbits of each one. If you are a combinatorics expert, you have already solved the first part and committed the solution to memory. The rest of the world might have to track down someone who has already done this—but that is not as hard as it sounds, since here I am, ready to assist.

Fortunately the 24 motions of the cube are not all entirely different from each other. They are of only four or five types:

  • Rotations around an axis that goes through one corner to the opposite corner.

    There are 4 such pairs of vertices, and for each pair, you can turn the cube either 120° clockwise or 120° counterclockwise. That makes 8 rotations of this type in total. Each of these motions has 2 orbits. For the example axis above, one orbit contains the top, front, and left faces and the other contains the back, bottom, and right faces. So each of these 8 rotations leaves N2 colorings of the cube unchanged.

  • Rotations by 180° around an axis that goes through the middle of one edge of the cube and out the middle of the opposite edge.

    There are 6 such pairs of edges, so 6 such rotations. Each rotation divides the six faces into three orbits of two faces each. The one above exchanges the front and bottom, top and back, and left and right faces; these three pairs are the three orbits. To be left unchanged by this rotation, the two faces in each orbit must be the same color. So N3 colorings of the cube are left fixed by each of these 6 rotations.

  • Rotations around an axis that goes through the center of a face and comes out the center of the opposite face.

    There are three such axes. The rotation can be 90° clockwise, 90 ° counterclockwise, or 180°. The 90° rotations have three orbits. The one shown above puts the top face into an orbit by itself, the bottom face into another orbit, and the four faces around the middle into a third orbit. So six of these nine rotations leave N3 colorings unchanged.

    The 180° rotations have four orbits. A 180° rotation around the axis shown above puts the top and bottom faces into private orbits, as the 90° rotation did, but instead of putting the four middle faces into a single orbit, the front and back faces go in one orbit and the left and right into another. Since there are three axes, there are three motions of the cube that each leave N4 colorings unchanged.

  • Finally, there's the "motion" that moves nothing. This motion leaves every face in a separate orbit, and leaves all N6 colorings unchanged.

Adding up the contributions of the 24 motions, we get:
  • 8N2 from the vertex rotations
  • 6N3 from the edge rotations
  • 6N3 from the 90° face rotations
  • 3N4 from the 180° face rotations
  • N6 from the "do nothing" motion
The average of these is (N6 + 3N4 + 12N3 + 8N2) / 24, and this is the number of ways to color the faces of a cube with N colors. We'd better check it. If we put in N=1, we get out 1 coloring, which is obviously correct: if you can paint each face of the cube any color you want so long as it's black, you are guaranteed to get out an all-black cube. If we put in N=2, we get out (64 + 3·16 + 12·8 + 8·4) / 24 = 240/24 = 10 colorings. And this is in fact the correct answer.

Unfortunately, the Pólya-Burnside technique does not tell you what the ten colorings actually are; for that you have to do some more work. But at least the P-B lemma tells you when you have finished doing the work! If you set about to enumerate ways of painting the faces of the cube, and you end up with 9, you know you must have missed one. And it tells you how much toil to expect if you do try to work out the colorings. 10 is not so many, so let's give it a shot:

  • 6 black and 0 white faces: one cube
  • 5 black and 1 white face: one cube
  • 4 black and 2 white faces: the white faces could be on opposite sides of the cube, or touching at an edge
  • 3 black and 3 white faces: the white faces could include a pair of opposite faces and one face in between, or they could be the three faces that surround a single vertex.
  • 2 black and 4 white faces: the black faces could be on opposite sides of the cube, or touching at an edge
  • 1 black and 5 white faces: one cube
  • 0 black and 6 white faces: one cube
And that's 1+1+2+2+2+1+1 = 10, as we hoped. With N=3 colors, there are 57 colorings total, so it's hard to imagine counting them without making a mistake somewhere. Knowing ahead of time that the answer is 57 is very helpful.

Care to try it out? There are 4 ways to color the sides of a triangle with two colors, 10 ways if you use three colors, and N(N+1)(N+2)/6 if you use N colors.

There are 140 different ways to color a the squares of a 3×3 square array, counting reflections as different. If reflected colorings are not counted separately, there are only 102 colorings. (This means that 38 of the colorings have some reflective symmetry.) If the two colors are considered interchangeable (so for example and are considered the same) there are 51 colorings.

You might think it is obvious that allowing an exchange of the two colors cuts the number of colorings in half from 102 to 51, but it is not so for 2×2 squares. There are 6 ways to color a 2×2 array, whether or not you count reflections as different; if you consider the two colors interchangeable then there are 4 colorings, not 3. Why the difference?


[Other articles in category /math] permanent link

More on the frequency of vibrations in 1666 and other matters
In a series of earlier posts (1 2 3) I discussed Robert Hooke's measurement of the frequency of G above middle C: he determined that it was 272 beats per second. There are two questions here: first, how did he do that, and second, why is his answer so badly wrong? (Modern definitions make it about 384 hertz; 17th-century definitions differ, and vary among themselves, but not by so much as that.)

The article Mr. Birchensha's Ear, by Benjamin Wardhaugh, provides the answers. In 1664, Hooke and the Royal Society set up a vibrating brass wire 136 feet long. It could be seen to vibrate once per second. Hooke divided it in half, and it vibrated twice per second. He truncated it to one foot, and the assembled musicians pronounced the sound to be a G below middle C. Since a wire one foot long vibrates 136 times as fast as one 136 feet long, we have the answer.

And the error? Partly just slop in the dividing and the estimation of the original frequency. Wardhaugh says:

In fact, the frequency is badly wrong (G has a frequency of 196 Hz), probably because Hooke was careless about the length of the pendulum that measured the time. In private he is meticulously precise, but maybe the details don't matter so much for Society showpieces.
Pepys wasn't there, which is why he had to learn about it from Hooke in 1666.

The rest of the article is well worth reading:

Passers-by jeer at the useless toy: the pointlessness of the Royal Society's activities is already threatening to become proverbial. (Shadwell later wrote a comedy in which 'the Virtuoso' weighed air, swam on dry land and read by the light of a decaying fish. Poor Hooke went to see it and 'people almost pointed'.)
The Royal Society is well-known to be the inspiration for the Grand Academy of Laputa in Gulliver's Travels, one of whose members "had been eight years upon a project for extracting sunbeams out of cucumbers...".

Reading the contents of Derham's 1726 collection of Hooke's papers on Wednesday, I was struck by the title "Hooke's Experiments on Floating of Lead". I was gearing up to write a blog post about how no real scientific paper has ever reminded me quite so much of the Grand Academy. But I abandoned the idea when I remembered that there is nothing unusual or surprising about being reminded of the grand Academy by a Royal Society paper. Also, the title only sounds silly until you learn that the paper itself is about the manner in which solid lead floats on molten lead.

Final note: while searching for the title of the floating-lead paper (which I left at home today) I learned something else interesting. In South Philadelphia there is a playground attached to an antique "shot tower". I had had no idea what a shot tower was, and I had supposed it was a place for storing shot and gunpowder. But why build a tower? It turns out it is the place where lead shot is manufactured. You melt up a cauldron of lead at the top, then dump it through a copper sieve and let it fall into a tub of water at the bottom. On the way down, the molten lead turns into round shot. You sort it for size and roundness, re-melt the ones that aren't round, and you have your shot.


[Other articles in category /physics] permanent link

The 20 most important tools
Forbes magazine recently ran an article on The 20 Most Important Tools. I always groan when I hear that some big magazine has done something like that, because I know what kind of dumbass mistake they are going to make: they are going to put Post-It notes at #14. The Forbes folks did not make this mistake. None of their 20 items were complete losers.

In fact, I think they did a pretty good job. They assembled a panel of experts, including Don Norman and Henry Petroski; they also polled their readers and their senior editors. The final list isn't the one I would have written, but I don't claim that it's worse than one I would have written.

Criticizing such a list is easy---too easy. To make the rules fair, it's not enough to identify items that I think should have been included. I must identify items that I think nearly everyone would agree should have been included.

Unfortunately, I think there are several of these.

First, to the good points of the list. It doesn't contain any major clinkers. And it does cover many vitally important tools. It provokes thought, which is never a bad thing. It was assembled thoughtfully, so one is not tempted to dismiss any item without first carefully considering why it is in there.

Here's the Forbes list:

  1. The Knife
  2. The Abacus
  3. The Compass
  4. The Pencil
  5. The Harness
  6. The Scythe
  7. The Rifle
  8. The Sword
  9. Eyeglasses
  10. The Saw
  11. The Watch
  12. The Lathe
  13. The Needle
  14. The Candle
  15. The Scale
  16. The Pot
  17. The Telescope
  18. The Level
  19. The Fish Hook
  20. The Chisel
The Forbes list has some restrictions. "Tools" must be simple, portable physical implements. Fundamental machines are omitted; most notably, this excludes "the lever" and "the wheel". (The invention of real importance there is not the wheel, but the axle. But that's another article for another time.) Inventions like fire, glassblowing, the computer, gunpowder, the windmill, and written language are ruled out, not because they are unimportant, but because they are not "tools" in the sense of being fairly simple, portable physical implements. They belong on some list, but not this one. (That didn't stop Don Norman from writing a ponderous and obvious essay about how the Forbes list was the wrong list to make. I know Don Norman has his fans, but I've never understood why.)

Categories

Order
The Pencil
The Pencil
with kickback
no kickback
The Forbes items are also allowed to stand for categories. For example, "the Rifle" really stands for portable firearms, including muskets and such. "The pencil" includes pens and writing brushes. (Why put "the pencil" and not "the pen"? I imagine Henry Petroski arguing about it until everyone else got tired and gave up.) The spoon, had they included it, would have stood for eating utensils in general.

But here is my first quibble: it's not really clear why some items stood for whole groups, and others didn't. The explanatory material points out that five other items on the list are special cases of the knife: the scythe, lathe, saw, chisel, and sword. The inclusion of the knife as #1 on the list is, I think, completely inarguable. The power and the antiquity of the knife would put it in the top twenty already.

Consider its unmatched versatility as well and you just push it up into first place, and beyond. Make a big knife, and you have a machete; bigger still, and you have a sword. Put a knife on the end of a stick and you have an axe; put it on a longer stick and you have a spear. Bend a knife into a circle and you have a sickle; make a bigger sickle and you have a scythe. Put two knives on a hinge or a spring and you have shears. Any of these could be argued to be in the top twenty. When you consider that all these tools are minor variations on the same device, you inevitably come to the conclusion that the knife is a tool that, like Babe Ruth among baseball players, is ridiculously overqualified even for inclusion with the greatest.

But Forbes people gave the sword a separate listing (#8), and a sword is just a big knife. It serves the same function as a knife and it serves it in the same mechanical way. So it's hard to understand why the Forbes people listed them separately. If you're going to list the sword separately, how can you omit the axe or the spear? Grouping the items is a good idea, because otherwise the list starts to look like the twenty most important ways to use a knife. But I would have argued for listing the sword, axe, chisel, and scythe under the heading of "knife".

I find the other knifelike devices less objectionable. The saw is fundamentally different from a knife, because it is made and used differently, and operates in a different way: it is many tiny knives all working in the same direction. And the lathe is not a special case of the knife, because the essential feature of the lathe is not the sharp part but the spinning part. (I wouldn't consider the lathe a small, portable implement, but more about that below.)

Pounding

I said that I was required to identify items that everyone would agree are major omissions. I have two such criticisms. One is that the list has room for six cutting tools, but no pounding tools. Where is the club? Where is the hammer? I could write a whole article about the absurdity of omitting the hammer. It's like leaving Abraham Lincoln off of a list of the twenty greatest U.S. presidents. It's like leaving Albert Einstein off of a list of the twenty greatest scientists. It's like leaving Honus Wagner off of a list of the twenty greatest baseball players.

No, I take it back. It's not like any of those things. Those things should all be described as analogous to leaving the hammer of the list of the twenty most important tools, not the other way around.

Was the hammer omitted because it's not a simple, portable physical implement? Clearly not.

Was the hammer omitted because it's an abstract fundamental machine, like the lever? Is a hammer really just a lever? Not unless a knife is just a wedge.

Is the hammer subsumed in one of the other items? I can't see any candidates. None of the other items is for pounding.

Did the Forbes panel just forget about it? That would have been weird enough. Two thousand Forbes readers, ten editors, and Henry Petroski all forgot about the hammer? Impossible. If you stop someone on the street and ask them to name a tool, odds are that they will say "hammer". And how can you make a list of the twenty most important tools, include the chisel as #20, and omit the hammer, without which the chisel is completely useless?

The article says:

We eventually came up with a list of more than 100 candidate tools. There was a great deal of overlap, so we collapsed similar items into a single category, and chose one tool to represent them. That left us with a final list of 33 items, each one a part of a particular class or style of tool; for instance, the spoon is representative of all eating utensils.

Perhaps the hammer was one of the 13 classes of tools that didn't make the cut? The writer of the article, David M. Ewalt, kindly provided me with a complete list of the 33 classes, including the also-rans. The hammer was not with the also-rans; I'm not sure if I find that more or less disturbing.

Also-rans

Well, enough about hammers. The 13 classes that did not make the cut were:

  • spoon
  • longbow
  • broom
  • paper clip
  • computer mouse
  • floppy disk
  • syringe
  • toothbrush
  • barometer
  • corkscrew
  • gas chromatograph
  • condom
  • remote control
Presumably some of these would have been cleaned up for publication, had they been selected for the top 20. For example, "longbow" should obviously be "bow". So I don't want to criticize these too much. The omissions seem more striking to me than the inclusions. But some of the inclusions are just too strange to let pass without comment, and some of those comments will help us understand what should be on the list and what shouldn't be.

"Gas chromatograph" seems to be someone's attempt to steer the list away from ancient inventions and to include some modern tools on the list. This is a worthy purpose. But I wish that they had thought of a better representative than the gas chromatograph. It seems to me that most tools of modern invention serve only very specialized purposes. The gas chromatograph is not an exception. I've never used a gas chromatograph. I don't think I know anyone who has. I've never seen a gas chromatograph. I might well go to the grave without using one. How is it possible that the gas chromatograph is one of the 33 most important tools of all time, beating out the hammer?

With "syringe", I imagine the authors were thinking of the hypodermic needle, but maybe they really were thinking of the syringe in general, which would include the meat syringe, the vacuum pipette, and other similar devices. If the latter, I have no serious complaint; I just wanted to point out the possible misunderstanding.

"Paper clip" is just the kind of thing I was afraid would appear. The paper clip isn't one of the top hundred most important tools, perhaps not even one of the top thousand. If the hammer were annihilated, civilization would collapse within twenty-four hours. If the paper clip were annihilated, we would shrug, we would go back to using pins, staples, and ribbons to bind our papers, and life would go on. If the pin isn't qualified for the list, the paper clip isn't even close.

I was speechless at the inclusion of the corkscrew in a list of essential tools that omits both bottles and corks, reduced to incoherent spluttering. The best I could do was mutter "insane".

I don't know exactly what was intended by "remote control", but it doesn't satisfy the criteria. The idea of remote control is certainly important, but this is not a list of important ideas or important functions but important tools. If there were a truly universal remote control that I could carry around with me everywhere and use to open doors, extinguish lights, summon vehicles, and so on, I might agree. But each particular remote control is too specialized to be of any major value.

Putting the computer mouse on the list of the twenty (or even 33) most important tools is like putting the pastrami on rye on the list of the twenty most important foods. Tasty, yes. Important? Surely not. In the same class as the soybean? Absurd.

The floppy disk is already obsolete.

Other comparisons

The telescope

Returning to the main list, eyeglasses and telescopes are both special cases of the lens, but their fundamentally different uses seem to me to clearly qualify them for separate listing; fair enough. I'm not sure I would have included the telescope, though. Is the telescope the most useful and important object of its type? Maybe I'm missing something, but it seems to me that most of the uses of the telescope are either scientific or military. The military value of the telescope is not in the same class as the value of the sword or the rifle. The scientific value of the telescope, however, is enormous. So it's on it scientific credentials that the telescope goes into the list, if at all.

But the telescope has a cousin, the microscope. Is the telescope's scientific value comparable to that of the microscope? I would argue that it is not. Certainly the microscope is much more widely used, in almost any branch of science you could name except astronomy. The telescope enabled the discovery that the earth is not the center of the universe, a discovery of vast philosophical importance. Did the microscope lead to fundamental discoveries of equal importance? I would argue that the discovery of microorganisms was at least as important in every way.

Arguing that "X is in the list, so Y should be too" is a slippery slope that leads to a really fat list in which each mistaken inclusion justifies a dozen more. I won't make that argument in this article. But the reverse argument, that "Y isn't in the list so X shouldn't be either", is much safer. If the microscope isn't important enough to make the list, then neither is the telescope.

The level

This is the only tool on the list that I thought was a serious mistake, not quite on the order of the Post-It note, but silly in the same way, if to a much lesser degree. It is another item of the type exemplified by the telescope, an item that is on the list, but whose more useful and important cousin is omitted. Why the level and not the plumb line? The plumb line does everything a level does, and more. The level tells you when things are horizontal; the plumb line tells you when they are horizontal or vertical, depending on what you need. The plumb line is simpler and older. The plumb line finds the point or surface B that is directly below point A; the level does nothing of the kind.

I'm boggled; I don't know what the level is doing there. But the fact that my most serious complaint about any particular item is with item #18 shows how well-done I think the list is overall.

Sewing

The needle made the list at #13, but thread did not. A lot of sewing things missed out. Most of these, I think, are not serious omissions. The spinning wheel, for example: hand-spinning works adequately, although more slowly. The thimble? Definitely not in the top twenty. The button, with frogs and other clasps included? Maybe, maybe not. But one omission is serious, and must be considered seriously: the loom. I suppose it was eliminated for being too big; there can be no other excuse. But the lathe is #12, and the lathe is not normally small or portable.

There are small, portable lathes. But there are also small, portable looms, hand looms, and so on. I think the loom has a better claim to being a tool in this sense than a lathe does. Cloth is surely one of the ten most important technological inventions of all time, up there with the knife, the gun, and the pot. Cloth does not belong on the Forbes list, because it is not a tool. But omission of the loom surprises me.

Grinding

Similarly, the omission of the windmill is quite understandable. But what about the quern? Flour is surely a technology of the first importance., Grain can be ground into flour without a windmill, and in many places was or still is. This morning I planned to write that it must have been omitted because it is hardly used any more, but then I thought a little harder and realized that I own not one but two devices that are essentially querns. (One for grinding coffee beans, the other for peppercorns.) I wouldn't want to argue that the quern is on the top twenty, but I think it's worth considering.

Male bias?

In fact, the list seems to omit a lot of important handicraft and home items that have fallen into disfavor. Male bias, perhaps? I briefly considered writing this article with the male-bias angle as the main point, but it's not my style. The authors might learn something from consideration of this question anyway.

The pot made the list, but not the potter's wheel. An important omission, perhaps? I think not, that a good argument could be made that the potter's wheel was only an incremental improvement, not suitable for the top twenty.

I do wonder what happened to rope; here I could only imagine that they decided it wasn't a "tool". (M. Ewalt says that he is at a loss to explain the omission of rope.) And where's the basket? Here I can't imagine what the argument was.

Carrying

With the mention of baskets, I can't put off any longer my biggest grievance about the list: Where is the bag?

The bag! Where is the bag?

I will say it again: Where is the bag?

Is the bag a small, portable implement? Yes, almost by definition. "Stop for a minute and think about what you've done today--every job you've accomplished, every task you've completed." begins the Forbes article. Did I have my bag with me? I did indeed. I started the day by opening up a bag of grapes to eat for breakfast. Then I made my lunch and put it in a bag, which I put into another, larger bag with my pens and work papers. Then I carried it all to work on my bicycle. Without the bag, I couldn't have carried these things to work. Could I have gotten that stuff to work without a bag? No, I would not have had my hands free to steer the bicycle. What if I had walked, instead of riding? Still probably not; I would have dropped all the stuff.

The bag, guys!. Which of you comes to work in the morning without a bag? I just polled the folks in my office; thirteen of fourteen brought bags to work today. Which of you carries your groceries home from the store without a bag? Paleolithic people carried their food in bags too. Did you use a lathe today? No? A telescope? No? A level? A fish hook? A candle? Did you use a bag today? I bet you did. Where is the bag?

The only container on the Forbes list is the pot. Could the bag be considered to be included under the pot? M. Ewalt says that it was, and it was omitted for that reason. I believe this is a serious error. The bag is fundamentally different from the pot. I can sum up the difference in one sentence: the pot is for storage; the bag is for transportation.

Each one has several advantages not possessed by the other. Unlike the pot, the bag is lightweight and easy to carry; pots are bulky. You can sling the bag over your shoulder. The bag is much more accommodating of funny-shaped objects: It's much easier to put a hacked-up animal or a heterogeneous bunch of random stuff into a bag than into a pot. My bag today contains some pads of paper, a package of crackers, another bag of pens, a toy octopus, and a bag of potato chips. None of this stuff would fit well into a pot. The bag collapses when it's empty; the pot doesn't.

The pot has several big advantages over the bag:

  1. The pot is rigid. It tends to protect its contents more than a bag would, both from thumping and banging, and from rodents, which can gnaw through bags but not through pots.

  2. The pot is impermeable. This means that it is easy to clean, which is an important health and safety issue. Solids, such as grain or beans, are protected from damp when stored in pots, but not in bags. And the pot, being impermeable, can be used to store liquids such as food and lighting oils; making a bag for storing liquids is possible but nontrivial. (Sometimes permeability is an advantage; we store dirty laundry in bags and baskets, never pots.)

  3. The pot is fireproof, and so can be used for cooking. Being both fireproof and impermeable, the pot enables the preparation of soup, which increases the supply of available food and the energy that can be extracted from the food.

The bag probably predates the pot. To make pots, you must locate a suitable source of clay, shape it, and sun-dry or bake it. To make a bag requires nothing more than to grab a large animal skin by the corners. The bag doesn't get as much notice by anthropologists---not because it's less important, but because it's not as durable. We have potsherds that are thirteen thousand years old. All the bags that old have long since turned to dust.

I have no objection to Forbes' inclusion of the pot on their list, none at all. In fact, I think that it should be put higher than #16. But the bag needs to be listed too.

Other possible omissions

After the hammer, the bag, and rope, I have no more items that I think are so inarguable that they are sure substitutes for items in Forbes' list. There are items I think are probably better choices, but I think it is arguable, and, as I explained at the beginning of the article, I don't want to take cheap shots. Any list of the 20 most important tools will leave out a lot of important tools; switching around which tools are omitted is no guarantee of an objectively better list. For discussion purposes only, I'll mention tongs (including pliers), baskets, and shovels. Of the items on Forbes' near-miss list that I would want to consider are the bow, the broom and the spoon.

Revised list

Here, then, is my revised list. It's still not the list I would have made up from scratch, but I wanted to try to retain as much of the Forbes list as I could, because I think the items at the bottom are judgement calls, and there is plenty of room for reasonable disagreement about any of them.

Linguists found a while ago that if you ask subjects to judge whether certain utterances are grammatically correct or not, they have some difficulty doing it, and their answers do not show a lot of agreement with other subjects'. But if you allow them an "I'm not sure" category, they have a lot less difficulty, and you do see a lot of agreement about which utterances people are unsure about. I think a similar method may be warranted here. Instead of the tools that are in or out of the list, I'm going to make two lists: tools that I'm sure are in the list, and tools that I'm not sure are out of the list.

The Big Eight, tools that I think you'd have to be crazy to omit, are:

  1. Knife (includes sword, axe, scythe, chisel, spear, shears, scissors)
  2. Hammer (includes club, mace, sledgehammer, mallet)
  3. Bag (includes wineskin, water skin, leather bottle, purse)
  4. Pot (includes plate, bowl, pitcher, rigid bottle, mortar)
  5. Rope (includes string and thread)
  6. Harness (includes collar and yoke)
  7. Pen (includes pencil, writing brush, etc.)
  8. Gun (includes rifle and musket, but not cannon)
The lesser twelve, the tools that I'm not sure are off the list, are:

  1. Compass
  2. Plumb line (includes level)
  3. Sewing needle
  4. Candle (includes lamp, lantern, torch)
  5. Ladder
  6. Eyeglasses (includes contact lenses)
  7. Saw
  8. Balance
  9. Fishhook
  10. Lathe
  11. Abacus (includes counting board)
  12. Microscope
My lists merge the sword, scythe, and chisel under the knife. This frees up space for the hammer, the bag, and rope, which I think were Forbes' most serious omissions. The only other omission I felt that I had to correct was the ladder; I removed the watch to make room, although I had misgivings about that.

The other adjustments are minor: The pot got a big promotion, from #16 to #4. The pencil is represented by the pen, instead of the other way around. The rifle is teamed with the musket as "the gun". The telescope is replaced with the microscope. The level is replaced with the plumb line. The scale is replaced by the balance, which is more a terminological difference than anything else.

The omission of mine that worries me the most is the basket. I left it out because although it didn't seem very much like either the pot or the bag, it did seem too much like both of them. I worry about omitting the pin, but I'm not sure it qualifies as a "tool".

If I were to get another 13 slots, I might include:

  1. Basket
  2. Broom
  3. Horn
  4. Pry bar
  5. Quern
  6. Radio (Walkie-talkies)
  7. Scraper
  8. Shovel
  9. Spoon
  10. Tape
  11. Tongs
  12. Touchstone
  13. Welding torch


[Other articles in category /tech] permanent link

Accidental Features
Last week I was away in Chicago as part of my Red Flags world tour, about which more in an upcoming entry. I mentioned before that I always learn something surprising from giving these classes, and this time was no exception. I was reviewing a medium-sized piece of code for a client and I encountered the following astonishing construction:

        for $i qw(foo bar baz) {
          # do something with $i
        }
If you're not familiar with Perl, you aren't going to see what is wrong with this. You may not see it for a while even if you are familiar with Perl.

Normally, the syntax for this sort of construction is:

        for $VAR (LIST) {
          ...
        }
where the parentheses around the LIST are not optional. And the qw(...) construction generates a list. So it should be:
        for $i (qw(foo bar baz)) {
          # do something with $i
        }
Leave off the outer parentheses, and you should get a syntax error. So when I saw this code, I said, "Oh, that's a syntax error."

Except, of course, it wasn't. The code I was reviewing was in production on the client's web site, and was processing millions of dollars of orders. If it had bee a syntax error, they would have known right away. And indeed, it worked when I tried it.

But it's not documented anywhere, and I have never seen it before. It is certainly little-known. I believe it is an accidental feature.

This is not the first time that someone has discovered an unintended but useful feature in Perl. Another example that comes to mind is the s/.../.../ee trick. Normally, you give s/.../.../ a pattern and a replacement string, and it performs a substitution, locating some string that matches the pattern and replacing it with the replacement. For example, s/\d+/carrots/ locates a a numeral (a sequence of digits) and replaces it with carrots. But you might want to do something tricky; say you want to replace each numeral with its double. Then you can use s/(\d+)/$1 * 2/e. The $1 is replaced with the digits, so that if the digits are 123, you get 123 * 2, and the the trailing e means that this is a Perl expression, rather than a literal string, and should be evaluated. Without the e, 123 becomes 123 * 2; with e, it becomes 246. (Other modifiers were permitted: for example, i made the pattern match case-insensitive.)

The implemenation was to scan the modifiers left-to-right, and to call Perl's expression evaluator if an e was seen. The accidental feature was that the implementation therefore allowed one to put two es to force two evaluations. The typical use of this was to write something like s/(\$\w+)/$1/ee, which finds $foo and replaces it with the value of the variable $foo. The first e evaluates $1 to $foo, and the second evaluates $foo to its value. I believe this was discovered by Tom Christiansen. It's rarely used now, since there are more straightforward ways to accomplish the same thing, but it still appears in three different places in the documentation.

I don't think it says anything good about Perl that this sort of accidental feature shows up in the language. I can't think of anything comparable in any other language. (Except APL.) C has the 4[ptr] syntax, but that's not a feature; it's just a joke. The for $i qw(...) { ... } syntax in Perl seems genuinely valuable.

Accordingly, I dropped a note to the Perl development mailing list, asking if anyone had ever noticed this before, and whether people thought we should commit to supporting it in the future. (It works in all versions of Perl that support qw().) I was surprised that there was no reply. So I suppose that the next step is to prepare a documentation patch describing it and some tests in the test suite for it. If everyone is still asleep when I post those, they may go in without comment, and once a feature is put in, it is very difficult to remove it.


[Other articles in category /prs] permanent link

More on multiple SMTP connections
[ The previous article is here. ]

Two people so far have written in to ask why I didn't just have the SMTP server delete the lock files when it was done with them. Well, more accurately, one person wrote in to ask that, one other person just said that it would solve my problem to do it.

But I thought of that, and I didn't do it; here's why. The lock directory contains up to 16,843,009 directories that cannot, typically, be deleted. Thus deleting the lock files does not solve the problem; at best it postpones it.

John Macdonald had a better suggestion: To lock address a.b.c.d, open file a/b and lock byte 256·c+d. This might work.

But at this point my thinking on the project is that if a thing's not worth doing at all, it's not worth doing well.


[Other articles in category /oops] permanent link

Blog Post Escapes from Lab
My blog pages get regenerated when I run Blosxom. Also, I have a cron job that runs it every night at 12:10.

Files named something.blog turn into blog posts. While posts are under construction, I name them something.notyet. Then when they're done, I rename them to something.blog and either rerun Blosxom or wait until 12:10.

Sometimes I get a little cocky, and I figure that I'm going to finish the post quickly. Then I name the file something.blog right from the start.

Sometimes I'm mistaken about finishing the post quickly. Then I have to fix the name before 12:10 so that the unfinished post doesn't show up on the blog.

Because if it does show up, I can't really retract it. I can remove it from the main blog pages at plover.com. But bloglines.com won't remove it until another twelve posts have come in, no matter what I do with the syndication files.

And that's why there's a semi-finished post about Simplified Poker on bloglines.com today.


[Other articles in category /oops] permanent link

An unusually badly designed bit of software
I am inaugurating this new section of my blog, which will contain articles detailing things I have screwed up.

Back on 19 January, I decided that readers might find it convenient if, when I mentioned a book, there was a link to buy the book. I was planning to write a lot about what books I was reading, and perhaps if I was convincing enough about how interesting they were, people would want their own copies.

The obvious way to do this is just to embed the HTML for the book link directly into each entry in the appropriate place. But that is a pain in the butt, and if you want to change the format of the book link, there is no good way to do it. So I decided to write a Blosxom plugin module that would translate some sort of escape code into the appropriate HTML. The escape code would only need to contain one bit of information about the book, say its ISBN, and then the plugin could fetch the other information, such as the title and price, from a database.

The initial implementation allowed me to put <book>1558607013</book> tags into an entry, and the plugin would translate this to the appropriate HTML. (There's an example on the right.

Order
Higher-Order Perl
Higher-Order Perl
with kickback
no kickback
) The 1558607013 was the ISBN. The plugin would look up this key in a Berkeley DB database, where it would find the book title and Barnes and Noble image URL. Then it would replace the <book> element with the appropriate HTML. I did a really bad job with this plugin and had to rewrite it.

Since Berkeley DB only maps string keys to single string values, I had stored the title and image URL as a single string, with a colon character in between. That was my first dumb mistake, since book titles frequently include colons. I ran into this right away, with Voyages and Discoveries: Selections from Hakluyt's Principal Navigations.

This, however, was a minor error. I had made two major errors. One was that the <book>1558607013</book> tags were unintelligible. There was no way to look at one and know what book was being linked without consulting the database.

But even this wouldn't have been a disaster without the other big mistake, which was to use Berkeley DB. Berkeley DB is a great package. It provides fast keyed lookup even if you have millions of records. I don't have millions of records. I will never have millions of records. Right now, I have 15 records. In a year, I might have 200.

The price I paid for fast access to the millions of records I don't have is that the database is not a text file. If it were a text file, I could look up <book>1558607013</book> by using grep. Instead, I need a special tool to dump out the database in text form, and pipe the output through grep. I can't use my text editor to add a record to the database; I had to write a special tool to do that. If I use the wrong ISBN by mistake, I can't just correct it; I have to write a special tool to delete an item from the database and then I have to insert the new record.

When I decided to change the field separator from colon to \x22, I couldn't just M-x replace-string; I had to write a special tool. If I later decided to add another field to the database, I wouldn't be able to enter the new data by hand; I'd have to write a special tool.

On top of all that, for my database, Berkeley DB was probably slower than the flat text file would have been. The Berkeley DB file was 12,288 bytes long. It has an index, which Berkeley DB must consult first, before it can fetch the data. Loading the Berkeley DB module takes time too. The text file is 845 bytes long and can be read entirely into memory. Doing so requires only builtin functions and only a single trip to the disk.

I redid the plugin module to use a flat text file with tab-separated columns:

        HOP	1558607013	Higher-Order Perl	9072008
        DDI	068482471X	Darwin's Dangerous Idea	1363778
        Autobiog	0760768617	Franklin's Autobiography	9101737
        VoyDD	0486434915	The Voyages of Doctor Dolittle	7969205
        Brainstorms	0262540371	Brainstorms	1163594
        Liber Abaci	0387954198	Liber Abaci	6934973
        Perl Medic	0201795264	Perl Medic	7254439
        Perl Debugged	0201700549	Perl Debugged	3942025
        CLTL2	1555580416	Common Lisp: The Language	3851403
        Frege	0631194452	The Frege Reader	8619273
        Ingenious Franklin	0812210670	Ingenious Dr. Franklin	977000
The columns are a nickname ("HOP" for Higher-Order Perl, for example), the ISBN, the full title, and the image URL. The plugin will accept either <book>1558607013</book> or <book>HOP</book> to designate Higher-Order Perl. I only use the nicknames now, but I let it accept ISBNs for backward compatibility so I wouldn't have to go around changing all the <book> elements I had already done.

Now I'm going to go off and write "just use a text file, fool!" a hundred times.


[Other articles in category /oops] permanent link

More approximations to pi
In an earlier post I discussed the purported Biblical approximation to π, and the verses that supposedly equate it to 3.

Eli Bar-Yahalom wrote in to tell me of a really fascinating related matter. He says that the word for "perimeter" is normally written "QW", but in the original, canonical text of the book of Kings, it is written "QWH", which is a peculiar (mis-)spelling. (M. Bar-Yahalom sent me the Hebrew text itself, in addition to the Romanizations I have shown, but I don't have either a Hebrew terminal or web browser handy, and in any event I don't know how to type these characters. Q here is qoph, W is vav, and H is hay.) M. Bar-Yahalom says that the canonical text also contains a footnote, which explains the peculiar "QWH" by saying that it represents "QW".

The reason this is worth mentioning is that the Hebrews, like the Greeks, made their alphabet do double duty for both words and numerals. The two systems were quite similar. The Greek one went something like this:

Α1Κ10Τ100
Β2Λ20Υ200
Γ3Μ30Φ300
Δ4Ν40Χ400
Ε5Ξ50Ψ500
Ζ6Ο60Ω600
Η7Π70
Θ8Ρ80
Ι9Σ90
This isn't quite right, because the Greek alphabet had more letters then, enough to take them up to 900. I think there was a "digamma" between Ε and Ζ, for example. (This is why we have F after E. The F is a descendant of the digamma. The G was put in in place of Ζ, which was later added back at the end, and the H is a descendent of Η.) But it should give the idea. If you wanted to write the number 172, you would use ΒΠΤ. Or perhaps ΤΒΠ. It didn't matter.

Anyway, the Hebrew system was similar, only using the Hebrew alphabet. So here's the point: "QW" means "circumference", but it also represents the number 106. (Qoph is 100; vav is 6.) And the odd spelling, "QWH", also represents the number 111. (Hay is 5.) So the footnote could be interpreted as saying that the 106 is represented by 111, or something of the sort.

Now it so happens that 111/106 is a highly accurate approximation of π/3. π/3 is 1.04719755 and 111/106 is 1.04716981. And the value cited for the perimeter, 30, is in fact accurate, if you put 111 in place of 106, by multiplying it by 111/106.

It's really hard to know for sure. But if true, I wonder where the Hebrews got hold of such an accurate approximation? Archimedes pushed it as far as he could, by calculating the perimeters of 96-sided polygons that were respectively inscribed within and circumscribed around a unit circle, and so calculated that 223/71 < π < 22/7. Neither of these fractions is as good an approximation as 333/106.

Thanks very much, M. Bar-Yaholom.


[Other articles in category /math] permanent link

Preventing multiple SMTP connections
As I believe I said somewhere else, I have a lot of ideas, and only about one in ten turns out really well in the end. About seven-tenths get abandoned before the development stage, and then of the three that survive to development, two don't work.

The MIT CS graduate student handbook used to have a piece of advice in it that I think is very wise. It said that two-thirds of all research projects are failures. The people that you (as a grad student at MIT) see at MIT who seem to have one success after another are not actually successful all the time. It's just that they always have at least three projects going on at once, and you don't hear about the ones that are flops. So the secret of success is to always be working on several projects at once. (I wish I could find this document again. It used to be online.)

Anyway, this note is about one of those projects that I carried through to completion, but that was a failure.

The project was based in the observation that spam computers often make multiple simultaneous connections to my SMTP server and use all the connections to send spam at once. It's very rare to get multiple legitimate simultaneous connections from the same place. So rejecting or delaying simultaneous connections may reduce incoming spam and load on the SMTP server.

My present SMTP architecture spawns a separate process for each incoming SMTP connection. So an IPC mechanism is required for the separate processes to communicate with each other to determine that several of them are servicing the same remote IP address. Then all but one of the processes can commit suicide. If the incoming email is legitimate, the remote server will try the other messages later.

The IPC mechanism I chose was this: Whenever an SMTP server starts up, servicing a connection from address a.b.c.d, it creates an empty file named smtplock/a/b/c/d and gets an exclusive file lock on it. It's OK if the file already exists. But if the file is already locked, the server should conclude that some other server is already servicing that address, and commit suicide.

This worked really well for a few weeks. Then one day I came home from work and found my system all tied into knots. The symptoms were similar to those that occurred the time I used up all the space on the root partition, so I checked the disk usage right away. But none of the disks were full. A minute later, I realized what the problem was: creating a file for each unique IP address that sent mail had used up all the inodes, and no new files could be created.

The IPC strategy I had chosen gobbled up about twenty thousand inodes a day. At the time I had implemented it, I had had a few hundred thousand inodes free on the root partition. Whoops.

Compounding this mistake, I had made another. I didn't provide any easy way to shut off the feature! It would have been trivial to do so. All I had to do was predicate its use on the existence of the smtplock directory. Then I could have moved the directory aside and the servers would have instantly stopped consuming inodes. I would then have been able to remove the directory at my convenience. Instead, the servers "gracefully" handle the non-existence of smtplock by reporting a temporary failure back to the remote system and aborting the transmission. To shut off the feature, I had to modify, recompile, and re-install the server software.

Duh.

[ Addendum 20060202: There is a brief followup to this article. ]


[Other articles in category /oops] permanent link

More low-tech sound recording
In an earlier post, I wondered how Robert Hooke could have discovered the frequency of a vibrating string in 1664. I suggested a device involving a bristle that traced a sinusoid on a smoked glass plate. I was sure that I head heard about this somewhere before, and wasn't making it up.

With the Wonders of the Internet, such things are easy to look up. Google for "bristle" and "smoked glass" and there you have it. The device is called a "phonautograph", and was invented in 1857 by Leon Scott. So the idea does work, but was probably not what Hooke used. My search continues.

The phonautograph was one of Alexander Graham Bell's inspirations for the telephone. Bell, whose principal research area was to improve education of deaf persons, built a phonautograph of his own, using—get this—an actual human ear. The bristle was attached to the ossicles. He cut away the stapes bone, and attached a wheat straw to the incus. I wonder whose ear he used?

I've read that a phonautograph recording survives from around 1860 and is the oldest known sound recording, although there was no way to play it back until recently, when the tracing was digitized and given to a computer to synthesize.

There was some talk a few years ago about the possibility that clay pots might retain recordings of the ambient sounds from the time they were created: they are thrown on potters' wheels that revolve at a uniform speed; they have high recording resolution, and are frequently worked with sharp tools that would have left grooves in them much like the grooves of phonograph records. I saw an exhibit of this in a science museum somewhere, but I could not make anything of the sound that they recovered from the pot, even though it had been specially constructed for the purpose. There are obvious, major problems with the idea, but it's nice to think that we might someday be able to listen to sound recordings of Sumerians chatting in the potter's workshop.

Come to think of it, conventional phonography was pretty low-tech itself, until the introduction of magnetic and later digital media.


[Other articles in category /physics] permanent link

Approximations and the big hammer
In today's article about rational approximations to √3, I said that "basic algebra tells us that &radic(1-&epsilon) &asymp 1 - &epsilon/2 when &epsilon is small".

A lot of people I know would be tempted to invoke calculus for this, or might even think that calculus was required. They see the phrase "when &epsilon is small" or that the statement is one about limits, and that immediately says calculus.

Calculus is a powerful tool for producing all sorts of results like that one, but for that one in particular, it is a much bigger, heavier hammer than one needs. I think it's important to remember how much can be accomplished with more elementary methods.

The thing about &radic(1-&epsilon) is simple. First-year algebra tells us that (1 - &epsilon/2)2 = 1 - ε + ε2/4. If &epsilon is small, then &epsilon2/4 is really small, so we won't lose much accuracy by disregarding it.

This gives us (1 - &epsilon/2)2 ≈ 1 - ε, or, equivalently, 1 - &epsilon/2 ≈ √(1 - ε). Wasn't that simple?


[Other articles in category /math] permanent link

Blog Posts Escape from Lab
Yesterday I decided to put my blog posts into CVS. As part of doing that, I copied the blog articles tree. I must have made a mistake, because not all the files were copied in the way I wanted.

To prevent unfinished articles from getting out before they are ready, I have a Blosxom plugin that suppresses any article whose name is title.blog if it is accompanied by a title.notyet file. I can put the draft of an article in the .notyet file and then move it to .blog when it's ready to appear, or I can put the draft in .blog with an empty or symbolic-link .notyet alongside it, and then remove the .notyet file when the article is ready.

I also use this for articles that will never be ready. For example, a while back I got the idea that it might be funny if I were to poſt ſome articles with long medial letter 'ſ' as one ſees in Baroque printing and writing. To try out what this would look like I copied one of my articles---the one on Baroque writing ſtyle ſeemed appropriate---and changed the filename from baroque-style.blog to baroque-ftyle.blog. Then I created baroque-ftyle.notyet ſo that nobody would ever ſee my ſtrange experiment.

But somehow in the big shakeup last night, some of the notyet files got lost, and so this post and one other escaped from my laboratory into the outside world. As I explained in an earlier post, I can remove these from my web site, but aggregators like Bloglines.com will continue to display my error. it's tempting to blame this on Blosxom, or on CVS, or on the sysadmin, or something like that, but I think the most likely explanation for this one is that I screwed up all by myself.

The other escapee was an article about the optional second argument in Perl's built-in bless function, which in practice is never omitted. This article was so dull that I abandoned it in the middle, and instead addressed the issue in passing in an article on a different subject.

So if your aggregator is displaying an unfinished and boring article about one-argument bless, or a puzzling article that seems like a repeat of an earlier one, but with all the s's replaced with ſ's, that is why.

These errors might be interesting as meta-information about how I work. I had hoped not to discuss any such meta-issues here, but circumstances seem to have forced me to do it at least a little. One thing I think might be interesting is what my draft articles look like. Some people rough out an article first, then go back and fix it it. I don't do that. I write one paragraph, and then when it's ready, I write another paragraph. My rough drafts look almost the same as my finished product, right up to the point at which they stop abruptly, sometimes in the middle of a sentence.

Another thing you might infer from these errors is that I have a lot of junk sitting around that is probably never going to be used for anything. Since I started the blog, I've written about 85,000 words of articles that were released, and about 20,000 words of .notyet. Some of the .notyet stuff will eventually see the light of day, of course. For example, I have about 2500 words of addenda to this month's posts that are scheduled for release tomorrow, 1000 words about this month's Google queries and the nature of "authority" on the Web, 1000 words about the structure of the real numbers, 1300 words about the Grelling-Nelson paradox, and so on.

But I alſo have that 1100-word experiment about what happens to an article when you use long medial s's everywhere. (Can you believe I actually conſidered doing this in every one of my poſts? It's tempting, but just a little too idioſyncratic, even for me.) I have 500 words about why to attend a colloquium, how to convince everyone there that you're a genius, and what's wrong with education in general. I have 350 words that were at the front of my article about the 20 most important tools that explained in detail why most criticism of Forbes' list would be unfair; it wasted a whole page at the beginning of that article, so I chopped it out, but I couldn't bear to throw it away. I have two thirds of a 3000-word article written about why my brain is so unusual and how I've coped with its peculiar limitations. That one won't come out unless I can convince myself that anyone else will find it more than about ten percent as interesting as I find it.


[Other articles in category /oops] permanent link

Blosxom sucks
Several people were upset at my recent discussion of Blosxom. Specifically, I said that it sucked. Strange to say, I meant this primarily as a compliment. I am going to explain myself here.

Before I start, I want to set your expectations appropriately. Bill James tells a story about how he answered the question "What is Sparky Anderson's strongest point as a [baseball team] manager?" with "His won-lost record". James was surprised to discover that when people read this with the preconceived idea that he disliked Anderson's management, they took "his record" as a disparagement, when he meant it as a compliment.

I hope that doesn't happen here. This article is a long compliment to Blosxom. It contains no disparagement of Blosxom whatsoever. I have technical criticisms of Blosxom; I left them out of this article to avoid confusion. So if I seem at any point to be saying something negative about Blosxom, please read it again, because that's not the way I meant it.

As I said in the original article, I made some effort to seek out the smallest, simplest, lightest-weight blogging software I could. I found Blosxom. It far surpassed my expectations. The manual is about three pages long. It was completely trivial to set up. Before I started looking, I hypothesized that someone must have written blog software where all you do is drop a text file in a directory somewhere and it magically shows up on the blog. That was exactly what Blosxom turned out to be. When I said it was a smashing success, I was not being sarcastic. It was a smashing success.

The success doesn't end there. As I anticipated, I wasn't satisfied for long with Blosxom's basic feature set. Its plugin interface let me add several features without tinkering with the base code. Most of these have to do with generating a staging area where I could view posts before they were generally available to the rest of the world. The first of these simply suppressed all articles whose filenames contained test. This required about six lines of code, and took about fifteen minutes to implement, most of which was time spent reading the plugin manual. My instance of Blosxom is now running nine plugins, of which I wrote eight; the ninth generates the atom-format syndication file.

The success of Blosxom continued. As I anticipated, I eventually reached a point at which the plugin interface was insufficient for what I wanted, and I had to start hacking on the base code. The base code is under 300 lines. Hacking on something that small is easy and rewarding.

I fully expect that the success of Blosxom will continue. Back in January, when I set it up, I foresaw that I would start with the basic features, that later I would need to write some plugins, and still later I would need to start hacking on the core. I also foresaw that the next stage would be that I would need to throw the whole thing away and rewrite it from scratch to work the way I wanted to. I'm almost at that final stage. And when I get there, I won't have to throw away my plugins. Blosxom is so small and simple that I'll be able to write a Blosxom-compatible replacement for Blosxom. Even in death, the glory of Blosxom will live on.

So what did I mean by saying that Blosxom sucks? I will explain.

Following Richard Gabriel, I think there are essentially two ways to do a software design:

  1. You can try to do the Right Thing, where the Right Thing is very complex, subtle, and feature-complete.
  2. You can take the "Worse-is-Better" approach, and try to cover most of the main features, more or less, while keeping the implementation as simple as possible.
Doing the Right Thing is very difficult. If you get it right, the resulting system is still complex and subtle. It takes a long time to learn to use it, and when users do learn to use it, it continues to surprise and baffle them because it is so complex and subtle. The Right Thing may contain implementation errors, and if it does the user is unlikely to be able to repair them without great effort.

Worse, because the Right Thing is very difficult to achieve, and very complex and subtle, it is rarely achieved. Instead, you get a complex and subtle system that is complexly and subtly screwed up. Often, the designer shoots for something complex, subtle, and correct, and instead ends up with a big pile of crap. I could cite examples here, but I think I've offended enough people for one day.

In contrast, with the "Worse-is-Better" approach, you do not try to do the Right Thing. You try to do the Good Enough Thing. You bang out something short and simple that seems like it might do the job, run it up the flagpole, and see if it flies. If not, you bang on it some more.

Whereas the Right Thing approach is hard to get correct, the Worse-is-Better approach is impossible to get correct. But it is very often a win, because it is much easier to achieve "good enough" than it is to achieve the Right Thing. You expend less effort in doing so, and the resulting system is often simple and easy to manage.

If you screw up with the Worse-is-Better approach, the end user might be able to fix the problem, because the system you have built is small and simple. It is hard to screw it up so badly that you end with nothing but a pile of crap. But even if you do, it will be a much smaller pile of crap than if you had tried to construct the Right Thing. I would very much prefer to clean up a small pile of crap than a big one.

Richard Gabriel isn't sure whether Worse-is-Better is better or worse than The Right Thing. As a philosophical question, I'm not sure either. But when I write software, I nearly always go for Worse is Better, for the usual reasons, which you might infer from the preceding discussion. And when I go looking for other people's software, I look for Worse is Better, because so very few people can carry out the Right Thing, and when they try, they usually end up with a big pile of crap.

When I went looking for blog software back in January, I was conscious that I was looking for the Worse-is-Better software to an even greater degree than usual. In addition to all the reasons I have given above, I was acutely conscious of the fact that I didn't really know what I wanted the software to do. And if you don't know what you want, the Right Thing is the Wrong Thing, because you are not going to understand why it is the Right Thing. You need some experience to see the point of all the complexity and subtlety of the Right Thing, and that was experience I knew I did not have. If you are as ignorant as I was, your best bet is to get some experience with the simplest possible thing, and re-evaluate your requirements later on.

When I found Blosxom, I was delighted, because it seemed clear to me that it was Worse-is-Better through and through. And my experience has confirmed that. Blosxom is a triumph of Worse-is-Better. I think it could serve as a textbook example.

Here is an example of a way in which Blosxom subscribes to the Worse-is-Better philosophy. A program that handles plugins seems to need some way to let the plugins negotiate amongst themselves about which ones will run and in what order. Consider Blosxom's filter callback. What if one plugin wants to force Blosxom to run its filter callback before the other plugins'? What if one plugin wants to stop all the subsequent plugins from applying their filters? What if one plugin filters out an article, but a later plugin wants to rescue it from the trash heap? All these are interesting and complex questions. The Apache module interface is an interesting and complex answer to some of these questions, an attempt to do The Right Thing. (A largely successful attempt, I should add.)

Faced with these interesting and complex questions, Blosxom sticks its head in the sand, and I say that without any intent to disparage Blosxom. The Blosxom answer is:

  • Plugins run in alphabetic order.
  • Each one gets its crack at a data structure that contains the articles.
  • When all the plugins have run, Blosxom displays whatever's left in the data structure.
End of explanation; total time elapsed, three seconds. So OK, a plugin named "AARDVARK" can filter out all the articles and nobody else gets a chance to look at them. So what? That is a feature, not a bug. It's simple. It's flexible enough: If you don't like that "AARDVARK" runs first, rename it to "zyzzyva"; Problem solved. An alternative design would be to have a plugin registry file that lists the order in which the plugins will be run. In the right system, that could be the Right Thing. In a system as simple as Blosxom, it would have been the wrong thing, not Worse-is-Better, but just Worse. The author knew just exactly how much design was required to solve the problem, and then he stopped.

So Blosxom is a masterpiece of Worse-is-Better. I am a Worse-is-Better kind of guy anyway, and Worse-is-Better was exactly what I was looking for in this case, so I was very pleased with Blosxom, and I still am. I got more and better Worse-is-Better for my software dollar than usual.

When I stuck a "BLOSXOM SUX" icon on my blog pages, I was trying to express (in 80×15 pixels) my evaluation of Blosxom as a tremendously successful Worse-is-Better design. Because it might be the Wrong Thing instead of the Right Thing, but it was the Right Wrong Thing, and I'd sure rather have the Right Wrong Thing than the Wrong Right Thing.

And when I said that "Blosxom really sucks... in the best possible way", that's what I meant. In a world full of bloated, grossly over-featurized software that still doesn't do quite what you want, Blosxom is a spectacular counterexample. It's a slim, compact piece of software that doesn't do quite what you want---but because it is slim and compact, you can scratch your head over it for couple of minutes, take out the hammer and tongs, and get it adjusted the way you want.

Thanks, Rael. I think you did a hell of a job. I am truly sorry that was not clear from my earlier article.


[Other articles in category /oops] permanent link

The 3n+1 domain
In an earlier post, I said:

The second proof depends on the (unproved) fact that lowest-term fractions are unique. This is actually a very strong theorem. It is true in the integers, but not in general domains.
To fully address this, I need to discuss one of the most fundamental ideas of number theory, the prime factorization. Every positive integer is either a product of two smaller integers, or else is a prime. Primes include 2, 3, 5, and 7. 4 is not prime because it is the product of 2 and 2; 6 is not prime because it is the product of 2 and 3. 221 is the product of 13 and 17; 222 is the product of 6 and 37; 223 is prime. (1 is a special case, and is not considered prime.) This much you probably already know.

The primes were recognized by mathematicians thousands of years ago. Euclid's Elements, written around 2300 years ago, includes a proof that there is no largest prime number. The most important theorem about the positive integers, also known to the Greeks, is that every positive integer has a unique representation as a product of primes. This theorem, fittingly, is known as the "fundamental theorem of arithmetic".

It's quite easy to show that every positive integer can be represented as a product of primes. Suppose that this were not so. Then there would have to be a smallest positive integer n that was not so representable. The number n is either 1, or is prime, or is a product of smaller numbers. If it is 1, it is representable as the "empty product", another special case. If n itself is prime, then it is trivially represented as the product of the single prime n. (Mathematicians customarily allow such things.) Otherwise, n is the product of two smaller numbers, say p and q. But these two numbers are representable as the products of primes, since they are smaller than n, and n is the smallest number not so representable. And then n's representation would be the concatenation of the representations of p and q.

But showing that the representation is unique is trickier. If it seems obvious, you probably haven't thought about it enough. For example, consider the number 24. We can decompose 24 into the product of smaller numbers as 4×6. Then 4 decomposes to 2×2 and 6 to 2×3, so 24 = 2×2×2×3. But what if we had decomposed 24 differently, say as 3×8? Then we decompose 8 as 2×4, and 4 as 2×2, yielding 24 = 3×2×2×2. The two decompositions end up in the same place. But was that guaranteed? What about for a really big number, like 3,628,800? There are a lot of ways to decompose this. Can we be sure that all paths will end at the same place?

For the integers, the answer is yes, they do. But this is a special property of the integers. One can find simple structures, analogous to the integers, where two paths might not end at the same factorization.

The usual example of this is the so-called 3n+1 domain. In this world, we consider only the integers of the form 3n+1. That is, the only numbers of interest are 1, 4, 7, 10, 13, 16, and so on. It's quite easy to show that the product of any collection of these numbers is another one of the form 3n+1. (Briefly, because (3n+1) × (3m+1) = 9mn+3m+3n+1 = 3·(3mn+m+n) + 1. But if you're not convinced by this, just try some examples.)

The 3n+1 domain has its own idea of what is prime and what isn't. 1, as usual, is a special case. 4 is prime, because it is not a product of smaller numbers. (2×2 doesn't work, because the 3n+1 domain has no 2.) 7 is prime, as usual. 10 is prime. (Again, because the 3n+1 domain has neither 2 nor 5.) 13 is prime, as usual. 16 is not prime; it is 4×4, as usual.

The proof that every number has a prime factorization goes through in the 3n+1 domain just as in the regular integers. But the proof that the factorization is unique does not go through, because in fact the 3n+1 domain does not have this property. The smallest example for which it fails is 100, which has not one but two factorizations into primes: 100 = 4 × 25 = 10 × 10.

The failure of the fundamental theorem of arithmetic in the 3n+1 domain triggers the failure of a long series of related results, like the first domino in a series knocking over the others. For example, in the regular integers, if p is prime, and ab is a multiple of p, then either a or b (or both) is a multiple of p. In particular, if the product of two integers is even, then at least one of the two integers must have been even. This does not hold true in the 3n+1 domain. A counterexample is that neither 4 nor 25 is a multiple of the prime 10, but their product is.

Yet another failure is the one with which I opened the article. We say that two integers have a common factor d if they are both multiples of d. (d = 1 is conventionally excluded, since all integers are multiples of 1.) For example, 28 and 18 have a common factor of 2 since they are both multiples of 2; 36 and 63 have a common factor of 9. 14 and 15 have no common factor. When we write fractions, we customarily forbid the numerator and denominator to have a common factor, since the fraction can then be written more simply by canceling the common factor. For example, we never write 36/90; we always cancel the common factor of 18 and write 2/5 instead. When the numerator and denominator have no common factor, we say that the fraction is "in lowest terms".

Each rational number has a unique representation as a lowest-terms fraction, as the quotient of two integers that have no common factor. There is no way to write 2/5 as the quotient of two numbers other than 2 and 5, except by multiplying them by some constant factor, to get something uninteresting like 200/500.

This property also fails in the 3n+1 domain. Some rational numbers have more than one representation. 4/10 = 10/25, but neither 4 and 10 nor 10 and 25 have common factors to cancel, since the 3n+1 domain omits both 2 and 5.

Exercise: Does the analogous 4n+1 domain have unique prime factorizations?

Coming eventually: Gaussian integers and Eisenstein integers.


[Other articles in category /math] permanent link

Biblical infallibility and pi
Two recent minor themes of the blog have been the value purportedly given for π in the Bible, and the value of serendipity and random browsing in the library. Here's a story about the Library Gods and how they sometimes shower you with blessings, if you let them.

I went to the Penn library today to see what I could turn up about Robert Hooke's measurement of the vibrational rate of strings. (Incidentally, Clinton Pierce had an excellent suggestion about this, which I will relate sooner or later.) I checked the electronic catalog for collections of essays by Hooke, and found several; I picked a likely one and wrote down its call number. Or so I thought; I actually wrote down the call number of the previous entry by mistake. Upon arriving at the appropriate place in the stacks, or so I thought, I realized my mistake when I found the wrong book instead of the right one.

Order
Mathematical and Philosophical Works of John Wilkins
Mathematical and Philosophical Works of John Wilkins
with kickback
no kickback
But two shelves up, another title caught my eye: Mathematical and Philosophical Works, by "Wilkins". On a hunch, I took it down. The hunch was this: I knew from an essay by Jorge Luis Borges that at once time a certain bishop John Wilkins had invented a language in which the meaning of each word would be immediately apparent from its spelling.

(I don't have an example handy, so I will make one up. All words that begin with "p" are animals. Words beginning with "pa" are birds, those with "pe" are fish, and so forth. Words beginning with "pel" are fish with fins and scales. Words for fin-fish that live in rivers and streams all begin with "pela". "pelam" is a salmon.)

Order
An Essay Towards a Real Character and a Philosophical Language
An Essay Towards a Real Character and a Philosophical Language
with kickback
no kickback
So I took down the Wilkins book to see if it was the same Wilkins, and indeed it was. Wilkins was one of the founders of the Royal Society, and was its first secretary. I had wanted to read the book about the philosophical language anyway, and here, I thought, was my chance. (It's titled An Essay Towards a Real Character and a Philosophical Language, if you want to look it up.)


Order
Mercury, or, The Swift and Secret Messenger
Mercury, or, The Swift and Secret Messenger
with kickback
no kickback
The volume I took down turned out to be a collection of some of Wilkins' work. It did not have the philosophical language essay in it, but only a short summary, written much later. But it did contain Wilkins' book Mercury about the state of cryptography and steganography in 1641, which will be invaluable for the day when I attain my dream of writing a book on the history of digital information processing before 1945. And when I opened it at random, the first thing that I saw was an extensive discussion of the Biblical passages from the books of Kings and Chronicles concerning the value of π.

This discussion appears in Wilkins' book The Discovery of a New World, or, a discourse tending to prove that (it is probable) there may be another habitable world in the moon. This book is divided into two parts, the second of which is on the question of whether the Earth is a planet. There is, as you would expect from a book on the subject written in 1638, an extensive discussion of the Biblical evidence on this question.

Wilkins comes down very strongly against the Bible on this matter. The entire third chapter is devoted to arguing:

That the Holy Ghost, in many places of scripture, does plainly conform his expressions unto the errors of our conceits; and does not speak of divers things as they are in themselves, but as they appear to us.
The chapter begins:

There is not any particular by which philosophy hath been more endamaged, than the ignorant superstition of such men: who in stating the controversies of it, do so closely adhere unto the mere words of scripture.
(By "philosophy", here and elsewhere, Wilkins means what we would now call "natural science".) Wilkins then quotes an earlier author on the same topic:

"I for my part am persuaded, that these divine treatises were not written by the holy and inspired penmen, for the interpretation of philosophy, because God left such things to be found out by men's labour and industry."
And he quotes John Calvin similarly:

It was not the purpose of the Holy Ghost to teach us astronomy: but being to propound a doctrine that concerns the most rude and simple people, he does (both by Moses and the prophets) conform himself unto their phrases and conceits. . .
For example, Wilkins cites Psalms 19:6 ("the ends of heaven") and 22:27 ("the ends of the world"), Job 38:4 ("the foundations of the earth"), and says "all of which phrases do plainly allude unto the error of vulgar capacities, which hereby is better instructed, than it would be by more proper expressions."

After much discussion and many examples, Wilkins finally concludes:

From all these scriptures it is clearly manifest that it is a frequent custom of the Holy Ghost to speak of natural things, rather according to their appearance and common opinion, than the truth itself.
So much for Biblical inerrancy. The following chapter is devoted to the demonstration "That divers learned men have fallen into great absurdities, whiles they have looked for the sects of philosophy from the words of scripture."

In the course of my (limited and cursory) research into the analysis of the Biblical value of π given in Kings and Chronicles, I learned something that I found shocking: to wit, that this is still considered a serious argument, at least to the extent that the world is filled with knuckleheaded assholes arguing about it. I don't think I can remember encountering any other topic that seemed to attract as many knuckleheaded assholes. The knuckleheaded assholes are on both sides of the argument, too. On the one hand you have the biblical inerrancy idiots, for whom words fail to express my contempt, and on the other hand you have the smug and intellectually lazy atheists, who ought to know better, but who are only too happy to grab any opportunity to ridicule the Bible.

If John Wilkins, who became a bishop, was able to understand in 1638 that it is a stupid argument, and that all the arguing on both sides is pointless and ill-conceived, why haven't we moved on by now? We have the Wonders of the Internet; why are we filling it up with the same old crap? Can't we do any better? Will we ever lay this one to rest? Isn't there something else to argue about yet?

But no, we still have knuckleheaded assholes trying to get special creation and 4,004 B.C. into school curricula. It's almost as though the Enlightenment never happened. It's obvious, of course, that these losers have missed out on the mainstream of twentieth-century thought. But next time you bump into them, remember that they've also missed out on the mainstream of seventeenth-century thought.

And atheists, please try to remember that just because stupid people flock to stupid religions, that it doesn't have to be that way, and remember the example of Bishop John Wilkins, who wrote a whole book to argue that when the Bible conflicts with common sense and physical evidence, you have to keep the evidence and ignore the scripture.


[Other articles in category /math] permanent link

On the manufacture of spherical objects
In an earlier post, I mentioned shot towers, which were used up into the 19th century for the manufacture of lead shot. The shot must be made spherical, or else it won't work properly when it's fired from a musket barrel; it will get stuck, or the expanding gases will blow past it and leave it in the barrel, or something of the sort. (Rifle bullets, in contrast, are very much not spherical. I would like to write an article about this someday, but at present that one statement has nearly exhausted my store of information on this fascinating subject.)

The manufacture of spherical objects is a nontrivial matter; hence the invention of shot towers. Molten lead is poured through a copper sieve at the top of the tower, and the resulting droplets are allowed to plummet to the bottom of the tower. (Incidentally, "plummet" is the most correct possible word here. It is derived from the Latin plumbum = "lead", and means "to fall like lead".) At the bottom, the droplets land in a tub of water. The congealed droplets are recovered from the tub and sorted for size; they are also sorted for roundness by being rolled down a large table. Insufficiently round shot are melted and dropped again.

This method is obviously only suited to the mass production of bullets; it makes lots of bullets of different sizes at the same time, and requires a giant tower. In one of the Laura Ingalls Wilder books, the author describes her father, Charles Ingalls, making bullets one at a time by the hearth. He would melt pieces of lead broken off a big chunk, and pour the melted lead into a spherical bullet mold. When the lead had cooled the bullet was turned out of the mold, and the sprue and flash trimmed off with a knife. I don't know for sure, but I imagine from the description that the mold looked something like the picture at the right. (All pictures in this article get bigger if you click them.)

Round bullets could also be manufactured by stamping. The picture to the left shows a Civil-war era bullet mold, shaped like a scissors. A cold lead wire was put into the end of the mold and the handles were closed. This cut the end off the wire and pressed it into a spherical shape.

[ Addendum 20070307: the bullet mold at right is probably used for casting bullets, not for stamping them. See this addendum for more details. ]

Simon Cozens inspired this article by writing to inform me that shot towers were also used to manufacture ball bearings. (But what were the bearings made of? I don't know. I imagine that lead ball bearings would deform too easily to be useful. ) Modern ball bearings are manufactured by a casting process, followed by machine polishing to remove the flash and any imperfections.

When I was in college, my professor brought to my attention the famous stone spheres of Costa Rica. These spheres range in size up to seven feet in diameter. He asked me to think about how the spheres might have been made. It is not clear, by the way, how close to spherical they actually are. Supposing that they really are nearly spherical, I have an educated guess, but I'm going to work around to it. In the meantime, you might like to puzzle about this yourself. The spheres were probably made sometime between 800 and 2200 years ago, so you are not allowed to use a CAD system.

When you make a submarine, it's very important that the hull be as exactly circular (in cross-section) as possible, so as to distribute the water pressure evenly. How do you make sure that your submarine hull is not deviating from the ideal of circularity? The first thing that was tried was to measure the distance across the hull in all directions, to make sure that all the diameters had the same length. Shapes with this property are said to have constant width. The first submarines the Germans made using this technique buckled and collapsed when they were subjected to high pressures. Why? Because they weren't circular! Circles do have constant width, but there are many other shapes that also have constant width; the shape to the left, known as a Reuleaux triangle, is an example. Constant width is no guarantee of circularity.

The Germans eventually solved the problem by making giant wooden forms and comparing the submarine's hull to the wooden form as construction progressed. It's easy to make a two-dimensional wooden form: take a big piece of wood, drive in a nail, tie a string to the nail, and draw a circle around the nail. Then cut out the circle and throw it away, and cut the remaining piece in half; you now have two semicircular forms, as large as you want. If you try to fit the form around the outside of the submarine, you will immediately see whether the hull deviates from circularity. I believe this technique is still used for submarine hulls.

I suspect that the Costa Rica spheres were made similarly. You start with a big lump of rock, a chisel, and a semicircular form of the appropriate size. Then you start chiseling at the rock, checking it for circularity by fitting the form to it, and cutting away the parts that don't match. The only difference from the submarine is that you need to check for circularity in multiple planes. You can do this by rotating the form around the point you are working on. The Costa Ricans can't make their forms out of sheets of plywood, of course, but they should be able to make them out of something.

Baseballs are made spherical by a completely different process. The outside of a baseball is a leather cover, most of a baseball is made of yarn, which is wound around a cork until the baseball is the correct size. The tension in the yarn makes the yarn want to move inward, toward the center of the ball. If part of the ball of yarn is too narrow, then that part is closer to the center and the subsequent yarn will tend to to move there, evening it out. Rubber-band-balls are spherical for similar reasons.

Billiard balls were originally made either from wood or were cut whole from elephants' tusks. I don't know how they were made spherical, although the wooden-form technique seems likely to work. Modern billiard balls are cast in molds and then polished.

[ I discussed some other round objects, including gumballs, marbles, and pellets of taconite ore, in a followup article. ]


[Other articles in category ] permanent link

Mon, 17 Sep 2007

The rate of a vibrating string in 1666
In an earlier post, I discussed a report by Samuel Pepys, from 1666, that Robert Hooke claimed to be able to tell how often a fly beat its wings, by comparing the pitch of the sound with that of a known source such as a vibrating string. But I didn't know how he had determined the rate at which the string vibrated. I guessed a couple of ways of measuring this with 17th-century technology.

Clinton Pierce wrote to me with a clever and highly plausible suggestion: build a "buzzer". Take a spoked wheel, and spin it at a known rate, and hold some flexible object against the spokes. If the wheel spins, say, four times per second, and has 64 spokes, the buzzing will be a middle C. It should be possible to space the spokes evenly, and to spin the wheel uniformly at the correct speed.

My own idea involved attaching a bristle to the string, and letting it draw a sinusoid on a smoked glass plate that was drawn past it. The more I thought about the bristle, the more I was sure I had really read about it somewhere. This morning as I made breakfast, I asked myself "What is this 'bristle', anyway? Where do you get one?" and the answer came immediately "You pull it out of a hog." I'm sure I am recalling this and not just making it up. But I liked M. Pierce's suggestion much better than the bristle and smoked glass idea, because it seemed to me much more likely to work. What Hooke actually did, I still have no idea.

I did a little research today, and I have not yet been able to find out how Hooke measured the frequency of a vibrating string. But I have been able to get some information.

Hooke died in 1703. Many of his papers were given to Richard Waller, and in 1705 Waller published Posthumous Works of Dr. Robert Hooke. As an introduction, Waller wrote "The Life of Dr. Robert Hooke", an itemization of every important research or contribution made by Hooke. Two of the entries are as follows:

In July 1664. he produc'd an Experement to shew the number of Vibrations of an extended String, made in a determinate time, requisite to make a certain Tone or Note, by which it was found that a Wire making two hundred seventy two vibrations in one Second of Time, founded G Sol Re Ut in the Scale of all Musick.

(We would make it 384 vibrations; 272 is only C#. Whether this is due to a change in the definition of G, or experimental error, or both, I do not know. But it does resolve my question of whether Hooke did have a specific number for the rate at which a fly beat its wings, or whether he just identified it as low D and left it at that.)

[ Addendum 20060216: I realized this morning that it might also be caused by a change in the length of the second. ]

[ Addendum 20070917: I realized sometime later that this was a dumb suggestion. One second is 1/86400 of a day, then as it is now, and the length of the day has not changed significantly since 1664. ]

In July [1680] he shew'd a way of making Musical and other Sounds, by the striking of the Teeth of several Brass Wheels, proportionaly cut as to their numbers, and turned very fast round, in which it was observable, that the equal or proprtional stroaks of the Teeth, that is, 2 to 1, 4 to 3, &c. made the Musical Notes, but the unequal stroaks of the Teeth more answer'd the sound of the Voice in speaking.

There we have it: He didn't think of the buzzing rotating wheel until 1680, so he must have been doing something else in 1664. What, I do not yet know.


[Other articles in category /physics] permanent link

Faucets

Order
Surely You're Joking, Mr. Feynman
Surely You're Joking, Mr. Feynman
with kickback
no kickback
Richard P. Feynman says:

When I was in high school, I'd see water running out of a faucet growing narrower, and wonder if I could figure out what determines that curve. I found it was rather easy to do.

I puzzled over that one for years; I didn't know how to start. I kept supposing that it had something to do with surface tension and the tendency of the water surface to seek a minimal configuration, and I couldn't understand how it could be "rather easy to do". That stuff all sounded really hard!

But one day I realized in a flash that it really is easy. The water accelerates as it falls. It's moving faster farther down, so the stream must be narrower, because the rate at which the water is passing a given point must be constant over the entire stream. (If water is passing a higher-up point faster than it passes a low point, then the water is piling up in between—which we know it doesn't do. And vice versa.) It's easy to calculate the speed of the water at each point in the stream. Conservation of mass gets us the rest.

So here's the calculation. Let's adopt a coordinate system that puts position 0 at the faucet, with increasing position as we move downward. Let R(p) be the radius of the stream at distance p meters below the faucet. We assume that the water is falling smoothly, so that its horizontal cross-section is a circle.

Let's suppose that the initial velocity of the water leaving the faucet is 0. Anything that falls accelerates at a rate g, which happens to be 9.8 m/s2, but we'll just call it g and leave it at that. The velocity of the water, after it has fallen for time t, is v = gt. Its position p is gt2/2. Thus v = (2gp)1/2.

Here's the key step: imagine a very thin horizontal disk of water, at distance p below the faucet. Say the disk has height h. The water in this disk is falling at a velocity of (2gp)1/2, and the disk itself contains volume π(R(p))2h of water. The rate at which water is passing position p is therefore π(R(p))2h · (2gp)1/2 gallons per minute, or liters per fortnight, or whatever you prefer. Because of the law of conservation of water, this quantity must be independent of p, so we have:

π(R(p))2h · (2gp)1/2 = A

Where A is the rate of flow from the faucet. Solving for R(p), which is what we really want:

R(p) = [ A / πh(2gp)1/2 ]1/2

Or, collecting all the constants (A, π, h, and g) into one big constant k:

R(p) = k p-1/4

There's a picture of that over there on the left side of the blog. Looks just about right, doesn't it? Amazing.

So here's the weird thing about the flash of insight. I am not a brilliant-flash-of-insight kind of guy. I'm more of a slow-gradual-dawning-of-comprehension kind of guy. This was one of maybe half a dozen brilliant flashes of insight in my entire life. I got this one at a funny time. It was fairly late at night, and I was in a bar on Ninth Avenue in New York, and I was really, really drunk. I had four straight bourbons that night, which may not sound like much to you, but is a lot for me. I was drunker than I have been at any other time in the past ten years. I was so drunk that night that on the way back to where I was staying, I stopped in the middle of Broadway and puked on my shoes, and then later that night I wet the bed. But on the way to puking on my shoes and pissing in the bed, I got this inspiration about what shape a stream of water is, and I grabbed a bunch of bar napkins and figured out that the width is proportional to p-1/4 as you see there to the left.

This isn't only time this has happened. I can remember at least one other occasion. When I was in college, I was freelancing some piece of software for someone. I alternated between writing a bit of code and drinking a bit of whisky. (At that time, I hadn't yet switched from Irish whisky to bourbon.) Write write, drink, write write, drink... then I encountered some rather tricky design problem, and, after another timely pull at the bottle, a brilliant flash of inspiration for how to solve it. "Oho!" I said to myself, taking another swig out of the bottle. "This is a really clever idea! I am so clever! Ho ho ho! Oh, boy, is this clever!" And then I implemented the clever idea, took one last drink, and crawled off to bed.

The next morning I remembered nothing but that I had had a "clever" inspiration while guzzling whisky from the bottle. "Oh, no," I muttered, "What did I do?" And I went to the computer to see what damage I had wrought. I called up the problematic part of the program, and regarded my alcohol-inspired solution. There was a clear and detailed comment explaining the solution, and as I read the code, my surprise grew. "Hey," I said, astonished, "it really was clever." And then I saw the comment at the very end of the clever section: "Told you so."

I don't know what to conclude from this, except perhaps that I should have spent more of my life drinking whiskey. I did try bringing a flask with me to work every day for a while, about fifteen years ago, but I don't remember any noteworthy outcome. But it certainly wasn't a disaster. Still, a lot of people report major problems with this strategy, so it's hard to know what to make of my experience.


[Other articles in category /physics] permanent link

Sun, 16 Sep 2007

Thank you very much for that bulletin
I'm about to move house, and so I'm going through a lot of old stuff and throwing it away. I just unearthed the decorations from my office door circa 1994. I want to record one of these here before I throw it away and forget about it. It's a clipping from the front page of the New York Times from 11 April, 1992. It is noteworthy for its headline, which only one column wide, but at the very top of page A1, above the fold. It says:

FIGHTING IMPERILS
EFFORTS TO HALT WAR
IN YUGOSLAVIA

Sometimes good articles get bad headlines. Often the headlines are tacked on just before press time by careless editors. Was this a good article afflicted with a banal headline? Perhaps they meant there was internecine squabbling among the diplomats charged with the negotiations?

No. If you read the article it turned out that it was about how darn hard it was to end the war when folks kept shooting at each other, dad gum it.

I hear that the headline the following week was DOG BITES MAN, but I don't have a clipping of that.


[Other articles in category /lang] permanent link

Sat, 15 Sep 2007

Families of scalars
I'm supposedly in the midst of writing a book about fixing common errors in Perl programs, and my canonical example is the family of scalar variables. For instance, code like this:

     if ($FORM{'h01'}) {$checked01 = " CHECKED "}
     if ($FORM{'h02'}) {$checked02 = " CHECKED "}
     if ($FORM{'h03'}) {$checked03 = " CHECKED "}
     if ($FORM{'h04'}) {$checked04 = " CHECKED "}
     if ($FORM{'h05'}) {$checked05 = " CHECKED "}
     if ($FORM{'h06'}) {$checked06 = " CHECKED "}
(I did not make this up; I got it from here.) The flag here is the family $checked01, $checked02, etc. Such code is almost always improved by replacing the family with an array, and the repeated code with a loop:

        $checked[$_] = $FORM{"h$_"} for "01" .. "06";
Actually in this particular case a better solution was to eliminated the checked variables entirely, but that is not what I was planning to discuss. Rather, I planned to discuss a recent instance in which I wrote some code with a family of variables myself, and the fix was somewhat different.

The program I was working on was a digester for the qmail logs, translating them into a semblance of human-readable format. (This is not a criticism; log files need not be human-readable; they need to be easy to translate, scan, and digest.) The program scans the log, gathering information about each message and all the attempts to deliver it to each of its recipient addresses. Each delivery can be local or remote.

Normally the program prints information about each message and all its deliveries. I was adding options to the program to allow the user to specify that only local deliveries or only remote deliveries were of interest.

The first thing I did was to add the option-processing code:

    ...
  } elsif ($arg eq "--local-only" || $arg eq '-L') {
    $local_only = 1;
  } elsif ($arg eq "--remote-only" || $arg eq '-R') {
    $remote_only = 1;
As you see, this is where I made my mistake, and introduced a (two-member) family of variables. The conventional fix says that this should have been something like $do_only{local} and $do_only{remote}. But I didn't notice my mistake right away.

Later on, when processing a message, I wanted to the program to scan its deliveries, and skip all processing and display of the message unless some of its deliveries were of the interesting type:

  if ($local_only || $remote_only) {
        ...
  }
I had vague misgivings at this point about the test, which seemed redundant, but I pressed on anyway, and found myself in minor trouble. Counting the number of local or remote deliveries was complicated:
  if ($local_only || $remote_only) {
    my $n_local_deliveries = 
      grep $msg->{del}{$_}{lr} eq "local", keys %{$msg->{del}};
    my $n_remote_deliveries = 
      grep $msg->{del}{$_}{lr} eq "remote", keys %{$msg->{del}};
    ...
  }
There is a duplication of code here. Also, there is a waste of CPU time, since the program never needs to have both numbers available. This latter waste could be avoided at the expense of complicating the code, by using something like $n_remote_deliveries = keys(%{$msg->{del}}) - $n_local_deliveries, but that is not a good solution.

Also, the complete logic for skipping the report was excessively complicated:

  if ($local_only || $remote_only) {
    my $n_local_deliveries = 
      grep $msg->{del}{$_}{lr} eq "local", keys %{$msg->{del}};
    my $n_remote_deliveries = 
      grep $msg->{del}{$_}{lr} eq "remote", keys %{$msg->{del}};

    return if $local_only  && $local_deliveries == 0
           || $remote_only && $remote_deliveries == 0;

  }
I could have saved the wasted CPU time (and the repeated tests of the flags) by rewriting the code like this:

  if ($local_only) {
    return unless
      grep $msg->{del}{$_}{lr} eq "local", keys %{$msg->{del}};
  } elsif ($remote_only) {
    return unless
      grep $msg->{del}{$_}{lr} eq "remote", keys %{$msg->{del}};
  }
but that is not addressing the real problem, which was the family of variables, $local_only and $remote_only, which inevitably lead to duplicated code, as they did here.

Such variables are related by a convention in the programmer's mind, and nowhere else. The language itself is as unaware of the relationship as if the variables had been named $number_of_nosehairs_on_typical_goat and $fusion_point_of_platinum. A cardinal rule of programming is to make such conventional relationships explicit, because then the programming system can give you some assistance in dealing with them. (Also because then they are apparent to the maintenance programmer, who does not have to understand the convention.) Here, the program was unable to associate $local_only with the string "local" and $remote_only with "remote", and I had to make up the lack by writing additional code.

For families of variables, the remedy is often to make the relationship explicit by using an aggregate variable, such as an array or a hash, something like this:

  if (%use_only) {
    my ($only_these) = keys %use_only;
    return unless
      grep $msg->{del}{$_}{lr} eq $only_these, keys %{$msg->{del}};
  }
Here the relationship is explicit because $use_only{"remote"} indicates an interest in remote deliveries and $use_only{"local"} indicates an interest in local deliveries, and the program can examine the key in the hash to determine what to look for in the {lr} data.

But in this case the alternatives are disjoint, so the %use_only hash will never contain more than one element. The tipoff is the bizarre ($only_these) = keys ... line. Since the hash is really storing a single scalar, it can be replaced with a scalar variable:

  } elsif ($arg eq "--local-only" || $arg eq '-L') {
    $only_these = "local";
  } elsif ($arg eq "--remote-only" || $arg eq '-R') {
    $only_these = "remote";
Then the logic for skipping uninteresting messages becomes:

  if ($only_these) {
    return unless
      grep $msg->{del}{$_}{lr} eq $only_these, keys %{$msg->{del}};
  }
Ahh, better.

A long time ago I started to suspect that flag variables themselves are a generally bad practice, and are best avoided, and I think this example is evidence in favor of that theory. I had a conversation about this yesterday with Aristotle Pagaltzis, who is very thoughtful about this sort of thing. One of our conclusions was that although the flag variable can be useful to avoid computing the same boolean value more than once, if it is worth having, it is because your program uses it repeatedly, and so it is probably testing the same boolean value more than once, and so it is likely that the program logic would be simplified if one could merge the blocks that would have been controlled by those multiple tests into one place, thus keeping related code together, and eliminating the repeated tests.


[Other articles in category /prs] permanent link

John Wilkins invents the meter

Order
An Essay Towards a Real Character and a Philosophical Language
An Essay Towards a Real Character and a Philosophical Language
with kickback
no kickback
I'm continuing to read An Essay Towards a Real Character and a Philosophical Language, the Right Reverend John Wilkins' 1668 book that attempted to lay out a rational universal language.

In skimming over it, I noticed that Wilkins' language contained words for units of measure: "line", "inch", "foot", "standard", "pearch", "furlong", "mile", "league", and "degree". I thought oh, this was another example of a foolish Englishman mistaking his own provincial notions for universals. Wilkins' language has words for Judaism, Christianity, Islam; everything else is under the category of paganism and false gods, and I thought that the introduction of words for inches and feet was another case like that one. But when I read the details, I realized that Wilkins had been smarter than that.

Wilkins recognizes that what is needed is a truly universal measurement standard. He discusses a number of ways of doing this and rejects them. One of these is the idea of basing the standard on the circumference of the earth, but he thinks this is too difficult and inconvenient to be practical.

But he settles on a method that he says was suggested by Christopher Wren, which is to base the length standard on the time standard (as is done today) and let the standard length be the length of a pendulum with a known period. Pendulums are extremely reliable time standards, and their period depends only their length and on the local effect of gravity. Gravity varies only a very little bit over the surface of the earth. So it was a reasonable thing to try.

Wilkins directed that a pendulum be set up with the heaviest, densest possible spherical bob at the end of lightest, most flexible possible cord, and the the length of the cord be adjusted until the period of the pendulum was as close to one second as possible. So far so good. But here is where I am stumped. Wilkins did not simply take the standard length as the length from the fulcrum to the center of the bob. Instead:

...which being done, there are given these two Lengths, viz. of the String, and of the Radius of the Ball, to which a third Proportional must be found out; which must be as the length of the String from the point of Suspension to the Centre of the Ball is to the Radius of the Ball, so must the said Radius be to this third which being so found, let two fifths of this third Proportional be set off from the Centre downwards, and that will give the Measure desired.

Wilkins is saying, effectively: let d be the distance from the point of suspension to the center of the bob, and r be the radius of the bob, and let x be such that d/r = r/x. Then d+(0.4)x is the standard unit of measurement.

Huh? Why 0.4? Why does r come into it? Why not just use d? Huh?

These guys weren't stupid, and there must be something going on here that I don't understand. Can any of the physics experts out there help me figure out what is going on here?

Anyway, the main point of this note is to point out an extraordinary coincidence. Wilkins says that if you follow his instructions above, the standard unit of measurement "will prove to be . . . 39 Inches and a quarter". In other words, almost exactly one meter.

I bet someone out there is thinking that this explains the oddity of the 0.4 and the other stuff I don't understand: Wilkins was adjusting his definition to make his standard unit come out to exactly one meter, just as we do today. (The modern meter is defined as the distance traveled by light in 1/299,792,458 of a second. Why 299,792,458? Because that's how long it happens to take light to travel one meter.) But no, that isn't it. Remember, Wilkins is writing this in 1668. The meter wasn't invented for another 110 years.

[ Addendum 20070915: There is a followup article, which explains the mysterious (0.4)x in the formula for the standard length. ]

Having defined the meter, which he called the "Standard", Wilkins then went on to define smaller and larger units, each differing from the standard by a factor that was a power of 10. So when Wilkins puts words for "inch" and "foot" into his universal language, he isn't putting in words for the common inch and foot, but rather the units that are respectively 1/100 and 1/10 the size of the Standard. His "inch" is actually a centimeter, and his "mile" is a kilometer, to within a fraction of a percent.

Wilkins also defined units of volume and weight measure. A cubic Standard was called a "bushel", and he had a "quart" (1/100 bushel, approximately 10 liters) and a "pint" (approximately one liter). For weight he defined the "hundred" as the weight of a bushel of distilled rainwater; this almost precisely the same as the original definition of the gram. A "pound" is then 1/100 hundred, or about ten kilograms. I don't understand why Wilkins' names are all off by a factor of ten; you'd think he would have wanted to make the quart be a millibushel, which would have been very close to a common quart, and the pound be the weight of a cubic foot of water (about a kilogram) instead of ten cubic feet of water (ten kilograms). But I've read this section over several times, and I'm pretty sure I didn't misunderstand.

Wilkins also based a decimal currency on his units of volume: a "talent" of gold or silver was a cubic standard. Talents were then divided by tens into hundreds, pounds, angels, shillings, pennies, and farthings. A silver penny was therefore 10-5 cubic Standard of silver. Once again, his scale seems off. A cubic Standard of silver weighs about 10.4 metric tonnes. Wilkins' silver penny is about is nearly ten cubic centimeters of metal, weighing 104 grams (about 3.5 troy ounces), and his farthing is 10.4 grams. A gold penny is about 191 grams, or more than six ounces of gold. For all its flaws, however, this is the earliest proposal I am aware of for a fully decimal system of weights and measures, predating the metric system, as I said, by about 110 years.


[Other articles in category /physics] permanent link

Girls of the SEC
I'm in the Raleigh-Durham airport, and I just got back from the newsstand, where I learned that the pictorial in this month's Playboy magazine this month is "Girls of the SEC". On seeing this, I found myself shaking my head in sad puzzlement.

This isn't the first time I've had this reaction on learning about a Playboy pictorial; last time was probably in August 2002 when I saw the "Women of Enron" cover. (I am not making this up.) I wasn't aware of The December 2002 feature, "Women of Worldcom" (I swear I'm not making this up), but I would have had the same reaction if I had been.

I know that in recent years the Playboy franchise has fallen from its former heights of glory: circulation is way down, the Playboy Clubs have all closed, few people still carry Playboy keychains. But I didn't remember that they had fallen quite so far. They seem to have exhausted all the plausible topics for pictorial features, and are now well into the scraping-the-bottom-of-the-barrel stage. The June 1968 feature was "Girls of Scandinavia". July 1999, "Girls of Hawaiian Tropic". Then "Women of Enron" and now "Girls of the SEC".

How many men have ever had a fantasy about sexy SEC employees, anyway? How can you even tell? Sexy flight attendants, sure; they wear recognizable uniforms. But what characterizes an SEC employee? A rumpled flannel suit? An interest in cost accounting? A tendency to talk about the new Basel II banking regulations? I tried to think of a category that would be less sexually inspiring than "SEC employees". It's difficult. My first thought was "Girls of Wal-Mart." But no, Wal-Mart employees wear uniforms.

If you go too far in that direction you end up in the realm of fetish. For example, Playboy is unlikely to do a feature on "girls of the infectious disease wards". But if they did, there is someone (probably on /b/), who would be extremely interested. It is hard to imagine anyone with a similarly intense interest in SEC employees.

So what's next for Playboy? Girls of the hospital gift shops? Girls of State Farm Insurance telephone customer service division? Girls of the beet canneries? Girls of Acadia University Grounds and Facilities Services? Girls of the DMV?

[ Pre-publication addendum: After a little more research, I figured out that SEC refers here to "Southeastern Conference" and that Playboy has done at least two other features with the same title, most recently in October 2001. I decided to run the article anyway, since I think I wouldn't have made the mistake if I hadn't been prepared ahead of time by "Women of Enron". ]

[ Addendum 20070913: This article is now on the first page of Google searches for "girls of the sec playboy". ]

[ Addendum 20070915: The article has moved up from tenth to third place. Truly, Google works in mysterious ways. ]


[Other articles in category /misc] permanent link

The Wilkins pendulum mystery resolved
Last March, I pointed out that:

  • John Wilkins had defined a natural, decimal system of measurements,
  • that he had done this in 1668, about 110 years before the Metric System, and
  • that the basic unit of length, which he called the "standard", was almost exactly the same length as the length that was eventually adopted as the meter
("John Wilkins invents the meter", 3 March 2006.)

This article got some attention back in July, when a lot of people were Google-searching for "john wilkins metric system", because the UK Metric Association had put out a press release making the same points, this time discovered by an Australian, Pat Naughtin.

For example, the BBC Video News says:

According to Pat Naughtin, the Metric System was invented in England in 1668, one hundred and twenty years before the French adopted the system. He discovered this in an ancient and rare book...
Actually, though, he did not discover it in Wilkins' ancient and rare book. He discovered it by reading The Universe of Discourse, and then went to the ancient and rare book I cited, to confirm that it said what I had said it said. Remember, folks, you heard it here first.

Anyway, that is not what I planned to write about. In the earlier article, I discussed Wilkins' original definition of the Standard, which was based on the length of a pendulum with a period of exactly one second. Then:

Let d be the distance from the point of suspension to the center of the bob, and r be the radius of the bob, and let x be such that d/r = r/x. Then d+(0.4)x is the standard unit of measurement.
(This is my translation of Wilkins' Baroque language.)

But this was a big puzzle to me:

Huh? Why 0.4? Why does r come into it? Why not just use d? Huh?

Soon after the press release came out, I got email from a gentleman named Bill Hooper, a retired professor of physics of the University of Virginia's College at Wise, in which he explained this puzzle completely, and in some detail.

According to Professor Hooper, you cannot just use d here, because if you do, the length will depend on the size, shape, and orientation of the bob. I did not know this; I would have supposed that you can assume that the mass of the bob is concentrated at its center of mass, but apparently you cannot.

The usual Physics I calculation that derives the period of a pendulum in terms of the distance from the fulcrum to the center of the bob assumes that the bob is infinitesimal. But in real life the bob is not infinitesimal, and this makes a difference. (And Wilkins specified that one should use the most massive possible bob, for reasons that should be clear.)

No, instead you have to adjust the distance d in the formula by adding I/md, where m is the mass of the bob and I is the moment of intertia of the bob, a property which depends on the shape, size, and mass of the bob. Wilkins specified a spherical bob, so we need only calculate (or look up) the formula for the moment of inertia of a sphere. It turns out that for a solid sphere, I = 2mr2/5. That is, the distance needed is not d, but d + 2r2/5d. Or, as I put it above, d + (0.4)x, where d/r = r/x.

Well, that answers that question. My very grateful thanks to Professor Hooper for the explanantion. I think I might have figured it out myself eventually, but I am not willing to put a bound of less than two hundred years on how long it would have taken me to do so.

One lesson to learn from all this is that those early Royal Society guys were very smart, and when they say something has a mysterious (0.4)x in it, you should assume they know what they are doing. Another lesson is that mechanics was pretty well-understood by 1668.


[Other articles in category /physics] permanent link

Fri, 14 Sep 2007

Why spiders hang with their heads down
Iris asked me last week why spiders hang in their webs with their heads downwards, and I said I would try to find out. After a cursory Google search, I was none the wiser, so I tried asking the Wikipedia "reference desk" page. I did not learn anything useful about the spiders, but I did learn that the reference desk page is full of people who know even less about spiders than I do who are nevertheless willing to post idle speculations.

Fortunately, I was at a meeting this week in Durham that was also attended by three of the world's foremost spider experts. I put the question to Jonathan A. Coddington, curator of arachnids for the Smithsonian Institution.

Professor Coddington told me that it was because the spider prefers (for obvious mechanical and dynamic reasons) to attack its prey from above, and so it waits the upper part of the web and constructs the web so that the principal prey-catching portion is below. When prey is caught in the web, the spider charges down and attacks it.

I had mistakenly thought that spiders in orb webs (which are the circular webs you imagine when you try to think of the canonical spiderweb) perched in the center. But it is only the topological center, and geometrically it is above the midline, as the adjacent picture should make clear. Note that more of the radial threads are below the center than are above it.


[Other articles in category /bio] permanent link

Wed, 12 Sep 2007

The loophole in the U.S. Constitution: the answer
In the previous article, I wondered what "inconsistency in the Constitution" Gödel might have found that would permit the United States to become a dictatorship.

Several people wrote in to tell me that Peter Suber addresses this in his book The Paradox of Self-Amendment, which is available online. (Suber also provides a provenance for the Gödel story.)

Apparently, the "inconsistency" noted by Gödel is simply that the Constitution provides for its own amendment. Suber says: "He noticed that the AC had procedural limitations but no substantive limitations; hence it could be used to overturn the democratic institutions described in the rest of the constitution." I am gravely disappointed. I had been hoping for something brilliant and subtle that only Gödel would have noticed.

Thanks to Greg Padgett, Julian Orbach, Simon Cozens, and Neil Kandalgaonkar for bringing this to my attention.

M. Padgett also pointed out that the scheme I proposed for amending the constitution, which I claimed would require only the cooperation of a majority of both houses of Congress, 218 + 51 = 269 people in all, would actually require a filibuster-proof majority in the Senate. He says that to be safe you would want all 100 senators to conspire; I'm not sure why 60 would not be sufficient. (Under current Senate rules, 60 senators can halt a filibuster.) This would bring the total required to 218 + 60 = 278 conspirators.

He also pointed out that the complaisance of five Supreme Court justices would give the President essentially dictatorial powers, since any legal challenge to Presidential authority could be rejected by the court. But this train of thought seems to have led both of us down the same path, ending in the idea that this situation is not really within the scope of the original question.

As a final note, I will point out what I think is a much more serious loophole in the Constitution: if the Vice President is impeached and tried by the Senate, then, as President of the Senate, he presides over his own trial. Article I, section 3 contains an exception for the trial of the President, where the Chief Justice presides instead. But the framers inexplicably forgot to extend this exception to the trial of the Vice President.

[ Addendum 20090121: Jeffrey Kegler has discovered Oskar Morgenstern's lost eyewitness account of Gödel's citizenship hearing. Read about it here. ]


[Other articles in category /law] permanent link

Sun, 09 Sep 2007

The snub disphenoid
The snub disphenoid is pictured at left. I do not know why it is called that, and I ought to know, because I am the principal author (so far) of the Wikipedia article on the disphenoid. Also, I never quite figured out what "snub" means in this context, despite perusing that section of H.S.M. Coxeter's book on polytopes at some length. It has something to do with being halfway between what you get when you cut all the corners off, and what you get when you cut all the corners off again.

Anyway, earlier this week I was visiting John Batzel, who works upstairs from me, and discovered that he had obtained a really cool toy. It was a collection of large steel ball bearings and colored magnetic rods, which could be assembled into various polyhedra and trusses. This is irresistible to me. The pictures at right, taken around 2002, show me modeling a dodecahedron with less suitable materials.

The first thing I tried to make out of John's magnetic sticks and balls was a regular dodecahedron, because it is my favorite polyhedron. (Isn't it everyone's?) This was unsuccessful, because it wasn't rigid enough, and kept collapsing. It's possible that if I had gotten the whole thing together it would have been stable, but holding the 50 separate magnetic parts in the right place long enough to get it together was too taxing, so I tried putting together some other things.

A pentagonal dipyramid worked out well, however. To understand this solid, imagine a regular pyramid, such as the kind that entombs the pharaohs or collects mystical energy. This sort of pyramid is known as a square pyramid, because it has a square base, and thus four triangular sides. Imagine that the base was instead a pentagon, so that there were five triangular sides sides instead of only four. Then it would be a pentagonal pyramid. Now take two such pentagonal pyramids and glue the pentagonal bases together. You now have a pentagonal dipyramid.

The success of the pentagonal dipyramid gave me the idea that rigid triangular lattices were the way to go with this toy, so I built an octahedron (square dipyramid) and an icosahedron to be sure. Even the icosahedron (thirty sticks and twelve balls) held together and supported its own weight. So I had John bring up the Wikipedia article about deltahedra. A deltahedron is just a polyhedron whose faces are all equilateral triangles.


Order
Geometric Playthings
Geometric Playthings
with kickback
no kickback
When I was around eight, I was given a wonderful book called Geometric Playthings, by Jean J. Pedersen and Kent Pedersen. The book was in three sections. One section was about Möbius strips, with which I was already familiar; I ignored this section. The second section was about hexaflexagons, with examples to cut out and put together. The third section was about deltahedra, again with cutout models of all eight deltahedra. As an eight-year-old I had cut out and proudly displayed the eight deltahedra, so I knew that there were some reasonably surprising models one would make with John's toy that would be likely to hold together well. Once again, the deltahedra did not disappoint me.

Four of the deltahedra are the tetrahedron (triangular pyramid, with 4 faces), triangular dipyramid (6 faces), octahedron (square dipyramid, 8 faces), and pentagonal dipyramid (10 faces).


Another is the icosahedron. Imagine making a belt of 10 triangles, alternating up and down, and then connect the ends of the belt. The result is a shape called a pentagonal antiprism, shown at left. The edges of the down-pointing triangles form a pentagon on the top of the antiprism, and the edges of the up-pointing triangles form one on the bottom. Attach a pentagonal pyramid to each of these pentagons, and you have an icosahedron, with a total of 20 faces.


The other three deltahedra are less frequently seen. One is the result of taking a triangular prism and appending a square pyramid to each of its three square faces. (Wikipedia calls this a "triaugmented triangular prism"; I don't know how standard that name is.) Since the prism had two triangular faces to begin with, and we have added four more to each of the three square faces of the original prism, the total is 14 faces.

Another deltahedron is the "gyroelongated square dipyramid". You get this by taking two square pyramids, as with the octahedron. But instead of gluing their square bases together directly, you splice a square antiprism in between. The two square faces of the antiprism are not aligned; they are turned at an angle of 45° to each other, so that when you are looking at the top pyramid face-on, you are looking at the bottom pyramid edge-on, and this is the "gyro" in "gyroelongated". (The icosahedron is a gyroelongated pentagonal dipyramid.) I made one of these in John's office, but found it rather straightforward.

The last deltahedron, however, was quite a puzzle. Wikipedia calls it a "snub disphenoid", and as I mentioned before, the name did not help me out at all. It took me several tries to build it correctly. It contains 12 faces and 8 vertices. When I finally had the model I still couldn't figure it out, and spent quite a long time rotating it and examining it. It has a rather strange symmetry. It is front-back and left-right symmetric. And it is almost top-bottom symmetric: If you give it a vertical reflection, you get the same thing back, but rotated 90° around the vertical axis.

When I planned this article I thought I understood it better. Imaging sticking together two equilateral triangles. Call the common edge the "rib". Fold the resulting rhombus along the rib so that the edges go up, down, up, down in a zigzag. Let's call the resulting shape a "wing"; it has a concave side and a convex side. Take two wings. Orient them with the concave sides facing each other, and with the ribs not parallel, but at right angles. So far, so good.

But this is where I started to get it wrong. The two wings have between them eight edges, and I had imagined that you could glue a rhombic antiprism in between them. I'm not convinced that there is such a thing as a rhombic antiprism, but I'll have to do some arithmetic to be sure. Anyway, supposing that there were such a thing, you could glue it in as I said, but if you did the wings would flatten out and what you would get would not be a proper polyhedron because the two triangles in each wing would be coplanar, and polyhedra are not allowed to have abutting coplanar faces. (The putative gyroelongated triangular dipyramid fails for this reason, I believe.)

To make the snub disphenoid, you do stick eight triangles in between the two wings, but the eight triangles do not form a rhombic antiprism. Even supposing that such a thing exists.

I hope to have some nice renderings for you later. I have been doing some fun work in rendering semiregular polyhedra, and I am looking forward to discussing it here. Advance peek: suppose you know how the vertices are connected by edges. How do you figure out where the vertices are located in 3-space?

If you would like to investigate this, the snub disphenoid has 8 vertices, which we can call A, B, ... H. Then:
This vertex:is connected to these:
AB C E F H
BA C D E
CA B D H
DB C E G H
EF G A B D
FE G H A
GE F H D
HF G A C D
The two wings here are ABCD and EFGH. We can distinguish three sorts of edges: five inside the top wing, five inside the bottom wing, and eight that go between the two wings.

Here is a list of the eight deltahedra, with links to the corresponding Wikipedia articles:

NameFacesEdgesVertices
Tetrahedron 464
Triangular dipyramid 695
Octahedron 8126
Pentagonal dipyramid 10157
Snub disphenoid 12188
Triaugmented triangular prism 14219
Gyroelongated square dipyramid 162410
Icosahedron 203012
[ Addendum 20070905: There are some followup notes. ]

[ Addendum 20070908: More about deltahedra. ]


[Other articles in category /math] permanent link

The loophole in the U.S. Constitution

Gödel took the matter of citizenship with great solemnity, preparing for the exam by making a close study of the United States Constitution. On the eve of the hearing, he called [Oskar] Morgenstern in an agitated state, saying he had found an "inconsistency" in the Constitution, one that could allow a dictatorship to arise.
(Holt, Jim. Time Bandits, The New Yorker, 29 February 2005.)

I've wondered for years what "inconsistency" was.

I suppose the Attorney General could bring some sort of suit in the Supreme Court that resulted in the Court "interpreting" the Constitution to find that the President had the power to, say, arbitrarily replace congresspersons with his own stooges. This would require only six conspirators: five justices and the President. (The A.G. is a mere appendage of the President and is not required for the scheme anyway.)

But this seems outside the rules. I'm not sure what the rules are, but having the Supreme Court radically and arbitrarily "re-interpret" the Constitution isn't an "inconsistency in the Constitution". The solution above is more like a coup d'etat. The Joint Chiefs of Staff could stage a military takeover and institute a dictatorship, but that isn't an "inconsistency in the Constitution" either. To qualify, the Supreme Court would have to find a plausible interpretation of the Constitution that resulted in a dictatorship.

The best solution I have found so far is this: Under Article IV, Congress has the power to admit new states. A congressional majority could agree to admit 150 trivial new states, and then propose arbitrary constitutional amendments, to be ratified by the trivial legislatures of the new states.

This would require a congressional majority in both houses. So Gödel's constant, the smallest number of conspirators required to legally transform the United States into a dictatorship, is at most 269. (This upper bound would have been 267 in 1948 when Gödel became a citizen.) I would like to reduce this number, because I can't see Gödel getting excited over a "loophole" that required so many conspirators.

[ Addendum 20070912: The answer. ]

[ Addendum 20090121: Jeffrey Kegler has discovered Oskar Morgenstern's lost eyewitness account of Gödel's citizenship hearing. Read about it here. ]


[Other articles in category /law] permanent link

Sat, 08 Sep 2007

Followup notes about dice and polyhedra
I got a lot of commentary about these geometric articles, and started writing up some followup notes. But halfway through I got stuck in the middle of making certain illustrations, and then I got sick, and then I went to a conference in Vienna. So I decided I'd better publish what I have, and maybe I'll get to the other fascinating points later.

  • Regarding a die whose sides appear with probabilities 1/21 ... 6/21

    • Several people wrote in to cast doubt on my assertion that the probability of an irregular die showing a certain face is proportional to the solid angle subtended by that face from the die's center of gravity. But nobody made the point more clearly than Robert Young, who pointed out that if I were right, a coin would have a 7% chance of landing on its edge. I hereby recant this claim.

    • John Berthels suggested that my analysis might be correct if the die was dropped into an inelastic medium like mud that would prevent it from bouncing.

    • Jack Vickeridge referred me to this web site, which has a fairly extensive discussion of seven-sided dice. The conclusion: if you want a fair die, you have no choice but to use something barrel-shaped.

    • Isabel Lugo wrote a detailed followup in which she discusses this and related problems. She says "What makes Mark's problem difficult is the lack of symmetry; each face has to be different." Quite so.

  • Regarding alternate labelings for standard dice

    • Aaron Crane says that these dice (with faces {1,2,2,3,3,4} and {1,3,4,5,6,8}) are sometimes known as "Sicherman dice", after the person who first brought them to the attention of Martin Gardner. Can anyone confirm that this was Col. G.L. Sicherman? I have no reason to believe that it was, except that it would be so very unsurprising if it were true.

    • Addendum 20070905: I now see that the Wikipedia article attributes the dice to "Colonel George Sicherman," which is sufficiently clear that I would feel embarrassed to write to the Colonel to ask if it is indeed he. I also discovered the the Colonel has a Perl program on his web site that will calculate "all pairs of n-sided dice that give the same sums as standard n-sided dice".

    • M. Crane also says that it is an interesting question which set of dice is better for backgammon. Both sets have advantages: the standard set rolls doubles 1/6 of the time, whereas the Sicherman dice only roll doubles 1/9 of the time. (In backgammon, doubles count double, so that whereas a player who rolls ab can move the pieces a total of a+b points, a player who rolls aa can move pieces a total of 4a points.) The standard dice permit movement of 296/36 points per roll, and the Sicherman dice only 274/36 points per roll.

      Ofsetting this disadvantage is the advantage that the Sicherman dice can roll an 8. In backgammon, one's own pieces may not land on a point occupied by more than one opposing piece. If your opponent occupies six conscutive points with two pieces each, they form an impassable barrier. Such a barrier is passable to a player using the Sicherman dice, because of the 8.

    • Doug Orleans points out that in some contexts one might prefer to use a Sicherman variant dice {2,3,3,4,4,5} and {0,2,3,4,5,7}, which retain the property that opposite faces sum to 7, and so that each die shows 3.5 pips on average. Such dice roll doubles as frequently as do standard dice.

    • The Wikipedia article on dice asserts that the {2, 3, 3, 4, 4, 5} die is used in some wargames to express the strength of "regular" troops, and the standard {1, 2, 3, 4, 5, 6} die to express the strength of "irregular" troops. This makes the outcome of battles involving regular forces more predictable than those involving irregular forces.

  • Regarding deltahedra and the snub disphenoid

    • Several people proposed alternative constructions for the snub disphenoid.

      1. Brooks Moses suggested the following construction: Take a square antiprism, squash the top square into a rhombus, and insert a strut along the short diagonal of the rhombus. Then squash and strut the bottom square similarly.

        It seems, when you think about this, that there are two ways to do the squashing. Suppose you squash the bottom square horizontally in all cases. The top square is turned 45° relative to the bottom (because it's an antiprism) and so you can squash it along the -45° diagonal or along the +45° diagonal, obtaining a left- and a right-handed version of the final solid. But if you do this, you find that the two solids are the same, under a 90° rotation.

        This construction, incidentally, is equivalent to the one I described in the previous article: I said you should take two rhombuses and connect corresponding vertices. I had a paragraph that read:

        But this is where I started to get it wrong. The two wings have between them eight edges, and I had imagined that you could glue a rhombic antiprism in between them. . . .

        But no, I was right; you can do exactly this, and you get a snub disphenoid. What fooled me was that when you are looking at the snub disphenoid, it is very difficult to see where the belt of eight triangles from the antiprism got to. It winds around the polyhedron in a strange way. There is a much more obvious belt of triangles around the middle, which is not suitable for an antiprism, being shaped not like a straight line but more like the letter W, if the letter W were written on a cylinder and had its two ends identified. I was focusing on this belt, but the other one is there, if you know how to see it.

        The snub disphenoid has four vertices with valence 4 and four with valence 5. Of its 12 triangular faces, four have two valence-4 vertices and one valence-5 vertex, and eight have one valence-4 vertex and two valence-5 vertices. These latter eight form the belt of the antiprism.

      2. M. Moses also suggested taking a triaugmented triangular prism, which you will recall is a triangular prism with a square pyramid erected on each of its three square faces, removing one of the three pyramids, and then squashing the exposed square face into a rhombus shape, adding a new strut on the diagonal. This one gives me even less intuition about what is going on, and it seems even more strongly that it shou,ld matter whether you put in the extra strut from upper-left to lower-right, or from upper-right to lower-left. But it doesn't matter; you get the same thing either way.

      3. Jacob Fugal pointed out that you can make a snub disphenoid as follows: take a pentagonal dipyramid, and replace one of the equatorial *----*----* figures with a rhombus. This is simple, but unfortunately gives very little intuition for what the disphenoid is like. It is obvious from the construction that there must be pentagons on the front and back, left over from the dipyramid. But it is not at all clear that there are now two new upside-down pentagons on the left and right sides, or that the disphenoid has a vertical symmetry.

    • A few people asked me where John Batzel got they magnet toy that I was using to construct the models. It costs only $5! John gave me his set, and I bought three more, and I now have a beautiful set of convex deltahedra and a stellated dodecahedron on my desk. (Actually, it is not precisely a stellated dodecahedron, since the star faces are not quite planar, but it is very close. If anyone knows the name of this thing, which has 32 vertices, 90 edges, and 60 equilateral triangular faces, I would be pleased to hear about it.) Also I brought my daughter Iris into my office a few weekends ago to show her the stella octangula ("I wanna see the stella octangula, Daddy! Show me the stella octangula!") which she enjoyed; she then stomped on it, and then we built another one together.

    • [ Addendum 20070908: More about deltahedra. ]


[Other articles in category /math] permanent link

Thu, 06 Sep 2007

Different arrangements for standard dice
Gaal Yahas wrote to refer me to an article about a pair of dice that never roll seven. It sounded cool, but but it was too late at night for me to read it, so I put it on the to-do list. But it reminded me of a really nice puzzle, which is to find a nontrivial relabeling of a pair of standard dice that gives the same probability of throwing any sum from 2 to 12. It's a happy (and hardly inevitable) fact that there is a solution.

To understand just what is being asked for here, first observe that a standard pair of dice throws a 2 exactly 1/36 of the time, a 3 exactly 2/36 of the time, and so forth:

21/36
32/36
43/36
54/36
65/36
76/36
85/36
94/36
103/36
112/36
121/36

The standard dice have faces numbered 1, 2, 3, 4, 5, and 6. It should be clear that if one die had {0,1,2,3,4,5} instead, and the other had {2,3,4,5,6,7}, then the probabilities would be exactly the same. Similarly you could subtract 3.7 from every face of one die, giving it labels {-2.7, -1.7, -0.7, 0.3, 1.3, 2.3}, and if you added the 3.7 to every face of the other die, giving labels {4.7, 5.7, 6.7, 7.7, 8.7, 9.7}, you'd still have the same chance of getting any particular total. For example, there are still exactly 2 ways out of 36 possible rolls to get the total 3: you can roll -2.7 + 5.7, or you can roll -1.7 + 4.7. But the question is to find a nontrivial relabeling.

Like many combinatorial problems, this one is best solved with generating functions. Suppose we represent a die as a polynomial. If the polynomial is Σaixi, it represents a die that has ai chances to produce the value i. A standard die is x6 + x5 + x4 + x3 + x2 + x, with one chance to produce each integer from 1 to 6. (We can deal with probabilities instead of "chances" by requiring that Σai = 1, but it comes to pretty much the same thing.)

The reason it's useful to adopt this representation is that rolling the dice together corresponds to multiplication of the polynomials. Rolling two dice together, we multiply (x6 + x5 + x4 + x3 + x2 + x) by itself and get P(x) = x12 + 2x11 + 3x10 + 4x9 + 5x8 + 6x7 + 5x6 + 4x5 + 3x4 + 2x3 + x2, which gives the chances of getting any particular sum; the coefficient of the x9 term is 4, so there are 4 ways to roll a 9 on two dice.

What we want is a factorization of this 12th-degree polynomial into two polynomials Q(x) and R(x) with non-negative coefficients. We also want Q(1) = R(1) = 6, which forces the corresponding dice to have 6 faces each. Since we already know that P(x) = (x6 + x5 + x4 + x3 + x2 + x)2, it's not hard; we really only have to factor x6 + x5 + x4 + x3 + x2 + x and then see if there's any suitable way of rearranging the factors.

x6 + x5 + x4 + x3 + x2 + x = x(x4 + x2 + 1)(x + 1) = x(x2 + x + 1)(x2 - x + 1)(x + 1). So P(x) has eight factors:

x x2 + x + 1 x2 - x + 1 x + 1
x x2 + x + 1 x2 - x + 1 x + 1

We want to combine these into two products Q(x) and R(x) such that Q(1) = R(1) = 6. If we calculate f(1) for each of these, we get 1, 3 (pink), 1, and 2 (blue). So each of Q and R will require one of the factors that has f(1) = 3 and one that has f(1) = 2; we can distribute the f(1) = 1 factors as needed. For normal dice the way we do this is to assign all the factors in each row to one die. If we want alternative dice, our only real choice is what to do with the x2 - x + 1 and x factors.

Redistributing the lone x factors just corresponds to subtracting 1 from all the faces of one die and adding it back to all the faces of the other, so we can ignore them. The only interesting question is what to do with the x2 - x + 1 factors. The normal distribution assigns one to each die, and the only alternative is to assign both of them to a single die. This gives us the two polynomials:

x(x2 + x + 1)(x + 1)
= x4 + 2x3 + 2x2 + x
x(x2 + x + 1)(x + 1)(x2 - x + 1)2
= x8 + x6 + x5 + x4 + x3 + x

And so the solution is that one die has faces {1,2,2,3,3,4} and the other has faces {1,3,4,5,6,8}:

 122334
1233445
3455667
4566778
5677889
67889910
891010111112

Counting up entries in the table, we see that there are indeed 6 ways to throw a 7, 4 ways to throw a 9, and so forth.

One could apply similar methods to the problem of making a pair of dice that can't roll 7. Since there are six chances in 36 of rolling 7, we need to say what will happen instead in these 6 cases. We might distribute them equally among some of the other possibilities, say 2, 4, 6, 8, 10, and 12, so that we want the final distribution of results to correspond to the polynomial 2x12 + 2x11 + 4x10 + 4x9 + 6x8 + 6x6 + 4x5 + 4x4 + 2x3 + 2x2. The important thing to notice here is that the coefficient of the x7 term is 0.

Now we want to factor this polynomial and proceed as before. Unfortunately, it is irreducible. (Except for the trivial factor of x2.) Several other possibilities are similarly irreducible. It's tempting to reason from the dice to the algebra, and conjecture that any reducible polynomial that has a zero x7 term must be rather exceptional in other ways, such as by having only even exponents. But I'm not sure it will work, because the polynomials are more general than the dice: the polynomials can have negative coefficients, which are meaningless for the dice. Still, I can fantasize that there might be some result of this type available, and I can even imagine a couple of ways of getting to this result, one combinatorial, another based on Fourier transforms. But I've noticed that I have a tendency to want to leave articles unpublished until I finish exploring all possible aspects of them, and I'd like to change that habit, so I'll stop here, for now.

[ Addendum 20070905: There are some followup notes. ]


[Other articles in category /math] permanent link

The 123456 die
As a result of my recent article on the snub disphenoid, Paul Keir wrote to me to ask about non-equiprobable dice. Specifically, he wanted a die that, because it was irregular, was twice as likely to land on one face as on any of the others.

That got me thinking about the problem in general. For some reason I've been trying to construct a die whose faces come up with probabilities 1/21, 2/21, 3/21, 4/21, 5/21, and 6/21 respectively.

Unless there is a clever insight I haven't had, I think this will be rather difficult to do explicitly. (Approximation methods will probably work fairly easily though, I think.) I started by trying to make a hexahedron with faces that had areas 1, 2, 3, 4, 5, 6, and even this has so far evaded me. This will not be sufficient to solve the problem, because the probability that the hexahedron will land on face F is not proportional to the area of F, but rather to the solid angle subtended by F from the hexahedron's center of gravity.

Anyway, I got interested in the idea of making a hexahedron whose faces had areas 1..6. First I tried just taking a bunch of simple shapes (right triangles and the like) of the appropriate sizes and fitting them together geometrically; so far that hasn't worked. So then I thought maybe I could get what I wanted by taking a tetrahedron or a disphenoid or some such and truncating a couple of the corners.

As Polya says, if you can't solve the problem, you should try solving a simpler problem of the same sort, so I decided to see if it was possible to take a regular tetrahedron and chop off one vertex so that the resulting pentahedron had faces with areas 1, 2, 3, 4, 5. The regular tetrahedron is quite tractable, geometrically, because you can put its vertices at (0,0,0), (0,1,1), (1,0,1), and (1,1,0), and then a plane that chops off the (0,0,0) vertex cuts the three apical edges at points (0,a,a), (b,0,b), and (c,c,0), for some 0 ≤ a, b, c ≤ 1. The chopped-off areas of the three faces are simply ab√3/4, bc√3/4, ca√3/4, and the un-chopped base has area √3/4, so if we want the three chopped faces to have areas of 2/5, 3/5 and 4/5 times √3/4, respectively, we must have ab = 3/5, bc = 2/5, and ca = 1/5, and we can solve for a, b, c. (We want the new top face to have area 1/5 · √3/4, but that will have to take care of itself, since it is also determined by a, b, and c.) Unfortunately, solving these equations gives b = √6/√5, which is geometrically impossible. We might fantasize that there might be some alternate solution, say with the three chopped faces having areas of 1/5, 2/5 and 4/5 times √3/4, and the top face being 3/5 · √3/4 instead of 1/5 · √3/4, but none of those will work either.

Oh well, it was worth a shot. I do think it's interesting that if you know the areas of the bottom four faces of a truncated regular tetrahedron, that completely determines the apical face. Because you can solve for the lengths of the truncated apical edges, as above, and then that gives you the coordinates of the three apical vertices.

I had a brief idea about truncating a square pyramid to get the hexahedron I wanted in the first place, but that's more difficult, because you can't just pick the lengths of the four apical edges any way you want; their upper endpoints must be coplanar.

The (0,a,a), (b,0,b), (c,c,0) thing has been on my mind anyway, and I hope to write tomorrow's blog article about it. But I've decided that my articles are too long and too intermittent, and I'm going to try to post some shorter, more casual ones more frequently. I recently remembered that in the early days of the blog I made an effort to post every day, and I think I'd like to try to resume that.

[ Addendum 20070905: There are some followup notes. ]


[Other articles in category /math] permanent link

Tue, 07 Aug 2007

Standard analytic polyhedra
If you want to consider a cube analytically, you have an easy job. The vertices lie at the points:

(0,0,0)
(0,0,1)
(0,1,0)
(0,1,1)
(1,0,0)
(1,0,1)
(1,1,0)
(1,1,1)
And you can see at a glance whether two vertices share an edge (they are the same in two of their three components) or are opposite (they differ in all three components).

Last week I was reading the Wikipedia article about the computer game "Hunt the Wumpus", which I played as a small child. For the Guitar Hero / WoW generation I should explain Wumpus briefly.

The object of "Wumpus" is to kill the Wumpus, which hides in a network of twenty caves arranged in a dodecahedron. Each cave is thus connected to three others. On your turn, you may move to an adjacent cave or shoot a crooked arrow. The arrow can pass through up to five connected caves, and if it enters the room where the Wumpus is, it kills him and you win. Two of the caves contain bottomless pits; to enter these is death. Two of the caves contain giant bats, which will drop you into another cave at random; if it contains a pit, too bad. If you are in a cave adjacent to a pit, you can feel a draft; if you are adjacent to bats, you can hear them. If you are adjacent to the Wumpus, you can smell him. If you enter the Wumpus's cave, he eats you. If you shoot an arrow that fails to kill him, he wakes up and moves to an adjacent cave; if he enters you cave, he eats you. You have five arrows.

I did not learn until much later that the caves are connected in a dodecahedron; indeed, at the time I probably didn't know what a dodecahedron was. The twenty caves were numbered, so that cave 1 was connected to 2, 5, and 8. This necessitated a map, because otherwise it was too hard to remember which room was connected to which.

Or did it? If the map had been a cube, the eight rooms could have been named 000, 001, 010, etc., and then it would have been trivial to remember: 011 is connected to 111, 001, and 010, obviously, and you can see it at a glance. It's even easy to compute all the paths between two vertices: the paths from 011 to 000 are 011–010–000 and 011–001–000; if you want to allow longer paths you can easily come up with 011–111–110–100–000 for example.

And similarly, the Wumpus source code contains a table that records which caves are connected to which, and consults this table in many places. If the caves had been arranged in a cube, no table would have been required. Or if one was wanted, it could have been generated algorithmically.

So I got to wondering last week if there was an analogous nomenclature for the vertices of a dodecahedron that would have obviated the Wumpus map and the table in the source code.

I came up with a very clever proof that there was none, which would have been great, except that the proof also worked for the tetrahedron, and the tetrahedron does have such a convenient notation: you can name the vertices (0,0,0), (0,1,1), (1,0,1), and (1,1,0), where there must be an even number of 1 components. (I mentioned this yesterday in connection with something else and promised to come back to it. Here it is.) So the proof was wrong, which was good, and I kept thinking about it.

The next-simplest case is the octahedron, and I racked my brains trying to come up with a convenient notation for the vertices that would allow one to see at a glance which were connected. When I finally found it, I felt like a complete dunce. The octahedron has six vertices, which are above, below, to the left of, to the right of, in front of, and behind the center. Their coordinates are therefore (1,0,0), (-1,0,0), (0,1,0), (0,-1,0), (0,0,1) and (0,0,-1). Two vertices are opposite when they have two components the same (necessarily both 0) and one different (necessarily negatives). Otherwise, they are connected by an edge. This is really simple stuff.

Still no luck with the dodecahedron. There are nice canonical representations of the coordinates of the vertices—see the Wikipedia article, for example—but I still haven't looked at it closely enough to decide if there is a simple procedure for taking two vertices and determining their geometric relation at a glance. Obviously, you can check for adjacent vertices by calculating the distance between them and seeing if it's the correct value, but that's not "at a glance"; arithmetic is forbidden.

It's easy to number the vertices in layers, say by calling the top five vertices A1 ... A5, then the five below that B1 ... B5, and so on. Then it's easy to see that A3 will be adjacent to A2, A4, and B3, for example.

But this nomenclature, unlike the good ones above, is not isometric: it has a preferred orientation of the dodecahedron. It's obvious that A1, A2, A3, A4, and A5 form a pentagonal face, but rather harder to see that A2, A3, B2, B3, and C5 do. With the cube, it's easy to see what a rotation or a reflection looks like. For example, rotation of 120° around an axis through a pair of vertices of the cube takes vertex (a, b, c) to (c, a, b); rotation of 90° around an axis through a face takes it to (1-b, a, c). Similarly, rotations and reflections of the tetrahedron correspond to simple permutations of the components of the vertices. Nothing like this exists for the A-B-C-D nomenclature for the dodecahedron.

I'll post if I come up with anything nice.


[Other articles in category /math] permanent link

Sun, 05 Aug 2007

Sub-blogs and the math sub-blog
I notice that a number of people have my blog included in lists of "math blogs", which is fine with me, but I got a bit worried when I saw someone's web site that actually includes a lot of "math blog" articles, including mine, which is only ever about one-fourth math, the rest being given over to random other stuff. So the "math blog" section of this guy's web site is carrying my ill-informed articles about evolutionary biology and notes about the Frances the Badger books.

If you really do want just the math articles for some reason, you can subscribe to the feeds at http://blog.plover.com/math/index.rss or http://blog.plover.com/math/index.atom. I've been generating these sub-feed files since the blog began, and I know nobody uses them. But perhaps someone would like to.

Similarly, there are sub-feeds for other subsections of the blog, for example "physics". Most of these topic areas receive many fewer updates than does the "math" section:

64 math
16 lang
14 physics
14 oops
14 linogram
14 book
14 prog
10 bio
7 lang/etym
6 meta

Personally I feel that the eclecticism of the blog is one of its attractions, and I gather that a lot of other people do too, but perhaps not everyone agrees.


[Other articles in category /meta] permanent link

Sat, 28 Jul 2007

Conference talk brochure descriptions
I just got back from doing some tutorials at OSCON, which were generally well-received. Sometimes it goes better than other times; this time it went pretty well, I thought, except that I was seven minutes late to the Tuesday morning one, through a tremendous series of fuckups beginning with the conference hotel not being able to find my reservation on Saturday night, continuing with my barely missing two unrelated streetcars on Tuesday morning, and, let's not leave out the most important part, my forgetting that the class started at 8:30 and not at 9:00 until about 8:00.

I've written before about the general worthlessness of the attendee evaluations, so maybe I won't go into detail about them again. What I want to complain about here is the descriptions of the classes that appear in the conference brochure and on the web site.

One of the things that Nat (the program committee chair) and I have commiserated about in the past is that no matter how hard you try to make a clear, concise, accurate description of the class, you are doomed, because people do not use the descriptions in a rational way. For example, suppose I happen to be giving the same class two years in a row. The class title is the same both years. The 250-word description in the brochure and on the web site is word-for-word identical both years. Nevertheless, you can be sure that someone will hand in an evaluation the second year that complains bitterly that the class was a waste of time, because they took the class the year before and there was no new material. I vented about this to Nat once, and the look of exhausted disgust on his face was something to see. Because I only have to read my own stupid evaluations, but Nat has to read all the stupid evaluations, and he probably sees that same idiotic complaint ten times a year.

Here's one I was afraid I'd get this year, and, who knows. It may yet happen. I sent the program committee seven proposals. They accepted three. One was for the Advanced techniques for Parsing class; one for for Higher-Order Perl. There was significant overlap between these two classes; the last third of the Higher-Order Perl class is about higher-order parser combinators, which are the principal subject of the advanced parsing class. This puts me in a difficult position. The program committee has accepted two classes that overlap. I have to deliver the material that I promised in the brochure, which people paid money to hear. I cannot unilaterally eliminate the overlap, say by substituting a different topic into Higher-Order Perl, because then someone in that class might quite rightly complain that they had been promised a section on parsing techniques, had paid for a section on parsing techniques, but had not been delivered a section on parsing techniques. But some people will sign up for both classes, and then will inevitably complain about the overlap, even though it should have been clear from the brochure that the classes would overlap.

The only way out for me is to try to get the program committee to agree beforehand to let me change around one of the classes to remove the overlap, write one-third of a new class, and document the change in the brochure description before it is published. That is a lot of work to do in a short time. Some people write their class slides the night before they give the class. I don't; I take weeks over it, revising extensively, and then I give a practice session, and then I revise again. So the classes overlapped, and I'm sure there were complaints about it that I haven't seen yet.

My favorite complaint of all time was from the guy who took Tricks of the Wizards and then complained that the material was too advanced.

This year I had the opposite problem. I gave a class on Advanced techniques for Parsing, and the following day I read a blog article from someone who had been disappointed that it was insufficiently advanced. This is a fair and legitimate criticism, and deserves a reasonable response. The response is not, however, to change the class content, because I think I have a pretty good idea of how sophisticated the conference attendees are, and of what is useful, and if I made the class a lot more advanced than it is, hardly anyone would understand it. But I did feel bad that this blogger had mistakenly wasted hours in my class and gotten nothing out of it. That should have been avoidable.

The first thing I did was to check the brochure description, to see if perhaps it was misleading, or if it promised extra-advanced material that I then didn't deliver. This sometimes happens. The deadline for proposals is far in advance of the deadline for the class materials themselves. So what happens is that you write up a proposal for a class you think you can do, that people will like, and that will appeal to the program committee, and you send it in. A few months later, it is accepted, and you start work on the class. Then sometimes you discover that even though you proposed a class about A, B, and C, there is only enough time to do A and B properly, and to cover all three in a three-hour class would just be a mess. So you write a class that covers A and B properly, and has an abbreviated discussion of C. But then there will be some people who came to the class specifically for the discussion of C, and who are disappointed. It is a tough problem.

Anyway, I thought this time I had done a reasonably good job of writing a class that actually matched the brochure description. So I wrote to the blogger to ask how the description could have been better: what would I have needed to say in it that would have tipped him off that the class would not have had whatever it was he was looking for?

The answer: nothing. He had not read the description. He attended the class solely because of the title, Advanced techniques for Parsing, and then after two hours figured out that it was not as advanced as he wanted it to be.

Not my fault! Not my fault!


[Other articles in category /talk] permanent link

Fri, 27 Jul 2007

More about fixed points and attractors
A while back I talked about a technique for calculating √2 where you pick a function that has √2 as a fixed point (that is, f(√2) = √2) and then see what happens when you consider the sequence x, f(x), f(f(x)), ..., for various initial values of x. For some such functions the sequence diverges, but often it converges to √2.

I picked a few example functions, some of which worked and some of which didn't.

One glaring omission from the article was that I forgot to mention the so-called "Babylonian method" for calculating square roots. The Babylonian method for calculating √n is simply to iterate the function x → ½(x + n/x). (This is a special case of the Newton-Raphson method for finding the zeroes of a function. In this case the function whose zeroes are being found is is xx2 - n.) The Babylonian method converges quickly for almost all initial values of x. As I was writing the article, at 3 AM, I had the nagging feeling that I was leaving out an important example function, and then later on realized what it was. Oops.

But there's a happy outcome, which is that the Babylonian method points the way to a nice general extension of this general technique. Suppose you've found a function f that has your target value, say √2, as a fixed point, but you find that iterating f doesn't work for some reason. For example, one of the functions I considered in the article was x → 2/x. No matter what initial value you start with (other than √2 and -√2) iterating the function gets you nowhere; the values just hop back and forth between x and 2/x forever.

But as I said in the original article, functions that have √2 as a fixed point are easy to find. Suppose we have such a function, f, which is badly-behaved because the fixed point repels, or because of the hopping-back-and-forth problem. Then we can perturb the function by trying instead x → ½(x + f(x)), which has the same fixed points, but which might be better-behaved. (More generally, x → (ax + bf(x)) / (a + b) has the same fixed points as f for any nonzero a and b, but in this article we'll leave a = b = 1.) Applying this transformation to the function x → 2/x gives us the the Babylonian method.

I tried applying this transform to the other example I used in the original article, which was xx2 + x - 2. This has √2 as a fixed point, but the √2 is a repelling fixed point. √2 ± &epsilon → √2 ± (1 + 2√2)ε, so the error gets bigger instead of smaller. I hoped that perturbing this function might improve its behavior, and at first it seemed that it didn't. The transformed version is x → ½(x + x2 + x - 2) = x2/2 + x - 1. That comes to pretty much the same thing. It takes √2 ± &epsilon → √2 + (1 + √2)ε, which has the same problem. So that didn't work; oh well.

But actually things had improved a bit. The original function also has -√2 as a fixed point, and again it's one that repels from both sides, because -√2 ± ε → -√2 ± (1 - 2√2)ε, and |1 - 2√2| > 1. But the transformed function, unlike the original, has -√2 as an attractor, since it takes -√2 ± ε → -√2 ± (1 - √2)ε and |1 - √2| < 1.

So the perturbed function works for calculating √2, in a slightly backwards way; you pick a value close to -√2 and iterate the function, and the iterated values get increasingly close to -√2. Or you can get rid of the minus signs entirely by transforming the function again, and considering -f(-x) instead of f(x). This turns x2/2 + x - 1 into -x2/2 + x + 1. The fixed points change places, so now √2 is the attractor, and -√2 is the repeller, since √2 ± ε → √2 ± (1 - √2)ε. Starting with x = 1, we get:

1.5
1.375
1.4296875
1.40768433
1.41689675
1.41309855
1.41467479
1.41402241
1.41429272
1.41418077
1.41422714
1.41420794
1.41421589
1.41421260
1.41421396
1.41421340
1.41421363
So that worked out pretty well. One might even make the argument that the method is simpler than the Babylonian method, since the division is a simple x/2 instead of a complex 2/x. I have not yet looked into the convergence properties; I expect it will turn out that the iterated polynomial converges more slowly than the Babylonian method.

I had meant to write about Möbius transformations, but that will have to wait until next week, I think.


[Other articles in category /math] permanent link

Sun, 22 Jul 2007

"More intuitive" programming language syntax
Chromatic wrote an article today about The Broken Metric of "Intuitive to the Uneducated" Language Syntax in which he addresses the very common argument that some language syntax is better than some other because it is "more intuitive" or "easier for beginners to understand".

Chromatic says that these arguments are bunk because programming language syntax is much less important than programming language semantics. But I think that is straining at a gnat and swallowing a camel.

To argue that a certain programming language feature is bad because it is confusing to beginners, you have to do two things. You have to successfully argue that being confusing to beginners is an important metric. Chromatic's article tries to refute this, saying that it is not an important metric.

But before you even get to that stage, you first have to show that the programming language feature actually is confusing to beginners.

But these arguments are never presented with any evidence at all, because no such evidence exists. They are complete fabrications, pulled out of the asses of their propounders, and made of equal parts wishful thinking and bullshit.

Addendum 20070720:
To support my assertion that nobody knows what makes programming hard for beginners, I wanted to cite this paper, The camel has two humps, by Dehnadi and Bornat, which I was rereading recently, but I couldn't find my copy and couldn't remember the title or authors. Happily, I eventually remembered.

The abstract begins:

Learning to program is notoriously difficult. A substantial minority of students fails in every introductory programming course in every UK university. Despite heroic academic effort, the proportion has increased rather than decreased over the years. Despite a great deal of research into teaching methods and student responses, we have no idea of the cause.
But the situation isn't completely hopeless; the abstract also says:

We have found a test for programming aptitude, of which we give details. We can predict success or failure even before students have had any contact with any programming language with very high accuracy, and by testing with the same instrument after a few weeks of exposure, with extreme accuracy. We present experimental evidence to support our claim. certain to succeed.
What's the secret? Read and learn.

[ Addendum 20070722: I screwed up the links to the paper when I first posted them; they are fixed now. The paper is here. Thanks to Anton Berezin for pointing this out. ]


[Other articles in category /prog] permanent link

Sat, 21 Jul 2007

Homosexuality is not hereditary
A just read a big pile of blog comments that all said that homosexuality couldn't be hereditary, because if it were, natural selection would have gotten rid of it by now.

But natural selection is more interesting than that. This article will ignore the obvious notion of homosexuals who breed anyway. Here is one way in which homosexuality could be entirely hereditary and still be favored by natural selection.

Suppose that human sexuality is extremely complicated, which should not be controversial. Suppose, just for concreteness, that there are 137 different genes that can affect whether an individual turns out heterosexual or homosexual. Say that each of these can either be either in state Q or state S, and that and that any individual will turn out homosexual if any 93 of the 137 genes are in state Q, heterosexual otherwise.

The over-simplistic argument from natural selection says that the Q states will be bred out of the population, and that S will be increasingly predominant over time.

Now let's consider an individual, X, whose family members tend to carry a lot of Q genes.

Suppose X's parents have a lot of Q genes, around 87 or 90. X's parents' siblings, who resemble them, will also have a lot of Q genes, and have a high probability of being homosexual. Having no children of their own, they may contribute to X's welfare, maybe by caring for X or by finding food for X.

In short, for every gay uncle X has, that is one additional set of cousins with whom X does not have to compete for scarce resources.

This could well turn out to be a survival advantage for X over someone from a family of people without a lot of Q genes, someone who is competing for food with a passel of cousins, none of whom ever really get enough to eat, someone whose aunt might even try to kill them in order to benefit her own children.

Perhaps X turns out to be homosexual and never breeds, but X probably has some siblings, in which case X might be an advantageous gay uncle or lesbian aunt to one of his or her own nieces or nephews, who, remember, are carrying a lot of the same genes, including the Q genes.

It might not actually work this way, of course, and in most ways it probably doesn't. The only point here is to show that natural selection does not necessarily rule out the idea of inherited homosexuality; people who think it must, have not exercised enough imagination.

(Now that I have finished writing this article, it occurs to me that the same argument applies to bees and ants; most individuals in a bee or ant colony are sterile. Who would be foolish enough to argue that this trait will soon be bred out of the colony?)

Order
Darwin's Dangerous Idea
Darwin's Dangerous Idea
with kickback
no kickback
The moral of this story:

Time and time again, biologists baffled by some apparently futile or maladroit bit of bad design in nature have eventually come to see that they have underestimated the ingenuity, the sheer brilliance, the depth of insight to be disovered in one of Mother Nature's creations. Francis Crick has mischievously baptized this trend in the name of his colleague Leslie Orgel, speaking of what he calls "Orgels Second Rule: Evolution is cleverer than you are."
Daniel Dennett, Darwin's Dangerous Idea, p. 74.


[Other articles in category /bio] permanent link

Fri, 20 Jul 2007

Tough questions
It's easy to recognize a good question: a good question is one that takes a lot longer to answer than it does to ask. Chip Buchholtz's example is "what is a byte?" To answer that you have to get into the nitty gritty of computer architecture and how, although the information in the computer is stored by the bit, the memory bus can only address it by the byte.

One of the biology interns asked a me a good one a couple of weeks ago: he asked how, if Perl runs Perl scripts, and the OS is running Perl, what is running the OS? Now that is a tough question to answer. I explained about logic gates, and how the logic gates are built into trivial arithmetic and memory circuits, how these are then built up into ALUs and memories, and how these in turn are controlled by microcode, and finally how the logical parts are assembled into a computer. I don't know how understandable it was, but it was the best I could do in five minutes, and I think I got some of the idea across. But I started and finished by saying that it was basically miraculous.

My daughter Iris asks a ton of questions, some better than others. On any given evening she is likely to ask "Daddy, what are you doing?" about fifteen times, and "why?" about fifteen million times. "Why" can be a great question, but sometimes it's not so great; Iris asks both kinds. Sometimes it's in response to "I'm eating a sandwich." Then the inevitable "why?" is rather annoying.

Some of the "why" questions are nearly impossible to answer. For example, we see a lady coming up the street toward us. "Is that Susanna?" "No." "Why is it not Susanna?"

I think what's happening here is that having discovered this magic word that often produces interesting information, Iris is employing it whenever possible, even when it doesn't make sense, because she hasn't yet learned when it works and when not. Why is that not Susanna? Hey, you never know when you might get an interesting answer. But there might be something else going on that I don't appreciate.

But the nice thing about Iris's incessant questions is that she listens to and remembers the answers, ponders them deeply, and then is likely to come back with an insightful followup when you least expect it.

Order
Make Way for Ducklings
Make Way for Ducklings
with kickback
no kickback
This weekend we went to visit my parents in New York, and as we drove down the Henry Hudson Parkway, we passed the North River wastewater treatment plant. Three-year-olds are fascinated with poop, so I took the opportunity to point out the plant to Iris. I said that although it had a park with trees on the roof, the inside was a giant machine for turning poop into garden soil; they cleaned it and mixed with with wood chips and it composted like the stuff in our composter. (I later found that some of these details were not quite accurate, but the general idea is correct. See the official site for the official story. My wife provided the helpful analogy with the composter.) As I expected, Iris was interested, and thought this over; she confirmed that they turned poop into soil, and then asked what they made pee into. I was not prepared for that one, and I had to promise her I would find out later. It took me some Internet research time to find out about denitrogenation.

Speaking of poop, last month Iris asked a puzzler: why don't birds use toilets? I think this was motivated by our earlier discussion of bird poop on our car.

In Make Way for Ducklings there's a picture of the friendly policeman Michael, running back to his police box to order a police escort to help the ducklings across Beacon Street. He's holding his billy club. Iris asked what that was for. I thought a moment, and then said "It's for hitting people with." Later I wondered if I had given an inaccurate or incomplete answer, so I asked around, and did some reading. It appears I got that one right. Some folks I know suggested that I should have said it was for hitting bad people, but I'd rather stick to the plain facts, and leave out the editorializing.


Order
The Defeat of the Spanish Armada
The Defeat of the Spanish Armada
with kickback
no kickback
Anyway, lately I've been rereading The Defeat of the Spanish Armada, by Garrett Mattingly, which is a really good book; it won a special Pulitzer Prize when it was published. It's about the attempt by Spain to invade England in 1588. The invasion was a failure, and the Spanish got clobbered. Most interesting minor detail: Francis Drake went to St. Vincent the year before the Armada sailed and captured a bunch of merchant ships that were carrying seasoned barrel-staves, which he burnt. As a result, when the mighty Armada sailed, many of the ships had to carry casks made of green wood, and they leaked; whenever the Spanish opened a cask that should have contained food or water, they were as likely as not to find it full of green slime instead.

So I was reading the Mattingly book this evening, and Iris was eating and playing with Play-Doh on the kitchen floor. After the eleventh repetition of "Daddy, what are you doing?" "Reading." I decided to tell Iris what I was reading about. I said that I was reading about ships, that ships are big boats; they carry lots of men and guns. Iris asked why they carried guns, and I explained that often the ships carried treasure, like spices or gold or jewels or cloth, and that pirates tried to steal it. Iris asked if the cloth was like a wash cloth, and I said no, it was more like the kind of cloth that Mommy makes quilts from, or like the silk that her silk dress is made of. I explained about the pirates, which she seemed to understand, because toddlers know all about people who try to take stuff that isn't theirs. And then she asked the question I couldn't answer: Why were there men on the ships, but no women?

I was totally stumped; I don't even know where to begin explaining to a three-year-old why there are no women on ships in 1588. The only answers I could think of had to do with women's traditional roles, with European mores, social constructions of gender, and so on, all stuff that wouldn't help. Sometimes women were smuggled aboard ship, but I wasn't going to say that either.

I don't usually give up, but this time I gave up. This is a tough question of the first order, easy to ask, hard to answer. It's a lot easier to explain wastewater treatment.


[Other articles in category /misc] permanent link

Thu, 19 Jul 2007

How to calculate the square root of 2
A few weeks ago I mentioned the following recurrence:

p0 = 1 q0 = 1
pi+1 = pi + 2qi qi+1 = pi + qi
If you carry this out, you get pairs p and q that have p2 - 2q2 = ±1, which means that p/q ≈ √2. The farther you carry the recurrence, the better the approximation is.

I said that this formula comes from consideration of continued fractions. But I was thinking about it a little more, and I realized that there is a way to get such a recurrence for pretty much any algebraic constant you want.

Consider for a while the squaring function s : xx2. This function has two obvious fixed points, namely 0 and 1, by which I mean that s(0) = 0 and s(1) = 1. Actually it has a third fixed point, ∞.

If you consider the behavior on some x in the interval (0, 1), you see that s(x) is also in the same interval. But also, s(x) < x on this interval. Now consider what happens when you iterate s on this interval, calculating the sequence s(x), s(s(x)), and so on. The values must stay in (0, 1), but must always decrease, so that no matter what x you start with, the sequence converges to 0. We say that 0 is an "attracting" fixed point of s, because any starting value x, no matter how far from 0 it is (as long as it's still in (0, 1)), will eventually be attracted to 0. Similarly, 1 is a "repelling" fixed point, because any starting value of x, no matter how close to 1, will be repelled to 0.

Consideration of the interval (1, ∞) is similar. 1 is a repeller and ∞ is an attractor.

Fixed points are not always attractors or repellers. The function x → 1/x has fixed points at ±1, but these points are neither attractors nor repellers.

Also, a fixed point might attract from one side and repel from the other. Consider xx/(x+1). This has a fixed point at 0. It maps the interval (0, ∞) onto (0, 1), which is a contraction, so that 0 attracts values on the right. On the other hand, 0 repels values on the left, because 1/-n goes to 1/(-n+1). -1/4 goes to -1/3 goes to -1/2 goes to -1, at which point the whole thing blows up and goes to -∞.

The idea about the fixed point attractors is suggestive. Suppose we were to pick a function f that had √2 as a fixed point. Then √2 might be an attractor, in which case iterating f will get us increasingly accurate approximations to √2.

So we want to find some function f such that f(√2) = √2. Such functions are very easy to find! For example, take √2. square it, and divide by 2, and add 1, and take the square root, and you have √2 again. So x → √(1+x2/2) is such a function. Or take √2. Take the reciprocal, double it, and you have √2 again. So x → 2/x is another such function. Or take √2. Add 1 and take the reciprocal. Then add 1 again, and you are back to √2. So x → 1 + 1/(x+1) is a function with √2 as a fixed point.

Or we could look for functions of the form ax2 + bx + c. Suppose √2 were a fixed point of this function. Then we would have 2a + b√2 + c = √2. We would like a, b, and c to be simple, since the whole point of this exercise is to calculate √2 easily. So let's take a=b=1, c=-2. The function is now xx2 + x - 2.

Which one to pick? It's an embarrasment of riches.

Let's start with the polynomial, xx2 + x - 2. Well, unfortunately this is the wrong choice. √2 is a fixed point of this function, but repels on both sides: √2 ± ε → √2 ± ε(1 + 2√2), which is getting farther away.

The inverse function of xx2 + x - 2 will have √2 as an attractor on both sides, but it is not so convenient to deal with because it involves taking square roots. Still, it does work; if you iterate ½(-1 + √(9 + 4x)) you do get √2.

Of the example functions I came up with, x → 2/x is pretty simple too, but again the fixed points are not attractors. Iterating the function for any initial value other than the fixed points just gets you in a cycle of length 2, bouncing from one side of √2 to the other forever, and not getting any closer.

But the next function, x → 1 + 1/(x+1), is a winner. (0, ∞) is crushed into (1, 2), with √2 as the fixed point, so √2 attracts from both sides.

Writing x as a/b, the function becomes a/b → 1 + 1/(a/b+1), or, simplifying, a/b → (a + 2b) / (a + b). This is exactly the recurrence I gave at the beginning of the article.

We did get a little lucky, since the fixed point of interest, √2, was the attractor, and the other one, -√2, was the repeller. ((-∞, -1) is mapped onto (-∞, 1), with -√2 as the fixed point; -√2 repels on both sides.) But had it been the other way around we could have exchanged the behaviors of the two fixed points by considering -f(-x) instead. Another way to fix it is to change the attractive behavior into repelling behavior and vice versa by running the function backwards. When we tried this for xx2 + x - 2 it was a pain because of the square roots. But the inverse of x → 1 + 1/(x+1) is simply x → (-x + 2) / (x - 1), which is no harder to deal with.

The continued fraction stuff can come out of the recurrence, instead of the other way around. Let's iterate the function x → 1 + 1/(1+x) formally, repeatedly replacing x with 1 + 1/(1+x). We get:

1 + 1/(1+x)
1 + 1/(1+1 + 1/(1+x))
1 + 1/(1+1 + 1/(1+1 + 1/(1+x)))
...
So we might expect the fixed point, if there is one, to be 1 + 1/(2 + 1/(2 + 1/(2 + ...))), if this makes sense. Not all such expressions do make sense, but this one is a continued fraction, and continued fractions always make sense. This one is eventually periodic, and a theorem says that such continued fractions always have values that are quadratic surds. The value of this one happens to be √2. I hope you are not too surprised.

In the course of figuring all this out over the last two weeks or so, I investigated many fascinating sidetracks. The x → 1 + 1/(x+1) function is an example of a "Möbius transformation", which has a number of interesing properties that I will probably write about next month. Here's a foretaste: a Möbius transformation is simply a function x → (ax + b) / (cx + d) for some constants a, b, c, and d. If we agree to abbreviate this function as ${ a\, b \choose c\,d}$, then the inverse function is also a Möbius transformation, and is in fact ${a\, b\choose c\,d}^{-1}$.

[ Addendum 20070719: There is a followup article to this one. ]


[Other articles in category /math] permanent link

Wed, 18 Jul 2007

God Plays Dice
Lately my favorite blog is God Plays Dice. If you like my blog, I think you will probably like that one too.


[Other articles in category ] permanent link

Sat, 14 Jul 2007

Evaporation
I work for the Penn Genomics Institute, mostly doing software work, but the Institute is run by biologists and also does biology projects. Last month I taught some perl classes for the four summer interns; this month they are doing some lab work. Since part of my job involves dealing with biologists, I thought this would be a good opportunity to get into the lab, and I got permission from Adam, the research scientist who was supervising the interns, to let me come along.

Since my knowledge of biology is practically nil, Adam was not entirely sure what to do with me while the interns prepared to grow yeasts or whatever it is that they are doing. He set me up with a scale, a set of pipettes, and a beaker of water, with instructions to practice pipetting the water from the beaker onto the scale.

The pipettes came in three sizes. Shown at right is the largest of the ones I used; it can dispense liquid in quantities between 10 and 100 μl, with a precision of 0.1 μl. I used each of the three pipettes in three settings, pipetting water in quantities ranging from 1 ml down to 5 μl. I think the idea here is that I would be able to see if I was doing it right by watching the weight change on the scale, which had a display precision of 1 mg. If I pipette 20 μl of water onto the scale, the measured weight should go up by just about 20 mg.

Sometimes it didn't. For a while my technique was bad, and I didn't always pick up the exact right amount of water. With the small pipette, which had a capacity range of 2–20 μl, you have to suck up the water slowly and carefully, or the pipette tip gets air bubbles in it, and does not pick up the full amount.

With a scale that measures in milligrams, you have a wait around for a while for the scale to settle down after you drop a few μl of water onto it, because the water bounces up and down and the last digit of the scale readout oscillates a bit. Milligrams are much smaller than I had realized.

It turned out that it was pretty much impossible to see if I was picking up the full amount with the smallest pipette. After measuring out some water, I would wait a few seconds for the scale display to stabilize. But if I waited a little longer, it would tick down by a milligram. After another twenty or thirty seconds it would tick down by another milligram. This would continue indefinitely.

I thought about this quietly for a while, and realized that what I was seeing was the water evaporating from the scale pan. The water I had in the scale pan had a very small surface area, only a few square centimeters. But it was evaporating at a measurable rate, around 2 or 3 milligrams per minute.

So it was essentially impossible to measure out five pipette-fuls of 10 μl of water each and end up with 50 mg of water on the scale. By the time I got it done, around 15% of it would have evaporated.

The temperature here was around 27°C, with about 35% relative humidity. So nothing out of the ordinary.

I am used to the idea that if I leave a glass of water on the kitchen counter overnight, it will all be gone in the morning; this was amply demonstrated to me in nursery school when I was about three years old. But to actually see it happening as I watched was a new experience.

I had no idea evaporation was so speedy.


[Other articles in category /physics] permanent link

Fri, 13 Jul 2007

Foundations of Mathematics in the early 20th Century

Last fall I read a bunch of books on logic and foundations of mathematics that had been written around the end of the 19th and beginning of the 20th centuries. I also read some later commentary on this work, by people like W. V. O. Quine. What follows are some notes I wrote up afterwards.

The following are only my vague and uninformed impressions. They should not be construed as statements of fact. They are also poorly edited.

1. Frege and Peano were the pioneers of modern mathematical logic. All the work before Peano has a distinctly medieval flavor. Even transitionary figures like Boole seem to belong more to the old traditions than to the new. The notation we use today was all invented by Frege and Peano. Frege and Peano were the first to recognize that one must distinguish between x and {x}.

(Added later: I finally realized today what this reminds me of. In physics, there is a fairly sharp demarcation between classical physics (pre-1900, approximately) and modern physics (post-1900). There was a series of major advances in physics around this time, in which the old ideas, old outlooks, and old approaches were swept away and replaced with the new quantum theories of Planck and Einstein, leaving the field completely different than it was before. Peano and Frege are the Planck and Einstein of mathematical logic.)

Order
The Frege Reader
The Frege Reader
with kickback
no kickback

2. Russell's paradox has become trite, but I think we may have forgotten how shocked and horrified everyone was when it first appeared. Some of the stories about it are hair-raising. For example, Frege had published volume I of his Grundgesetze der Arithmetik ("Basic Laws of Arithmetic"). Russell sent him a letter as volume II was in press, pointing out that Frege's axioms were inconsistent. Frege was able to add an appendix to volume II, including a heartbreaking note:

"Hardly anything more unwelcome can befall a scientific writer than that one of the foundations of his edifice be shaken after the work is finished. I have been placed in this position by a letter of Mr Bertrand Russell just as the printing of the second volume was nearing completion..."

I hope nothing like this ever happens to any of my dear readers.

The struggle to figure out Russell's paradox took years. It's so tempting to think that the paradox is just a fluke or a wart. Frege, for example, first tried to fix his axioms by simply forbidding (xx). This, of course, is insufficient, and the Russell paradox runs extremely deep, infecting not just set theory, but any system that attempts to deal with properties and descriptions of things. (Expect a future blog post about this.)

3. Straightening out Russell's paradox went in several different directions. Russell, famously, invented the so-called "Theory of Types", presented as an appendix to Principia Mathematica. The theory of types is noted for being complicated and obscure, and there were several later simplifications. Another direction was Zermelo's, which suffers from different defects: all of Zermelo's classes are small, there aren't very many of them, and they aren't very interesting. A third direction is von Neumann's: any sets that would cause paradoxes are blackballed and forbidden from being elements of other sets.

To someone like me, who grew up on Zermelo-Fraenkel, a term like "(z = complement({w}))" is weird and slightly uncanny.

(Addendum 20060110: Quine's "New Foundations" program is yet another technique, sort of a simplified and streamlined version of the theory of types. Yet another technique, quite different from the others, is to forbid the use of the ∼ ("not") operator in set comprehensions. This last is very unusual.)

Order
Principia Mathematica (through section 56)
Principia Mathematica (through section 56)
with kickback
no kickback

4. Notation seems to have undergone several revisions since the first half of the 20th Century. Principia Mathematica and other works use a "dots" notation instead of or in additional to using parentheses for grouping. For example, instead of writing "((a + b) × c) + ((e + f) × g)", one would write "a + bc :+: e + fg". (This notation was invented by—guess who?—Peano.) This takes some getting used to when you have not seen it before. The dot notation seems to have fallen completely out of use. Last week, I thought it had technical advantages over parentheses; now I am not sure.

The upside-down-A (∀) symbol meaning "for each" is of more recent invention than is the upside-down-E (∃) symbol meaning "there exists". Early C20 would write "∃z:P(z)" as "(∃z)P(z)" but would write "∀z: P(z)" as simply "(z)P(z)".

The turnstile symbol $$\vdash$$ is Russell and Whitehead's abbreviation of the elaborate notation of Frege's Begriffschrift. The Begriffschrift notation was essentially annotated abstract syntax trees. The root of the tree was decorated with a vertical bar to indicate that the statement was asserted to be true. When you throw away the tree, leaving only the root with its bar, you get a turnstile symbol.

The ∨ symbol is used for disjunction, but its conjunctive counterpart, the ∧, is not used. Early C20 logicians use a dot for conjunction. I have been told that the ∨ was chosen by Russell and Whitehead as an abbreviation for the Latin vel = "or". Quine says that the $$\sim$$ denotes logical negation because of its resemblance to the letter "N" (for "not"). Incidentally, Quine also says that the ↓ that is sometimes used to mean logical nor is simply the ∨ with a vertical slash through it, analogous to ≠.

An ι is prepended to an expression x to denote the set that we would write today as {x}. The set { u : P(u) } of all u such that P(u) is true is written as ûP. Peter Norvig says (in Paradigms of Artificial Intelligence Programming) that this circumflex is the ultimate source of the use of "lambda" for function abstraction in Lisp and elsewhere.

5. (Addendum 20060110: Everyone always talks about Russell and Whitehead's Principia Mathematica, but it isn't; it's Whitehead and Russell's. Addendum 20070913: In a later article, I asked how and when Whitehead lost top billing in casual citation; my conclusion was that it occurred on 10 December, 1950.)

6. (Addendum 20060116: The ¬ symbol is probably an abbreviated version of Frege's notation for logical negation, which is to attach a little stem to the underside of the branch of the abstract syntax tree that is to be negated. The universal quantifier notation current in Principia Mathematica, to write (x)P(x) to mean that P(x) is true for all x, may also be an adaptation of Frege's notation, which is to put a little cup in the branch of the tree to the left of P(x) and write x in the cup.


[Other articles in category /math] permanent link

Thu, 12 Jul 2007

New York tourism
Anil Dash recently blogged about touristy stuff in New York that you should skip. I grew up in New York, so I know something about this.

Top of Anil's list: the Statue of Liberty. He advises taking the Staten Island Ferry instead. I couldn't agree more. The Statue is great, but it's just as great seen from a distance, and you get a superb view of it from the Ferry. The Ferry is cheap (Anil says it's free; it was fifty cents last time I took it) and the view of lower Manhattan is unbeatable.

Similarly, you should avoid the Circle Line, which is a boat trip all the way around Manhattan Island. That sounds good, but it takes all day and you spend a lot of it cruising the not-so-scenic Harlem River. The high point of the trip is the view of lower Manhattan and the harbor. You can get the best parts of the Circle Line trip by taking the Staten Island Ferry, which is much cheaper and omits the dull bits.

Ten years ago I would have said to skip the World Trade Center in favor of the Empire State Building. Well, so much for that suggestion.

Anil says to skip Katz's and the Carnegie Deli, that they're tourist traps. I've never been to Katz's. I would not have advised skipping the Carnegie. I have not been there since 1995, so my view may be out of date, and the place may have changed. But in 1995 I would have said that although it is indeed a tourist trap, the pastrami sandwich is superb nevertheless. At no time, however, would I have advised anyone to eat anything else from there. Get the sandwich and eat it in the comfort of your hotel room, perhaps. But quickly, before it gets cold.

Also in the "go there but only eat one thing" department is Junior's Restaurant, at (I think) Atlantic and De Kalb avenues in Brooklyn. Now here's the thing about Junior's: their cheesecake is justly famous. They guarantee it. It is not your usual guarantee. A typical guarantee would be that if you are not happy with the cheesecake, they will refund your money. That is not Junior's guarantee. No. Junior's guarantees your money back unless their cheesecake is the best you have ever eaten.

Lorrie and I once ordered a cheesecake from Junior's. They ship it overnight, packed in dry ice. Our order was delayed in transit; we called the next day to ask where it was. They apologized and immediately overnighted us a second cheesecake, free, with no further discussion. The next day the two cheesecakes arrived in the mail. Both of them were the best cheesecake I have ever eaten.

But I once went to have dinner at Junior's. This was a mistake. Their cheesecake is so stupendous, I thought, how could their other food possibly fail? As usual, the cheesecake was the best I have ever eaten. But dinner? Not so hot. Do go to Junior's. You don't even have to schlep out to Atlantic Avenue, since they have opened restaurants in Times Square and at Grand Central Station. Get the cheesecake. But eat dinner somewhere else.

Anil says not to eat in the goddamn Olive Garden, and of course he is right. What on earth is the point of going to New York, food capital of this half of the Earth, and eating in the goddamn Olive Garden? You could have done that in Dubuque or Tallahassee or whatever crappy Olive-Garden-loving burg you came from.

If you don't know where to eat in New York, here's my advice: Take the subway to 42nd street, get out, and walk to 9th Avenue. Choose a side of the street by coin flip. Walk north on 9th avenue. Make a note of every interesting-seeming restaurant you pass. After three blocks, you will have passed at least ten interesting-seeming restaurants. Walk back to the most interesting-seeming one and go in, or select one at random. I promise you will have a win, probably a big win. That stretch of 9th Avenue is a paradise of inexpensive but superb restaurants.

I have played the 9th Avenue game many times and it has never failed.

Speaking of "things to skip", I suggest skipping the giant Times Square New Year's Eve celebration, unless you are a pickpocket, in which case you should get there early. Instead, have dinner on 9th Avenue. As you pass each cross-street walking down 9th Avenue, you will be able to see the Times Square crowd two blocks east, and you can pause a moment to think how clever you are to not to be part of it; feeling smugly superior to the writhing mass of humanity is an authentically New York experience. Then have an awesome dinner on 9th Avenue, and take the subway home.

Anil's whole series is pretty good, and as a native New Yorker I found little to disagree with. But I think he may be a little misleading when he says "the natives are friendly and helpful." I would say not. Neither are they unfriendly or unhelpful. What they mostly are, in my experience, is brusque and in a hurry. They will not go out of their way to abuse, harass, or ridicule you; nor will they go out of their way to advise or assist you. The New Yorkers' outlook on the world is that they have important business to attend to, and so, presumably, do you, and everything will run smoothly as long as everyone just stays out of each others' way and attends to their own important business.

In Boston, people will take you personally. I was once thrown out of a liquor store in Boston for daring to ask for a bottle of rye in a manner that the proprietor found offensive. This would never happen in New York. New Yorkers don't have time to be offended by your stupid demands, and they will not throw you out, because they want your money, and if dealing with your stupid demands is what they have to do to get it, well, they will just deal with your stupid demands as quickly as possible. A New York liquor store owner is not in the business of getting offended, and he has more important things to do than to throw you out. He is in the business of taking your money, and if he throws you out, it is because you are getting in the way of his next customer and preventing him from taking his money. Most likely, if you ask for rye, the New York liquor store owner will take your money and give you the rye.

There is a story about Hitler and Goebbels having an argument, with Hitler arguing that the Jews were too inferior to pose any sort of threat, and Goebbels disputing with him, saying that Jews are devious and cunning. To prove his point, Goebbels takes Hitler to a Jewish-run hardware and sundries store and asks the proprietor for a left-handed teapot. The proprietor hesitates a moment, says "let me check in the back room," and returns carrying a teapot in his left hand. "Yes," he says, "I had just one left." As Goebbels and Hitler leave the shop with their left-handed teapot, Goebbels says "I told you the Jews were cunning." Hitler replies "What's so cunning about having one left?"

A Bostonian would have told those two assholes where they could stick their left-handed teapot. That Jew emigrated from Germany, and he did not go to Boston. He went to New York, as did his fifty devious cousins.

But I digress.

In some cities I have visited, there is no convention about which side of the subway stairs are for going up and which are for going down. People just go up whichever side they feel like. In New York, you always travel on the right-hand side of the stairs. Everyone does this, because everyone knows that if they don't they will just get in the way and hold everyone up, including themselves. They have no time for this disorganized nonsense in which people go up whatever side of the stairs suits them.

New Yorkers do not stop and stand in doorways. When New Yorkers need to open their umbrellas, they step aside, and do it out of the way.

New Yorkers are orderly queuers. Disorganized queuing just wastes everyone's time. You don't want to waste everyone's time, do you? So get in line and shut the hell up!

Here in Philadelphia, we waste a lot of time trying to flag down cabs that turn out to be full. New Yorkers would never tolerate such slack management. In New York, taxicabs have a lamp on top that is wired to the taximeter; it lights up when the taxi is empty. That is good business for drivers, for riders, for everyone. I like Philadelphia well enough to have lived here for seventeen years, but it's no New York, let me tell you.

Hong Kong, on the other hand, is a very satisfactory New York. A few years back I visited Hong Kong, food capital of the other half of the Earth, on business, and loved it there. Not least because of the food. The Cantonese are the best cooks in the world, cooks so gifted and brilliant that people all over the world line up on the weekends to eat Cantonese-style garbage, and then come back next weekend to eat it again, because Cantonese garbage, which they call dim sum, but if you think about it for a minute you will realize that dim sum is the week's leftovers, served up in a not-too-subtle disguise, dim sum is more delicious than other cuisines' delicacies. And Hong Kong has the best Cantonese food in the world.

People had warned me beforehand that the Hongkongians were known for being brusque and rude. And that is what I found. Several times in Hong Kong I called up someone or other to try to get something done, and the conversation went roughly like this: I would start my detailed explanation of what I wanted, and why, and the person on the other end of the phone would cut me off mid-sentence, saying something like "You need x; I do y. OK? OK! <click>" and that was the end of it.

As a New Yorker, I recognized immediately what was going on. Brusque, yes, but not rude. I knew that the person on the other end of the phone was thinking that their time was valuable, that I presumably considered my own time valuable, and that we would both be best served if each of us wasted as little of our valuable time as possible in idle chitchat. New Yorkers are just like that too. I gather some people are offended by this behavior, and want the person on the phone to be polite and friendly. I just want them to shut up and do the thing I want done, and in Hong Kong that is what I got.

So if you are a tourist in New York, please try to remember: New Yorkers may appear to be trying to get rid of you as quickly as they can, and if it seems that way, it is probably because they are trying to get rid of you as quickly as they can. But they are doing it because they are trying to help, because they have your best interests at heart. And also because they want to get rid of you as quickly as they can.


[Other articles in category /food] permanent link

Sun, 24 Jun 2007

Do you dream in color?
People have occasionally asked me whether I dreamt in color or on black-and-white, by which I suppose they meant grayscale. This question was strange to me the first time I heard it, because up to then it had not occurred to me that anyone did not dream in color. I still find it strange, and I had to do a Google search to verify that there really are people who claim not to dream in color.

One time, when I replied that I did dream in color, my interlocutor asked me if I was sure: perhaps I dreamt in black and white, but only remembered it as being in color later.

I am sure I dream in color, because on more than one occasion I have had discussions in dreams about colors of objects. I can't remember any examples right now, but it was something like this: "Give me the red apple." "Okay, here." "That is not the red apple, that is the green apple!" And then I looked and saw that the apple I had thought was red was really green.

One could still argue that I wasn't really dreaming in color, that it only seemed like that, or something. It's a delicate philosophical point. One could also argue that I didn't have any dream at all, I only thought I did after I woke up. I suppose the only refutations of such an argument either appeal to neurology or involve a swift kick in the pants.

And then suppose I have a dream in which I take LSD and have marvelous hallucinations. Did I really have hallucinations? Or did I only dream them? If I dream that I kill someone, we agree that it wasn't real, that a dream murder is not a real murder; it is only in your head. But hallucinations, by definition, are only in your head even when they are real, so don't dream hallucinations have as much claim to reality as waking hallucinations?

One might argue that dreamt LSD hallucinations are likely to be qualitatively very different from real LSD hallucinations—less like real LSD hallucinations, say, and more like, well, dreams. But this only refutes the claim that the dream hallucinations were LSD hallucinations. And nobody was going to claim that they were LSD hallucinations anyway, since no actual LSD was involved. So this doesn't address the right question.

Stickier versions of the same problem are possible. For example, suppose I give Bill a little piece of paper and tell him it is impregnated with LSD. It is not, but because of the placebo effect, Bill believes himself to be having an LSD trip and reports hallucinations. There was no LSD involved, so the hallucinations were only imaginary. But even real hallucinations are only imaginary. Are we really justified in saying that Bill is mistaken, that he did not actually hallucinate, but only imagined that he did? That seems like a very difficult position to defend.

I seem to have wandered from the main point, which is that I had another dream last night that supports my contention that I dream in color. I was showing my friend Peter some little homunculi that had been made long ago from colored pipe cleaners, shiny paper, and sequins by my grandmother's friend Kay Seiler. Originally there had been ten of these, but in the dream I had only five. When my grandmother had died, my sister and I had split the set, taking five each. In place of the five originals I was missing, I had five copies, which were identifiable as such because they were in grayscale. Presumably my sister had grayscale copies of the originals I retained. I explained this to Peter, drawing his attention to the five full-color homunculi and the five grayscale ones.

So yes, barring philosophical arguments that I think deserve a kick in the pants, I am sure that I dream in color.


[Other articles in category /brain] permanent link

Thu, 21 Jun 2007

Square triangular numbers
A while back I made the erroneous assertion that no numbers are both square and triangular. As I noted in a followup, this is a rather stupid thing to say, since both 0 and 1 are obvious counterexamples. (36 is a nontrivial counterexample.) Also, a few years before I had actually investigated this very question and had determined that the set of such numbers is infinite. Whoops.

I no longer remember how I solved the problem the first time around, but I was tinkering around with it today and came up with an approach that I think is instructive, or at least interesting.

We want to find non-negative integers a and b such that ½(a2 + a) = b2. Or, equivalently, we want a and b such that √(a2 + a) = b√2.

Now, √(a2 + a) is pretty nearly a + ½. So suppose we could find p and q with a + ½ = b·p/q, and p/q a bit larger than √2. a + ½ is a bit too large to be what we want on the left, but p/q is a bit larger than what we want on the right too. Perhaps the fudging on both sides would match up, and we would get √(a2 + a) = b√2 anyway.

If this magic were somehow to occur, then a and b would be the numbers we wanted.

Finding p/q that is a shade over √2 is a well-studied problem, and one of the things I have in my toolbox, because it seems to come up over and over in the solution of other problems, such as this one. It has interesting connections to several other parts of mathematics, and I have written about it here before.

The theoretical part of finding p/q close to √2 is some thing about continued fractions that I don't want to get into today. But the practical part is very simple. The following recurrence generates all the best rational approximations to √2; the farther you carry it, the better the approximation:

p0 = 1 q0 = 1
pi+1 = pi + 2qi qi+1 = pi + qi
This gives us the following examples:

p q p/q
1 1 1.0
3 2 1.5
7 5 1.4
17 12 1.416666666666667
41 29 1.413793103448276
99 70 1.414285714285714
239 169 1.414201183431953
577 408 1.41421568627451
1393 985 1.414213197969543
3363 2378 1.41421362489487
And in all cases p2 - 2q2 = ±1.

Now, we want a + ½ = b·p/q, or equivalently (2a + 1)/2b = p/q. This means we can restrict our attention to the rows of the table that have q even. This is a good thing, because we need p/q a bit larger than √2, and those are precisely the rows with even q. The rows that have q odd have p/q a bit smaller than √2, which is not what we need. So everything is falling into place.

Let's throw away the rows with q odd, put a = (p - 1)/2 and b = q/2, and see what we get:

pqab½(a2+a) = b2
3 2 1 11
17 12 8 636
99 70 49 351225
577 408 288 20441616
3363 2378 1681 11891413721
Lo and behold, our wishful thinking about the fudging on both sides canceling out has come true, and an infinite set of solutions just pops right out.

I have two points to make about this. One is that I have complained in the past about mathematical pedagogy, how the convention is to come up with some magic-seeming guess ahead of time, as when pulling a rabbit from a hat, and then at the end it is revealed to be the right choice, but what really happened was that the author worked out the whole thing, then saw at the end what he would need at the beginning to make it all work, and went back and filled in the details.

That is not what happened here. My apparent luck was real luck. I really didn't know how it was going to come out. I was really just exploring, trying to see if I could get some insight into the answer without necessarily getting all the way there; I thought I might need to go back and do a more careful analysis of the fudge factors, or something. But sometimes when you go exploring you stumble on the destination by accident, and that is what happened this time.

The other point I want to make is that I've written before about how a mixture of equal parts of numerical sloppiness and algebraic tinkering, with a dash of canned theory, can produce useful results, in a sort of alchemical transmutation that turns base metals into gold, or at least silver. Here we see it happen again.


[Other articles in category /math] permanent link

Sun, 17 Jun 2007

More sawed-off shotguns

[ Note: because this article is in the oops section of my blog, I intend that you understand it as a description of a mistake that I have made. ]

Abhijit Menon-Sen wrote to me to ask for advice in finding the smallest triangular number that has at least 500 divisors. (That is, he wants the smallest n such that both n = (k2 + k)/2 for some integer k and also ν(n) ≥ 500, where ν(n) is the number of integers that divide n.) He said in his note that he believed that brute-force search would take too long, and asked how I might trim down the search.

The first thing that occurred to me was that ν is a multiplicative function, which means that ν(ab) = ν(a)ν(b) whenever a and b are relatively prime. Since n and n-1 are relatively prime, we have that ν(n(n-1)) = ν(n)·ν(n-1), and so if T is triangular, it should be easy to calculate ν(T). In particular, either n is even, and ν(T) = ν(n/2)·ν(n-1), or n is odd, and ν(T) = ν(n)·ν((n-1)/2).

So I wrote a program to run through all possible values of n, calculating a table of ν(n), and then the corresponding ν(n(n-1)/2), and then stopping when it found one with sufficiently large ν.

        my $N = 1;
        my $max = 0;
        while (1) {
            my $n   = $N % 2 ? divisors($N) : divisors($N/2);
            my $np1 = $N % 2 ? divisors(($N+1)/2) : divisors($N+1);
            if ($n * $np1 > $max) {
                $max = $n * $np1;
                print "N=$N; T=", $N*($N+1)/2, "; nd()=$max\n";
            }
            last if $max >= 500;
            $N++;
        }
There may be some clever way to quickly calculate ν(n) in general, but I don't know it. But if you have the prime factorization of n, it's easy: if n = p1a1p2a2... then ν(n) = (a1 + 1)(a2 + 1)... . This is a consequence of the multiplicativity of ν and the fact that ν(pn) is clearly n+1. Since I expected that n wouldn't get too big, I opted to factor n and to calculate ν from the prime factorization:

        my @nd;
        sub divisors {
            my $n = shift;
            return $nd[$n] if $nd[$n];
            my @f = factor($n);
            my $ND = 1;
            my $cur = 0;
            my $curct = 0;
            while (@f) {
                my $next = shift @f;
                if ($next != $cur) {
                    $cur = $next;
                    $ND *= $curct+1;
                    $curct = 1;
                } else {
                    $curct++;
                }
            }
            $ND *= $curct+1;
            return $ND;
        }
Unix comes with a factor program that factors numbers pretty quickly, so I used that:
        sub factor {
            my $r = qx{factor $_[0]};
            my @f = split /\s+/, $r;
            shift @f;
            return @f;
        }
This found the answer, 76,576,500, in about a minute and a half. (76,576,500 = 1 + 2 + ... + 12,375, and has 576 factors.) I sent this off to Abhijit.

I was rather pleased with myself, so I went onto IRC to boast about my cleverness. I posed the problem, and rather than torment everyone there with a detailed description of the mathematics, I just said that I had come up with some advice about how to approach the problem that turned out to be good advice.

A few minutes later one of the gentlemen on IRC, who goes by "jeek", (real name T.J. Eckman) asked me if 76,576,500 was the right answer. I said that I thought it was and asked how he'd found it. I was really interested, because I was sure that jeek had no idea that ν was multiplicative or any of that other stuff. Indeed, his answer was that he used the simplest possible brute force search. Here's jeek's program:

        $x=1; $y=0; 
        while(1) { 
          $y += $x++; $r=0; 
          for ($z=1; $z<=($y ** .5); $z++) { 
            if (($y/$z) == int($y/$z)) { 
              $r++; 
              if (($y/$z) != ($z)) { $r++; } 
           }
          } 
          if ($r>499) {print "$y\n";die}
        }
(I added whitespace, but changed nothing else.)

In this program, the variable $y holds the current triangular number. To calculate ν(y), this program just counts $z from 1 up to √y, incrementing a counter every time it discovers that z is a divisor of y. If the counter exceeds 499, the program prints y and stops. This takes about four and a half minutes.

It takes three times as long, but uses only one-third the code. Beginners may not see this as a win, but it is a huge win. It is a lot easier to reduce run time than it is to reduce code size. A program one-third the size of another is almost always better—a lot better.

In this case, we can trim up some obvious inefficiencies and make the program even smaller. For example, the tests here can be omitted:

              if (($y/$z) != ($z)) { $r++; }
It can yield false only if y is the square of z. But y is triangular, and triangular numbers are never square. And we can optimize away the repeated square root in the loop test, and use a cheaper and simpler $y % $z == 0 divisibility test in place of the complicated one.

        while(1) { 
          $y += $x++; $r=0;
          for $z (1 .. sqrt($y)) {
            $y % $z == 0 and $r+=2; 
          }
          if ($r>499) {print "$x $y\n";die}
        }
The program is now one-fifth the size of mine and runs in 75 seconds. That is, it is now smaller and faster than mine.

This shows that jeek's approach was the right one and mine was wrong, wrong, wrong. Simple programs are a lot easier to speed up than complicated ones. And I haven't even consider the cost of the time I wasted writing my complicated program when I could have written Jeek's six-liner that does the same thing.

So! I had to write back to my friend to tell him that my good advice was actually bad advice.

The sawed-off shotgun wins again!

[ Addendum 20070405: Robert Munro pointed out an error in the final version of the program, which I have since corrected. ]

[ Addendum 20070405: I said that triangular numbers are never square, which is manifestly false, since 1 is both, as is 36. I should have remembered this, since some time in the past three years I investigated this exact question and found the answer. But it hardly affects the program. The only way it could cause a failure is if there were a perfect square triangular number with exactly 499 factors; such a number would be erroneously reported as having 500 factors instead. But the program reports a number with 576 factors instead. Conceivably this could actually be a misreported perfect square with only 575 factors, but regardless, the reported number is correct. ]

[ Addendum 20060617: I have written an article about triangular numbers that are also square. ]


[Other articles in category /oops] permanent link

Sat, 16 Jun 2007

Fanny Trollope arrives in America
More than a year ago I mentioned Fanny Trollope's book Domestic Manners of the Americans, which I have at long last checked out of the library and begun to read. I am only at page 40 or so, but it is easy going, and entertaining.

Trollope's book begins with her arrival from Europe in New Orleans. I was drawn in early on by the following passage, which appears on page 5:

The land is defended from the encroachments of the river by a high embankment which is called the Levée; without which the dwellings would speedily disappear, as the river is evidently higher than the banks would be without it. . . . She was looking so mighty, and so unsubdued all the time, that I could not help fancying she would some day take the matter into her own hands again, and if so, farewell to New Orleans.

The book was published in 1832.

It is available online.


[Other articles in category /book] permanent link

Thu, 14 Jun 2007

Harry Potter and the Goblet of Fire
I have not been too impressed with the Harry Potter books. I read them all, one at a time, on airplanes. They are good for this because they are fat, undemanding, and readily available in airport bookshops for reasonable prices. In a lot of ways they are badly constructed, but there is really no point in dwelling on their flaws. The Potter books have been widely criticized already from all directions, and so what? People keep buying them.

But The Goblet of Fire has been bothering me for years now, because its plot is so very stupid. I am complaining about it here in my blog because it continues to annoy me, and I hope to forget about it after I write this. The rest of this article will contain extensive spoilers, and I will assume that you either know it all already or that you don't care.

The bad guys want to kill Harry Potter, the protagonist. The Triwizard Tournament is being held at Harry's school. In the tournament, the school champions must overcome several trials, the last of which is to race through a maze and grab the enchanted goblet at the center of the maze. The bad guys' plan is this: they will enter Harry in the tournament. They will interfere subtly in the tournament, to ensure that Harry is first to lay hands on the goblet. They will enchant the goblet so that it is a "portkey", and whoever first touches it will be transported into their evil clutches.

They need an evil-doer on the spot, to interfere in the competition in Harry's favor; if he is eliminated early, or fails to touch the goblet first, all their plotting will be for naught. So they abduct and imprison Mad-eye Moody, a temporary faculty member and a famous capturer of evil-doers, and enchant one of their own to impersonate him for the entire school year.

The badness of this plan is just mind-boggling. Moody is a tough customer. If they fail to abduct him, or if he escapes his year-long captivity, their plans are in the toilet. If the substitution is detected, their plans are in the toilet. Their fake Moody will be teaching a class in "Defense Against the Dark Arts", a subject in which the real Moody has real expertise that the substitute lacks; the substitute somehow escapes detection on this front. For several months the fake Moody will be eating three meals a day with a passel of witches and wizards who are old friends with the real Moody, and among whom is Albus Dumbledore, who supposedly is not a complete idiot; the substitute somehow escapes detection on this front as well.

Even with the substitution accomplished, the bad guys' task is far from easy. Harry procrastinates everything he can and it's all they can do to arrange that he is not eliminated from the tournament. None of the other champions are either, and the villains have a tough problem to make sure that he is first through the maze.

Here is an alternative plan, which apparently did not occur to the fearsome Lord Voldemort: instead of making the Goblet of Fire into a portkey, he should enchant a common object, say a pencil. We know this is possible, since it has been explicitly established that absolutely any object can be a portkey, and the first instance of one that we see appears to be an abandoned boot. Then, since fake Moody is teaching Harry's class, sometime during the first week of the term he should ask Harry to stay behind on some pretext, and then say "Oh, Harry, would you please pass me that pencil over there?" After Harry is dead, fake Moody can disappear. A little thought will no doubt reveal similar plans that involve no substitutions or imprisonments: send Harry a booby-trapped package in the mail, or enchant his socks, or something of the sort.

In fact, they do something like this in one of the later books; they sell another character, I think Ginny Weasley, some charm that puts her under their control. This is a flub already, because they should have sold it to Harry instead—duh—and then had him kill himself. Or they could have sold him a portkey. Or an exploding candy. But I don't want to belabor the point.

Normally I have no trouble suspending my disbelief in matters like this. I can forgive a little ineptness on the part of the master schemers, because I am such an inept schemer that I usually don't notice. When evil plots seem over-elaborate and excessively risky to me, I just imagine that it seems that way because evil plots are so far outside my area of expertise, and read on. But in The Goblet of Fire I couldn't do this. My enjoyment of the book was disrupted by the extreme ineptness of the evil scheme.

One of Rowling's recurring themes is the corruption and ineptness of the ostensibly benevolent government. But perhaps this incompetence is a good thing. If the good guys had been less incompetent in the past, the bad guys might have had to rise to the occasion, and would have stomped Harry flat in no time. Lulled into complacency by years of ineffective opposition, they become so weak and soft that they are defeated by a gang of teenagers.

Okay, that's off my chest now. Thanks for your forbearance.


[Other articles in category /book] permanent link

Wed, 13 Jun 2007

How to calculate binomial coefficients
The binomial coefficient $n\choose k$ is usually defined as:

$${n\choose k} = {n!\over k!(n-k)!}$$

This is a fine definition, brief, closed-form, easy to prove theorems about. But these good qualities seduce people into using it for numerical calculations:

        fact 0 = 1
        fact (n+1) = (n+1) * fact n

        choose n k = (fact n) `div` ((fact k)*(fact (n-k)))
(Is it considered bad form among Haskellites to use the n+k patterns? The Haskell Report is decidedly ambivalent about them.)

Anyway, this is a quite terrible way to calculate binomial coefficients. Consider calculating $100\choose 2$, for example. The result is only 4950, but to get there the computer has to calculate 100! and 98! and then divide these two 150-digit numbers. This requires the use of bignums in languages that have bignums, and causes an arithmetic overflow in languages that don't. A straightforward implementation in C, for example, drops dead with an arithmetic exception; using doubles instead, it claims that the value of $100\choose 2$ is -2147483648. This is all quite sad, since the correct answer is small enough to fit in a two-byte integer.

Even in the best case, $2n\choose n$, the result is only on the order of 4n, but the algorithm has to divide a numerator of about 4nn2n by a denominator of about n2n to get it.

A much better way to calculate values of $n\choose k$ is to use the following recurrence:

$${n+1\choose k+1} = {n+1\over k+1}{n\choose k}$$

This translates to code as follows:

        choose n 0 = 1
        choose 0 k = 0
        choose (n+1) (k+1) = (choose n k) * (n+1) `div` (k+1)
This calculates $8\choose 4$ as ${5\over1}{6\over2}{7\over3}{8\over4} $. None of the intermediate results are larger than the final answer.

An iterative version is also straightforward:

        unsigned choose(unsigned n, unsigned k) {
          unsigned r = 1;
          unsigned d;
          if (k > n) return 0;
          for (d=1; d <= k; d++) {
            r *= n--;
            r /= d;
          }
          return r;
        }
This is speedy, and it cannot cause an arithmetic overflow unless the final result is too large to be represented.

It's important to multiply by the numerator before dividing by the denominator, since if you do this, all the partial results are integers and you don't have to deal with fractions or floating-point numbers or anything like that. I think I may have mentioned before how much I despise floating-point numbers. They are best avoided.

I ran across this algorithm last year while I was reading the Lilavati, a treatise on arithmetic written about 850 years ago in India. The algorithm also appears in the article on "Algebra" from the first edition of the Encyclopaedia Britannica, published in 1768.

So this algorithm is simple, ancient, efficient, and convenient. And the problems with the other algorithm are obvious, or should be. Why isn't this better known?

[ Addendum 20070613: There is a followup article to this one. ]


[Other articles in category /math] permanent link

How to calculate binomial coefficients, again
Yesterday's article about how to calculate binomial coefficients was well-received. It was posted on Reddit, and to my surprise and gratification, the comments were reasonably intelligent. Usually when a math article of mine shows up on Reddit, all the megacretins come out of the woodwork to say what an idiot I am, and why don't I go back to school and learn basic logic.

A couple of people pointed out that, contrary to what I asserted, the algorithm I described can in fact overflow even when the final result is small enough to fit in a machine word. Consider $8\choose 4$ for example. The algorithm, as I wrote it, calculates intermediate values 8, 8, 56, 28, 168, 56, 280, 70, and 70 is the final answer. If your computer has 7-bit machine integers, the answer (70) will fit, but the calculation will overflow along the way at the 168 and 280 steps.

Perhaps more concretely, $35\choose11$ is 417,225,900, which is small enough to fit in a 32-bit unsigned integer, but the algorithm I wrote wants to calculate this as $35{34\choose10}\over11$, and the numerator here is 4,589,484,900, which does not fit.

One Reddit user suggested that you can get around this as follows: To multiply r by a/b, first check if b divides r. If so, calculate (r/ba; otherwise calculate r·(a/b). This should avoid both overflow and fractions.

Unfortunately, it does not. A simple example is ${14\choose4} = {11\over1}{12\over2}{13\over3}{14\over4}$. After the first three multiplications one has 286. One then wants to multiply by 14/4. 4 does not divide 286, so the suggestion calls for multiplying 286 by 14/4. But 14/4 is 3.5, a non-integer, and the goal was to use integer arithmetic throughout.

Order
The Art of Computer Programming: Volume 2, Seminumerical Algorithms
The Art of Computer Programming: Volume 2, Seminumerical Algorithms
with kickback
no kickback
Fortunately, this is not hard to fix. Say we want to multiply r by a/b without overflow or fractions. First let g be the greatest common divisor of r and b. Then calculate (r/g) · (a/(b/g)). In the example above, g is 2, and we calculate (286/2) · (14/2) = 143 · 7; this is the best we can do.

I haven't looked, but it is hard to imagine that Volume II of Knuth doesn't discuss this in exhaustive detail, including all the stuff I just said, plus a bunch of considerations that hadn't occurred to any of us.

A few people also pointed out that you can save time when n > m/2 by calculating $m\choose m-n$ instead of $m \choose n$. For example, instead of calculating $100\choose98$, calculate $100\choose 2$. I didn't mention this in the original article because it was irrelevant to the main point, and because I thought it was obvious.


[Other articles in category /math] permanent link

Software archaeology
For appropriate values of "everyone", everyone knows that Unix files do not record any sort of "creation time". A fairly frequently asked question in Unix programming forums, and other related forums, such as Perl programming forums, is how to get the creation date of a file; the answer is that you cannot do that because it is not there.

This lack is exacerbated by several unfortunate facts: creation times are available on Windows systems; the Unix inode contains three timestamps, one of which is called the "ctime", and the "c" is suggestive of the wrong thing; Perl's built-in stat function overloads the return value to return the Windows creation time in the same position (on Windows) as it returns the ctime (on Unix).

So we see questions like this one, which appeared this week on the Philadelphia Linux Users' Group mailing list:

How does one check and change ctime?
And when questioned as to why he or she wanted to do this, this person replied:

We are looking to change the creation time. From what I understand, ctime is the closest thing to creation time.
There is something about this reply that irritates me, but I'm not quite sure what it is. Several responses come to mind: "Close" is not sufficient in system programming; the ctime is not "close" to a creation time, in any sense; before you go trying to change the thing, you ought to do a minimal amount of research to find out what it is. It is a perfect example of the Wrong Question, on the same order as that poor slob all those years ago who wanted to know how to tell if a file was a hard link or a soft link.

But anyway, that got me thinking about ctimes in general, and I did some research into the history and semantics of the thing, and made some rather surprising discoveries.

One good reference for the broad outlines of early Unix is the paper that Dennis Ritchie and Ken Thompson published in Communications of the ACM in 1974. This was updated in 1978, but the part I'm quoting wasn't revised and is current to 1974. Here is what it has to say about the relevant parts of the inode structure:

IV. IMPLEMENTATION OF THE FILE SYSTEM

... The entry found thereby (the file's i-node) contains the description of the file:
... time of creation, last use, and last modification
An error? I don't think so. Here is corroborating evidence, the stat man page from the first edition of Unix, from 1971:

NAME        stat -- get file status
SYNOPSIS    sys      stat; name; buf / stat = 18.
DESCRIPTION name points to a null-terminated string naming a file; buf is the
            address of a 34(10) byte buffer into which information is placed
            concerning the file. It is unnecessary to have any permissions at all
            with respect to the file, but all directories leading to the file
            must be readable.
            After stat, buf has the following format:
            buf, +1             i-number
            +2, +3              flags (see below)
            +4                  number of links
            +5                     user ID of owner size in bytes
            +6,+7            size in bytes
            +8,+9            first indirect block or contents block
            ...
            +22,+23             eighth indirect block or contents block
            +24,+25,+26,+27 creation time
            +28,+29, +30,+31 modification time
                +32,+33         unused
(Dennis Ritchie provides the Unix first edition manual; the stat page is in section 2.1.)

Now how about that?

When did the ctime change from being called a "creation time" to a "change time"? Did the semantics change too, or was the "creation time" description a misnomer? If I can't find out, I might write to Ritchie to ask. But this is, of course, a last resort.

In the meantime, I do have the source code for the fifth edition kernel, but it appears that, around that time (1975 or so), there was no creation time. At least, I can't find one.

The inode operations inside the kernel are defined to operate on struct inodes:

	struct inode {
		char    i_flag;
		char    i_count;
		int     i_dev;
		int     i_number;
		int     i_mode;
		char    i_nlink;
		char    i_uid;
		char    i_gid;
		char    i_size0;
		char    *i_size1;
		int     i_addr[8];
		int     i_lastr;
	} inode[NINODE];
The i_lastr field is what we would now call the atime. (I suppose it stands for "last read".) The mtime and ctime are not there, because they are not stored in the in-memory copy of the inode. They are fetched directly from the disk when needed.

We can see an example of this in the stat1 function, which is the backend for the stat and fstat system calls:

     1	stat1(ip, ub)
     2	int *ip;
     3	{
     4	        register i, *bp, *cp;
     5	
     6	        iupdat(ip, time);
     7	        bp = bread(ip->i_dev, ldiv(ip->i_number+31, 16));
     8	        cp = bp->b_addr + 32*lrem(ip->i_number+31, 16) + 24;
     9	        ip = &(ip->i_dev);
    10	        for(i=0; i<14; i++) {
    11	                suword(ub, *ip++);
    12	                ub =+ 2;
    13	        }
    14	        for(i=0; i<4; i++) {
    15	                suword(ub, *cp++);
    16	                ub =+ 2;
    17	        }
    18	        brelse(bp);
    19	}
ub is the user buffer into which the stat data will be deposited. ip is the inode structure from which most of this data will be copied. The suword utility copies a two-byte unsigned integer ("short unsigned word") from source to destination. This is done starting at the i_dev field (line 9), which effectively skips the two earlier fields, i_flag and i_count, which are internal kernel matters that are none of the user's business.

14 words are copied from the inode structure starting from this position, including the device and i-number fields, the mode, the link count, and so on, up through the addresses of the data or indirect blocks. (In modern Unixes, the stat call omits these addresses.) Then four words are copied out of the cp buffer, which has been read from the inode actually on the disk; these eight bytes are at position 24 in the inode, and ought to contain the mtime and the ctime. The question is, which is which? This simple question turns out to have a surprisingly complicated answer.

When an inode is modified, the IUPD flag is set in the i_flag member. For example, here is chmod, which modifies the inode but not the underlying data. On a modern unix system, we would expect this to update the ctime, but not the mtime. Let's see what it does in version 5:

     1	chmod()
     2	{
     3	        register *ip;
     4	
     5	        if ((ip = owner()) == NULL)
     6	                return;
     7	        ip->i_mode =& ~07777;
     8	        if (u.u_uid)
     9	                u.u_arg[1] =& ~ISVTX;
    10	        ip->i_mode =| u.u_arg[1]&07777;
    11	        ip->i_flag =| IUPD;
    12	        iput(ip);
    13	}
Line 10 is the important one; it sets the mode on the in-memory copy of the inode to the argument supplied by the user. Then line 11 sets the IUPD flag to indicate that the inode has been modified. Line 12 calls iput, whose principal job is to maintain the kernel's internal reference count of the number of file descriptors that are attached to this inode. When this number reaches zero, the inode is written back to disk, and discarded from the kernel's open file table. The iupdat function, called from iput, is the one that actually writes the modified inode back to the disk:

     1	iupdat(p, tm)
     2	int *p;
     3	int *tm;
     4	{
     5		register *ip1, *ip2, *rp;
     6		int *bp, i;
     7	
     8		rp = p;
     9		if((rp->i_flag&(IUPD|IACC)) != 0) {
    10			if(getfs(rp->i_dev)->s_ronly)
    11				return;
    12			i = rp->i_number+31;
    13			bp = bread(rp->i_dev, ldiv(i,16));
    14			ip1 = bp->b_addr + 32*lrem(i, 16);
    15			ip2 = &rp->i_mode;
    16			while(ip2 < &rp->i_addr[8])
    17				*ip1++ = *ip2++;
    18			if(rp->i_flag&IACC) {
    19				*ip1++ = time[0];
    20				*ip1++ = time[1];
    21			} else
    22				ip1 =+ 2;
    23			if(rp->i_flag&IUPD) {
    24				*ip1++ = *tm++;
    25				*ip1++ = *tm;
    26			}
    27			bwrite(bp);
    28		}
    29	}
What is going on here? p is the in-memory copy of the inode we want to update. It is immediately copied into a register, and called by the alias rp thereafter. tm is the time that the kernel should write into the mtime field of the inode. Usually this is the current time, but the smdate system call ("set modified date") supplies it from the user instead.

Lines 16–17 copy the mode, link count, uid, gid, "size", and "addr" fields from the in-memory copy of the inode into the block buffer that will be written back to the disk. Lines 18–22 update the atime if the IACC flag is set, or skip it if not. Then, if the IUPD flag is set, lines 24–25 write the tm value into the next slot in the buffer, where the mtime is stored. The bwrite call on line 27 commits the data to the disk; this results in a call into the appropriate device driver code.

There is no sign of updating the ctime field, but recall that we started this search by looking at what the chmod call does; it sets IUPD, which eventually results in the updating of the mtime field. So the mtime field is not really an mtime field as we now know it; it is doing the job that is now done by the ctime field. And in fact, the dump command predicates its decision about whether to dump a file on the contents of the mtime field. Which is really the ctime field. So functionally, dump is doing the same thing it does now.

It's possible that I missed it, but I cannot find the advertised creation time anywhere. The logical place to look is in the maknode function, which allocates new inodes. The maknode function calls ialloc to get an unused inode from the device, and this initializes its mode (as specified by the user), its link count (to 1), and its uid and gid (to the current process's uid and gid). It does not set a creation time. The ialloc function is fairly complicated, but as far as I can tell it is not setting any creation time either.

Working it from the other end, asking who might look at the ctime field, we have the find command, which has a -mtime option, but no -ctime option. The dump command, as noted before, uses the mtime. Several commands perform stat calls and declare structs to hold the result. For example, pr, which prints files with nice pagination, declares a struct inode, which is the inode as returned by stat, as opposed to the inode as used internally by the kernel—what we would call a struct stat now. There was no /usr/include in the fifth edition, so the pr command contains its own declaration of the struct inode. It looks like this:

struct inode {
        int dev;
	...
        int atime[2];
        int mtime[2];
};
No sign of the ctime, which would have been after the mtime field. (Of course, it could be there anyway, unmentioned in the declaration, since it is last.) And similarly, the ls command has:

struct ibuf {
	int	idev;
	int	inum;
        ...
	char	*iatime[2];
	char	*imtime[2];
};
A couple of commands have extremely misleading declarations. Here's the struct inode from the prof command, which prints profiling reports:

struct inode {
        int     idev;
        ...
        int ctime[2];
        int mtime[2];
        int fill;
};
The atime field has erroneously been called ctime here, but it seems that since prof does not use the atime, nobody noticed the bug. And there's a mystery fill field at the end, as if prof is expecting one more field, but doesn't know what it will be for. The declaration of ibuf in the ln command has similar oddities.

So the creation time advertised by the CACM paper (1974) and the version 1 manual (1971) seems to have disappeared by the time of version 5 (1975), if indeed it ever existed.

But there was some schizophrenia in the version 5 system about whether there was a third date in addition to the atime and the mtime. The stat call copied it into the stat buffer, and some commands assumed that it would be there, although they weren't sure what it would be for, and none of them seem look at it. It's quite possible that there was at one time a creation date, which had been eliminated by the time of the fifth edition, leaving behind the vestigial remains we saw in commands like ln and prof and in the code of the stat1 function.

Order
Lions' Commentary on Unix 6th Edition
Lions' Commentary on Unix 6th Edition
with kickback
no kickback
Functionally, the version 5 mtime is actually what we would now call the ctime: it is updated by operations like chmod that in modern Unix will update the ctime but not the mtime. A quick scan of the Lions Book suggests that it was the same way in version 6 as well. I imagine that the ctime-mtime distinction arose in version 7, because that was the last version before the BSD/AT&T fork, and nearly everything common to those two great branches of the Unix tree was in version 7.

Oh, what the hell; I have the version 7 source code; I may as well look at it. Yes, by this time the /usr/include/sys/stat.h file had been invented, and does indeed include all three times in the struct stat. So the mtime (as we now know it) appears to have been introduced in v7.

One sometimes hears that early Unix had atime and mtime, and that ctime was introduced later. But actually, it appears that early Unix had atime and ctime, and it was the mtime that was introduced later. The confusion arises because in those days the ctime was called "mtime".

Addendum: It occurs to me now that the version 5 mtime is not precisely like the modern ctime, because it can be set via the smdate call, which is analogous to the modern utime call. The modern ctime cannot be set at all.


Order
The C Programming Language
The C Programming Language
with kickback
no kickback
(Minor trivium: line 22 of iupdat is ip1 =+ 2. In modern C, we would write ip1 += 2. The =+ and =- operators had turned out to be a mistake, because people would write i=-1, intending i = -1, but the compiler would understand it as i =- 1, producing subtle bugs. The spellings of the operators were changed to avoid these bugs. The change from =+ to += was complete by the time K&R first edition was published in 1978: K&R mentions the old-style operators and says that the are obsolete. In spite of this, the Sun compiler I used in 1987 would still produce a warning for i=-1, despite interpreting it as i = -1. I believe this was because it was PCC-derived, and all PCC compilers emitted this warning. In the fifth edition code, we can see the obsolete form still in use.)

(Totally peripheral addendum: Google search for dmr puts Dennis M. Ritchie in fourth position, not the first. Is this grave insult to our community to be tolerated? I think not! It must be avenged! With fire and steel!)

[ Addendum 20060127: Unix source code prior to the fifth edition is lost. The manuals for the third and fourth editions are available from the Unix Heritage Society. The manual for the third edition (February 1973) mentions the creation time, but by the fourth edition (November 1973) the stat(2) man page no longer mentions a creation time. In v4, the two dates in the stat structure are called actime (modern atime) and modtime (modern mtime/ctime). ]



[Other articles in category /Unix] permanent link

Sun, 10 Jun 2007

Backsies

Order
A Bargain for Frances
A Bargain for Frances
with kickback
no kickback
One of the books in the bedtime-reading rotation for my daughter Iris is A Bargain for Frances, by Russell and Lillian Hoban. (Russell Hoban is also the author of a number of acclaimed novels for adults, most notably Riddley Walker.) The plot and character relationships in A Bargain for Frances are quite complex, probably about at the limit of what a two-year-old can handle. I will try to summarize.

Frances the badger is having a tea party with her friend Thelma, who has previously behaved abusively to her. Thelma's tea set is plastic, with red flowers. Frances is saving up her money for a real china tea set with blue pictures. Thelma asserts that those tea sets are no longer made, and that they are prohibitively expensive. She offers to sell Frances her own tea set, in return for Frances's savings of $2.17. Frances agrees. End of act I.

When Frances returns home with the plastic tea set, her little sister Gloria criticizes it, saying repeatedly that it is "ugly". She reports that the china kind with blue pictures is available in the local candy store for $2.07, and that Thelma knows this. Frances rushes to the candy store, where she witnesses Thelma buying a china tea set with her money. End of act II.

There is an act III, but I do not want to spoil the ending.

There is quite a lot here to engage the mind of a two-year-old: what does it mean to make a trade, for example? And Thelma is quite devious in the way she talks up the benefits of her plastic tea set ("It does not break, unless you step on it") while dissembling her own desire for a china one. Iris has not yet learned to deceive others for her own benefit, and I think this is her first literary exposure to the idea.

I mentioned at one point that Thelma had told a lie: she had said "I don't think they make that kind [of tea set] anymore" when she knew that the very tea set was available at the candy store. Iris was very interested by this observation. She asked me repeatedly, over a period of a several weeks, to explain to her what a lie was. I had some trouble, because I did not have any good examples to draw on. Iris does not do it yet, and Lorrie and I do not lie to Iris either.

One time I tried to explain lies by telling Iris about how people sometimes tell children that if they do not behave, goblins will come and take them away. Of course, this didn't work. First I had to explain what goblins were. Iris was very disturbed at the thought of goblins that might take her away. I had to reassure Iris that there were no goblins. We got completely sidetracked on a discussion of goblins. I should have foreseen this, but it was the best example I was able to come up with on the spur of the moment.

Later I thought of a better example, with no distracting goblins: suppose Iris asks for raspberries, and I know there are some in the refrigerator, but I tell her that we have none, because I want to eat them myself. I think this was just a little bit too complicated for Iris. It has four parts, and I try to keep explanations to three parts, which seems to be about the maximum that she can follow at once. (Two parts is even better.) I think Iris attached too much significance to the raspberries; for a while she seemed to think that lying had something to do with raspberries.

Oh well, at least I tried. She will catch on soon enough, I am sure.

Perhaps the most complex idea in the book is this: when Frances and Thelma agree to trade money for tea set, they agree on "no backsies". This is an important plot point. After the second or third reading, Iris asked me what "no backsies" meant.

I had to think about this carefully before I answered, because it is quite involved, and until I thought it through, I was not sure I understood it myself. You might want to think about this before reading on. Remember that it's not enough to understand it; you have to be able to explain it.

My understanding of "no backsies" was that normally, when friends trade, there is an assumption that the exchange may be unilaterally voided by either party, as long as this is done timely. You can come back the next day and say you have changed your mind, and your friend, being your friend, is expected to consent. Specifying "no backsies" establishes an advance agreement that this is not the case. If you come back the next day, your friend can protest "but we said there were no backsies on this" and refuse to undo the trade. (The trade can, of course, be voided later if both parties agree.)

So to understand this, you must first understand what it means to trade, and why. Iris took this in early on, and fairly easily. You also have to understand the idea that one or both parties might want to change their minds later; this is also something Iris can get her head around. Toddlers know all about what it means to change one's mind.

But then you have to understand that one party might want to annul the agreement and the other party might not. Tracking two people's independent and conflicting desires is probably a little too hard for Iris at this stage. She can sometimes understand another person's point of view, by identification. ("You sometimes feel like x; here this other person feels the same way.") And similarly she can immerse herself in the world-view of the protagonist of a book, and understand that the protagonist's desires might be frustrated by another character. But to immerse herself in both world-views simultaneously is beyond her.

"No backsies" goes beyond this: you have to understand the idea that an agreement might have default, unspoken conventions, and that the participants will adhere to these conventions even if they don't want to; this is not something that two-year-olds are good at doing yet. You have to understand the idea of an explicit modification to the default conditions; that part is not too hard, and everyday examples abound. But then you have to understand what the unspoken convention actually is, and how it is being modified, and the difference between a unilateral annulment of an agreement and a bilateral one. Again, I think it's the bilaterality that's hard for Iris to understand. She is still genuinely puzzled when I tell her we should leave the public restroom clean for the next person.

Really, though, the main difficulty is just that the idea is very complicated. Maybe I'm wrong about which parts are harder and which parts are easier, and perhaps Iris can understand any of the pieces separately. But at two years old she can't yet sustain a train of thought as complicated as the one required to put all the pieces of "no backsies" together. This sort of understanding is one of the essential components of being an adult, and she will get it sooner or later; probably sooner.

This is not the only part of the book that repays careful thought. At one point, during Thelma's monologue about the unavailability of china tea sets, she says:

I know another girl who saved up for that tea set. Her mother went to every store and could not find one. Then that girl lost some of her money and spent the rest on candy. She never got the tea set. A lot of girls never do get tea sets. So maybe you won't get one.
One evening my wife Lorrie asked me who I thought Thelma was speaking about in that passage. I replied that I had always understood it as a pure fabrication, and that there was no "other girl".

Lorrie said that she thought that Thelma had been speaking about herself, that Thelma had saved up her money, and her mother had gone looking for a china tea set, been unable to find one, and had brought home the plastic set as a consolation prize.

The crucial clue was the detail about how the "other girl" spent the rest of her money on candy, which is just a bit too specific for a mere fabrication.

Once you try out the hypothesis that Thelma is speaking personally, a lot of other details fall into place. For example, her assertion that "A lot of girls never do get tea sets" is no longer a clever invention on her part: she is repeating something her mother told her to shut her up when she expressed her disappointment over receiving a plastic instead of a china tea set. Her sales pitch to Frances about why a plastic tea set is better than a china one can be understood as an echo of her mother's own attempts to console her.

My wife is very clever, and was an English major to boot. She is skilled at noticing such things both by native talent and by long training of that talent.

Good children's literature does reward a close reading, and like good adult literature, reveals additional depths on multiple readings. It seems to me that books for small children are more insipid than they used to be, but that could just be fuddy-duddyism, or it could be selection bias: I no longer remember the ones I loved as a child that would now seem insipid precisely because they would now seem insipid.

But the ability to produce good literature at any level is rare, so it is probably just that there only a few great writers in every generation can do it. Russell Hoban was one of the best here.


[Other articles in category /book] permanent link

Fri, 08 Jun 2007

Counting transitive relations
A relation on a set S is merely a subset of S×S. For example, the relation < on the set {1,2,3} can be identified as {(1,2), (1,3), (2,3)}, the set of all (a, b) with a < b.

A relation is transitive if, whenever it has both (a, b) and (b, c), it also has (a, c).

For the last week I've been trying to find a good way to calculate the number of transitive relations on a set with three elements.

There are 13 transitive relations on a set with 2 elements. This is easy to see. There are 16 relations in all. The only way a relation can fail to be transitive is to contain both (1, 2) and (2, 1). There are clearly four such relations. Of these four, the only one that is transitive has (1, 1) and (2, 2) also. Similarly it's quite easy to see that there are only 2 relations on a 1-element set, and both are transitive.

There are 512 relations on a set with 3 elements. How many are transitive?

It would be very easy to write a computer program to check them all and count the transitive ones. That is not what I am after here. In fact, it would also be easy to enumerate the transitive relations by hand; 512 is not too many. That is not what I am after either. I am trying to find some method or technique that scales reasonably well, well enough that I could apply it for larger n.

No luck so far. Relations on 3-sets can fail to be transitive in all sorts of interesting ways. Say that a relation has the Fabc property if it contains (a,b) and (b,c) but not (a,c). Such a relation is intransitive.

Now clearly there are 64 Fabc relations for each distinct choice of a, b, and c. But some of these properties overlap. For example, {(a,b), (b,c), (c,a)} has not only the Fabc property but also the Fbca and Fcab properties.

Of the 64 relations with the Fabc property, 16 have the Fbca property also. 16 have the Faba property. None have the Facb property. There are 12 of these properties, and they overlap in a really complicated way.

After a week I gave in and looked in the literature. I have a couple of papers in my bag I haven't read yet. But it seems that there is no simple solution, which is reassuring.

One problem is that the number of relations on n elements grows very rapidly (it's 2n2) and the number of transitive relations is a good-sized fraction of these.


[Other articles in category /math] permanent link

Wed, 16 May 2007

Moziz Addums
Last July at a porch sale I obtained a facsimile copy of Housekeeping in Old Virginia, by M.C. Tyree, originally published in 1879. I had been trying to understand the purpose of ironing. Ironing makes the clothes look nice, but it must have also served some important purpose, essential for life, that I don't now understand. In the Laura Ingalls Wilder Little House books, Laura recounts a common saying that scheduled the week's work:

Wash on Monday
Iron on Tuesday
Mend on Wednesday
Churn on Thursday
Clean on Friday
Bake on Saturday
Rest on Sunday

You bake on Saturday so that you have fresh bread for Sunday dinner. You wash on Monday because washing is backbreaking labor and you want to do it right after your day of rest. You iron the following day before the washed clothes are dirty again. But why iron at all? If you don't wash the clothes or clean the house, you'll get sick and die. If you don't bake, you won't have any bread, and you'll starve. But ironing? In my mind it was categorized with dusting, as something people with nice houses in the city might do, but not something that Ma Ingalls, three miles from the nearest neighbor, would concern herself with.

But no. Ironing, and starching with the water from boiled potatoes, was so important that it got a whole day to itself, putting it on par with essential activities like cleaning and baking. But why?

A few months later, I figured it out. In this era of tumble-drying and permanent press, I had forgotten what happens to fabrics that are air dried, and did not understand until I was on a trip and tried to air-dry a cotton bath towel. Air-dried fabrics come out not merely wrinkled but corrugated, like an accordion, or a washboard, and are unusable. Ironing was truly a necessity.

Anyway, I was at this porch sale, and I hoped that this 1879 housekeeping book might provide the answer to the ironing riddle. It turned out to be a cookbook. There is plenty to say about this cookbook anyway. It comes recommended by many notable ladies, including Mrs. R.B. Hayes. (Her husband was President of the United States.) She is quoted on the flyleaf as being "very much pleased" with the cookbook.

Some of the recipes are profoundly unhelpful. For example, p.106 has:

Boiled salmon. After the fish has been cleaned and washed, dry it and sew it up in a cloth; lay in a fish-kettle, cover with warm water, and simmer until done and tender.

Just how long do I simmer it? Oh, until it is "done" and "tender". All right, I will just open up the fish kettle and poke it to see. . . except that it is sewed up in a cloth. Hmmm.

You'd think that if I'm supposed to simmer this fish that has been sewn up in a cloth, the author of the recipe might advise me on how long until it is "done". "Until tender" is a bit of a puzzle too. In my experience, fish become firmer and less tender the longer you simmer them. Well, I have a theory about this. The recipe is attributed to "Mrs. S.T.", and consulting the index of contributors, I see that it is short for "Mrs. Samuel Tyree", presumably the editor's mother-in-law. Having a little joke at her expense, perhaps?

There are a lot of other interesting points, which may appear here later. For example, did you know that the most convenient size hog for household use is one of 150 to 200 pounds? And the cookbook contains recipes not only for tomato catsup, but also pepper catsup, mushroom catsup, and walnut catsup.

But the real reason I brought all this up is that page 253–254 has the following item, attributed to "Moziz Addums":

Resipee for cukin kon-feel Pees. Gether your pees 'bout sun-down. The folrin day, 'bout leven o'clock, gowge out your pees with your thum nale, like gowgin out a man's ey-ball at a kote house. Rense your pees, parbile them, then fry 'erm with some several slices uv streekd middlin, incouragin uv the gravy to seep out and intermarry with your pees. When modritly brown, but not scorcht, empty intoo a dish. Mash 'em gently with a spune, mix with raw tomarters sprinkled with a little brown shugar and the immortal dish ar quite ready. Eat a hepe. Eat mo and mo. It is good for your genral helth uv mind and body. It fattens you up, makes you sassy, goes throo and throo your very soul. But why don't you eat? Eat on. By Jings. Eat. Stop! Never, while thar is a pee in the dish.

This was apparently inserted for humorous effect. Around the time the cookbook was written, there was quite a vogue for dialectal humor of this type, most of which has been justly forgotten. Probably the best-remembered practitioner of this brand of humor was Josh Billings, who I bet you haven't heard of anyway. Tremendously popular at the time, almost as much so as Mark Twain, his work is little-read today; the joke is no longer funny. The exceptionally racist example above is in many ways typical of the genre.

One aspect of this that is puzzling to us today (other than the obvious "why was this considered funny?") is that it's not clear exactly what was supposed to be going on. Is the idea that Moziz Addums wrote this down herself, or is this a transcript by a literate person of a recipe dictated by Moziz Addums? Neither theory makes sense. Where do the misspellings come from? In the former theory, they are Moziz Addums' own misspellings. But then we must imagine someone literate enough to spell "intermarry" and "immortal" correctly, but who does not know how to spell "of".

In the other theory, the recipe is a transcript, and the misspellings have been used by the anonymous, literate transcriber to indicate Moziz Addums' unusual or dialectal pronunciations, as with "tomarters", perhaps. But "uv" is the standard (indeed, the only) pronunciation of "of", which wrecks this interpretation. (Spelling "of" as "uv" was the signature of Petroleum V. Nasby, another one of those forgotten dialectal humorists.) And why did the transcriber misspell "peas" as "pees"?

So what we have here is something that nobody could possibly have written or said, except as an inept parody of someone else's speech. I like my parody to be rather less artificial.

All of this analysis would be spoilsportish if the joke were actually funny. E.B. White famously said that "Analyzing humor is like dissecting a frog. Few people are interested and the frog dies of it." Here, at least, the frog had already been dead for a hundred years dead before I got to it.


[Other articles in category /lang] permanent link

Tue, 15 May 2007

Ambiguous words and dictionary hacks
A Mexican gentleman of my acquaintance, Marco Antonio Manzo, was complaining to me (on IRC) that what makes English hard was the large number of ambiguous words. For example, English has the word "free" where Spanish distinguishes "gratis" (free like free beer) from "libre" (free like free speech).

I said I was surprised that he thought that was unique to English, and said that probably Spanish had just as many "ambiguous" words, but that he just hadn't noticed them. I couldn't think of any Spanish examples offhand, but I knew some German ones: in English, "suit" can mean a lawsuit, a suit of clothes, or a suit of playing cards. German has different words for all of these. In German, the suit of a playing card is its "farbe", its color. So German distinguishes between suit of clothes and suit of playing cards, which English does not, but fails to distinguish between colors of paint and suit of playing cards, which English does.

Every language has these mismatches. Korean has two words for "thin", one meaning thin like paper and the other meaning thin like string. Korean distinguishes father's sister ("komo") from mother's sister ("imo") where English has only "aunt".

Anyway, Sr. Manzo then went to lunch, and I wanted to find some examples of concepts distinguished by English but not by Spanish. I did this with a dictionary hack.

A dictionary hack is when you take a plain text dictionary and do some sort of rough-and-ready processing on it to get an 80% solution to some problem. The oldest dictionary hack I know of is the old Unix rhyming dictionary hack:

        rev /usr/dict/words | sort | rev > rhyming.txt
This takes the Unix word list and turns it into a semblance of a rhyming dictionary. It's not an especially accurate semblance, but you can't beat the price.

...
     ugh	      Marlborough   choreograph	            Guelph        Wabash   
     Hugh	      Scarborough   lithograph	            Adolph        cash     
     McHugh	      thorough	    electrocardiograph      Randolph      dash     
     Pugh	      trough	    electroencephalograph   Rudolph       leash    
     laugh	      sough	    nomograph	            triumph       gash     
     bough	      tough	    tomograph	            lymph         hash     
     cough	      tanh	    seismograph	            nymph         lash     
     dough	      Penh	    phonograph	            philosoph     clash    
     sourdough        sinh	    chronograph	            Christoph     eyelash  
     hough	      oh	    polarograph	            homeomorph    flash    
     though	      pharaoh	    spectrograph            isomorph      backlash 
     although         Shiloh	    Addressograph           polymorph     whiplash 
     McCullough       pooh	    chromatograph           glyph         splash   
     furlough         graph	    autograph	            anaglyph      slash    
     slough	      paragraph	    epitaph	            petroglyph    mash     
     enough	      telegraph	    staph	            myrrh         smash    
     rough	      radiotelegrap aleph	            ash           gnash    
     through	      calligraph    Joseph	            Nash          Monash   
     breakthrough     epigraph	    caliph	            bash          rash     
     borough	      mimeograph    Ralph	            abash         brash    
...
It figures out that "clash" rhymes with "lash" and "backlash", but not that "myrrh" rhymes with "purr" or "her" or "sir". You can of course, do better, by using a text file that has two columns, one for orthography and one for pronunciation, and sorting it by reverse pronunciation. But like I said, you won't beat the price.

But I digress. Last week I pulled an excellent dictionary hack. I found the Internet Dictionary Project's English-Spanish lexicon file on the web with a quick Google search; it looks like this:

        a	un, uno, una[Article]
        aardvark	cerdo hormiguero
        aardvark	oso hormiguero[Noun]
        aardvarks	cerdos hormigueros
        aardvarks	osos hormigueros 
        ab	prefijo que indica separacio/n
        aback	hacia atras
        aback	hacia atr´s,take aback, desconcertar. En facha.
        aback	por sopresa, desprevenidamente, de improviso
        aback	atra/s[Adverb]
        abacterial	abacteriano, sin bacterias
        abacus	a/baco
        abacuses	a/bacos
        abaft	A popa (towards stern)/En popa (in stern)
        abaft	detra/s de[Adverb]
        abalone	abulo/n
        abalone	oreja de mar (molusco)[Noun]
        abalone	oreja de mar[Noun]
        abalones	abulones
        abalones	orejas de mar (moluscos)[Noun]
        abalones	orejas de mar[Noun]
        abandon	abandonar
        abandon	darse por vencido[Verb]
        abandon	dejar
        abandon	desamparar, desertar, renunciar, evacuar, repudiar
        abandon	renunciar a[Verb]
        abandon	abandono[Noun]
        abandoned	abandonado
        abandoned	dejado
...
Then I did:

        sort +1 idengspa.txt  | 
        perl -nle '($ecur, $scur) = split /\s+/, $_, 2; 
                print "$eprev $ecur $scur" 
                        if $sprev eq $scur && 
                           substr($eprev, 0, 1) ne substr($ecur, 0, 1); 
                        ($eprev, $sprev) = ($ecur, $scur)'

The sort sorts the lexicon into Spanish order instead of English order. The Perl thing comes out looking a lot more complicated than it ought. It just says to look and print consecutive items that have the same Spanish, but whose English begins with different letters. The condition on the English is to filter out items where the Spanish is the same and the English is almost the same, such as:

blond blonde rubio
cake cakes tarta
oceanographic oceanographical oceanografico[Adjective]
palaces palazzi palacios[Noun]
talc talcum talco
taxi taxicab taxi

It does filter out possible items of interest, such as:

carefree careless sin cuidado

But since the goal is just to produce some examples, and this cheap hack was never going to generate an exhaustive list anyway, that is all right.

The output is:

        at letter a
        actions stock acciones[Noun]
        accredit certify acreditar
        around thereabout alrededor
        high tall alto
        comrade pal amigo[Noun]
        antecedents backgrounds antecedentes
        (...complete output...)
A lot of these are useless, genuine synonyms. It would be silly to suggest that Spanish fails to preserve the English distinction between "marry" and "wed", between "ale" and "beer", between "desire" and "yearn", or between "vest" and "waistcoat". But some good possibilities remain.

Of these, some probably fail for reasons that only a Spanish-speaker would be able to supply. For instance, is "el pastel" really the best translation of both "cake" and "pie"? If so, it is an example of the type I want. But perhaps it's just a poor translation; perhaps Spanish does have this distinction; say maybe "torta" for "cake" and "empanada" for "pie". (That's what Google suggests, anyway.)

Another kind of failure arises because of idioms. The output:

        exactly o'clock en punto
is of this type. It's not that Spanish fails to distinguish between the concepts of "exactly" and "o'clock"; it's that "en punto" (which means "on the point of") is used idiomatically to mean both of those things: some phrase like "en punto tres" ("on the point of three") means "exactly three" and so, by analogy, "three o'clock". I don't know just what the correct Spanish phrases are, but I can guess that they'll be something like this.

Still, some of the outputs are suggestive:

high tall alto
low small bajo[Adjective]
babble fumble balbucear[Verb]
jealous zealous celoso
contest debate debate[Noun]
forlorn stranded desamparado[Adjective]
docile meek do/cil[Adjective]
picture square el cuadro
fourth room el cuarto
collar neck el cuello
idiom language el idioma[Noun]
clock watch el reloj
floor ground el suelo
ceiling roof el techo
knife razor la navaja
feather pen la pluma
cloudy foggy nublado

I put some of these to Sr. Manzo, and he agreed that some were indeed ambiguous in Spanish. I wouldn't have known what to suggest without the dictionary hack.


[Other articles in category /lang] permanent link

Mon, 14 May 2007

Bryan and his posse
Today upon the arrival of a coworker and his associates, I said "Oh, here comes Bryan and his posse". My use of "posse" here drew some comment. I realized I was not completely sure what "posse" meant. I mostly knew it from old West contexts: the Big Dictionary has quotes like this one, from 1901:

A pitched battle was fought..at Rockhill, Missouri, between the Sheriff's posse and the miners on strike.
Order
Me and My Little Brain
Me and My Little Brain
with kickback
no kickback
I first ran across the word in J.D. Fitzgerald's Great Brain books. At least in old West contexts, the word refers to a gang of men assembled by some authority such as a sheriff or a marshal, to perform some task, such as searching for a lost person, apprehending an outlaw, or blasting some striking miners. This much was clear to me before.

From the context and orthography, I guessed that it was from Spanish. But no, it's not. It's Latin! "Posse" is the Latin verb "to be able", akin to English "possible" and ultimately to "potent" and related words. I'd guessed something like this, supposing English "posse" was akin to some Spanish derivative of the Latin. But it isn't; it's direct from Latin: "posse" in English is short for posse comitatus, "force of the county".

The Big Dictionary has citations for "posse comitatus" back to 1576:

Mr. Sheryve meaneth in person to repayre thither & with force to bryng hym from Aylesham, Whomsoever he fyndeth to denye the samet & suerly will with Posse Comitatus fetch hym from this new erected pryson to morrow.

"Sheryve" is "Sheriff". (If you have trouble understanding this, try reading it aloud. English spelling changed more than its pronunciation since 1576.)

I had heard the phrase before in connection with the Posse Comitatus Act of U.S. law. This law, passed in 1878, is intended to prohibit the use of the U.S. armed forces as Posse Comitatus—that is, as civilian law enforcement. Here the use is obviously Latin, and I hadn't connected it before with the sheriff's posse. But they are one and the same.


[Other articles in category /lang/etym] permanent link

Tue, 08 May 2007

Degrees of algebraic numbers
An algebraic number x is said to have degree n if it is the zero of some irreducible nth-degree polynomial P with integer coefficients.

For example, all rational numbers have degree 1, since the rational number a/b is a zero of the first-degree polynomial bx - a. √2 has degree 2, since it is a zero of x2 - 2, but (as the Greeks showed) not of any first-degree polynomial.

It's often pretty easy to guess what degree some number has, just by looking at it. For example, the nth root of a prime number p has degree n. $\sqrt{1 + \sqrt 2}$ has a square root of a square root, so it's fourth-degree number. If you write $x = \sqrt{1 + \sqrt 2}$ then eliminate the square roots, you get x4 - 2x2 - 1, which is the 4th-degree polynomial satisfied by this 4th-degree number.

But it's not always quite so simple. One day when I was in high school, I bumped into the fact that $\sqrt{7 + 4 \sqrt 3}$, which looks just like a 4th-degree number, is actually a 2nd-degree number. It's numerically equal to $2 + \sqrt 3$. At the time, I was totally boggled. I couldn't believe it at first, and I had to get out my calculator and calculate both values numerically to be sure I wasn't hallucinating. I was so sure that the nested square roots in $\sqrt{7 + 4 \sqrt 3}$ would force it to be 4th-degree.

If you eliminate the square roots, as in the other example, you get the 4th-degree polynomial x4 - 14x2 + 1, which is satisfied by $\sqrt{7 + 4 \sqrt 3}$. But unlike the previous 4th-degree polynomial, this one is reducible. It factors into (x2 + 4x + 1)(x2 - 4x + 1). Since $\sqrt{7 + 4 \sqrt 3}$ is a zero of the polynomial, it must be a zero of one of the two factors, and so it is second-degree. (It is a zero of the second factor.)

I don't know exactly why I was so stunned to discover this. Clearly, the square of any number of the form a + bc is another number of the same form (namely (a2 + b2c) + 2abc), so it must be the case that lots of a + bc numbers must be squares of other such, and so that lots of $\sqrt{a + b \sqrt c}$ numbers must be second-degree. I must have known this, or at least been capable of knowing it. Socrates says that the truth is within us, and we just don't know it yet; in this case that was certainly true. I think I was so attached to the idea that the nested square roots signified fourth-degreeness that I couldn't stop to realize that they don't always.

In the years since, I came to realize that recognizing the degree of an algebraic number could be quite difficult. One method, of course, is the one I used above: eliminate the radical signs, and you have a polynomial; then factor the polynomial and find the irreducible factor of which the original number is a root. But in practice this can be very tricky, even before you get to the "factor the polynomial" stage. For example, let x = 21/2 + 21/3. Now let's try to eliminate the radicals.

Proceeding as before, we do x - 21/3 = 21/2 and then square both sides, getting x2 - 2·21/3x + 22/3 = 2, and then it's not clear what to do next.

So we try the other way, starting with x - 21/2 = 21/3 and then cube both sides, getting x3 - 3·21/2x2 + 6x - 2·21/2 = 2. Then we move all the 21/2 terms to the other side: x3 + 6x - 2 = (3x2 + 2)·21/2. Now squaring both sides eliminates the last radical, giving us x6 + 12x4 - 4x3 + 36x2 - 24x + 4 = 18x4 + 12x2 + 8. Collecting the terms, we see that 21/2 + 21/3 is a root of x6 - 6x4 - 4x3 + 12x2 - 24x - 4. Now we need to make sure that this polynomial is irreducible. Ouch.

In the course of writing this article, though, I found a much better method. I'll work a simpler example first, √2 + √3. The radical-eliminating method would have us put x - √2 = √3, then x2 - 2√2x + 2 = 3, then x2 - 1 = 2√2x, then x4 - 2x2 + 1 = 8x2, so √2 + √3 is a root of x4 - 10x2 + 1.

The new improved method goes like this. Let x = √2 + √3. Now calculate powers of x:

x0 =       1
x1 =   √2 + √3  
x2 = 2√6 +     5
x3 =   11√2 + 9√3  
x4 = 20√6 +     49

That's a lot of calculating, but it's totally mechanical.

All of the powers of x have the form a6√6 + a2√2 + a3√3 + a1. This is easy to see if you write p for √2 and q for √3. Then x = p + q and powers of x are polynomials in p and q. But any time you have p2 you replace it with 2, and any time you have q2 you replace it with 3, so your polynomials never have any terms in them other than 1, p, q, and pq.

This means that you can think of the powers of x as being vectors in a 4-dimensional vector space whose canonical basis is {1, √2, √3, √6}. Any four vectors in this space, such as {1, x, x2, x3}, are either linearly independent, and so can be combined to total up to any other vector, such as x4, or else they are linearly dependent and three of them can be combined to make the fourth. In the former case, we have found a fourth-degree polynomial of which x is a root, and proved that there is no simpler such polynomial; in the latter case, we've found a simpler polynomial of which x is a root.

To complete the example above, it is evident that {1, x, x2, x3} are linearly independent, but if you don't believe it you can use any of the usual mechanical tests. This proves that √2 + √3 has degree 4, and not less. Because if √2 + √3 were of degree 2 (say) then we would be able to find a, b, c such that ax2 + bx + c = 0, and then the x2, x1, and x0 vectors would be dependent. But they aren't, so we can't, so it isn't.

Instead, there must be a, b, c, and d such that x4 = ax3 + bx2 + cx + d. To find these we need merely solve a system of four simultaneous equations, one for each column in the table:

2 b = 20
11 a + c = 0
9 a + c = 0
5 b + d= 49

And we immediately get a=0, b=10, c=0, d=-1, so x4 = 10x2 - 1, and our polynomial is x4 - 10x2 + 1, as before.

Yesterday's draft of this article said:

I think [21/2 + 21/3] turns out to be degree 6, but if you try to work it out in the straightforward way, by equating it to x and then trying to get rid of the roots, you get a big mess. I think it turns out that if two numbers have degrees a and b, then their sum has degree at most ab, but I wouldn't even want to swear to that without thinking it over real carefully.

Happily, I'm now sure about all of this. I can work through the mechanical method on it. Putting x = 21/2 + 21/3, we get:

x0 = [0 0 0 0 0 1]
x1 = [0 0 0 1 1 0]
x2 = [0 1 2 0 0 2]
x3 = [3 0 0 6 2 2]
x4 = [0 12 8 2 8 4]
x5 = [20 2 10 20 4 40]
x6 = [12 60 24 60 80 12]

Where the vector [a, b, c, d, e, f] is really shorthand for a21/2·22/3 + b22/3 + c21/2·21/3 + d21/3 + e21/2 + f.

x0...x5 turn out to be linearly independent, almost by inspection, so 21/2 + 21/3 has degree 6. To express x6 as a linear combination of x0...x5, we set up the following equations:

20a + 3c = 12
2a + 12b + d = 60
10a+ 8b + 2d = 24
20a+ 2b + 6c + e = 60
4a + 8b + 2c + e = 80
40a+ 4b + 2c+ 2d + f= 12

Solving these gives [a, b, c, d, e, f]= [0, 6, 4, -12, 24, 4], so x6 = 6x4 + 4x3 - 12x2 + 24x + 4, and 21/2 + 21/3 is a root of x6 - 6x4 - 4x3 + 12x2 - 24x - 4, which is irreducible.

And similarly, using this method, one can calculate in a few minutes that 21/2 + 21/4 has degree 4 and is a root of x4 - 4x2 - 8x + 2.

I wish I had figured this out in high school; it would have delighted me.


[Other articles in category /math] permanent link

Woodrow Wilson on bloggers
Last weekend my family and I drove up to New York. On the way we stopped in the Woodrow Wilson Service Area on the New Jersey Turnpike, which has a little plaque on the wall commemorating Woodrow Wilson and providing some quotations, such as:

Uncompromising thought is the luxury of the closeted recluse.

(Part of a speech at the University of Tennessee in 17 June, 1890).

Bloggers beware; Woodrow Wilson has your number.


[Other articles in category /meta] permanent link

Wed, 02 May 2007

Symmetric functions
I used to teach math at the John Hopkins CTY program, which is a well-regarded summer math camp. Kids would show up and finish a year (or more) of high-school math in three weeks. We'd certify them by giving them standardized tests, which might carry some weight with their school. But before they were allowed to take the standardized test, they had to pass a much more difficult and comprehensive exam that we'd made up ourselves.

The most difficult question on the Algebra III exam presented the examinee with some intractable third degree polynomial—say x3 + 4x2 - 2x + 6—and asked for the sum of the cubes of its roots.

You might like to match your wits against the Algebra III students before reading the solution below.

In the three summers I taught, only about two students were able to solve this problem, which is rather tricky. Usually they would start by trying to find the roots. This is doomed, because the Algebra III course only covers how to find the roots when they are rational, and the roots here are totally bizarre.

Even clever students didn't solve the problem, which required several inspired tactics. First you must decide to let the roots be p, q, and r, and, using Descartes' theorem, say that

x3 + bx2 + cx + d = (x - p)(x - q)(x - r)

This isn't a hard thing to do, and a lot of the kids probably did try it, but it's not immediately clear what the point is, or that it will get you anywhere useful, so I think a lot of them never took it any farther.

But expanding the right-hand side of the equation above yields:

x3 + bx2 + cx + d = x3 - (p + q + r)x2 + (pq + pr + qr)x - pqr

And so, equating coefficients, you have:
b = -(p + q + r)
c = pq + pr + qr
d = -pqr
Quite a few people did get to this point, but didn't know what to do next. Getting the solution requires either a bunch of patient tinkering or a happy inspiration, and either way it involves a large amount of accurate algebraic manipulation. You need to realize that you can get the p3 terms by cubing b. But even if you have that happy idea, the result is:

-b3 = p3 + q3 + r3 + 3p2q + 3p2r + 3q2r + 3pq2 + 3pr2 + 3qr2 + 6pqr
And you now need to figure out how to get rid of the unwanted terms. The 6pqr term is not hard to eliminate, since it is just -6d, and if you notice this, it will probably inspire you to try combinations of the others. In fact, the answer is:

p3 + q3 + r3 = -b3 + 3bc - 3d
So for the original polynomial, x3 + 4x2 - 2x + 6, we know that the sum of the cubes of the roots is -43 + 3·4·(-2) - 3·6 = -64 - 24 - 18 = -106, and we calculated it without any idea what the roots actually were.

Or, to take an example that we can actually check, consider x3 - 6x2 + 11x - 6, whose roots are 1, 2, and 3. The sum of the cubes is 1 + 8 + 27 = 36, and indeed -b3 + 3bc - 3d = 63 + 3·(-6)·11 + 18 = 216 - 198 + 18 = 36.

This was a lot of algebra III, but once you have seen this example, it's not hard to solve a lot of similar problems. For instance, what is the sum of the squares of the roots of x2 + bx + c? Well, proceeding as before, we let the roots be p and q, so x2 + bx + c = (x - p)(x - q) = x2 - (p + q)x + pq, so that b = -(p + q) and c = pq. Then b2 = p2 + 2pq+ q2, and b2 - 2c = p2 + q2.

In general, if F is any symmetric function of the roots of a polynomial, then F can be calculated from the coefficients of the polynomial without too much difficulty.

Anyway, I was tinkering around with this at breakfast a couple of days ago, and I got to thinking about b2 - 2c = p2 + q2. If roots p and q are both integers, then b2 - 2c is the sum of two squares. (The sum-of-two-squares theorem is one of my favorites.) And the roots are integers only when the discriminant of the original polynomial is itself a square. But the discriminant in this case is b2 - 4c. So we have the somewhat odd-seeming statement that when b2 - 4c is a square, then b2 - 2c is a sum of two squares.

I found this surprising because it seemed so underconstrained: it says that you can add some random even number to a fairly large class of squares and the result must be a sum of two squares, even if the even number you added wasn't a square itself. But after I tried a few examples to convince myself I hadn't made a mistake, I was sure there had to be a very simple, direct way to get to the same place.

It took some fiddling, but eventually I did find it. Say that b2 - 4c = a2. Then b and a must have the same parity, so p = (b + a)/2 is an integer, and we can write b = p + q and a = p - q where p and q are both integers.

Then c = (b2 - a2)/4 is just pq, and b2 - 2c = p2 + q2.

So that's where that comes from.

It seems like there ought to be an interesting relationship between the symmetric functions of roots of a polynomial and their expression in terms of the coefficients of the polynomial. The symmetric functions of degree N are all linear combinations of a finite set of symmetric functions. For example, any second-degree symmetric function of two variables has the form a(p2 + q2) + 2bpq. We can denote these basic symmetric functions of two variables as Fi,j(p, q) = Σpiqj. Then we have identities like (F1,0)2 = F2,0 + F1,1 and (F1,0)3 = F3,0 + 3F2,1.

Maybe I'll do an article about this in a week or two.


[Other articles in category /math] permanent link

Mon, 30 Apr 2007

Excessive precision
You sometimes read news articles that say that some object is 98.42 feet tall, and it is clear what happened was that the object was originally reported to be 30 meters tall, and some knucklehead translated 30 meters to 98.42 feet, instead of to 100 feet as they should have.

Finding a real example for you was easy: I just did Google search for "62.14 miles", and got this little jewel:

Tsunami waves can be up to 62.14 miles long! They can also be about three feet high in the middle of the ocean. Because of its strong underwater energetic force, the tsunami can rise up to 90 feet, in extreme cases, when they hit the shore! Tsunami waves act like shallow water waves because they are so long. Because it is so long, it can last an hour. In the Pacific Ocean, a tsunami moves 60.96 feet a second, passing through water that is around 1219.2 feet deep.

(library.thinkquest.org.)

The 60.96 feet per second is actually 100 km/hr, but I'm not sure what's going on with the 1219.2 feet deep. Is it 1/5 nautical mile? But that would be strange. [ Addendum 20070428: the explanation.]

Here's another delightful example:

The MiniC.A.T. is very cost-efficient to operate. According to MDI, it costs less than one dollar per 62.14 miles... Given the absence of combustion and the fact that the MiniC.A.T. runs on vegetable oil, oil changes are only necessary every 31,068 miles.

(eesi.org.)

(I should add that many of the hits for "62.14 miles" were perfectly legitimate. Many concerned 100-km bicycle races, or the conditions for winning the X-prize. In both cases the distance is in fact 62.14 miles, not 62.13 or 62.15, and the precision is warranted. But I digress.)

(Long ago there was a parody of the New York Times which included a parody sports section that announced "FOOTBALL TO GO METRIC". The article revealed that after the change, the end zones would be placed 91.44 meters apart...)

Anyway, similar knuckleheadedness occurs in the well-known value of 98.6 degrees Fahrenheit for normal human body temperature. Human body temperature varies from individual to individual, and by a couple of degrees over the course of the day, so citing the "normal" temperature to a tenth of a degree is ridiculous. The same thing happened here as with the 62.14-mile tsunami. Normal human body temperature was determined to be around 37 degrees Celsius, and then some knucklehead translated 37°C to 98.6°F instead of to 98°F.

When our daughter was on the way, Lorrie and I took a bunch of classes on baby care. Several of these emphasized that the maximum safe spacing for the bars of a crib, rails of a banister, etc., was two and three-eighths inches. I was skeptical, and at one of these classes I was foolish enough to ask if that precision were really required: was two and one-half inches significantly less safe? How about two and seven-sixteenths inches? The answer was immediate and unequivocal: two and one-half inches was too far apart for safety; two and three-eighths inches is the maximum safe distance.

All the baby care books say the same thing. (For example...)

But two and three-eighths inches is 6.0325 cm, so draw your own conclusion about what happened here.

[ Addendum 20070430: 60.96 feet per second is nothing like 100 km/hr, and I have no idea why I said it was. The 60.96 feet per second appears to be a backwards conversion of 200 m/s to ft/s, multiplying by 3.048 instead of dividing. As Scott turner noted a few days ago, a similar error occurs in the conversion of meters to feet in the "1219.2 feet deep" clause. ]


[Other articles in category /physics] permanent link

1219.2 feet
In Thursday's article, I quoted an article about tsunamis that asserted that:

In the Pacific Ocean, a tsunami moves 60.96 feet a second, passing through water that is around 1219.2 feet deep.
"60.96 feet a second" is an inept conversion of 100 km/h to imperial units, but I wasn't able to similarly identify 1219.2 feet.

Scott Turner has solved the puzzle. 1219.2 feet is an inept conversion of 4000 meters to imperial units, obtained by multiplying 4000 by 0.3048, because there are 0.3048 meters in a foot.

Thank you, M. Turner.

[ Addendum 20070430: 60.96 feet per second is nothing like 100 km/hr, and I have no idea why I said it was. The 60.96 feet per second is a backwards conversion of 200 m/s. ]


[Other articles in category /physics] permanent link

Tue, 24 Apr 2007

favicon.ico
I'm getting hundreds of failed requests for favicon.ico. I need to come up with a good-looking icon for the blog. Since the blog isn't about Higher-Order Perl, I don't want to use the HOP cover or the HOP quilt block icon .

I have not had any good ideas. So I am asking LazyNet.

The best suggestion will win a free one-year subscription to The Universe of Discourse blog. If the winner supplies an icon that I can actually use, the prize will be increased to a free two-year subscription.

Any suggestions?

[ Addendum 20061120: David Eppstein responded immediately with , which I will explain in a future article that summarizes the suggestions. M. Eppstein's suggestion is good enough that I am comfortable installing it right away. But don't let that stop you from coming up with your own suggestion. ]

[ Addendum 20061120: Neil Kandalgaonkar has contributed . Thank you, Neil! ]


[Other articles in category /meta] permanent link

Tue, 17 Apr 2007

Who farted?
The software that generates these web pages from my blog entries is Blosxom. When I decided I wanted to try blogging, I did a web search for something like "simplest possible blogging software", and found a page that discussed Bryar. So I located the Bryar manual. The first sentence in the Bryar manual says "Bryar is a piece of blog production software, similar in style to (but considerably more complex than) Rael Dornfest's blosxom." So I dropped Bryar and got Blosxom. This was a smashing success: It was running within ten minutes. Now, two months later, I'm thinking about moving everything to Bryar, because Blosxom really sucks. . But for me it was a huge success, and I probably wouldn't have started a blog if I hadn't found it. It sucks in the best possible way: because it's drastically under-designed and excessively lightweight. That is much better than sucking because it is drastically over-designed and excessively heavyweight. It also sucks in some less ambivalent ways, but I'm not here (today) to criticize Blosxom, so let's move on.

Until the big move to Bryar, I've been writing plugins for Blosxom. When I was shopping for blog software, one of my requirements that that the thing be small and simple enough to be hackable, since I was sure I would inevitably want to do something that couldn't be accomplished any other way. With Blosxom, that happened a lot sooner than it should have (the plugin interface is inadequate), but the fallback position of hacking the base code worked quite well, because it is fairly small and fairly simple.

Most recently, I had to hack the Blosxom core to get the menu you can see on the left side of the page, with the titles of the recent posts. This should have been possible with a plugin. You need a callback (story) in the plugin that is invoked once per article, to accumulate a list of title-URL pairs, and then you need another callback (stories_done) that is invoked once after all the articles have been scanned, to generate the menu and set up a template variable for insertion into the template. Then, when Blosxom fills the template, it can insert the menu that was set up by the plugin.

With stock Blosxom, however, this is impossible. The first problem you encounter is that there is no stories_done callback. This is only a minor problem, because you can just have a global variable that holds the complete menu so far at all times; each time story is invoked, it can throw away the incomplete menu that it generated last time and replace it with a revised version. After the final call to story, the complete menu really is complete:

        sub story {
          my ($pkg, $path, $filename, $story_ref, $title_ref, $body_ref,
                 $dir, $date) = @_;
          return unless $dir eq "" && $date eq "";
          return unless $blosxom::flavour eq "html";
          my $link = qq{<a href="$blosxom::url$path/$filename.$blosxom::flavour">} .
                     qq{<span class=menuitem>$$title_ref</span></a>};
          push @menu, $link;
          $menu = join "&nbsp;/ ", @menu;
        }
That strategy is wasteful of CPU time, but not enough to notice. You could also fix it by hacking the base code to add a stories_done callback, but that strategy is wasteful of programmer time.

But it turns out that this doesn't work, and I don't think there is a reasonable way to get what I wanted without hacking the base code. (Blosxom being what it is, hacking the base code is a reasonable solution.) This is because of a really bad architecture decision inside of Blosxom. The page is made up of three independent components, called the "head", the "body", and the "foot". There are separate templates of each of these. And when Blosxom generates a page, it does so in this sequence:

  1. Fill "head" template and append result to output
  2. Process articles
  3. Fill "body" template and append result to output
  4. Fill "foot" template and append result to output
This means that at the time the "head" template is filled, Blosxom has not yet seen the articles. So if you want the article's title to appear in the HTML <title> element, you are out of luck, because the article has not yet been read at the time that the <title> element is generated. And if you want a menu of article titles on the left-hand side of the page, you are out of luck, because the article has not yet been read at the time that the left-hand side of the page was generated.

So I had to go in and hack the Blosxom core to make it fractionally less spiky:

  1. Process articles
  2. Fill "head" template and append result to output
  3. Fill "body" template and append result to output
  4. Fill "foot" template and append result to output
It's tempting to finish this article right now with a long explanation of the philosophical mistakes of this component of Blosxom's original design, how it should have been this:

  1. Process articles
  2. Fill and output templates:
    1. Fill "head" template and append result to output
    2. Fill "body" template and append result to output
    3. Fill "foot" template and append result to output
Or an analysis of Blosxom's confusion between structure and presentation, and so forth. Why three templates? Why not one? The one template, as distributed with the package, could simply have been:

        #include head
        #include body
        #include foot
But if I were to finish the article with that discussion, then the first half of the article would have been relevant and to the point, and as regular readers know, We Don't Do That Here. No, there must come a point about a third of the way through each article at which I say "But anyway, the real point of this note is...".

Anyway, the real point of this note is to discuss was the debugging technique I used to fix Blosxom after I made this core change, which broke the output. The menu was showing up where I wanted, but now all the date headers (like the "Sat, 18 Mar 2006" one just above) appeared at the very top of the page body, before all the articles.

The way Blosxom generates output is by appending it to a variable $output, which eventually accumulates all the output, and is printed once, right at the very end. I had found the code that generated the "head" part of the output; this code ended with $output .= $head to append the head part to the complete output. I moved this section down, past the (much more complicated) section that scanned the articles themselves, so that the "head" template, when filled, would have access to information (such as menus) accumulated while scanning the articles.

But apparently some other part of the program was inserting the date headers into the output while scanning the articles. I had to find this. The right thing to do would have been just to search the code for $output. This was not sure to work: there might have been dozens of appearances of $output, making it a difficult task to determine which one was the responsible appearance. Also, any of the plugins, not all of which were written by me, could have been modifying $output, so it would not have been enough just to search the base code.

Still, I should have started by searching for $output, under the look-under-the-lamppost principle. (If you lose your wallet in a dark street start by looking under the lamppost. The wallet might not be there, but you will not waste much time on the search.) If I had looked under the lamppost, I would have found the lost wallet:

      $curdate ne $date and $curdate = $date and $output .= $date;
That's not what I did. Daunted by the theoretical difficulties, I got out a big hammer. The hammer is of some technical interest, and for many problems it is not too big, so there is some value in presenting it.

Perl has a feature called tie that allows a variable to be enchanted so that accesses to it are handled by programmer-defined methods instead of by Perl's usual internal processes. When a tied variable $foo is stored into, a STORE method is called, and passed the value to be stored; it is the responsibility of STORE to put this value somewhere that it can be accessed again later by its counterpart, the FETCH method.

There are a lot of problems with this feature. It has an unusually strong risk of creating incomprehensible code by being misused, since when you see something like $this = $that you have no way to know whether $this and $that are tied, and that what you think is an assignment expression is actually STORE($this, FETCH($that)), and so is invoking two hook functions that could have completely arbitrary effects. Another, related problem is that the compiler can't know that either, which tends to disable the possibility of performing just about any compile-time optimization you could possibly think of. Perhaps you would like to turn this:

        for (1 .. 1000000) { $this = $that }
into just this:

        $this = $that;
Too bad; they are not the same, because $that might be tied to a FETCH function that will open the pod bay doors the 142,857th time it is called. The unoptimized code opens the pod bay doors; the "optimized" code does not.

But tie is really useful for certain kinds of debugging. In this case the question is "who was responsible for inserting those date headers into $output?" We can answer the question by tieing $output and supplying a STORE callback that issues a report whenever $output is modified. The code looks like this:

        package who_farted;

        sub TIESCALAR {
          my ($package, $fh) = @_;
          my $value = "";
          bless { value => \$value, fh => $fh } ;
        }
This is the constructor; it manufactures an object to which the FETCH and STORE methods can be directed. It is the responsibility of this object to track the value of $output, because Perl won't be tracking the value in the usual way. So the object contains a "value" member, holding the value that is stored in $object. It also contains a filehandle for printing diagnostics. The FETCH method simply returns the current stored value:

        sub FETCH {
          my $self = shift;
          ${$self->{value}};
        }
The basic STORE method is simple:

        sub STORE {
          my ($self, $val) = @_;
          my $fh = $self->{fh};
          print $fh "Someone stored '$val' into \$output\n";
          ${$self->{value}} = $val;
        }
It issues a diagnostic message about the new value, and stores the new value into the object so that FETCH can get it later. But the diagnostic here is not useful; all it says is that "someone" stored the value; we need to know who, and where their code is. The function st() generates a stack trace:

        sub st {
          my @stack;
          my $spack = __PACKAGE__;
          my $N = 1;
          while (my @c = caller($N)) {
            my (undef, $file, $line, $sub) = @c;
            next if $sub =~ /^\Q$spack\E::/o;
            push @stack, "$sub ($file:$line)";
          } continue { $N++ }
          @stack;
        }
Perl's built-in caller() function returns information about the stack frame N levels up. For clarity, st() omits information about frames in the who_farted class itself. (That's the next if... line.)

The real STORE dumps the stack trace, and takes some pains to announce whether the value of $output was entirely overwritten or appended to:

        sub STORE {
          my ($self, $val) = @_;
          my $old = $ {$self->{value}};
          my $olen = length($old);
          my ($act, $what) = ("set to", $val);
          if (substr($val, 0, $olen) eq $old) {
            ($act, $what) = ("appended", substr($val, $olen));
          }
          $what =~ tr/\n/ /;
          $what =~ s/\s+$//;
          my $fh = $self->{fh};
          print $fh "var $act '$what'\n";
          print $fh "  $_\n" for st();
          print $fh "\n";
          ${$self->{value}} = $val;
        }
To use this, you just tie $output at the top of the base code:

        open my($DIAGNOSTIC), ">", "/tmp/who-farted" or die $!;
        tie $output, 'who_farted', $DIAGNOSTIC;
This worked well enough. The stack trace informed me that the modification of interest was occurring in the blosxom::generate function at line 290 of blosxom.cgi. That was the precise location of the lost wallet. It was, as I said, a big hammer to use to squash a mosquito of a problem---but the mosquito is dead.

A somewhat more useful version of this technique comes in handy in situations where you have some program, say a CGI program, that is generating the wrong output; maybe there is a "1" somewhere in the middle of the output and you don't know what part of the program printed "1" to STDOUT. You can adapt the technique to watch STDOUT instead of a variable. It's simpler in some ways, because STDOUT is written to but never read from, so you can eliminate the FETCH method and the data store:

        package who_farted;

        sub rig_fh {
          my ($handle, $diagnostic) = shift;
          my $mode = shift || "<";
          open my($aux_handle), "$mode&=", $handle or die $!;
          tie *$handle, __PACKAGE__, $aux_handle, $diagnostic;
        }

        sub TIEHANDLE {
          my ($package, $aux_handle, $diagnostic) = @_;
          bless [$aux_handle, $diagnostic] => $package;
        }

        sub PRINT {
          my ($aux_handle, $diagnostic) = @$self;
          print $aux_handle @_;
          my $str = join("", @_);
          print $diagnostic "$str:\n";
          print $diagnostic "  $_\n" for st();
          print $diagnostic "\n";
        }
To use this, you put something like rig_fh(\*STDOUT, $DIAGNOSTIC, ">") in the main code. The only tricky part is that some part of the code (rig_fh here) must manufacture a new, untied handle that points to the same place that STDOUT did before it was tied, so that the output can actually be printed.

Something it might be worth pointing out about the code I showed here is that it uses the very rare one-argument form of bless:

        package who_farted;

        sub TIESCALAR {
          my ($package, $fh) = @_;
          my $value = "";
          bless { value => \$value, fh => $fh } ;
        }
Normally the second argument should be $package, so that the object is created in the appropriate package; the default is to create it in package who_farted. Aren't these the same? Yes, unless someone has tried to subclass who_farted and inherit the TIESCALAR method. So the only thing I have really gained here is to render the TIESCALAR method uninheritable. <sarcasm>Gosh, what a tremendous benefit</sarcasm>. Why did I do it this way? In regular OO-style code, writing a method that cannot be inherited is completely idiotic. You very rarely have a chance to write a constructor method in a class that you are sure will never be inherited. This seemed like such a case. I decided to take advantage of this non-feature of Perl since I didn't know when the opportunity would come by again.

"Take advantage of" is the wrong phrase here, because, as I said, there is not actually any advantage to doing it this way. And although I was sure that who_farted would never be inherited, I might have been wrong; I have been wrong about such things before. A smart programmer requires only ten years to learn that you should not do things the wrong way, even if you are sure it will not matter. So I violated this rule and potentially sabotaged my future maintenance efforts (on the day when the class is subclassed) and I got nothing, absolutely nothing in return.

Yes. It is completely pointless. A little Dada to brighten your day. Anyone who cannot imagine a lobster-handed train conductor sleeping in a pile of celestial globes is an idiot.


[Other articles in category /oops] permanent link

Sun, 15 Apr 2007

Happy birthday Leonhard Euler
Leonhard Euler, one of the greatest and most prolific mathematicians ever to walk the earth, was born 300 years ago today in Basel, Switzerland.

Euler named the constant e (not for himself; he used vowels for constants and had already used a for something else), and discovered the astonishing formula $e^{ix} = \cos x + i \sin x$, which is known as Euler's formula. A special case of this formula is the Euler identity: $e^{i\pi} + 1 = 0$.

Order
Visual Complex Analysis
Visual Complex Analysis
with kickback
no kickback
I never really understood what was going on there until last year, when I read the utterly brilliant book Visual Complex Analysis, by Tristan Needham. This was certainly the best math book I read in all of 2006, and probably the best one I've read in the past five years. (Many thanks to Dan Schmidt for rcommending it.)

The brief explanantion is something like this: the exponential function ect is exactly the function that satisfies the differential equation df/dt = cf(t). That is, it is the function that describes the motion of a particle whose velocity is proportional to its position at all times.

Imagine a particle moving on the real line. If its velocity is proportional to its position, it will speed away from the origin at an exponentially increasing rate. Or, if the proportionality constant is negative, it will rapidly approach the origin, getting closer (but never quite reaching it) at an exponentially increasing rate.

Now, suppose we consider a particle moving on the complex plane instead of on the real line, again with velocity proportional to position. If the proportionality constant is real, the particle will speed away from the origin (or towards it, if the constant is negative), as before. But what if the proportionality constant is imaginary?

A proportionality constant of i means that the velocity of the particle is at right angles to the position, because multiplication by i in the complex plane corresponds to a counterclockwise rotation by 90°, as always. In this case, the path of the particle is a circle, and so its position as a function of t is described by something like cos t + i sin t. But this function must satisfy the differential equation also, with c = i, and we have Euler's formula.

Another famous and important formula named after Euler is also called Euler's formula, and states that for any simply-connected polyhedron with F faces, E edges, and V vertices, F - E + V = 2. For example, the cube has 6 faces, 12 edges, and 8 vertices, and indeed 6 - 12 + 8 = 2. The formula also holds for all planar graphs and is the fundamental result of planar graph theory.

Spheres in this case behave like planes, and graphs that cover spheres also satisfy F - E + V = 2. One then wonders whether the theorem holds for more complex surfaces, such as tori; this is equivalent to asking about polyhedra that have a single hole. In this case, the the theorem is a little different, and the identity becomes F - E + V = 0.

It turns out that every surface S has a value χ(S), called the Euler characteristic, such that graphs on the surface all satisfy F - E + V = χ(S).

Euler also discovered that the sum of the first n terms of the harmonic series, 1 + 1/2 + 1/3 + ... + 1/n, is approximately log n. We might like to say that it becomes arbitrarily close to log n, as so many things do, but it does not. It is always a bit larger than log n, and you cannot make it as close as you want. The more terms you take, the closer the sum gets to log n + γ, where γ is approximately 0.577216. This γ is Euler's constant:

$$\gamma = \lim_{n\rightarrow\infty}\left({\sum_{i=1}^n {1\over i} - \ln n}\right)$$

This is one of those numbers that shows up all over the place, and is easy to calculate, but is a big fat mystery. Is it rational? Everyone would be shocked if it were, but nobody knows for sure.

The Euler totient function φ(x) counts the number of integers less than x that have no divisors in common with x. It is of tremendous importance in combinmatorics and number theory. One of the most fundamental and astonishing facts about the totient function is Euler's theorem: aφ(n) - 1 is a multiple of n whenever a and n have no divisors in common. For example, since &phi(9) = 6, a6 - 1 is a multiple of 9, except when a is divisible by 3:

16 - 1= 9.
26 - 1= 9.
46 - 1= 455·9.
56 - 1= 1736·9.
76 - 1= 13072·9.

Euler's solution in 1736 of the "bridges of Königsberg" problem is often said to have begun the study of topology. It is also the source of the term "Eulerian path".

Wikipedia lists forty more items that are merely named for Euler. The list of topics that he discovered, invented, or contributed to would be far too large to actually construct.

Happy birthday, Leonhard Euler.


[Other articles in category /anniversary] permanent link

Thu, 12 Apr 2007

A security problem in a CGI program: addenda

Shell-less piping in Perl

In my previous article, I said:

Unfortunately, there is no easy way to avoid the shell when running a command that is attached to the parent process via a pipe. Perl provides open "| command arg arg arg...", which is what I used, and which is analogous to [system STRING], involving the shell. But it provides nothing analogous to [system ARGLIST], which avoids the shell. If it did, then I probably would have used it, writing something like this:

        open M, "|", $MAILER, "-fnobody\@plover.com", $addre;
and the whole problem would have been avoided.

Several people wrote to point out that, as of Perl 5.8.0, Perl does provide this, with a syntax almost identical to what I proposed:

        open M, "|-", $MAILER, "-fnobody\@plover.com", $addre;
Why didn't I use this? The program was written in late 2002, and Perl 5.8.0 was released in July 2002, so I expect it's just that I wasn't familiar with the new feature yet. Why didn't I mention it in the original article? As I said, I just got back from Asia, and I am still terribly jetlagged.

(Jet lag when travelling with a toddler is worse than normal jet lag, because nobody else can get over the jet lag until the toddler does.)

Jeff Weisberg also pointed out that even prior to 5.8.0, you can write:

        open(F, "|-") || exec("program", "arg", "arg", "arg");
Why didn't I use this construction? I have run out of excuses. Perhaps I was jetlagged in 2002 also.

RFC 822

John Berthels wrote to point out that my proposed fix, which rejects all inputs containing spaces, also rejects some RFC822-valid addresses. Someone whose address was actually something like "Mark Dominus"@example.com would be unable to use the web form to subscribe to the mailing list.

Quite so. Such addresses are extremely rare, and people who use them are expected to figure out how to subscribe by email, rather than using the web form.

qmail

Nobody has expressed confusion on this point, but I want to expliticly state that, in my opinion, the security problem I described was entirely my fault, and was not due to any deficiency in the qmail mail system, or in its qmail-inject or qmail-queue components.

Moreover, since I have previously been paid to give classes at large conferences on how to avoid exactly this sort of problem, I deserve whatever scorn and ridicule comes my way because of this.

Thanks to everyone who wrote in.


[Other articles in category /oops] permanent link

A security problem in a CGI program
<sarcasm>No! Who could possibly have predicted such a thing?</sarcasm>

I was away in Asia, and when I got back I noticed some oddities in my mail logs. Specifically, Yahoo! was rejecting mail.plover.com's outgoing email. In the course of investigating the mail logs, I discovered the reason why: mail.plover.com had been relaying a ton of outgoing spam.

It took me a little while to track down the problem. It was a mailing list subscription form on perl.plover.com:

your address
perl-qotw perl-qotw-discuss perl-qotw-discuss-digest

The form took the input email address and used it to manufacture an email message, requesting that that address be subscribed to the indicated lists:

        my $MAILER = '/var/qmail/bin/qmail-inject';
        ...

        for (@lists) {
          next unless m|^perl-qotw[\-a-z]*|;
          open M, "|$MAILER" or next;
          print M "From: nobody\@plover.com\n";
          print M "To: $_-subscribe-$addre\@plover.com\n";
          print M "\nRequested by $ENV{REMOTE_ADDR} at ", scalar(localtime), "\n";
          close M or next;
          push @DONE, $_;
        }
The message was delivered to the list management software, which interpreted it as a request to subscribe, and generated an appropriate confirmation reply. In theory, this doesn't open any new security holes, because a malicious remote user could also forge an identical message to the list management software without using the form.

The problem is the interpolated $addre variable. The value of this variable is essentially the address from the form. Interpolating user input into a string like this is always fraught with peril. Daniel J. Bernstein has one of the most succinct explanantions of this that I have ever seen:

The essence of user interfaces is parsing: converting an unstructured sequence of commands, in a format usually determined more by psychology than by solid engineering, into structured data.

When another programmer wants to talk to a user interface, he has to quote: convert his structured data into an unstructured sequence of commands that the parser will, he hopes, convert back into the original structured data.

This situation is a recipe for disaster. The parser often has bugs: it fails to handle some inputs according to the documented interface. The quoter often has bugs: it produces outputs that do not have the right meaning. Only on rare joyous occasions does it happen that the parser and the quoter both misinterpret the interface in the same way.

When the original data is controlled by a malicious user, many of these bugs translate into security holes.

In this case, I interpolated user data without quoting, and suffered the consequences.

The malicious remote user supplied an address of the following form:

        into9507@plover.com
        Content-Transfer-Encoding: quoted-printable
        Content-Type: text/plain
        From: Alwin Bestor <Morgan@plover.com>
        Subject: Enhance your love life with this medical marvel
        bcc: hapless@example.com,recipients@example.com,of@example.net,
             spam@example.com,...

        Attention fellow males..

        If you'd like to have stronger, harder and larger erections,
        more power and intense orgasms, increased stamina and
        ejaculatory control, and MUCH more,...

        .
(Yes, my system was used to send out penis enlargement spam. Oh, the embarrassment.)

The address contained many lines of data, separated by CRNL, and a complete message header. Interpolated into the subscription message, the bcc: line caused the qmail-inject user ineterface program to add all the "bcc" addresses to the outbound recipient list.

Several thoughts occur to me about this.

User interfaces and programmatic interfaces

The problem would probably not have occurred had I used the qmail-queue progam, which provides a programmatic interface, rather than qmail-inject, which provides a user interface. I originally selected qmail-inject for convenience: it automatically generates Date and Message-ID fields, for example. The qmail-queue program does not try to parse the recipient information from the message header; it takes recipient information in an out-of-band channel.

Perl piped open is deficient

Perl's system and exec functions have two modes. One mode looks like this:

        system "command arg arg arg...";
If the argument string contains shell metacharacters or certain other constructions, Perl uses the shell to execute the command; otherwise it forks and execs the command directly. The shell is the cause of all sorts of parsing-quoting problems, and is best avoided in programs like this one. But Perl provides an alternative:

        system "command", "arg", "arg", "arg"...;
Here Perl never uses the shell; it always forks and execs the command directly. Thus, system "cat *" prints the contents of all the files in the current working directory, but system "cat", "*" prints only the contents of the file named "*", if there is one.

qmail-inject has an option to take the envelope information from an out-of-band channel: you can supply it in the command-line arguments. I did not use this option in the original program, because I did not want to pass the user input through the Unix shell, which is what Perl's open FH, "| command args..." construction would have required.

Unfortunately, there is no easy way to avoid the shell when running a command that is attached to the parent process via a pipe. Perl provides open "| command arg arg arg...", which is what I used, and which is analogous to the first construction, involving the shell. But it provides nothing analogous to the second construction, which avoids the shell. If it did, then I probably would have used it, writing something like this:

        open M, "|", $MAILER, "-fnobody\@plover.com", $addre;
and the whole problem would have been avoided.

A better choice would have been to set up the pipe myself and use Perl's exec function to execute qmail-inject, which bypasses the shell. The qmail-inject program would always have received exactly one receipient address argument. In the event of an attack like the one above, it would have tried to send mail to into9507@plover.com^M^JContent-Transfer-Encoding:..., which would simply have bounced.

Why didn't I do this? Pure laziness.

qmail-queue more vulnerable than qmail-inject in this instance

Rather strangely, an partial attack is feasible with qmail-queue, even though it provides a (mostly) non-parsing interface. The addresses to qmail-queue are supplied to it on file descriptor 1 in the form:

        Taddress1@example.com^@Taddress2@example.com^@Taddress3@example.com^@...^@
If my program were to talk to qmail-queue instead of to qmail-inject, the program would have contained code that looked like this:

        print QMAIL_QUEUE_ENVELOPE "T$addre\0";
qmail-queue parses only to the extent of dividing up its input at the ^@ characters. But even this little bit of parsing is a problem. By supplying an appropriately-formed address string, a malicious user could still have forced my program to send mail to many addresses.

But still the recipient addresses would have been out of the content of the message. If the malicious user is unable to affect the content of the message body, the program is not useful for spamming.

But using qmail-queue, my program would have had to generate the To field itself, and so it would have had to insert the user-supplied address into the content of the message. This would have opened the whole can of worms again.

My program attacked specifically

I think some human put real time into attacking this particular program. There are bots that scour the web for email submission forms, and then try to send spam through them. Those bots don't successfully attack this program, because the recipient address is hard-wired. Also, the program refuses to send email unless at least one of the checkboxes is checked, and form-spam bots don't typically check boxes. Someone had to try some experiments to get the input format just so. I have logs of the experiments.

A couple of days after the exploit was discovered, a botnet started using it to send spam; 42 different IP addresses sent requests. I fixed the problem last night around 22:30; there were about 320 more requests, and by 09:00 this morning the attempts to send spam stopped.

Perl's "taint" feature would not have prevented this

Perl offers a feature that is designed specifically for detecting and preventing exactly this sort of problem. It tracks which data are possibly under control of a malicious user, and whether they are used in unsafe operations. Unsafe operations include most file and process operations.

One of my first thoughts was that I should have enabled the tainting feature to begin with. However, it would not have helped in this case. Although the user-supplied address would have been flagged as "tainted" and so untrustworthy; by extension, the email message string into which it was interpolated would have been tainted. But Perl does not consider writing a tainted string to a pipe to be an "unsafe" operation and so would not have signalled a failure.

The fix

The short-range fix to the problem was simple:

        my $addr = param('addr');
        $addr =~ s/^\s+//;
        $addr =~ s/\s+$//;

        ...

        if ($addr =~ /\s/) {
          sleep 45 + rand(45);
          print p("Sorry, addresses are not allowed to contain spaces");
          fin();
        }
Addresses are not allowed to contain whitespace, except leading or trailing whitespace, which is ignored. Since whitespace inside an address is unlikely to be an innocent mistake, the program waits before responding, to slow down the attacker.

Summary

Dammit.

[ Addendum 20070412: There is a followup article to this one. ]


[Other articles in category /oops] permanent link

Pick's theorem
In a recent article, I discussed Hero's formula for the area of a triangle in terms of its sides, and I said it was an oddity that didn't seem like any other formula in geometry. Pick's theorem is another such oddity, although not at all like Hero's.

Pick's theorem concerns the area of so-called "lattice polygons". These are simply polygons whose vertices all lie at points whose coordinates are integers. Such points are called "lattice points". Here it is:

Let P be a lattice polygon. Let b be the number of lattice points that lie on the edges of the polygon, and i be the number of lattice points inside the polygon. Then the area of the polygon is exactly b/2 + i - 1.

I think this should be at least a bit surprising. It implies that every lattice polygon has an area that is an integer multiple of ½, which I would not have thought was obvious.

Some examples now. Pick's theorem is obviously true for rectangles:

In the example above, b = 14 and i = 6, so Pick's theorem says that the area is 14/2 + 6 - 1 = 12, which is correct. In general, an m×n rectangle has (m-1)(n-1) lattice points inside, and 2m + 2n on the edges, so by Pick's theorem its area is (2m + 2n)/2 + (mn - m - n + 1) - 1 = mn.

We can cut the rectangle in half, and it still works:

b is now 8 and i is 3, so Pick's theorem predicts an area of 8/2 + 3 - 1 = 6, which is still correct. This works even when the diagonal cuts through some lattice points:

Here b is still 8, but i is only 1, so Pick predicts an area of 8/2 + 1 - 1 = 4.

It works for figures without right angles:

Here b=5 and i=0, for a total area of 3/2. We can check the area manually as follows:

The entire square has area 9. Regions A, B, and C each have area 1, and D has area 4½. That leaves X = 1½ as Pick's theorem says.

It works for more complicated figures too. Here we have b=7 and i=5 for a total area of 7½:


It works for non-convex polygons:

It does not, however, work for polygons with holes. The figure below is the same as the first triangle in this article (which had an area of of 6) except that it has a hole of size ½ chopped out of it. It also has three fewer interior points and three more boundary points than the original triangle, for a total net loss of 3 - 3/2 = 3/2.

To fix Pick's theorem for non-simply-connected polygons, you need to say that each hole adds an extra -1 to the total area.

The proof of Pick's theorem isn't hard. You start by proving it that it holds for all triangles that have two sides parallel to the x and y axes. Then you prove it for all triangles, using a subtraction argument like the one I used above for triangle X. Finally, you use induction to prove it for more complicated regions, which can be built up from triangles.

But the funny thing about Pick's theorem is that you can guess it even without a proof. Suppose someone told you that there was a formula for the size of a polygon in terms of the number of lattice points on the boundary and in the interior. Well, each interior point is surrounded by a square of area 1, which is typically inside the polygon:

So each such point should contribute about 1 to the area. Of course, not all do; some will contribute a little less:

But there will also be parts of the polygon that are not near any interior points:

The points outside the polygon whose squares are partly inside (which count less than they should) will tend to balance out the contributions of the points inside whose squares are partly outside (which count more than they should.)

The squares around a point on the edge (but not the vertex) of the polygon will always be half inside, half outside, so that such a point will contribute exactly half of a square to the total:

The vertices are a little funny. They also contribute about ½, for the same reason that the edge points do. But convex vertices contribute rather less than ½:

While concave vertices contribute a bit more:

So we need to adjust the contribution of the vertices away from the ideal of ½ each. The adjustment is positive when angle is more than 180°, in which case the path at the vertex turns clockwise, and negative when the angle is less than 180°, when the path turns counterclockwise. But any polygon makes exactly one complete counterclockwise turn, so the total adjustment is exactly one square's-worth, or -1, and this is where the fudge factor of -1 comes from. (I've known about Pick's theorem for twenty years, but I never knew where the -1 came from until just now.) Viewed in this light, it makes perfect sense that a hole in the polygon should subtract an extra unit of space, since it's adding an extra complete turn.

I made the diagrams for this article with a picture-drawing tool, originally designed at Bell Labs, called pic. The source code files for the illustrations were all named things like pick.pic, and the command to compile them to PostScript was pic pick.pic. This was really confusing.


[Other articles in category /math] permanent link

Hero's formula
On of my favorite search engine queries of last month was:

        this mathematician's best known work is the formula for the
        area of a triangle in terms of the lengths of it's sides. who
        is this and when did they live?
This is the kind of question that always trips me up in Jeopardy. (And doesn't that first sentence sounds just like a Jeopardy question? "I'll take triangles for $600, Alex. . .") The ears hear "mathematician", "triangle", "formula", and the mouth says "Who was Pythagoras!" without any conscious intervention. But the answer here is almost certainly Hero of Alexandria.

Hero's formula, or Heron's formula, has always seemed to me like something of an oddity. It doesn't look like any other formula in geometry. Usually to get anything useful from a triangle you need to involve the angles, and express the results in terms of trigonometric functions. This will be obvious if you pause to remember what the phrase "trigonometric function" means. The few rare cases in which you can avoid trigonometry (there's that word again) usually involve special cases, such as right triangles.

But Hero's formula expresses the area of a triangle in terms of the lengths of its sides, with no angles and no trigonometric functions. If we were trying to discover such a formula, we might proceed trigonometrically, and then hope we could somehow eliminate the trigonometry by the end. So we might proceed as follows:

The magnitude of the cross product a×b is the area of the parallelogram with sides a and b, so the area of the triangle is half that. And |a × b| = |a||b| sin θ, where θ is the angle between the two vectors. So if a and b are sides of a triangle and the included angle is θ, the area A is ab/2 · sin θ.

Now we want θ is in terms of the third side c. The law of cosines generalizes the Pythagorean theorem to non-right triangles: c2 = a2 + b2 - 2ab cos θ, where θ is the included angle between sides a and b. (In a right triangle, θ = 90°, and the cosine term drops out.) So we have cos θ = (a2 + b2 - c2) / 2ab. But we want sine instead of cosine, so convert cosine to sine using sin θ = √(1-cos2&theta);, which yields:

$$\sin\theta = {1\over 2ab}\sqrt{4a^2b^2 - {(a^2 + b^2 - c^2)}^2}$$

Expanding out the right-hand side, and remembering that what we really want is A = ab/2 sin θ. we get:

$$A = {1\over4}\sqrt{2a^2b^2 + 2a^2c^2 + 2b^2c^2 - a^4 - b^4 - c^4}$$.

Now, that might satisfy anyone, but Hero's answer is better. Hero says: Let p be the "semiperimeter" of the triangle; that is, p = (a+b+c)/2. Then the area of the triangle is just √(p(p-a)(p-b)(p-c)). This is "Hero's formula".

Hero's formula is simple and easy to remember, which (2a2b2 + 2a2c2 + 2b2c2 - a4 - b4- c4)/16 is not. If you expand out Hero's formula, you find that that the two formulas are the same, as of course they must be. But if you have only the complicated fourth-degree polynomial, how would you get the idea that putting it in terms of p will simplify it? There is some technique or insight here that I am missing.

Even though such technique is the indispensable tool of the working mathematician, mathematical writing customarily scorns such technique,and has numerous phrases for disparaging it or kicking it under the carpet. For example, having gotten the fourth-degree polynomial, we might say "obviously, this is equivalent to Hero's formula", or "it is left to the student to prove that the two formulations are equivalent." Or we could just skip direct from the polynomial to Hero's formula, and say what Wikipedia says in the same situation: "Here the simple algebra in the last step was omitted."

Indeed, if you are trying to prove that Hero's formula is true, it's tedious but straightforward to grovel over the algebra and grind out that the two formulations are the same. But all this ignores the real problem, which is that nobody looking at the trigonometric formulation would know to guess Hero's formula if they had not seen it before. A proof serves two purposes. it is supposed to persuade you that the theorem is true. But more importantly, it is supposed to help you understand why it is true, and to give you some insight that may help you solve similar problems. The proof-by-handwaving-away-complex-and-unexpected-algebra technique serves the first purpose, but not the second. And Hero did not discover the formula using this approach anyway, since he did not take a trig class when he was in high school.

Another way to proceed is to drop a perpendicular from the vertex opposite side c. If the length of the perpendicular is h, the area of the triangle is hc/2. But if the foot of the perpendicular divides c into x + y, we also have x2 + h2 = a2 and y2 + h2 = b2. Grovelling over the algebra again yields something equivalent to Hero's formula.

I don't know a good proof of Hero's formula. I haven't seen Hero's, about which MathWorld says:

Heron's proof is ingenious but extremely convoluted, bringing together a sequence of apparently unrelated geometric identities and relying on the properties of cyclic quadrilaterals and right triangles.

A "cyclic quadrilateral" is a quadrilateral whose vertices all lie on a circle. Any triangle can be considered a degenerate cyclic quadrilateral, which happens to have two of its vertices in the same place. Indeed, there's a formula, called "Bretschneider's formula", just like Hero's, but for the area of a cyclic quadrilateral: A = √((p-a)(p-b)(p-c)(p-d)), where a, b, c, d are the lengths of the sides and p is the semiperimeter. If you consider that a triangle is just a cyclic quadrilateral one of whose sides has length 0, you put d = 0 in Bretschneider's formula and you get out Hero's formula.


[Other articles in category /math] permanent link

Thu, 22 Mar 2007

My favorite math problem

Introduction

According to legend, the imaginary unit i = √-1 was invented by mathematicians because the equation x2 + 1 = 0 had no solution. So they introduced a new number i, defined so that i2 + 1 = 0, and investigated the consequences. Like most legends, this one has very little to do with the real history of the thing it purports to describe. But let's pretend that it's true for a while.

Descartes' theorem

If P(x) is some polynomial, and z is some number such that P(z) = 0, then we say that z is a zero of P or a root of P. Descartes' theorem (yes, that Descartes) says that if z is a zero of P, then P can be factored into P = (x-z)P', where P' is a polynomial of lower degree, and has all the same roots as P, except without z. For example, consider P = x3 - 6x2 + 11x - 6. One of the zeroes of P is 1, so Descartes theorem says that we can write P in the form (x-1)P' for some second-degree polynomial P', and if there are any other zeroes of P, they will also be zeroes of P'. And in fact we can; x3 - 6x2 + 11x - 6 = (x-1)(x2 - 5x + 6). The other zeroes of the original polynomial P are 2 and 3, and both of these (but not 1) are also zeroes of x2 - 5x + 6.

We can repeat this process. If a1, a2, ... an are the zeroes of some nth-degree polynomial P, we can factor P as (x-a1)(x-a2)...(x-an).

The only possibly tricky thing here is that some of the ai might be the same. x2 - 2x + 1 has a zero of 1, but twice, so it factors as (x-1)(x-1). x3 - 4x2 + 5x - 2 has a zero of 1, but twice, so it factors as (x-1)(x2 - 3x + 2), where (x2 - 3x + 2) also has a zero of 1, but this time only once.

Descartes' theorem has a number of important corollaries: an nth-degree polynomial has no more than n zeroes. A polynomial that has a zero can be factored. When coupled with the so-called fundamental theorem of algebra, which says that every polynomial has a zero over the complex numbers, it implies that every nth-degree polynomial can be factored into a product of n factors of the form (x-z) where z is complex, or into a product of first- and second-degree polynomials with real coefficients.

The square roots of -1

Suppose we want to know what the square roots of 1 are. We want numbers x such that x2 = 1, or, equivalently, such that x2 - 1 = 0. x2 - 1 factors into (x-1)(x+1), so the zeroes are ±1, and these are the square roots of 1. On the other hand, if we want the square roots of 0, we need the solutions of x2 = 0, and the corresponding calculation calls for us to factor x2, which is just (x-0)(x-0), So 0 has the same square root twice; it is the only number without two distinct square roots.

If we want to find the square roots of -1, we need to factor x2 + 1. There are no real numbers that do this, so we call the polynomial irreducible. Instead, we make up some square roots for it, which we will call i and j. We want (x - i)(x - j) = x2 + 1. Multiplying through gives x2 - (i + j)x + ij = x2 + 1. So we have some additional relationships between i and j: ij = 1, and i + j = 0.

Because i + j = 0, we always dispense with j and just call it -i. But it's essential to realize that neither one is more important than the other. The two square roots are identical twins, different individuals that are completely indistinguishable from each other. Our choice about which one was primary was completely arbitrary. That doesn't even do justice to the arbitrariness of the choice: i and j are so indistinguishable that we still don't know which one got the primary notation. Every property enjoyed by i is also enjoyed by j. i3 = j, but j3 = i. eiπ = -1, but so too ejπ = -1. And so on.

In fact, I once saw the branch of mathematics known as Galois theory described as being essentially the observation that i and -i were indistinguishable, plus the observation that there are no other indistinguishable pairs of complex numbers.

Anyway, all this is to set up the main point of the article, which is to describe my favorite math problem of all time. I call it the "train problem" because I work on it every time I'm stuck on a long train trip.

We are going to investigate irreducible polynomials; that is, polynomials that do not factor into products of simpler polynomials. Descartes' theorem implies that irreducible polynomials are zeroless, so we are going to start by looking for polynomials without zeroes. But we are not going to do it in the usual world of the real numbers. The answer in that world is simple: x2 + 1 is irreducible, and once you make up a zero for this polynomial and insert it into the universe, you are done; the fundamental theorem of algebra guarantees that there are no more irreducible polynomials. So instead of the reals, we're going to do this in a system known as Z2.

Z2

Z2 is much simpler than the real numbers. Instead of lots and lots of numbers, Z2 has only two. (Hence the name: The 2 means 2 and the Z means "numbers". Z is short for Zahlen, which is German.) Z2 has only 0 and 1. Addition and multiplication are as you expect, except that instead of 1 + 1 = 2, we have 1 + 1 = 0.

+01
001
110
×01
0 00
1 01

The 1+1=0 oddity is the one and only difference between Z2 and ordinary algebra. All the other laws remain the same. To be explicit, I will list them below:

  • a + 0 = a
  • a + b = b + a
  • (a + b) + c = a + (b + c)
  • a × 0 = 0
  • a × 1 = a
  • a × b = b × a
  • (a × b) × c = a × (b × c)
  • a × (b + c) = a × c + b × c

Algebra in Z2

But the 1+1=0 oddity does have some consequences. For example, a + a = a·(1 + 1) = a·0 = 0 for all a. This will continue to be true for all a, even if we extend Z2 to include more elements than just 0 and 1, just as x + x = 2x is true not only in the real numbers but also in their extension, the complex numbers.

Because a + a = 0, we have a = -a for all a, and this means that in Z2, addition and subtraction are the same thing. If you don't like subtraction—and really, who does?—then Z2 is the place for you.

As a consequence of this, there are some convenient algebraic manipulations. For example, suppose we have a + b = c and we want to solve for a. Normally, we would subtract b from both sides. But in Z2, we can add b to both sides, getting a + b + b = c + b. Then the b + b on the left cancels (because b + b = 0) leaving a = c + b. From now on, I'll call this the addition law.

If you suffered in high school algebra because you wanted (a + b)2 to be equal to a2 + b2, then suffer no more, because in Z2 they are equal. Here's the proof:

(a + b)2= 
(a + b)(a + b)= 
a(a + b) + b(a + b)= 
(a2 + ab) + (ba + b2)= 
a2 + ab + ab + b2= 
a2 + b2  

Because the ab + ab cancels in the last step.

  This happy fact, which I'll call the square law, also holds true for sums of more than two terms: (a + b + ... + z)2 = a2 + b2 + ... + z2. But it does not hold true for cubes; (a + b)3 is a3 + a2b + ab2 + b3, not a3 + b3.

Irreducible polynomials in Z2

Now let's try to find a polynomial that is irreducible in Z2. The first degree polynomials are x and x+1, and both of these have zeroes: 0 and 1, respectively.

There are four second-degree polynomials:

  1. x2
  2. x2 + 1
  3. x2 + x
  4. x2 + x + 1
If a second-degree polynomial is reducible, it must factor into a product of two first-degree polynomials, and there are three such products: x·x, (x+1)(x+1), and x(x+1). These are, respectively, polynomials 1, 2, and 3 in the previous display. The polynomials have zeroes as indicated here:

PolynomialZeroesFactorization
x2 0 (twice) x2
x2 + 11 (twice) (x+1)2
x2 + x 0, 1 x(x+1)
x2 + x + 1 none  

Note in particular that, unlike in the real numbers, x2 + 1 is not irreducible in Z2. 1 is a zero of this polynomial, which, because of the square law, factors as (x+1)(x+1). But we have found an irreducible polynomial, x2 + x + 1. The other three second-degree polynomials all can be factored, but x2 + x + 1 cannot.

Like the fictitious mathematicians who invented i, we will rectify this problem, by introducing into Z2 a new number, called b, which has the property that b2 + b + 1 = 0. What are the consequences of this?

The first thing to notice is that we haven't added only one new number to Z2. When we add i to the real numbers, we also get a lot of other stuff like i+1 and 3i-17 coming along for the ride. Here we can't add just b. We must also add b+1.

The next thing we need to do is figure out how to add and multiply with b. The addition and multiplication tables are extended as follows:

+01bb+1
001bb+1
110b+1b
bbb+101
b+1b+1b10
×01bb+1
00000
101bb+1
b0bb+11
b+10b+11b

The only items here that might require explanation are the entries in the lower-right-hand corner of the multiplication table. Why is b×b = b+1? Well, b2 + b + 1 = 0, by definition, so b2 = b + 1 by the addition law. Why is b × (b+1) = 1? Well, b × (b+1) = b2 + b = (b + 1) + b = 1 because the two b's cancel. Perhaps you can work out (b+1) × (b+1) for yourself.

From here there are a number of different things we could investigate. But what I think is most crucial is to discover the value of b3. Let's calculate that. b3 = b·b2. b2, we know, is b+1. So b3 = b(b+1) = 1. Thus b is a cube root of 1.

People with some experience with this sort of study will not be surprised by this. The cube roots of 1 are precisely the zeroes of the polynomial x3 + 1 = 0. 1 is obviously a cube root of 1, and so Descartes' theorem says that x3 + 1 must factor into (x+1)P, where P is some second-degree polynomial whose zeroes are the other two cube roots. P can't be divisible by x, because 0 is not a cube root of 1, so it must be either (x+1)(x+1), or something else. If it were the former, then we would have x3 + 1 = (x+1)3, but as I pointed out before, we don't. So it must be something else. The only candidate is x2 + x + 1. Thus the other two cube roots of 1 must be zeroes of x2 + x + 1.

The next question we must answer without delay is: b is one of the zeroes of x2 + x + 1; what is the other? There is really only one possibility: it must be b+1, because that is the only other number we have. But another way to see this is to observe that if p is a cube root of 1, then p2 must also be a cube root of 1:

b3=1
(b3)2=12
b6=1
(b2)3=1

So we now know the three zeroes of x3 + 1: they are 1, b, and b2. Since x3 + 1 factors into (x+1)(x2+x+1), the two zeroes of x2+x+1 are b and b2. As we noted before, b2 = b+1.

This means that x2 + x + 1 should factor into (x+b)(x+b+1). Multiplying through we get x2 + bx + (b+1)x + b(b+1) = x2 + x + 1, as hoped.

Actually, the fact that both b and b2 are zeroes of x2+x+1 is not a coincidence. Z2 has the delightful property that if z is a zero of any polynomial whose coefficients are all 0's and 1's, then z2 is also a zero of that polynomial. I'll call this Theorem 1. Theorem 1 is not hard to prove; it follows from the square law. Feel free to skip the demonstration which follows in the rest of the paragraph. Suppose the polynomial is P(x) = anxn + an-1xn-1 + ... + a1x + a0, and z is a zero of P, so that P(z) = 0. Then (P(z))2 = 0 also, so (anzn + an-1zn-1 + ... + a1z + a0)2 = 0. By the square law, we have an2(zn)2 + an-12(zn-1)2 + ... + a12z2 + a02 = 0. Since all the ai are either 0 or 1, we have ai2 = ai, so an(zn)2 + an-1(zn-1)2 + ... + a1z2 + a0 = 0. And finally, (zk)2 = (z2)k, so we get an(z2)n + an-1(z2)n-1 + ... + a1z2 + a0 = P(z2) = 0, which is what we wanted to prove.

Now you might be worried about something: the fact that b is a zero of x2 + x + 1 implies that b2 is also a zero; doesn't the fact that b2 is a zero imply that b4 is also a zero? And then also b8 and b16 and so forth? Then we would be in trouble, because x2 + x + 1 is a second-degree polynomial, and, by Descartes' theorem, cannot have more than two zeroes. But no, there is no problem, because b3 = 1, so b4 = b·b3 = b·1 = b, and similarly b8 = b2, so we are getting the same two zeroes over and over.

Symmetry again

Just as i and -i are completely indistinguishable from the point of view of the real numbers, so too are b and b+1 indistinguishable from the point of view of Z2. One of them has been given a simpler notation than the other, but there is no way to tell which one we gave the simpler notation to! The one and only defining property of b is that b2 + b + 1 = 0; since b+1 also shares this property, it also shares every other property of b. b and b+1 are both cube roots of 1. Each, added to the other gives 1; each, added to 1 gives the other. Each, when multiplied by itself gives the other; each when multiplied by the other gives 1.

Third-degree polynomials

We have pretty much exhausted the interesting properties of b and its identical twin b2, so let's move on to irreducible third-degree polynomials. If a third-degree polynomial is reducible, then it is either a product of three linear factors, or of a linear factor and an irreducible second-degree polynomial. If the former, it must be one of the four products x3, x2(x+1), x(x+1)2, or (x+1)3. If the latter, it must be one of x(x2+x+1) or (x+1)(x2+x+1). This makes 6 reducible polynomials. Since there are 8 third-degree polynomials in all, the other two must be irreducible:

PolynomialZeroesFactorization
x3    0, 0, 0 x3
x3   + 1 1, b, b+1 (x+1)(x2+x+1)
x3 + x  0, 1, 1 x(x+1)2
x3 + x + 1 none  
x3+ x2   0, 0, 1 x2(x+1)
x3+ x2  + 1 none  
x3+ x2+ x  0, b, b+1 x(x2+x+1)
x3+ x2+ x + 1 1, 1, 1 (x+1)3

There are two irreducible third-degree polynomials. Let's focus on x3 + x + 1, since it seems a little simpler. As before, we will introduce a new number, c, which has the property c3 + c + 1 = 0. By the addition law, this implies that c3 = c + 1. Using this, we can make a table of powers of c, which will be useful later:

c1 =  c 
c2 = c2  
c3 =  c +1
c4 = c2 +c 
c5 = c2 +c +1
c6 = c2 + 1
c7 =   1

To calculate the entries in this table, just multiply each line by c to get the line below. For example, once we know that c5 = c2 + c + 1, we multiply by c to get c6 = c3 + c2 + c. But c3 = c + 1 by definition, so c6 = (c + 1) + c2 + c = c2 + 1. Once we get as far as c7, we find that c is a 7th root of 1. Analogous to the way b and b2 were both cube roots of 1, the other 7th roots of 1 are c2, c3, c4, c5, c6, and 1. For example, c5 is a 7th root of 1 because (c5)7 = c35 = (c7)5 = 15 = 1.

We've decided that c is a zero of x3 + x + 1. What are the other two zeroes? By theorem 1, they must be c2 and (c2)2 = c4. Wait, what about c8? Isn't that a zero also, by the same theorem? Oh, but c8 = c·c7 = c, so it's really the same zero again.

What about c3, c5, and c6? Those are the zeroes of the other irreducible third-degree polynomial, x3 + x2 + 1. For example, (c3)3 + (c3)2 + 1 = c9 + c6 + 1 = c2 + (c2+1) + 1 = 0.

So when we take Z2 and adjoin a new number that is a solution of the previously unsolvable equation x3 + x + 1 = 0, we're actually forced to add six new numbers; the resulting system has 8, and includes three different solutions to both irreducible third-degree polynomials, and a full complement of 7th roots of 1. Since the zeroes of x3 + x + 1 and x3 + x2 + 1 are all 7th roots of 1, this tells us that x7 + 1 factors as (x+1)(x3 + x + 1)(x3 + x2 + 1), which might not have been obvious.

Onward

I said this was my favorite math problem, but I didn't say what the question was. But by this time we have dozens of questions. For example:

  • We've seen cube roots and seventh roots of 1. Where did the fifth roots go?

    (The square theorem ensures that even-order roots are uninteresting. The fourth roots of 1, for example, are just 1, 1, 1, and 1. The sixth roots are the same as the cube roots, but repeated: 1, 1, b, b, b2, and b2.)

    The fifth roots, it turns out, appear later, as the four zeroes of the polynomial x4 + x3 + x2 + x + 1. These four, and 1, are five of the 15 fifteenth roots of 1. Eight others appear as the zeroes of the other two irreducible fourth-degree polynomials, and the other two fifteenth roots are b and b2. If we let d be a zero of one of the other irreducible fourth-degree polynomials, say x4 + x + 1, then the zeroes of this polynomial are d, d2, d4, and d8, and the zeroes of the other are d7, d11, d13, and d14. So we have an interesting situation. We can extend Z2 by adjoining the zeroes of x3 + 1, to get a larger system with 4 elements instead of only 2. And we can extend Z2 by adjoining the zeroes of x7 + 1, to get a larger system with 8 elements instead of only 2. But we can't do the corresponding thing with x5 + 1, because the fifth roots of 1 don't form a closed system. The sum of any two of the seventh roots of 1 was another seventh root of 1; for example, c3 + c4 = c6 and c + c2 = c4. But the corresponding fact about fifth roots is false. Suppose, for example, that y5 = 1. We'd like for y + y2 to be another fifth root of 1, but it isn't; it actually turns out that (y + y2)5 = b. For which n do the nth roots of 1 form a closed system?

  • What do we get if we extend Z2 with both b and c? What's bc? What's b + c? (Spoiler: b + c turns out to be zero of x6 + x5 + x3 + x2 + 1, and is a 63rd root of 1, so we may as well call it f. bc = f48.)

    The degree of a number is the degree of the simplest polynomial of which it is a zero. What's the degree of b + c? In general, if p and q have degrees d(p) and d(q), respectively, what can we say about the degrees of pq and p + q?

  • Since there is an nth root of 1 for every odd n, that implies that for all odd n, there is some k such that n divides 2k - 1. How are n and k related?

  • For n = 2, 3, and 4, the polynomial xn + x + 1 was irreducible. Is this true in general? (No: x5 + x + 1 = (x3 + x2 + 1)(x2 + x + 1).) When is it true? Is there a good way to find irreducible polynomials? How many irreducible polynomials are there of degree n? (The sequence starts out 0, 0, 1, 2, 3, 6, 9, 18, 30, 56, 99,...)

  • The Galois theorem says that any finite field that contains Z2 must have 2n elements, and its addition component must be isomorphic to a direct product of n copies of Z2. Extending Z2 with c, for example, gives us the field GF(8), obtained as the direct product of {0, 1}, {0, c}, and {0, c2}. The addition is very simple, but the multiplication, viewed in this light, is rather difficult, since it depends on knowing the secret identity c · c2 = c + 1.

    Identify the the element a0 + a1c + a2c2 with the number a0 + 2a1 + 4a2, and let the addition operation be exclusive-or. What happens to the multiplication operation?

  • Consider a linear-feedback shift register:

            # Arguments: N, a positive integer
            #            0 <= K <= 2**N-1
            MAXBIT = 2**N
            c = 2
            until c == 1:
              c *= 2
              if (c => MAXBIT):
                c &= ~MAXBIT
                c ^= K
              print c
    
    (Except that this function runs backward relative to the way that LFSRs usually work.)

    Does this terminate for all choices of N and K? (Yes.) For any given N, the maximum number of iterations of the until loop is 2N - 1. For which choices of K is this maximum achieved?

  • Consider irreducible sixth-degree polynomials. We should expect that adjoining a zero of one of these, say f, will expand Z2 to contain 64 elements: 0, 1, f, f2, ..., f62, and f63 = 1. In fact, this is the case.

    But of these 62 new elements, some are familiar. Specifically, f9 = c, and f21 = b. Of the 62 powers of f, 8 are numbers we have seen before, leaving only 54 new numbers. Each irreducible 6th-degree polynomial has 6 of these 54 as its zeroes, so there are 9 such polynomials. What does this calculation look like in general?

  • c1, c2, and c4 are zeroes of x3 + x + 1, one of the two irreducible third-degree polynomials. This means that x3 + x + 1 = (x + c1)(x + c2)(x + c4). Multiplying out the right-hand side, we get: x3 + (c + c2 + c4)x2 + (c3 + c5 + c6)x + 1. Equating coefficients, we have c + c2 + c4 = 0 and c3 + c5 + c6 = 1. c3, c5, and c6 are the zeroes of the other irreducible third-degree polynomial. Coincidence? (No.)

  • The irreducible polynomials appear to come in symmetric pairs: if anxn + an-1xn-1 + ... + a1x + a0 is irreducible, then so is a0xn + a1xn-1 + ... + an-1x + an. Why?

  • c1, c2, and c4 are the zeroes of one of the irreducible third-degree polynomials. 1, 2, and 4 are all the three-bit binary numbers that have exactly one 1 bit. c3, c5, and c6 are the zeroes the other irreducible third-degree polynomial. 3, 5, and 6 are all the three-bit binary numbers that have exactly one 0 bit. Coincidence? (No; theorem 1 is closely related here.)

    Now consider the analogous case for fourth-degree polynomials. The 4-bit numbers that have one 1 bit are 0001, 0010, 0100, and 1000 (1, 2, 4, and 8) which are all zeroes of one of the irreducible 4th-degree polynomials. The 4-bit numbers that have one 0 bit are 1110, 1101, 1011, and 0111 (7, 11, 13, and 14) which are all zeroes of another of the irreducible 4th-degree polynomials.

    The 4-bit numbers with two 1 bits and two 0 bits fall into two groups. In one group we have the numbers where similar bits are adjacent: 0011, 0110, 1100, and 1001, or 3, 6, 9, and 12. These are the zeroes of the third irreducible 4th-degree polynomial. The remaining 4-bit numbers are 0101 and 1010, which correspond to b and b2.

    There is, therefore, a close relationship between the structure an extension of Z2 and the so-called "necklace patterns" of bits, which are patterns that imagine that the bits are joined into a circle with no distinguishable starting point.

    The exceptional values (b and b2) corresponded to necklace patterns that did not possess the full n-fold symmetry. This occurs for composite values of n. But we might also expect to find exceptions at n = 11, since then 2n-1 = 2047 is composite. However, 2047 is exceptional in another way: 2047 = 23 · 89, and neither 23 nor 89 is a divisor of 2n-1 for any smaller value of n. Can we prove from this that when p is prime and 2p-1 is not, the divisors of 2p-1 do not divide 2q-1 for any q < p?

  • Since ci are the seventh roots of 1, we have x7 + 1 = (x + 1)(x + c)(x + c2)...(x + c6). After dividing out the trivial factor of x+1, we are left with:

    x6 + x5 + ... + 1 = (x + c)(x + c2)...(x + c6)

    Multiplying through on the right side and equating coefficients gives:

    c + c2 + ... + c6=1
    cc2 + cc3 + ... + c5c6=1
    cc2c3 + cc2c4 + ... + c4c5c6=1
    ...=...
    cc2c3c4c5c6=1

    What, if anything, can one make of this?

The reason this problem works so well as a long-train-trip problem is that it can be approached from many different directions and at many different levels of difficulty. If I am tired and woozy, I can still enumerate fifth-degree polynomials looking for the irreducible ones, calculate tables of values of en, and ponder the interrelationships of the exponents until I fall asleep. When I wake up again, rested, I can consider the deep relationships with the Galois theorem, with algebra, with random number generation, and with cryptography.

And then, when I get tired of Z2, I can start all over with Z3.


[Other articles in category /math] permanent link

Sun, 11 Mar 2007

Bernoulli processes
A family has four children. Assume that the sexes of the four children are independent, and that boys and girls are equiprobable. What's the most likely distribution of boys and girls?

Well,it depends how you count. Are there three possibilities or five?

All four the same
Three the same, one different
Two-and-two
Four boys, no girls
Three boys, one girl
Two boys, two girls
One boy, three girls
No boys, four girls
If we group outcomes into five categories, as in the pink division on the right, the most likely distribution is two-and-two, as you would probably guess:

BoysGirlsProbability
040.0625
130.25
220.375
310.25
400.0625

This distribution is depicted in the graph at right. Individually, (3, 1) and (1, 3) are less likely than (2, 2). But "three-and-one" includes both (1, 3) and (3, 1), whereas "two-and-two" includes only (2, 2). So if you group outcomes into three categories, as in the green division above left, "three-and-one" comes out more frequent overall than "two-and-two":


One sexThe otherTotal
probability
400.125
310.5
220.375


It makes a difference whether you specify the sexes in the distribution. If a "distribution" is a thing like "b of the children are boys and g are girls", then the most frequent distribution is (2, 2). But if a distribution is "x of one sex and y of the other", then the most frequent distribution [3, 1], where I've used square brackets to show that the order is not important. [3, 1] is the same as [1, 3].

This is true in general. Suppose someone has 1,000 kids. What's the most likely distribution of sexes? It's 500 boys and 500 girls, which I've been writing (500, 500). This is more likely than either (499, 501) or (501, 499). But if you consider "Equal numbers" versus "501-to-499", which I've been writing as [500, 500] and [501, 499], then [501, 499] wins:

BoysGirlsProbability
5014990.02517
5005000.02522
4995010.02517


One sexThe otherTotal
probability
5014990.05035
5005000.02522
For odd numbers of kids, this anomaly doesn't occur, because there's no symmetric value like [500, 500] to get shorted.

DistributionNumber
of hands
Frequency
[4, 4, 3, 2] 10810800 0.16109347
[5, 4, 3, 1] 8648640 0.12887478
[5, 3, 3, 2] 8648640 0.12887478
[5, 4, 2, 2] 6486480 0.09665608
[4, 3, 3, 3] 4804800 0.07159710
[6, 4, 2, 1] 4324320 0.06443739
[6, 3, 2, 2] 4324320 0.06443739
[6, 3, 3, 1] 2882880 0.04295826
[5, 5, 2, 1] 2594592 0.03866243
[7, 3, 2, 1] 2471040 0.03682137
[4, 4, 4, 1] 1801800 0.02684891
[6, 4, 3, 0] 1441440 0.02147913
[5, 4, 4, 0] 1081080 0.01610935
[6, 5, 2, 0] 864864 0.01288748
[6, 5, 1, 1] 864864 0.01288748
[5, 5, 3, 0] 864864 0.01288748
[7, 4, 2, 0] 617760 0.00920534
[7, 4, 1, 1] 617760 0.00920534
[7, 2, 2, 2] 617760 0.00920534
[8, 2, 2, 1] 463320 0.00690401
[7, 3, 3, 0] 411840 0.00613689
[8, 3, 2, 0] 308880 0.00460267
[8, 3, 1, 1] 308880 0.00460267
[7, 5, 1, 0] 247104 0.00368214
[8, 4, 1, 0] 154440 0.00230134
[6, 6, 1, 0] 144144 0.00214791
[9, 2, 1, 1] 102960 0.00153422
[9, 3, 1, 0] 68640 0.00102282
[9, 2, 2, 0] 51480 0.00076711
[10, 2, 1, 0] 20592 0.00030684
[7, 6, 0, 0] 20592 0.00030684
[8, 5, 0, 0] 15444 0.00023013
[9, 4, 0, 0] 8580 0.00012785
[10, 1, 1, 1] 6864 0.00010228
[10, 3, 0, 0] 3432 0.00005114
[11, 1, 1, 0] 1872 0.00002789
[11, 2, 0, 0] 936 0.00001395
[12, 1, 0, 0] 156 0.00000232
[13, 0, 0, 0] 4 0.00000006

Similar behavior appears in related problems. What's the most likely distribution of suits in a bridge hand? People often guess (4, 3, 3, 3), and this is indeed the most likely distribution of particular suits. That is, if you consider distributions of the form "a hearts, b spades, c diamonds, and d clubs", then (4, 3, 3, 3) gives the most likely distribution. (The distributions (3, 4, 3, 3), (3, 3, 4, 3), and (3, 3, 3, 4) are of course equally frequent.) But if distributions have the form "a cards of one suit, b of another, c of another, and d of the fourth"—which is what is usually meant by a suit distribution in a bridge hand—then [4, 4, 3, 2] is the most likely distribution, and [4, 3, 3, 3] is in fifth place.

Why is this? [4, 3, 3, 3] covers the four most frequent distributions: (4, 3, 3, 3), (3, 4, 3, 3), (3, 3, 4, 3), and (3, 3, 3, 4). But [4, 4, 3, 2] covers twelve quite frequent distributions: (4, 4, 3, 2), (4, 3, 2, 4), and so on. Even though the individual distributions aren't as common as (4, 4, 4, 3), there are twelve of them instead of 4. This gives [4, 4, 3, 2] the edge.

[5, 4, 3, 1] includes 24 distributions, and ends up tied for second place. A complete table is in the sidebar at left.

(For 5-card poker hands, the situation is much simpler. [2, 2, 1, 0] is most common, followed by [2, 1, 1, 1] and [3, 1, 1, 0] (tied), then [3, 2, 0, 0], [4, 1, 0, 0], and [5, 0, 0, 0].)

This same issue arose in my recent article on Yahtzee roll probabilities. There we had six "suits", which represented the six possible rolls of a die, and I asked how frequent each distribution of "suits" was when five dice were rolled. For distribution [p1, p2, ...], we let ni be the number of p's that are equal to i. Then the expression for probability of the distribution has a factor of $\prod {n_i}!$ in the denominator, with the result that distributions with a lot of equal-sized parts tend to appear less frequently than you might otherwise expect.

I'm not sure how I got so deep into this end of the subject, since I didn't really want to compare complex distributions to each other so much as to compare simple distributions under different conditions. I had originally planned to discuss the World Series, which is a best-four-of-seven series of baseball games that we play here in the U.S. and sometimes in that other country to the north. Sometimes one team wins four games in a row ("sweeps"); other times the Series runs the full seven games.

You might expect that even splits would tend to occur when the two teams playing were evenly matched, but that when one team was much better than the other, the outcome would be more likely to be a sweep. Indeed, this is generally so. The chart below graphs the possible outcomes. The x-axis represents the probability of the Philadelphia Phillies winning any individual game. The y-axis is the probability that the Phillies win the entire series (red line), which in turn is the sum of four possible events: the Phillies win in 4 games (green), in 5 games (dark blue), in 6 games (light blue), or in 7 games (magenta). The probabilities of the Nameless Opponents winning are not shown, because they are exactly the opposite. (That is, you just flip the whole chart horizontally.)

(The Opponents are a semi-professional team that hails from Nameless, Tennessee.)

Clearly, the Phillies have a greater-than-even chance of winning the Series if and only if they have a greater-than-even chance of winning each game. If they are playing a better team, they are likely to lose, but if they do win they are most likely to do so in 6 or 7 games. A sweep is the most likely outcome only if the Opponents are seriously overmatched, and have a less than 25% chance of winning each game. (The lines for the 4-a outcome and the 4-b outcome cross at 1-(pa / pb)1/(b-a), where pi is 1, 4, 10, 20 for i = 0, 1, 2, 3.)

If we consider just the first four games of the World Series, there are five possible outcomes, ranging from a Phillies sweep, through a two-and-two split, to an Opponents sweep. Let p be the probability of the Phillies winning any single game. As p increases, so does the likelihood of a Phillies sweep. The chart below plots the likelihood of each of the five possible outcomes, for various values of p, charted here on the horizontal axis:

The leftmost red curve is the probability of an Opponents sweep; the red curve on the right is the probability of a Phillies sweep. The green curves are the probabilities of 3-1 outcomes favoring the Opponents and the Phillies, respectively, with the Phillies on the right as before. The middle curve, in dark blue, is the probability of a 2-2 split.

When is the 2-2 split the most likely outcome? Only when the Phillies and the Opponents are approximately evenly matched, with neither team no more than 60% likely to win any game.

But just as with the sexes of the four kids, we get a different result if we consider the outcomes that don't distinguish the teams. For the first four games of the World Series, there are only three outcomes: a sweep (which we've been writing [4, 0]), a [3, 1] split, and a [2, 2] split:

Here the green lines in the earlier chart have merged into a single outcome; similarly the red lines have merged. As you can see from the new chart, there is no pair of teams for which a [2, 2] split predominates; the even split is buried. When one team is grossly overmatched, winning less than about 19% of its games, a sweep is the most likely outcome; otherwise, a [3, 1] split is most likely.

Here are the corresponding charts for series of various lengths.

Series length
(games)
Distinguish teams Don't
distinguish teams
2
3
4
5
6
7
8
9
10

I have no particular conclusion to announce about this; I just thought that the charts looked cool.

Coming later, maybe: reasoning backwards: if the Phillies sweep the World Series, what can we conclude about the likelihood that they are a much better team than the Opponents? (My suspicion is that you can conclude a lot more by looking at the runs scored and runs allowed totals.)

(Incidentally, baseball players get a share of the ticket money for World Series games, but only for the first four games. Otherwise, they could have an an incentive to prolong the series by playing less well than they could, which is counter to the ideals of sport. I find this sort of rule, which is designed to prevent conflicts of interest, deeply satisfying.)


[Other articles in category /math] permanent link

Mon, 05 Mar 2007

An integer partition puzzle
Last month I wrote an article about calculating Yahtzee probabilities and another one about counting permutations in which integer partitions came up. An integer partition of some integer N is an unordered sequence of positive integers that sums to N. For example, there are 5 different integer partitions of 4:

1 1 1 1
2 1 1
2 2
3 1
4
I've spent a lot of time tinkering with partitions since then.

Here's one interesting fact: it's quite easy to calculate the number of partitions of N. Let P(n, k) be the number of partitions of n into parts that are at least k. Then it's easy to see that:

$$P(n, k) = \sum_{i=k}^{n-1} P(n-i, k)$$

And there are simple boundary conditions: P(n, n) = 1; P(n, k) = 0 when k > n, and so forth. And P(n), the number of partitions of n into parts of any size, is just P(n, 1). So a program to calculate P(n) is very simple:

        my @P;
        sub P {
          my ($n, $k) = @_;
          return 0 if $n < 0;
          return 1 if $n == 0;
          return 0 if $k > $n;
          my $r = $P[$n] ||= [];
          return $r->[$k] if defined $r->[$k];
          return $r->[$k] = P($n-$k, $k) + P($n, $k+1);
        }

        sub part {
          P($_[0], 1);
        }

        for (1..100) {
          printf "%3d %10d\n", $_, part($_);
        }
I had a funny conversation once with someone who ought to have known better: I remarked that it was easy to calculate P(n), and disagreed with me, asking why Rademacher's closed-form expression for P(n) had been such a breakthrough. But the two properties are independent; the same is true for lots of stuff. Just because you can calculate something doesn't mean you understand it. Calculating ζ(2) is quick and easy, but it was a major breakthrough when Euler discovered that it was equal to π2/6. Calculating ζ(3) is even quicker and easier, but nobody has any idea what the value represents.

Similarly, P(n) is easy to calculate, but harder to understand. Ramanujan observed, and proved, that P(5k+4) is always a multiple of 5, which had somehow escaped everyone's notice until then. And there are a couple of other similar identities which were proved later: P(7k+5) is always a multiple of 7; P(11k+6) is always a multiple of 11. Based on that information, any idiot could conjecture that P(13k+7) would always be a multiple of 13; this conjecture is wrong. (P(7) = 15.)

Anyway, all that is just leading up the the real point of this note, which is that I was tabulating the number of partitions of n into exactly k parts, which is also quite easy. Let's call this Q(n, k). And I discovered that Q(13, 4) = Q(13, 5). There are 18 ways to divide a pile of 13 beans into 4 piles, and also 18 ways to divide the beans into 5 piles.

1 1 1 10
1 1 2 9
1 1 3 8
1 1 4 7
1 1 5 6
1 2 2 8
1 2 3 7
1 2 4 6
1 2 5 5
1 3 3 6
1 3 4 5
1 4 4 4
2 2 2 7
2 2 3 6
2 2 4 5
2 3 3 5
2 3 4 4
3 3 3 4
1 1 1 1 9
1 1 1 2 8
1 1 1 3 7
1 1 1 4 6
1 1 1 5 5
1 1 2 2 7
1 1 2 3 6
1 1 2 4 5
1 1 3 3 5
1 1 3 4 4
1 2 2 2 6
1 2 2 3 5
1 2 2 4 4
1 2 3 3 4
1 3 3 3 3
2 2 2 2 5
2 2 2 3 4
2 2 3 3 3

The question I'm trying to resolve: is this just a coincidence? Or is there something in the structure of the partitions that would lead us to suspect that Q(13, 4) = Q(13, 5) even if we didn't know the value of either one?

So far, I haven't turned anything up; it seems to be a coincidence. A simpler problem of the same type is that Q(8, 3) = Q(8, 4); that seems to be a coincidence too:

1 1 6
1 2 5
1 3 4
2 2 4
2 3 3
1 1 1 5
1 1 2 4
1 1 3 3
1 2 2 3
2 2 2 2

Looking at this, one can see all sorts of fun correspondences. But on closer inspection, they turn out to be illusory. For example, any partition into 4 parts can be turned into a partition into 3 parts by taking the smallest of the 4 parts, dividing it up into 1's, and distributing the extra 1's to the largest parts. But there's no reason why that should always yield different outputs for different inputs, and, indeed, it doesn't.

Oh well, sometimes these things don't work out the way you'd like.


[Other articles in category /math] permanent link

"Go ahead, throw your vote away!"
I noticed this back in November right afer the election, when I was reading the election returns in the newspaper. There were four candidates for the office of U.S. Senator in Nevada. One of these was Brendan Trainor, running for the Libertarian party.

Trainor received a total of 5,269 votes, or 0.90% of votes cast.

A fifth choice, "None of these candidates", was available. This choice received 8,232 votes, or 1.41%.

Another candidate, David Schumann, representing the Independent American Party, was also defeated by "None of these candidates".

(Complete official results.)

I'm not sure what conclusion to draw from this. I am normally sympathetic to the attempts of independent candidates and small parties to run for office, and I frequently vote for them. But when your candidate fails to beat out "None of the above", all I can think is that you must be doing something terribly wrong.


[Other articles in category /politics] permanent link

Wed, 21 Feb 2007

A bug in HTML generation
A few days ago I hacked on the TeX plugin I wrote for Blosxom so that it would put the TeX source code into the ALT attributes of the image elements it generated.

But then I started to see requests in the HTTP error log for URLs like this:

    /pictures/blog/tex/total-die-rolls.gif$${6/choose%20k}k!{N!/over%20/prod%20{i!}^{n_i}{n_i}!}/qquad%20/hbox{/rm%20where%20$k%20=%20/sum%20n_i$}$$.gif
Someone must be referring people to these incorrect URLs, and it is presumably me. The HTML version of the blog looked okay, so I checked the RSS and Atom files, and found that, indeed, they were malformed. Instead of <img src="foo.gif" alt="$TeX$">, they contained codes for <img src="foo.gif$TeX$">.

I tracked down and fixed the problem. Usually when I get a bug like this, I ask myself what I could learn from it. This one is unusual. I can't think of much. Here's the bug.

The <img> element is generated by a function called imglink. The arguments to imglink are the filename that contains the image (for use in the SRC attribute) and the text for the ALT attribute. The ALT text is optional. If it is omitted, the function tries to locate the TeX source code and fetch it. If this attempt fails, it continues anyway, and omits the ALT attribute. Then it generates and returns the HTML:

        sub imglink {
          my $file = shift;
          ...

          my $alt = shift || fetch_tex($file);

          ...
          $alt = qq{alt="$alt"} if $alt;

          qq{<img $alt border=0 src="$url">};
        }
This function is called from several places in the plugin. Sometimes the TeX source code is available at the place from which the call comes, and the code has return imglink($file, $tex); sometimes it isn't and the code has return imglink($file) and hopes that the imglink function can retrieve the TeX.

One such place is the branch that handles generation of tags for every type of output except HTML. When generating the HTML output, the plugin actually tries to run TeX and generate the resulting image file. For other types of output, it assumes that the image file is already prepared, and just calls imglink to refer to an image that it presumes already exists:

  return imglink($file, $tex) unless $blosxom::flavour eq "html";
The bug was that I had written this instead:

  return imglink($file. $tex) unless $blosxom::flavour eq "html";
The . here is a string concatenation operator.

It's a bit surprising that I don't make more errors like this than I do. I am a very inaccurate typist.

Stronger type checking would not have saved me here. Both arguments are strings, concatenation of strings is perfectly well-defined, and the imglink function was designed and implemented to accept either one or two arguments.

The function did note the omission of the $tex argument, attempted to locate the TeX source code for the bizarrely-named file, and failed, but I had opted to have it recover and continue silently. I still think that was the right design. But I need to think about that some more.

The only lesson I have been able to extract from this so far is that I need a way of previewing the RSS and Atom outputs before publishing them. I do preview the HTML output, but in this case it was perfectly correct.


[Other articles in category /prog/bug] permanent link

Tue, 20 Feb 2007

Subtlety or sawed-off shotgun?

1
  1 1
2
  1 1 1
  2 1
3
  1 1 1 1
  1 2 3
  3 2
4
  1 1 1 1 1
  1 1 2 6
  2 2 3
  3 1 8
  4 6
5
  1 1 1 1 1 1
  2 1 1 1 10
  2 2 1 15
  3 1 1 20
  3 2 20
  4 1 30
  5 24
6
 1 1 1 1 1 1 1
  2 1 1 1 1 15
  2 2 1 1 45
  2 2 2 15
  3 1 1 1 40
  3 2 1 120
  3 3 40
  4 1 1 90
  4 2 90
  5 1 144
  6 120

There's a line in one of William Gibson's short stories about how some situations call for a subtle and high-tech approach, and others call for a sawed-off shotgun. I think my success as a programmer, insofar as I have any, comes from knowing when to deploy each kind of approach.

In a recent article I needed to produce the table that appears at left.

This was generated by a small computer program. I learned a long time ago that although it it tempting to hack up something like this by hand, you should usually write a computer program to do it instead. It takes a little extra time up front, and that time is almost always amply paid back when you inevitably decide that that table should have three columns instead of two, or the lines should alternate light and dark gray, or that you forgot to align the right-hand column on the decimal points, or whatever, and then all you have to do is change two lines of code and rerun the program, instead of hand-editing all 34 lines of the output and screwing up two of them and hand-editing them again. And again. And again.

When I was making up the seating chart for my wedding, I used this approach. I wrote a raw data file, and then a Perl program to read the data file and generate LaTeX output. The whole thing was driven by make. I felt like a bit of an ass as I wrote the program, wondering if I wasn't indulging in an excessive use of technology, and whether I was really going to run the program more than once or twice. How often does the seating chart need to change, anyway?

Gentle readers, that seating chart changed approximately one million and six times.

Order
Higher-Order Perl
Higher-Order Perl
with kickback
no kickback
The Nth main division of the table at left contains one line for every partition of the integer N. The right-hand entry in each line (say 144) is calculated by a function permcount, which takes the left-hand entry (say [5, 1]) as input. The permcount function in turn calls upon fact to calculate factorials and choose to calculate binomial coefficients.

But how is the left-hand column generated? In my book, I spent quite a lot of time discussing generation of partitions of an integer, as an example of iterator techniques. Some of these techniques are very clever and highly scalable. Which of these clever partition-generating techniques did I use to generate the left-hand column of the table?


Why, none of them, of course! The left-hand column is hard-wired into the program:

        while (<DATA>) {
          chomp;
          my @p = split //;
          ...
        }

        ...
        __DATA__
        1
        11
        2
        111
        12
        3
        ...
        51
        6
I guessed that it would take a lot longer to write code to generate partitions, or even to find it already written and use it, than it would just to generate the partitions out of my head and type them in. This guess was correct. The only thing wrong with my approach is that it doesn't scale. But it doesn't need to scale.


The sawed-off shotgun wins!


[Other articles in category /prog] permanent link

Structured BASIC
Aristotle Pagaltzis reminisces about programming microcomputers in BASIC in the 1980s:

That's what I started with, on the Acorn Electron. And I remember being excited about finding and understanding DEF FN. I also remember my disappointment about how limited it was. I remember my frustration whenever BASIC forced me into writing messy code.

I remember my frustration with this too. I realized fairly early on that it was important to organize one's code in a modular fashion. My clearest memory of this was in developing an Adventure-style program. Each of the locations in the world was assigned a sequence number. Location #23 was handled by lines 2300--2399 of the program. Lines 2300--2319 would print the description of the location. Line 2320 would set the variables that recorded the player's location, and called the subroutine to print the descriptions of the other objects at that location. Line 2380 would call the subroutine that prompted the user for their next command. Other lines in between would provide the implementation of whatever special effects were required for that location.

All the important utility subroutines were at mnemonic line numbers; the main loop was at line 50000, and the command processing was at 51000. Special handling for objects was in the 40000 range, with one hundred statement numbers reserved for each object.

After each user command was processed, control was dispatched back to the appropriate part of the program, depending on where the player was now. Microsoft BASIC didn't have a computed GOTO, so the dispatch was performed by a jump table. I was unhappy with the jump table, recognizing that it didn't scale well.

Object sizes and descriptions were stored in a table. I don't know why I didn't store the location descriptions in the table in the same way, but I suspect that I tried and found that my microcomputer didn't have enough string memory. I also discovered that the algorithm that mapped statement numbers to code did not scale well to programs with a lot of numbered statements; editing the program grew intolerably slow once the world contained more than about fifty locations.

Still, I was pleased with the outcome. My goal (at the tender age of sixteen, or whatever) had been to adopt conventions that made it easy to extend or modify the world and to add new locations or objects, and I felt at the time that I had achieved that.

M. Pagaltzis says:

I guess I have a natural penchant for structured code. Penchant? Instinct.

I think anyone who is really interested in writing programs in BASIC and who reflects on the results of his projects is going to come to the conclusion that BASIC is a very poor tool for the job. These problems force themselves on everyone, and if you are thoughtful you will see the problems and try to come up with some techniques to solve them.

I really wish I could see those old programs again. I'm sure I would learn a lot from them.

I do have some code I wrote in C as long ago as 1987. I remember that shortly after that I got sick of programming and took a vacation from it for a year.

One day the following year I was reading netnews, and I overheard a colleague complaining about his CS homework. He had to write a program in C to count the number of occurrences of each word in its input, using a binary tree to store the words. I said he was complaining about nothing and that I, a math major, could turn out such a program in two hours. I don't know why I said this, since I hadn't done any C programming in a year, and I didn't have any significant experience with C, but I was inspired, and I did finish it quickly, and it worked. I have been programming regularly ever since. I still have the source code for that program.

Here's the funny thing about the programs from that time: when I look at the pre-vacation programs, they look to me as though they were written by someone else. When I look at the tree-sort program or any other program I have written since then, I recognize it as my own code.

I don't know what happened in my brain during my one-year vacation, but my current programming style first emerged in that tree-sort program, and the code from after the break has all been a lot better than the code I wrote before.

I'd like to take another vacation, but I can't now, because I have to earn a living.


[Other articles in category /prog] permanent link

Design patterns of 1972
(No português)

"Patterns" that are used recurringly in one language may be invisible or trivial in a different language.

Extended Example: "object-oriented class"

C programmers have a pattern that might be called "Object-oriented class". In this pattern, an object is an instance of a C struct.

        struct st_employee_object *emp;
Or, given a suitable typedef:
        EMPLOYEE emp;
Some of the struct members are function pointers. If "emp" is an object, then one calls a method on the object by looking up the appropriate function pointer and calling the pointed-to function:

        emp->method(emp, args...);
Each struct definition defines a class; objects in the same class have the same member data and support the same methods. If the structure definition is defined by a header file, the layout of the structure can change; methods and fields can be added, and none of the code that uses the objects needs to know.

There are a bunch of variations on this. For example, you can get opaque implementation by defining two header files for each class. One defines the implementation:

        struct st_employee_object {
           unsigned salary;
           struct st_manager_object *boss;
           METHOD fire, transfer, competence;
        };
The other defines only the interface:

        struct st_employee_object {
           char __SECRET_MEMBER_DATA_DO_NOT_TOUCH[4];
           struct st_manager_object *boss;
           METHOD fire, transfer, competence;
        };
And then files include one or the other as appropriate. Here "boss" is public data but "salary" is private.

You get abstract classes by defining a constructor function that sets all the methods to NULL or to:

        void _abstract() { abort(); }
If you want inheritance, you let one of the structs be a prefix of another:

        struct st_manager_object;   /* forward declaration */

        #define EMPLOYEE_FIELDS \
           unsigned salary; \
           struct st_manager_object *boss; \
           METHOD fire, transfer, competence;
                
        struct st_employee_object {
           EMPLOYEE_FIELDS
        };

        struct st_manager_object {
           EMPLOYEE_FIELDS
           unsigned num_subordinates;
           struct st_employee_object **subordinate;
           METHOD delegate_task, send_to_conference;
        };
And if obj is a manager object, you can still treat it like an employee and call employee methods on it.

This may seem weird or contrived, but the technique is widely used. The C standard contains guarantees that the common fields of struct st_manager_object and struct st_employee_object will be laid out identically in memory, specifically so that this object-oriented class technique can work. The code of the X window system has this structure. The code of the Athena widget toolkit has this structure. The code of the Linux kernel filesystem has this structure.

Rob Pike, one of the primary architects of the Plan 9 operating system (the Bell Labs successor to Unix) and co-author (with Brian Kernighan) of The Unix Programming Environment, recommends this technique in his article "Notes on Programming in C".

This is a pattern

There's only one way in which this technique doesn't qualify as a pattern according to the definition of Gamma, Helm, Johnson, and Vlissides. They say:

A design pattern systematically names, motivates, and explains a general design that addresses a recurring design problem in object-oriented systems. It describes the problem, the solution, when to apply the solution, and its consequences. It also gives implementation hints and examples. The solution is a general arrangement of objects and classes that solve the problem. The solution is customized and implemented to solve the problem in a particular context.

Their definition arbitrarily restricts "design patterns" to addressing recurring design problems "in object-oriented systems", and to being general arrangements of "objects and classes". If we ignore this arbitrary restriction, the "object-oriented class" pattern fits the description exactly.

The definition in Wikipedia is:

In software engineering, a design pattern is a general solution to a common problem in software design. A design pattern isn't a finished design that can be transformed directly into code; it is a description or template for how to solve a problem that can be used in many different situations.

And the "object-oriented class" solution certainly qualifies.

Codification of patterns

Peter Norvig's presentation on "Design Patterns in Dynamic Languages" describes three "levels of implementation of a pattern":

Invisible
So much a part of language that you don't notice

Formal
Implement pattern itself within the language
Instantiate/call it for each use
Usually implemented with macros

Informal
Design pattern in prose; refer to by name, but Must be reimplemented from scratch for each use

In C, the "object-oriented class" pattern is informal. It must be reimplemented from scratch for each use. If you want inheritance, you have to set it up manually. If you want abstraction, you have to set it up manually.

The single major driver for the invention of C++ was to codify this pattern into the language so that it was "invisible". In C++, you don't have to think about the structs and you don't have to worry about keeping data and methods private. You just declare a "class" (using syntax that looks almost exactly like a struct declaration) and annotate the items with "public" and "private" as appropriate.

But underneath, it's doing the same thing. The earliest C++ compilers simply translated the C++ code into the equivalent C code and invoked the C compiler on it. There's a reason why the C++ method call syntax is object->method(args...): it's almost exactly the same as the equivalent code when the pattern is implemented in plain C. The only difference is that the object is passed implicitly, rather than explicitly.

In C, you have to make a conscious decision to use OO style and to implement each feature of your OOP system as you go. If a program has fifty modules, you need to decide, fifty times, whether you will make the next module an OO-style module. In C++, you don't have to make a decision about whether or not you want OO programming and you don't have to implement it; it's built into the language.

Sherman, set the wayback machine for 1957

If we dig back into history, we can find all sorts of patterns. For example:

Recurring problem: Two or more parts of a machine language program need to perform the same complex operation. Duplicating the code to perform the operation wherever it is needed creates maintenance problems when one copy is updated and another is not.

Solution: Put the code for the operation at the end of the program. Reserve some extra memory (a "frame") for its exclusive use. When other code (the "caller") wants to perform the operation, it should store the current values of the machine registers, including the program counter, into the frame, and transfer control to the operation. The last thing the operation does is to restore the register values from the values saved in the frame and jump back to the instruction just after the saved PC value.

This is a "pattern"-style description of the pattern we now know as "subroutine". It addresses a recurring design problem. It is a general arrangement of machine instructions that solve the problem. And the solution is customized and implemented to solve the problem in a particular context. Variations abound: "subroutine with passed parameters". "subroutine call with returned value". "Re-entrant subroutine".

For machine language programmers of the 1950s and early 1960's, this was a pattern, reimplemented from scratch for each use. As assemblers improved, the pattern became formal, implemented by assembly-language macros. Shortly thereafter, the pattern was absorbed into Fortran and Lisp and their successors, and is now invisible. You don't have to think about the implementation any more; you just call the functions.

Iterators and model-view-controller

The last time I wrote about design patterns, it was to point out that although the movement was inspired by the "pattern language" work of Christopher Alexander, it isn't very much like anything that Alexander suggested, and that in fact what Alexander did suggest is more interesting and would probably be more useful for programmers than what the design patterns movement chose to take.

One of the things I pointed out was essentially what Norvig does: that many patterns aren't really addressing recurring design problems in object-oriented programs; they are actually addressing deficiencies in object-oriented programming languages, and that in better languages, these problems simply don't come up, or are solved so trivially and so easily that the solution doesn't require a "pattern". In assembly language, "subroutine call" may be a pattern; in C, the solution is to write result = function(args...), which is too simple to qualify as a pattern. In a language like Lisp or Haskell or even Perl, with a good list type and powerful primitives for operating on list values, the Iterator pattern is to a great degree obviated or rendered invisible. Henry G. Baker took up this same point in his paper "Iterators: Signs of Weakness in Object-Oriented Languages".

I received many messages about this, and curiously, some made the same point in the same way: they said that although I was right about Iterator, it was a poor example because it was a very simple pattern, but that it was impossible to imagine a more complex pattern like Model-View-Controller being absorbed and made invisible in this way.

This remark is striking for several reasons. It is an example of what is perhaps the most common philosophical fallacy: the writer cannot imagine something, so it must therefore be impossible. Well, perhaps it is impossible—or perhaps the writer just doesn't have enough imagination. It is worth remembering that when Edgar Allan Poe was motivated to investigate and expose Johann Maelzel's fraudulent chess-playing automaton, it was because he "knew" it had to be fraudulent because it was inconceivable that a machine could actually exist that could play chess. Not merely impossible, but inconceivable! Poe was mistaken, and the people who asserted that MVC could not be absorbed into a programming language were mistaken too. Since I gave my talk in 2002, several programming systems, such as Ruby on Rails and Subway have come forward that attempt to codify and integrate MVC in exactly the way that I suggested.

Progress in programming languages

Had the "Design Patterns" movement been popular in 1960, its goal would have been to train programmers to recognize situations in which the "subroutine" pattern was applicable, and to implement it habitually when necessary. While this would have been a great improvement over not using subroutines at all, it would have been vastly inferior to what really happened, which was that the "subroutine" pattern was codified and embedded into subsequent languages.

Identification of patterns is an important driver of progress in programming languages. As in all programming, the idea is to notice when the same solution is appearing repeatedly in different contexts and to understand the commonalities. This is admirable and valuable. The problem with the "Design Patterns" movement is the use to which the patterns are put afterward: programmers are trained to identify and apply the patterns when possible. Instead, the patterns should be used as signposts to the failures of the programming language. As in all programming, the identification of commonalities should be followed by an abstraction step in which the common parts are merged into a single solution.

Multiple implementations of the same idea are almost always a mistake in programming. The correct place to implement a common solution to a recurring design problem is in the programming language, if that is possible.

The stance of the "Design Patterns" movement seems to be that it is somehow inevitable that programmers will need to implement Visitors, Abstract Factories, Decorators, and Façades. But these are no more inevitable than the need to implement Subroutine Calls or Object-Oriented Classes in the source language. These patterns should be seen as defects or missing features in Java and C++. The best response to identification of these patterns is to ask what defects in those languages cause the patterns to be necessary, and how the languages might provide better support for solving these kinds of problems.

With Design Patterns as usually understood, you never stop thinking about the patterns after you find them. Every time you write a Subroutine Call, you must think about the way the registers are saved and the return value is communicated. Every time you build an Object-Oriented Class, you must think about the implementation of inheritance.

People say that it's all right that Design Patterns teaches people to do this, because the world is full of programmers who are forced to use C++ and Java, and they need all the help they can get to work around the defects of those languages. If those people need help, that's fine. The problem is with the philosophical stance of the movement. Helping hapless C++ and Java programmers is admirable, but it shouldn't be the end goal. Instead of seeing the use of design patterns as valuable in itself, it should be widely recognized that each design pattern is an expression of the failure of the source language.

If the Design Patterns movement had been popular in the 1980's, we wouldn't even have C++ or Java; we would still be implementing Object-Oriented Classes in C with structs, and the argument would go that since programmers were forced to use C anyway, we should at least help them as much as possible. But the way to provide as much help as possible was not to train people to habitually implement Object-Oriented Classes when necessary; it was to develop languages like C++ and Java that had this pattern built in, so that programmers could concentrate on using OOP style instead of on implementing it.

Summary

Patterns are signs of weakness in programming languages.

When we identify and document one, that should not be the end of the story. Rather, we should have the long-term goal of trying to understand how to improve the language so that the pattern becomes invisible or unnecessary.

[ Thanks to Garrett Rooney for pointing out some minor errors that I have since corrected. - MJD ]

[ Addendum 20061003: There is a followup article to this one, replying to a response by Ralph Johnson, one of the authors of the "Design Patterns" book. This link URL is correct, but Johnson's website will refuse it if you come from here. ]


[Other articles in category /prog] permanent link

The envelope paradox
This is on my mind because someone asked about it in IRC yesterday and I was surprised at how coherently I was able to explain it on the spur of the moment. There are several versions of this paradox. My favorite version goes like this: you're going to play a game with an adversary. The adversary writes two different numbers on slips of paper and puts them in an envelope. The numbers are completely arbitrary; they could be absolutely any numbers whatsoever: zero, or π, or -1428573901823.00013, or anything else.

You pick one slip at random from the envelope and examine the number written on it. You then make a prediction about whether the other number is larger or smaller. If your prediction is correct, you win a dollar; if it is incorrect, you lose a dollar.

Clearly, you can break even in the long run simply by making your prediction at random. And it seems just just as clear that there is no strategy you can use that does better than breaking even. But this is the paradox: there is a strategy you can use that does better than breaking even. (This is what W.V.O. Quine calls a "veridical paradox": it's something that seems impossible, but is nevertheless true.)

Spoilers follow, so you might want to stop reading here for twenty-four hours and try to figure out a winning strategy yourself.


Let's call the number you get from the envelope A and the number still in the envelope B. You can see A, and you are trying to predict whether B is larger or smaller than A.

Here's your winning strategy. Before you see A, choose a random number R. If A < R, then conclude that A is "small", and predict that B is larger. If A > R, make the opposite prediction.

There are three possibilities. Either (1) A and B are both less than R, or (2) they are both greater than R, or else (3) one is less than R and one is greater.

In case 1, you predict that B > A, and you have a 50% probability of being correct. In case 2, you predict that B < A, and you have a 50% probability of being correct.

But in case 3, you win every time! If A < R < B then you see A, conclude that A is "small", and predict that B > A, which is correct; if B < R < A then you see A, conclude that A is "large", and predict that B < A, which is correct.

Since you're breaking even in cases 1 and 2, and you have a guaranteed win in case 3, you have a better-than-even chance of winning overall. There's some positive probability p (which depends on the method you use to choose R) that you have case 3, and if so, then your expected positive return on the game is p dollars per game.

The paradoxical part is that it initially seems as though you can't get any idea, just from looking at A, of whether it's larger or smaller than the unknown number B. But you can get such an idea, because you can tell from looking at A how big it is, and big numbers are more likely to be larger than B than small numbers are.

What you've done with R is to invent a definition of "large" and "small" numbers: numbers larger than R are "large" and those smaller than R are "small". It's an arbitrary definition, and it doesn't always succeed in distinguishing large from small numbers—it thinks that R+1 and 1000000R+1000000 are both "large"—but it can distinguish some large numbers from some small numbers, and it never gets confused and concludes that x is large and y is small when x is actually smaller than y. So it may be arbitrary, and extremely coarse, but it is never actually wrong.

In the cases where this very coarse method of deciding "large" from "small" fails to distinguish A from B, you get no new information, but that's okay, because you can still break even. But if you get lucky and the adversary has chosen numbers that you can distinguish, then you win.

Another way to look at the paradox is like this: suppose the adversary is required to choose his two numbers at random. Then you have a simple winning strategy: if A is positive, predict that B is smaller, and if A is negative, predict that B is larger. Even when both numbers are positive or both are negative, you win half the time; if one is positive and one is negative, you are guaranteed a win.

If the adversary knows that this is what you are doing, he can cut you back to merely breaking even, by limiting himself to always choosing positive numbers. But you can foil this strategy of his by choosing your "positive" and "negative" classes to be divided somewhere other than at 0: instead of "positive" being "> 0" and "negative" being "< 0", you make them mean "> R" and "< R". The adversary still wants to choose two numbers that are always positive, but since he doesn't know how big R is, hw doesn't know how large he has to make his own numbers to get them both to be "positive".

Still, this suggests the best strategy for the adversary: choose two very very large numbers that are close together. By doing this, he can make your expected win close to zero.

The envelope paradox is often presented in a different form: you are given two envelopes. One contains a bunch of money, say x dollars. The other contains twice as much. You open one envelope at random and examine its contents. Then you choose one envelope to keep.

A naïve analysis goes like this: I open the first envelope and see x. I can keep this envelope and collect amount x. If I switch, I have a 50% chance of ending up with 2x and a 50% chance of ending up with x/2, for an expected outcome of 5x/4. Since 5x/4 > x, I should always switch.

This is what Quine calls a "falsidical paradox": the reasoning seems good, but leads to an impossible conclusion. The strategy of always switching can't possibly be correct, because you could apply it with without even seeing what is in the envelope. You could keep switching back and forth all day, never opening either envelope, and increasing your expected winnings to infinity.

The tricky part, again, is that having seen x in the envelope, you cannot conclude that there is exactly a 50% chance of x being the larger of the two amounts. You get some information from the size of x, and if x is a large amount of money, then the probability that x is the larger of the two amounts is thereby greater than 0.5.

To do a full analysis, one has to ask the question of how the original amounts were selected. Say that the two amounts are b and 2b; let's call b the "base amount". How did the adversary select b? Let's say that the probability of the base amount being any particular amount x is P(x). It is impossible that b has an equal probability of being every number, because $$\int_{-\infty}^\infty P(x) dx$$ is required to be 1, and if P(b) is the same for every possible base amount b, then it is a constant function, and constant functions do not have the required property.

When you see x in the envelope, you know that one of two situations occurred. Either x is the base amount, and so is smaller, which occurs with probability P(x), or x/2 is the base amount, and x itself is the larger, which occurs with probability P(x/2).

Since these are the only possibilities, the a posteriori likelihood that x is the smaller number is P(x)/(P(x/2) + P(x)). This is equal to 1/2 only if P(x) = P(x/2). Although this can occur for particular values of x, it can't be true for every x. As x increases, P(x) approaches zero, so for sufficiently large x, we must have P(x/2) > P(x), so P(x/2) + P(x) > 2P(x), and P(x)/(P(x/2) + P(x)) < P(x)/2P(x) = 1/2.


[Other articles in category /math] permanent link

Daylight chart
My wife is always sad this time of the year because the days are so short and dark. I made a chart like this one a couple of years ago so that she could see the daily progress from dark to light as the winter wore on to spring and then summer.

The x axis is the date; the y axis is the length of the lit portion of the day, in minutes. Hang it on your wall and watch the length of the day increase!

This chart only works for Philadelphia and other places at the same latitude, such as Boulder, Colorado. For higher latitutdes, the y axis must be stretched. For lower latitudes, it must be squished. In the southern hemisphere, the whole chart must be flipped upside-down.

(The chart will work for places inside the Arctic or Antarctic circles, if you use the correct interpretations for y values greater than 1440 minutes and less than 0 minutes.)

This time of year, the length of the day is only increasing by about thirty seconds per day. (Decreasing, if you are in the southern hemishphere.) On the equinox, the change is fastest, about 165 seconds per day. (At latitudes 40°N and 40°S.) After that, the rate of increase slows, until it reaches zero on the summer solstice; then the days start getting shorter again.

As with all articles in the physics section of my blog, readers are cautioned that I do not know what I am talking about, although I can spin a plausible-sounding line of bullshit. For this article, readers are additionally cautioned that I failed observational astronomy twice in college.


[Other articles in category /physics] permanent link

Addenda to Apostol's proof that sqrt(2) is irrational
Yesterday I posted Tom Apostol's wonderful proof that √2 is irrational. Here are some additional notes about it.

  1. Gareth McCaughan observed that:
    It's equivalent to the following simple algebraic proof: if a/b is the "simplest" integer ratio equal to √2 then consider (2b-a)/(a-b), which a little manipulation shows is also equal to √2 but has smaller numerator and denominator, contradiction.
  2. According to Cut-the-knot, the proof was anticipated in 1892 by A. P. Kiselev and appeared on page 121 of his book Geometry.


[Other articles in category /math] permanent link

More irrational numbers
Gaal Yahas has written in with a delightfully simple proof that a particular number is irrational. Let x = log2 3; that is, such that 2x = 3. If x is rational, then we have 2a/b = 3 and 2a = 3b, where a and b are integers. But the left side is even and the right side is odd, so there are no such integers, and x must be irrational.

As long as I am on the subject, undergraduates are sometimes asked whether there are irrational numbers a and b such that ab is rational. It's easy to prove that there are. First, consider a = b = √2. If √2√2 is rational, then we are done. Otherwise, take a = √2√2 and b = √2. Both are irrational, but ab = 2.

This is also a standard example of a non-constructive proof: it demonstrates conclusively that the numbers in question exist, but it does not tell you which of the two constructed pairs is actually the one that is wanted. Pinning down the real answer is tricky. The Gelfond-Schneider theorem establishes that it is in fact the second pair, as one would expect.


[Other articles in category /math] permanent link

A programmer had a problem...
A while back, I wrote an article in which I mentioned a programmer who had a problem, tried to solve it with weak references, and, as a result, had two problems. I said that weak references work unusually well in that little formula.

Yesterday I was about to make the same mistake. I had a problem, and weak references seemed like the solution. Fortunately, it was time to go home, which is a two-mile walk. Taking a two-mile walk is a great way to fix mistakes, especially the ones you haven't made yet. On this particular walk, I came to my senses and avoided the weak references.

The problem concerns the following classes and methods. You have a database object $db. You can call @rec = $db->lookup, which may return some record objects that represent records. You then call methods on the records, say $rec[3]->get_color, to extract data from them, or $rec[3]->set_color("purple"), to modify the data in the records. The updating is done in-memory only, and a later call to $db->flush writes all the updates back to the database.

The database object needs to store the changes that have been made but not yet written out. The easy way to do this is to have it store a change log of the modified record objects. So set_color first makes its change to the target record object, and then calls an internal _update method on the original database object to attach the record to the change log. Later on, flush will process this array, writing out the indicated changes.

In order for set_color to know which database to direct the _update call to, each record object must have a pointer back to the database that created it. This is convenient for other purposes too. Fine. But then if the record object is stored in the change log inside the database object, we now have a reference loop: the database contains a change log with a pointer to the record, which contains a pointer back to the database itself. This means that neither the database nor the record will ever be garbage collected. (This problem is common in complex Perl programs, and would simply vanish if Perl had even a slightly less awful garbage collector. Improvement is unlikely to occur before the release of Perl 6, now scheduled for October 28, 2073.)

My first reaction when faced with a problem like this one is to gurgle contentedly in my sleep, turn over, and pull the blankets over my head. This strategy is the primary contributor to my success as a programmer; it is somewhat superior to the typical programmer's response, which is to swing into action, overthink the problem, and come up with an elaborate solution. Aron Nimzovitch once said that the problem chess novices have is the irrepressible urge to always be doing something. Programmers are similar. They are all very bright people, very good at solving problems, and they solve problems all the time, even the ones that don't need to be solved.

I seem to be digressing. How unusual. In any case, this problem really did have to be solved. One wants the database object to flush out its pending changes at the time it becomes inacessible. If the object is never garbage collected, then the programmer must always remember to flush out the changes manually. Miss one call to flush, and your updates are lost. This is unacceptable. The primary purpose of a database is to record the updates. So I had to take my head out from under the covers, like it or not.

I thought about several solutions, and even tried one out, but it was too complicated and got me into a horrible tar pit, so I threw it away and started over. (That is another superior strategy that programmers don't exercise as often as they should. As Erik Naggum says, they will drive a hundred miles through a forest, stopping every five feet to cut down another tree, instead of pausing to wonder if maybe they shouldn't have driven off the road in the first place.)

Then I got the bright idea to use weak references, which seemed like just the thing. That's what weak references are for: breaking dependency loops so that things that need to be garbage collected can be. Fortunately, it was time to go, so I walked home instead of diving into the chyme-filled swimming pool of weak references.

With the weak references, you need to decide which reference to weaken. There is a reference to the record object, in the change log inside the database object. And there is a reference to the database object, in the record object. Which do you weaken?

If you weaken the reference to the record, you get a disaster:

        {
          my ($rec) = $db->lookup(...);
          $rec->set_color("purple");
        }
        $db->flush;
When the block is exited, the last strong reference to the record goes away, and the modified record evaporates, leaving nothing inside the database object. The flush method can see by the lingering ghost that there was something there it was supposed to deal with, but it no longer knows what. So that choice is doomed.

What if you weaken the reference inside the record, the one that points back to the database? That is hardly any better:

        my $rec;
        {
          my $db = FlatFile->new(...);
          ($rec) = $db->lookup(...);
        }
        $rec->set_color("purple");
We would like the database object to hang around as long as there are still some extant records from it. But because we weakened the references from the records to the database, it doesn't; it evaporates at the end of the block, leaving the record orphaned. The set_color method then fails, because the database to which it is supposed to write changes has evaporated.

Conclusion: I've heard it before, and it wasn't funny the first time.

On the walk home, I realized something else: actually storing the database data inside the record objects is a bad move. The general advice under which this is a bad move is something like Don't store the same data in two places. The specific problems in this instance are exemplified by this:

        my ($a) = $db->lookup(unique_id => "142857");
        my ($b) = $db->lookup(unique_id => "142857");
        $a->set_color("red");
        $b->set_color("purple");
        $a->color eq "purple";  # True or false?
Since $a and $b represent the same record, the answer should be true. But in the implementation I had (and still have, actually; I haven't fixed this yet) it is false. The set_color method on $b updates the data that is cached in object $b, but has no idea that it should also update the data cached in $a.

To work properly, $a and $b should be identical objects. One way to do this is to store an object in memory for every record in the database, and hand out these preconstructed objects as needed; then both calls to lookup return the same object. This is time- and memory-intensive. Another way to do this is to cache the record objects as they are constructed, and arrange for lookup to return the cached objects when appropriate. This is more complicated.

A simpler solution is not to store the data in memory at all. Record objects are always created as needed, but contain nothing but a database handle and some sort of locator information that says how to get the record data, should it be asked for. ("Any problem can be solved by another layer of indirection," they say, although it's not really true. Still, there are several classes of problems that can be solved by adding another layer of indirection, and this particular object identity problem could serve well as an exemplar of one of those classes.) Then modifications don't go into the record objects themselves. Instead, they go into the database object as an instruction to modify a certain record in a certain way.

This solution, however, presupposes that there is a good way to build locator information for a flat file and update it as needed. Fortunately, there is. I did a really good job of solving this problem a few years ago when I wrote the Tie::File module. It represents a text file as a Perl array, so a record locator can simply be an index into the array, and a record object then becomes something like:

        {
          db => $db,
          recno => 37,
        }
The change log inside the database object looks something like:

        { 0 => no change,
          1 => no change,
          2 => "color" field was set to "purple",
          3 => no change,
          4 => "size" field was set to "unusually large",
          ...
        }        
This happily gets rid of the garbage collection problem I had been trying to solve in the first place.

Using Tie::File also eliminates a lot of I/O issues that I had solved before, and gets all the I/O code out of the database module. I had already been thinking about getting rid of the explicit I/O and having the database module depend on Tie::File, and when I recognized the lurking record object identity problem, I was convinced that it had to happen sooner rather than later. Having done it, I'm really pleased with the outcome.


[Other articles in category /prog] permanent link

Red Flags world tour: New York City
My wife came up with a brilliant plan to help me make regular progress on my current book. The idea of the book is that I show how to take typical programs and repair and refurbish them. The result usually has between one-third and one-half less code, is usually a little faster, and sometimes has fewer bugs. Lorrie's idea was that I should schedule a series of talks for Perl Mongers groups. Before each talk, I would solicit groups to send me code to review; then I'd write up and deliver the talk, and then afterward I could turn the talk notes into a book chapter. The talks provide built-in, inflexible deadlines, and I love giving talks, so the plan will help keep me happy while writing the book.

The first of these talks was on Monday, in my home town of New York.

Order
Martin Chuzzlewit
Martin Chuzzlewit
with kickback
no kickback
'It makes no odds whether a man has a thousand pound, or nothing, there. Particular in New York, I'm told, where Ned landed.'

'New York, was it?' asked Martin, thoughtfully.

'Yes,' said Bill. 'New York. I know that, because he sent word home that it brought Old York to his mind, quite wivid, in consequence of being so exactly unlike it in every respect.'

(Charles Dickens, Martin Chuzzlewit, about which more in some future entry, perhaps.)

The New Yorkers gave me a wonderful welcome, and generously paid my expenses afterward. The only major hitch was that I accidentally wrote my talk about a submission that had come from London. Oops! I must be more careful in the future.

Each time I look at a new program it teaches me something new. Some people, perhaps, seem to be able to reason from general principles to specifics: if you tell them that common code in both branches of a conditional can be factored out, they will immediately see what you mean. Or so they would have you believe; I have my doubts. Anyway, whether they are telling the truth or not, I have almost none of that ability myself. I frequently tell people that I have very little capacity for abstract thought. They sometimes think I'm joking, but I'm not. What I mean is that I can't identify, remember, or understand general principles except as generalizations of specific examples. Whenever I want to study some problem, my approach is always to select a few typical-seeming examples and study them minutely to try to understand what they might have in common. Some people seem to be able to go from abstract properties to conclusions; I can only go from examples.

So my approach to understanding how to improve programs is to collect a bunch of programs, repair them, take notes, and see what sorts of repairs come up frequently, what techniques seem to apply to multiple programs, what techniques work on one program and fail on another, and why, and so on. Probably someone smarter than me would come up with a brilliant general theory about what makes bad programs bad, but that's not how my brain works. My brain is good at coming up with a body of technique. It's a limitation, but it's not all bad.


Order
Concrete Mathematics
Concrete Mathematics
with kickback
no kickback
The goal of generalization had become so fashionable that a generation of mathematicians had become unable to relish beauty in the particular, to enjoy the challenge of solving quantitative problems, or to appreciate the value of technique.

(Ronald L. Graham, Donald E. Knuth, Oren Patashnik, Concrete Mathematics.)

So anyway, here's something I learned from this program. I have this idea now that you should generally avoid the Perl . (string concatenation) operator, because there's almost always a better alternative. The typical use of the . operator looks like this:

          $html = "<a href='".$url."'>".$hot_text."</a>";
It's hard to see here what is code and what is data. You pretty much have to run the Perl lexer algorithm in your head. But Perl has another notation for concatenating strings: "$a$b" concatenates strings $a and $b. If you use this interpolation notation to rewrite the example above, it gets much easier to read:

          $html = "<a href='$url'>$hot_text</a>";
So when I do these classes, I always suggest that whenever you're going to use the . operator, you try writing it as an interpolation too and see which you like better.

This frequently brings on a question about what to do in cases like this:

          $tmpfilealrt = "alert_$daynum" . "_$day" . "_$mon.log" ;
Here you can't eliminate the . operators in this way, because you would get:
          $tmpfilealrt = "alert_$daynum_$day_$mon.log" ;
This fails because it wants to interpolate $daynum_ and $day_, rather than $daynum and $day. Perl has an escape hatch for this situation:
          $tmpfilealrt = "alert_${daynum}_${day}_$mon.log" ;
But it's not clear to me that that is an improvement on the version that used the . operator. The punctuation is only slightly reduced, and you've used an obscure notation that a lot of people won't recognize and that is visually similar to, but entirely unconnected with, hash notation.

Anyway, when this question would come up, I'd discuss it, and say that yeah, in that case it didn't seem to me that the . operator was inferior to the alternatives. But since my review of the program I talked about in New York on Monday, I know a better alternative. The author of that program wrote it like this:

          $tmpfilealrt = "alert_$daynum\_$day\_$mon.log" ;
When I saw it, I said "Duh! Why didn't I think of that?"


[Other articles in category /prs] permanent link

Benjamin Franklin and the Quakers

Order
Franklin's Autobiography
Franklin's Autobiography
with kickback
no kickback
Benjamin Franklin was not impressed with the Quakers. His Autobiography, which is not by any means a long book, contains at least five stories of Quaker hypocrisy. I remembered only two, and found the others when I was looking for these.

In one story, the firefighting company was considering contributing money to the drive to buy guns for the defense of Philadelphia against the English. A majority of board members was required, but twenty-two of the thirty board members were Quakers, who would presumably oppose such an outlay. But when the meeting time came, twenty-one of the Quakers were mysteriously absent from the meeting! Franklin and his friends agreed to wait a while to see if any more would arrive, but instead, a waiter came to report to him that eight of the Quakers were awaiting in a nearby tavern, willing to come vote in favor of the guns if necessary, but that they would prefer to remain absent if it wouldn't affect the vote, "as their voting for such a measure might embroil them with their elders and friends."

Franklin follows this story with a long discourse on the subterfuges used by Quakers to pretend that they were not violating their pacifist principles:

My being many years in the Assembly. . . gave me frequent opportunities of seeing the embarrassment given them by their principle against war, whenever application was made to them, by order of the crown, to grant aids for military purposes. . . . The common mode at last was, to grant money under the phrase of its being "for the king's use," and never to inquire how it was applied.

And a similar story, about a request to the Pennsylvania Assembly for money to buy gunpowder:

. . . they could not grant money to buy powder, because that was an ingredient of war; but they voted an aid to New England of three thousand pounds, to he put into the hands of the governor, and appropriated it for the purchasing of bread, flour, wheat, or other grain. Some of the council, desirous of giving the House still further embarrassment, advis'd the governor not to accept provision, as not being the thing he had demanded; but he reply'd, "I shall take the money, for I understand very well their meaning; other grain is gunpowder," which he accordingly bought, and they never objected to it.

And Franklin repeats an anecdote about William Penn himself:

The honorable and learned Mr. Logan, who had always been of that sect . . . told me the following anecdote of his old master, William Penn, respecting defense. It was war-time, and their ship was chas'd by an armed vessel, suppos'd to be an enemy. Their captain prepar'd for defense; but told William Penn and his company of Quakers, that he did not expect their assistance, and they might retire into the cabin, which they did, except James Logan, who chose to stay upon deck, and was quarter'd to a gun. The suppos'd enemy prov'd a friend, so there was no fighting; but when [Logan] went down to communicate the intelligence, William Penn rebuk'd him severely for staying upon deck, and undertaking to assist in defending the vessel, contrary to the principles of Friends, especially as it had not been required by the captain. This reproof, being before all the company, piqu'd [Mr. Logan], who answer'd, "I being thy servant, why did thee not order me to come down? But thee was willing enough that I should stay and help to fight the ship when thee thought there was danger."


[Other articles in category /religion] permanent link

On design
I'm writing this Perl module called FlatFile, which is supposed to provide lightweight simple access to flat-file databases, such as the Unix password file. An interesting design issue came up, and since I think that understanding is usually best served by minuscule examination of specific examples, that's what I'm going to do.

The basic usage of the module is as follows: You create a database object that represents the entire database:

        my $db = FlatFile->new(FILE => "/etc/passwd", 
                               FIELDS => ['username', 'password', 'uid', 'gid',
                                          'gecos', 'homedir', 'shell'],
                               FIELDSEP => ':',
                              ) or die ...;
Then you can do queries on the database:

        my @roots = $db->lookup(uid => 0);
This returns a list of Record objects. (Actually it returns a list of FlatFile::Record::A objects, where FlatFile::Record::A is a dynamically-generated class that was manufactured at the time you did the new call, and which inherits from FlatFile::Record, but we can ignore that here.) Once we have the Record objects, we can query them or modify them:

        for my $root (@roots) {
          if ($root->username eq 'root') {
            $root->set_shell('/bin/false');
          } else {
            $root->delete;
          }
        }
This loops over the records that were selected in the earlier call and examines the username field in each one. if the username is root, the program sets the shell in the record to /bin/false; otherwise it deletes the record entirely.

Since lookup returns all the matching records, there is the question of what this should do:

        my $root = $db->lookup(uid => 0);
Here we have provided enough room for at most one root user. What if there is more than one?

Every Perl function needs to make a decision about this issue. The function could be called in list context or in scalar context, and you need to choose the two behaviors sensibly. Here are some possibilities for what lookup might do if called in scalar context:

  1. die unconditionally

  2. return the number of matching records, analogous to the builtin grep function or the @array syntax

  3. return the single matching record, if there is only one, and die if there is more than one.

  4. return the first matching record, and discard the others

  5. return a reference to an array of all matching records

  6. return an iterator object which can be used to access all the matching records

There are probably some other reasonable possibilities.

How to decide on the best behavior? This is the kind of problem that I really enjoy. What will people expect? What will they want? What do they need?

Two important criteria are:

  1. Difficulty: Whatever I provide should be something that's not easy to get any other way.

  2. Usefulness: Whatever I provide should be something that people will use a lot.

The difficulty criterion argues strongly against behavior #5 (return an array), because it's too much like the current list context behavior. No matter what the method does in scalar context, no matter what design decision I make, the programmer will always be able to get behavior #5 very easily:

        my $ref = [ $db->lookup(...) ];
Or they can subclass the Record module and add a new one-line method that does the same:
        sub lookup_ref {
          my $self = shift;
          [ $self->lookup(@_) ];
        }
Similarly, behavior #2 (return a count) is so easy to get that supporting it directly would probably not be a good use of my code or my precious interface space:

        my $N_recs = () = $db->lookup(...);
I had originally planned to do #3 (require that the query produce a single record, on pain of death), and here's why: in my first forays into programming with this module, I frequently found myself writing things like my $rec = $db->lookup(...) without meaning to, and in spite of the fact that I had documented the behavior in scalar context as being undefined. I kept doing it unintentionally in cases where I expected only one record to be returned. So each time I wrote this code, I was putting in an implicit assumption that there would be only one match. I would have been quite surprised in each case if there had actually been multiple matches. That's the sort of assumption that you might like to have automatically checked.

I ran the question by the folks on IRC, and reaction against this design was generally negative. Folks said that it's not the module's job to try to discern the programmer's intention and enforce this inference by committing suicide.

I can certainly get behind that point of view. I once wrote an article complaining bitterly about modules that call die. I said it was like when you're having tea and crumpets on your 112-piece Spode china set, and you accidentally chip the teacup, and the butler comes running in, crying "Don't worry, Master! I'll take care of that for you!" and then he whips out a hammer and smashes all 112 pieces of china to tiny bits.

I don't think the point applies here, though. I had mentioned it in connection with the Text::ParseWords module, which would throw an exception if the input string was unparseable, hardly an uncommon occurrence, and one that was entirely unavoidable: if I knew that the string would be unparseable, I wouldn't be calling Text::ParseWords to parse it.

Folks on IRC said that when the method might call die, you have to wrap every call to it in an exception handler, which I certainly agree is a pain in the ass. But in this example, you do not have to do that. Here, to prevent the function from dying is very easy: just call it in list context; then it will never die. If what you want is behavior #4, to have it discard all the records but the first one, that is easy to get, regardless of the design I adopt for scalar context behavior:

        my ($rec) = $db->lookup(...);
This argues against #4 (return the first matching record) in the same way that we argued against #2 and #5 already: it's so very easy to do already, maybe we don't need an even easier way to do it. But if so, couldn't the programmer just:

        sub lookup_first {
          my $self = shift;
          my ($rec) = $self->lookup(@_);
          return $rec;
        }
A counterargument in favor of #4 might be based on the usefulness criterion: perhaps this behavior is so commonly wanted that we really do need an even easier way to do it.

I was almost persuaded by the strong opinion in favor of #4, but then Roderick Schertler spoke up in favor of #3, for basically the reasons I set forth. I consider M. Schertler to have higher-than-normal reliability on matters of this type, so his opinion counterbalances several of the counteropinions on the other side. #3 is not too difficult to get, but still scores higher than most of the others on the difficulty scale. There doesn't seem to be a trivial inline expression of it, as there was with #2, #4, and #5. You would have to actually write a method, or else do something nasty like:

        (my ($rec) = $db->lookup(...)) < 2 or die ...;
What about the other proposed behaviors? #1 (unconditional fatality) is simple, but both criteria seem to argue against it. It does, however, have the benefit of being a good temporary solution since it is easy to change without breaking backward compatibility. Were I to adopt it, it would be very unlikely (although not impossible) that anyone would write a program that would depend on that behavior; I would then be able to change it later on.

#6 (return an iterator object) is very tempting, because it is the only one that scores high on the difficulty criterion scale: it is difficult or impossible to do this any other way, so by providing it, I am providing a real service to users of the module, rather than yet another way to do the same thing. The module's user cannot implement a good iterator interface as a wrapper around lookup, because lookup always searches the entire database before it returns, and allocates enough memory to store every returned record, whereas a good iterator interface will search only as far as is necessary to find the next matching record, and will store only one record at a time.

This performance argument would be more important if we expected the databases to be very large. But since this is a module for manipulating plain text files, we can expect that they will not be too big, and perhaps the time and memory costs of searching them will be relatively small, so perhaps this design will score fairly low on the usefulness scale.

I still haven't made up my mind, although writing this article has pushed me strongly toward #6. I would be glad to receive email on the matter.


[Other articles in category /prog] permanent link

Yahtzee probability
In the game of Yahtzee, the players roll five dice and try to generate various combinations, such as five of a kind, or full house (a simultaneous pair and a three of a kind.) A fun problem is to calculate the probabilities of getting these patterns. In Yahtzee, players get to re-roll any or all of the dice, twice, so the probabilities depend in part on the re-rolling strategy you choose. But the first step in computing the probabilities is to calculate the chance of getting each pattern in a single roll of all five dice.

A related problem is to calculate the probability of certain poker hands. Early in the history of poker, rules varied about whether a straight beat a flush; players weren't sure which was more common. Eventually it was established that straights were more common than flushes. This problem is complicated by the fact that the deck contains a finite number of each card. With cards, drawing a 6 reduces the likelihood of drawing another 6; this is not true when you roll a 6 at dice.

With three dice, it's quite easy to calculate the likelihood of rolling various patterns:

PatternProbability
A A A6/ 216
A A B90/ 216
A B C120/ 216

A high school student would have no trouble with this. For pattern AAA, there are clearly only six possibilities.