The Universe of Discourse
           
Wed, 26 Mar 2008

The "z" command: output filtering
My last few articles ([1] [2] [p] [p-2]) have been about this z program. The first part of this article is a summary of that discussion, which you can skip if you remember it.

The idea of z is that you can do:

        z grep pattern files...
and it does approximately the same as:
        zgrep pattern files...
or you could do:
        z sed script files...
and it would do the same as:
        zsed script files...
if there were a zsed command, although there isn't.

Much of the discussion has concerned a problem with the implementation, which is that the names of the original compressed files are not available to the command, due to the legerdemain z must perform in order to make the uncompressed data available to the command. The problem is especially apparent with wc:

        % z wc *  
            411    2611   16988 ctime.blog
             71     358    2351 /proc/self/fd/3
            121     725    5053 /proc/self/fd/4
             51     380    2381 files-talk.blog
             48     145     885 find-uniq.pl
            288    2159   12829 /proc/self/fd/5
             95     665    4337 ssh-agent-revisted.blog
            221     941    6733 struct-inode.blog
            106     555    3976 sync-2.blog
            115     793    4904 sync.blog
            124     624    4208 /proc/self/fd/6
           1651    9956   64645 total

Here /proc/self/fd/3 and the rest should have been names ending in .gz, such as env-2.blog.gz.

Another possible solution

At the time I wrote the first article, it occurred to me briefly that it would be possible to have z capture the output of the command and attempt to translate /proc/self/fd/3 back to env-2.blog.gz or whatever is appropriate, because although the subcommand does not know the original filenames, z itself does. The code would look something like this. Instead of ending by execing the command, as the original version of z did:
  exec $command, @ARGV;
  die "Couldn't run '$command': $!.\n";
this revised version of z, which we might call zz, would end with the code to translate back to the original filenames:

  open my($out), "-|", $command, @ARGV
    or die "Couldn't run '$command': $!.\n";
  while (<$out>) {
    s{/proc/self/fd/(\d+)}{$old[$1]}g;
    print;
  }
Here @old is an array that translates from file descriptors back to the original filename.

At the time, I thought of doing this, and my immediate thought was "well, that is so obviously a terrible idea that it is not worth even mentioning", so I left it out. But since then at least five people have written to me to suggest it, so it appears that it is not obviously a terrible idea. I had to think a little deeper about why I thought it was a terrible idea.

Really the question is why I think this is a more terrible idea than the original z program was in the first place. Because one could say that z is garbling the output of its command, and the filtering code in zz is only un-garbling it. But I think this isn't the right way to look at it.

The output of the command has a certain format, a certain structure. We don't know ahead of time what that structure is, but it can be described for any particular command. For instance, the output of wc is always a sequence of lines where each line has four whitespace-separated fields, of which the first three are numerals and the last is a filename, and then a final total line at the end.

Similarly, the output of tar is a file in a complicated binary format, one which is documented somewhere and which is intelligible to other instances of the tar command that are trying to decode it.

The original behavior of z may alter the content of the command output to some extent, replacing some filenames with others. But it cannot disrupt the structure or the format of the file, ever. This is because the output of z tar is the output of tar, unmodified. The z program tampers with the arguments it gives to tar, but having done that it runs tar and lets tar do what it wants, and tar then must produce a tar-format output, possibly not the one it would have normally produced—the content might be a little different—but a properly-formatted one for sure. In particular, any program written to deal properly with the output of tar will still work with the output of z tar. The output might not have the same meaning, but we can say very particularly what the extent of the differences might be: if the output mentions filenames, then some of these might have changed from the true filenames to filenames of the form /proc/self/fd/37.

With zz, we cannot make any such guarantee. The output of zz tar zc foo.gz, for example, might be in proper .tar.gz format. But suppose the output of tar zc foo.gz creates compressed binary output that just happens to contain the byte sequence 2f70 726f 632f 7365 6c66 2f66 642f 33? (That is, "/proc/self/fd/3".) Then zz will silently replace these 15 bytes with the six bytes 666f 6f2e 677a.

What if the original sequence was understood as part of a sequence of 2-byte integers? The result is not even properly aligned. What if that initial 2f was a count? The resulting count (66) is much too long. The result would be utterly garbled and unintelligible to tar zx. What the tar command will do with a garbled input is not well-defined: it might dump core, or it might write out random garbage data, or overwrite essential files in the filesystem. We are into nasal demon territory. With the original z, we never get anywhere near the nasal demons.

I suppose the short summary here is that z treats its command as a black box, while zz pretends to understand what comes out of it. But zz's understanding is a false pretense. My experience says that programs should not screw around with things they don't understand, and this is why I instantly rejected the idea when I thought of it before.

One correspondent argued that the garbling is very unlikely, and proposed various techniques to make it even less likely, mostly by rewriting the input filenames to various long random strings. But I felt then that this was missing the point, and I still do. He says it is unlikely, but he doesn't know that it is unlikely, and indeed the unlikeliness depends on the format of the output of the command, which is precisely the unknown here. In my view, the difference between z and zz is that the changes that z makes are bounded, because you can describe them briefly, as I did above, and the changes that zz makes are unbounded, because there is no limit to what could happen as a result.

On the other hand, this correspondent made a good point that if the output of zz is not consumed by anything other than human eyeballs, there may be no real problem. And for some particular commands, such as wc, there is never any problem at all. So perhaps it's a good idea to add a command-line option to z to enable the zz behavior. I did this in my version, and I'm going to try it out and see how it goes.

Complete modified source code is available. (Diffs from previous version.)


[Other articles in category /Unix] permanent link

Tue, 25 Mar 2008

z-commands
The gzip distribution includes a command called zcat. Its command-line arguments can include any number of filenames, compressed or not, and it prints out the contents, uncompressing them on the fly if necessary. Sometime later a zgrep command appeared, which was similar but which also performed a grep search.

But for anything else, you either need to uncompress the files, or build a special tool. I have a utility that scans the web logs of blog.plover.com, and extracts a report about new referrers. The historical web logs are normally kept compressed, so I recently built in support for decompression. This is quite easy in Perl. Normally one scans a sequence of input files something like this:

        while (<>) {
          ... do something with $_ ...
        }
The <> operator implicitly scans all the lines in all the files named in the command-line arguments, opening a new file each time the previous one is exhausted.

To decompress the files on the fly, one can preprocess the command-line arguments:

        for (@ARGV) {
          if (/\.gz$/) {
            $_ = "gzip -dc $_ |";
          }
        }

        while (<>) {
          ... do something with $_ ...
        }
The for loop scans the command-line arguments, replacing each one that has the form foo.gz with gzip -dc foo.gz |. Perl's magic open semantics treat filenames specially if they end with a pipe symbol: a pipe to a command is opened instead. Of course, anyone can think of half a dozen ways in which this can go wrong. But Larry Wall's skill in making such tradeoffs has been a large factor in Perl's success.

But it bothered me to have to make this kind of change in every program that wanted to handle compressed files. We have zcat and zgrep; where are zcut, zpr, zrev, zwc, zcol, zbc, zsed, zawk, and so on? Echh.

But after I got to thinking about it, I decided that I could write a single z utility that would do a lot of the same things. Instead of this:

        zsed -e 's/:.*//' * | ...
where the * matches some files that have .gz suffixes and some that haven't, one would write:

        z sed -e 's/:.*//' * | ...
and it would Just Work. That's the idea, anyway.

If sed were written in Perl, z would have an easy job. It could rely on Perl's magic open, and simply preprocess the arguments before running sed:

        # hypothetical implementation of z
        #
        my $command = shift;
        for (@ARGV) {
          if (/\.gz$/) {
            $_ = "gzip -dc $_ |";
          }
        }
        exec $command, @ARGV;
        die "Couldn't run command '$command': $!\n";
But sed is not written in Perl, and has no magic open. So I have to play a trickier trick:

        for my $file (@ARGV) {
          if ($file =~ /\.gz$/) {
            unless (open($fhs[@fhs], "-|", "gzip", "-cd", $file)) {
              warn "Couldn't open file '$file': $!; skipping\n";
              next;
            }
            my $fd = fileno $fhs[-1];
            $_ = "/proc/self/fd/$fd";
          }
        }

        # warn "running $command @ARGV\n";
        exec $command, @ARGV;
        die "Couldn't run command '$command': $!\n";
This is a stripped-down version to illustrate the idea. For various reasons that I explained yesterday, it does not actually work. The complete, working source code is here.

The idea, as before, is that the program preprocesses the command-line arguments. But instead of replacing the arguments with pipe commands, which are not supported by open(2), the program sets up the pipes itself, and then directs the command to take its input from the pipes by specifying the appropriate items from /proc/self/fd.

The trick depends crucially on having /proc/self/fd, or /dev/fd, or something of the sort, because otherwise there's no way to trick the command into reading from a pipe when it thinks it is opening a file. (Actually there is at least one other way, involving FIFOs, which I plan to discuss tomorrow.) Most modern systems do have /proc/self/fd. That feature postdates my earliest involvement with Unix, so it isn't a ready part of my mental apparatus as perhaps it ought to be. But this utility seems to me like a sort of canonical application of /proc/self/fd, in the sense that, if you couldn't think what /proc/self/fd might be good for, then you could read this example and afterwards have a pretty clear idea.

The z utility has a number of flaws. Principally, the original filenames are gone. Here's a typical run with regular zgrep:

        % zgrep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        env-2.blog.gz:All five people who wrote to me about this immediately said "oh, yes,
        qmail-throttle.blog.gz:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
But here's the same thing with z:

        % z grep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        /proc/self/fd/3:All five people who wrote to me about this immediately said "oh, yes,
        /proc/self/fd/5:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
The problem is even more glaring in the case of commands like wc:

        % z wc *  
            411    2611   16988 ctime.blog
             71     358    2351 /proc/self/fd/3
            121     725    5053 /proc/self/fd/4
             51     380    2381 files-talk.blog
             48     145     885 find-uniq.pl
            288    2159   12829 /proc/self/fd/5
             95     665    4337 ssh-agent-revisted.blog
            221     941    6733 struct-inode.blog
            106     555    3976 sync-2.blog
            115     793    4904 sync.blog
            124     624    4208 /proc/self/fd/6
           1651    9956   64645 total

So perhaps z will not turn out to be useful enough to be more than a curiosity. But I'm not sure yet.

This is article #300 on my blog. Thanks for reading.

[ Addendum 20080322: There is a followup to this article. ]

[ Addendum 20080325: Another followup. ]


[Other articles in category /Unix] permanent link

Closed file descriptors: the answer
This is the answer to yesterday's article about a small program that had a mysterious error.

        my $command = shift;
        for my $file (@ARGV) {
          if ($file =~ /\.gz$/) {
            my $fh;
            unless (open $fh, "<", $file) {
              warn "Couldn't open $file: $!; skipping\n";
              next;
            }
            my $fd = fileno $fh;
            $file = "/proc/self/fd/$fd";
          }
        }

        exec $command, @ARGV;
        die "Couldn't run command '$command': $!\n";
When the loop exits, $fh is out of scope, and the filehandle it contains is garbage-collected, closing the file.

"Duh."

Several people suggested that it was because open files are not preserved across an exec, or because the meaning of /proc/self would change after an exec, perhaps because the command was being run in a separate process; this is mistaken. There is only one process here. The exec call does not create a new process; it reuses the same one, and it does not affect open files, unless they have been flagged with FD_CLOEXEC.

Abhijit Menon-Sen ran a slightly different test than I did:

        % z cat foo.gz bar.gz
        cat: /proc/self/fd/3: No such file or directory
        cat: /proc/self/fd/3: No such file or directory
As he said, this makes it completely obvious what is wrong, since the two files are both represented by the same file descriptor.


[Other articles in category /prog/perl] permanent link

The "z" command: alternative implementations
In yesterday's article I discussed a possibly-useful utility program named z, which has a flaw. To jog your memory, here is a demonstration:

        % z grep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        /proc/self/fd/3:All five people who wrote to me about this immediately said "oh, yes,
        /proc/self/fd/5:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
For a detailed discussion, see the previous article.

Fixing this flaw seems difficult-to-impossible. As I said earlier, the trick is to fool the command into reading from a pipe when it thinks it is opening a file, and this is precisely what /proc/self/fd is for. But there is an older, even more widely-implemented Unix feature that does the same thing, namely the FIFO. So an alternative implementation creates one FIFO for each compressed file, with a gzip process writing to the FIFO, and tells the command to read from the FIFO. Since we have some limited control over the name of the FIFO, we can ameliorate the missing-filename problem to some extent. Say, for example, we create the FIFOs in /tmp/PID. Then the broken zgrep example above might look like this instead:

        % z grep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        /tmp/7516/env-2.blog.gz:All five people who wrote to me about this immediately said "oh, yes,
        /tmp/7516/qmail-throttle.blog.gz:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
The output is an improvement, but it is not completely solved, and the cost is that the process and file management are much more complicated. In fact, the cost is so high that you have to wonder if it might not be simpler to replace z with a shell script that copies the data to a temporary directory, uncompresses the files, and runs the command on the uncompressed files, perhaps something along these lines:

        #!/bin/sh
        DIR=/tmp/$$
        mkdir $DIR

        COMMAND=$1
        shift
        cp -p "$@" $DIR

        cd $DIR
        gzip -d *
        $COMMAND *
This has problems too, but my point is that if you are willing to accept a crappy, semi-working solution along the lines of the FIFO one, simpler ones are at hand. You can compare the FIFO version directly with the shell script, and I think the FIFO version loses. The z implementation I have is a solution in a different direction, and different tradeoffs, and so might be preferable to it in a number of ways.

But as I said, I don't know yet.

[ Addendum 20080325: Several people suggested a fix that I had considered so unwise that I didn't even mention it. But after receiving the suggestion repeatedly, I wrote an article about it. ]


[Other articles in category /Unix] permanent link

Sun, 16 Mar 2008

Drawing lines
As part of this thing I sometimes do when I'm not writing in my blog—what is it called?—oh, now I remember.

As part of my job I had to produce the following display:

The idea here is that the user can fill in the names of three organisms into the form blanks, and the application will find all the studies in its database which conclude that those organisms are related in the indicated way. For example, the user can put "whale" and "hippo" in the top two blanks and "cow" in the bottom one, and the result will be all the studies that conclude (perhaps among other things) that whales and hippos are more closely related to each other than either is to cows. (I think "cothurnocystis bifida" is biologist jargon for cows.)

If you wanted to hear more about phylogeny, Java programming, or tree algorithms, you are about to be disappointed. The subject of my article today is those fat black lines.

The first draft of the page did not have the fat black lines. It had some incredibly awful ASCII-art that was not even properly aligned. Really it was terrible; it would have been better to have left it out completely. I will not make you look at it.

I needed the lines, so I popped down the "graphics" menu on my computer and looked for something suitable. I tried the Gimp first. It seems that the Gimp has no tool for drawing straight lines. If someone wants to claim that it does, I will not dispute the claim. The Gimp has a huge and complex control panel covered with all sorts of gizmos, and maybe one of those gizmos draws a straight line. I did not find one. I gave up after a few minutes.

Next I tried Dia. It kept selecting the "move the line around on the page" tool when I thought I had selected the "draw another line" tool. The lines were not constrained to a grid by default, and there was no obvious way to tell it that I wanted to draw a diagram smaller than a whole page. I would have had to turn the thing into a bitmap and then crop the bitmap. "By Zeus's Beard," I cried, "does this have to be so difficult?" Except that the oath I actually uttered was somewhat coarser and less erudite than I have indicated. I won't repeat it, but it started with "fuck" and ended with "this".

Here's what I did instead. I wrote a program that would read an input like this:

        >-v-<
        '-+-`
and produce a jpeg file that looks like this:

Or similarly this:

        .---,    
        |   >--, 
        '---`  '-
Becomes this:

You get the idea.

Now I know some of you are just itching to write to me and ask "why didn't you just use...?", so before you do that, let me remind you of two things. First, I had already wasted ten or fifteen minutes on "just use..." that didn't work. And second, this program only took twenty minutes to write.

The program depends on one key insight, which is that it is very, very easy to write a Perl program that generates a graphic output in "PBM" ("portable bitmap") format. Here is a typical PBM file:

        P1
        10 10
        1111111111
        1000000001
        1000000001
        1001111001
        1001111001
        1001111001
        1001111001
        1000000001
        1000000001
        1111111111
The P1 is a magic number that identifies the file format; it is always the same. The 10 10 warns the processor that the upcoming bitmap is 10 pixels wide and 10 pixels high. The following characters are the bitmap data. I'm not going to insult you by showing the 10×10 bitmap image that this represents.

PBM was invented about twenty years ago by Jef Poskanzer. It was intended to be an interchange format: say you want to convert images from format X to format Y, but you don't have a converter. You might, however, have a converter that turns X into PBM and then one that turns PBM into Y. Or if not, it might not be too hard to produce such converters. It is, in the words of the Extreme Programming guys, the Simplest Thing that Could Possibly Work.

There are also PGM (portable graymap) and PPM (portable pixmap) formats for grayscale and 24-bit color images as well. They are only fractionally more complicated.

Because these formats are so very, very simple, they have been widely adopted. For example, the JPEG reference implementation includes a sample cjpeg program, for converting an input to a JPEG file. The input it expects is a PGM or PPM file.

Writing a Perl program to generate a P?M file, and then feeding the output to pbmtoxbm or ppmtogif or cjpeg is a good trick, and I have used it many times. For example, I used this technique to generate a zillion little colored squares in this article about the Pólya-Burnside counting lemma. Sure, I could have drawn them one at a time by hand, and probably gone insane and run amuck with an axe immediately after, but the PPM technique was certainly much easier. It always wins big, and this time was no exception.

The program may be interesting as an example of this technique, and possibly also as a reminder of something else. The Perl community luminaries invest a lot of effort in demonstrating that not every Perl program looks like a garbage heap, that Perl can be as bland and aseptic as Java, that Perl is not necessarily the language that most closely resembles quick-drying shit in a tube, from which you can squirt out the contents into any shape you want and get your complete, finished artifact in only twenty minutes and only slightly smelly.

No, sorry, folks. Not everything we do is a brilliant, diamond-like jewel, polished to a luminous gloss with pages torn from one of Donald Knuth's books. This line-drawing program was squirted out of a tube, and a fine brown piece of engineering it is.

        #!/usr/bin/perl

        my ($S) = shift || 50;
$S here is "size". The default is to turn every character in the input into a 50×50 pixel tile. Here's the previous example with $S=10:

        my ($h, $w);
        my $output = [];
        while (<>) {
          chomp;
          $w ||= length();
          $h++;
          push @$output, convert($_);
        }  
The biggest defect in the program is right here: it assumes that each line will have the same width $w. Lines all must be space-padded to the same width. Fixing this is left as an easy exercise, but it wasn't as easy as padding the inputs, so I didn't do it.

The magic happens here:

        open STDOUT, "| pnmscale 1 | cjpeg" or die $!;
        print "P1\n", $w * $S, " ", $h * $S, "\n";
        print $_, "\n" for @$output;
        exit;
The output is run through cjpeg to convert the PBM data to JPEG. For some reason cjpeg doesn't accept PBM data, only PGM or PPM, however, so the output first goes through pnmscale, which resizes a P?M input. Here the scale factor is 1, which is a no-op, except that pnmscale happens to turn a PBM input into a PGM output. This is what is known in the business as a "trick". (There is a pbmtopgm program, but it does something different.)

If we wanted gif output, we could have used "| ppmtogif" instead. If we wanted output in Symbolics Lisp Machine format, we could have used "| pgmtolispm" instead. Ah, the glories of interchange formats.

I'm going to omit the details of convert, which just breaks each line into characters, calls convert_ch on each character, and assembles the results. (The complete source code is here if you want to see it anyway.) The business end of the program is convert_ch:

        # 
        sub convert_ch {
          my @rows;
          my $ch = shift;
          my $up = $ch =~ /[<|>^'`+]/i;
          my $dn = $ch =~ /[<|>V.,+]/i;
          my $lt = $ch =~ /[-<V^,`+]/i;
          my $rt = $ch =~ /[->V^.'+]/i;
These last four variables record whether the tile has a line from its center going up, down, left, or right respectively. For example, "|" produces a tile with lines coming up and down from the center, but not left or right. The /i in the regexes is because I kept writing v instead of V in the inputs.

          my $top = int($S * 0.4);
          my $mid = int($S * 0.2);
          my $bot = int($S * 0.4);
The tile is divided into three bands, of the indicated widths. This probably looks bad, or fails utterly, unless $S is a multiple of 5. I haven't tried it. Do you think I care? Hint: I haven't tried it.

          my $v0 = "0" x $S;
          my $v1 = "0" x $top . "1" x $mid . "0" x $bot;
          push @rows, ($up ? $v1 : $v0) x $top;
This assembles the top portion of the tile, including the "up" line, if there is one. Note that despite their names, $top also determines the width of the left portion of the tile, and $bot determines the width of the right portion. The letter "v" here is for "vertical".

Perhaps I should explain for the benefit of the readers of Planet Haskell (if any of them have read this far and not yet fainted with disgust) that "$a x $b" in Perl is like concat (replicate b a) in the better sorts of languages.

          my $ls = $lt ? "1" : "0";
          my $ms = ($lt || $rt || $up || $dn) ? "1" : "0";
          my $rs = $rt ? "1" : "0";
          push @rows, ($ls x $top . $ms x $mid . $rs x $bot) x $mid;
This assembles the middle section, including the "left" and "right" lines.

          push @rows, ($dn ? $v1 : $v0) x $bot;
This does the bottom section.

          return @rows;
        }
And we are done. Nothing to it. Adding diagonal lines would be a fairly simple matter.

Download the complete source code if you haven't seen enough yet.

There is no part of this program of which I am proud. Rather, I am proud of the thing as a whole. It did the job I needed, and it did it by 5 PM. Larry Wall once said that "a Perl script is correct if it's halfway readable and gets the job done before your boss fires you." Thank you, Larry.

No, that is not quite true. There is one line in this program that I'm proud of. I noticed after I finished that there is exactly one comment in this program, and it is blank. I don't know how that got in there, but I decided to leave it in. Who says program code can't be funny?


[Other articles in category /prog/perl] permanent link

Fri, 14 Mar 2008

Throttling qmail
This may well turn out to be another oops. Sometimes when I screw around with the mail system, it's a big win, and sometimes it's a big lose. I don't know yet how this will turn out.

Since I moved house, I have all sorts of internet-related problems that I didn't have before. I used to do business with a small ISP, and I ran my own web server, my own mail service, and so on. When something was wrong, or I needed them to do something, I called or emailed and they did it. Everything was fine.

Since moving, my ISP is Verizon. I have great respect for Verizon as a provider of telephone services. They have been doing it for over a hundred years, and they are good at it. Maybe in a hundred years they will be good at providing computer network services too. Maybe it will take less than a hundred years. But I'm not as young as I once was, and whenever that glorious day comes, I don't suppose I'll be around to see it.

One of the unexpected problems that arose when I switched ISPs was that Verizon helpfully blocks incoming access to port 80. I had moved my blog to outside hosting anyway, because the blog was consuming too much bandwidth, so I moved the other plover.com web services to the same place. There are still some things that don't work, but I'm dealing with them as I have time.

Another problem was that a lot of sites now rejected my SMTP connections. My address was in a different netblock. A Verizon DSL netblock. Remote SMTP servers assume that anybody who is dumb enough to sign up with Verizon is also too dumb to run their own MTA. So any mail coming from a DSL connection in Verizonland must be spam, probably generated by some Trojan software on some infected Windows box.

The solution here (short of getting rid of Verizon) is to relay the mail through Verizon's SMTP relay service. mail.plover.com sends to outgoing.verizon.net, and lets outgoing.verizon.net forward the mail to its final destination. Fine.

But but but.

If my machine sends more than X messages per Y time, outgoing.verizon.net will assume that mail.plover.com has been taken over by a Trojan spam generator, and cut off access. All outgoing mail will be rejected with a permanent failure.

So what happens if someone sends a message to one of the 500-subscriber email lists that I host here? mail.plover.com generates 500 outgoing messages, sends the first hundred or so through Verizon. Then Verizon cuts off my mail service. The mailing list detects 400 bounce messages, and unsubscribes 400 subscribers. If any mail comes in for another mailing list before Verizon lifts my ban, every outgoing message will bounce and every subscriber will be unsubscribed.

One solution is to get a better mail provider. Lorrie has an Earthlink account that comes with outbound mail relay service. But they do the same thing for the same reason. My Dreamhost subscription comes with an outbound mail relay service. But they do the same thing for the same reason. My Pobox.com account comes with an unlimited outbound mail relay service. But they require SASL authentication. If there's a SASL patch for qmail, I haven't been able to find it. I could implement it myself, I suppose, but I don't wanna.

So far there are at least five solutions that are on the "eh, maybe, if I have to" list:

  • Get a non-suck ISP
  • Find a better mail relay service
  • Hack SASL into qmail and send mail through Pobox.com
  • Do some skanky thing with serialmail
  • Get rid of qmail in favor of postfix, which presumably supports SASL
(Yeah, I know the Postfix weenies in the audience are shaking their heads sadly and wondering when the scales will fall from my eyes. They show up at my door every Sunday morning in their starched white shirts and their pictures of DJB with horns and a pointy tail...)

It also occurred to me in the shower this morning that the old ISP might be willing to sell me mail relaying and nothing else, for a small fee. That might be worth pursuing. It's gotta be easier than turning qmail-remote into a SASL mail client.

The serialmail thing is worth a couple of sentences, because there's an autoresponder on the qmail-users mailing-list that replies with "Use serialmail. This is discussed in the archives." whenever someone says the word "throttle". The serialmail suite, also written by Daniel J. Bernstein, takes a maildir-format directory and posts every message in it to some remote server, one message at a time. Say you want to run qmail on your laptop. Then you arrange to have qmail deliver all its mail into a maildir, and then when your laptop is connected to the network, you run serialmail, and it delivers the mail from the maildir to your mail relay host. serialmail is good for some throttling problems. You can run serialmail under control of a daemon that will cut off its network connection after it has written a certain amount of data, for example. But there seems to be no easy way to do what I want with serialmail, because it always wants to deliver all the messages from the maildir, and I want it to deliver one message.

There have been some people on the qmail-users mailing-list asking for something close to what I want, and sometimes the answer was "qmail was designed to deliver mail as quickly and efficiently as possible, so it won't do what you want." This is a variation of "Our software doesn't do what you want, so I'll tell you that you shouldn't want to do it." That's another rant for another day. Anyway, I shouldn't badmouth qmail-users mailing-list, because the archives did get me what I wanted. It's only a stopgap solution, and it might turn out to be a big mistake, but so far it seems okay, and so at last I am coming to the point of this article.

I hacked qmail to support outbound message rate throttling. Following a suggestion of Richard Lyons from the qmail-users mailing-list, it was much easier to do than I had initially thought.

Here's how it works. Whenever qmail wants to try to deliver a message to a remote address, it runs a program called qmail-remote. qmail-remote is responsible for looking up the MX records for the host, contacting the right server, conducting the SMTP conversation, and returning a status code back to the main component. Rather than hacking directly on qmail-remote, I've replaced it with a wrapper. The real qmail-remote is now in qmail-remote-real. The qmail-remote program is now written in Perl. It maintains a log file recording the times at which the last few messages were sent. When it runs, it reads the log file, and a policy file that says how quickly it is allowed to send messages. If it is okay to send another message, the Perl program appends the current time to the log file and invokes the real qmail-remote. Otherwise, it sleeps for a while and checks again.

The program is not strictly correct. It has some race conditions. Suppose the policy limits qmail to sending 8 messages per minute. Suppose 7 messages have been sent in the last minute. Then six instances of qmail-remote might all run at once, decide that it is OK to send a message, and send one. Then 13 messages have been sent in the last minute, which exceeds the policy limit. So far this has not been much of a problem. It's happened twice in the last few hours that the system sent 9 messages in a minute instead of 8. If it worries me too much, I can tell qmail to run only one qmail-remote at a time, instead of 10. On a normal qmail system, qmail speeds up outbound delivery by running multiple qmail-remote processes concurrently. On my crippled system, speeding up outbound delivery is just what I'm trying to avoid. Running at most one qmail-remote at a time will cure all race conditions. If I were doing the project over, I think I'd take out all the file locking and such, and just run one qmail-remote. But I didn't think of it in time, and for now I think I'll live with the race conditions and see what happens.

So let's see? What else is interesting about this program? I made at least one error, and almost made at least one more.

The almost-error was this: The original design for the program was something like:

  1. do
    • lock the history file, read it, and unlock it
    until it's time to send a message
  2. lock the history file, update it, and unlock it
  3. send the message
This is a classic mistake in writing programs that run concurrently and update a file. The problem is that process A update the file after process B reads but before B updates it. Then B's update will destroy A's.

One way to fix this is to have the processes append to the history file, but never remove anything from it. That is clearly not a sustainable strategy. Someone must remove expired entries from the history file.

Another fix is to have the read and the update in the same critical section:

  1. lock the history file
  2. do
    • read the history file
    until it's time to send a message
  3. update the history file and unlock it
  4. send the message
But that loop could take a long time, during which no other qmail-remote process can make progress. I had decided that I wanted to try to retain the concurrency, and so I wasn't willing to accept this.

Cleaning the history file could be done by a separate process that periodically locks the file and rewrites it. But instead, I have the qmail-remote processes to it on the fly:

  1. do
    • lock the history file, read it, and unlock it
    until it's time to send a message
  2. lock the history file, read it, update it, and unlock it
  3. send the message
I'm happy that I didn't actually make this mistake. I only thought about it.

Here's a mistake that I did make. This is the block of code that sleeps until it's time to send the message:

          while (@last >= $msgs) {
            my $oldest = $last[0];
            my $age = time() - $oldest;
            my $zzz = $time - $age + int(rand(3));
            $zzz = 1 if $zzz < 1;
       #    Log("Sleeping for $zzz secs");
            sleep $zzz;
            shift @last while $last[0] < time() - $time;
            load_policy();
          }
The throttling policy is expressed by two numbers, $msgs and $time, and the program tries to send no more than $msgs messages per $time seconds. The @last array contains a list of Unix epoch timestamps of the times at which the messages of the last $time seconds were sent. So the loop condition checks to see if fewer than $msgs messages were sent in the last $time seconds. If not, the program continues immediately, possibly posting its message. (It rereads the history file first, in case some other messages have been posted while it was asleep.)

Otherwise the program will sleep for a while. The first three lines in the loop calculate how long to sleep for. It sleeps until the time the oldest message in the history will fall off the queue, possibly plus a second or two. Then the crucial line:

            shift @last while $last[0] < time() - $time;
which discards the expired items from the history. Finally, the call to load_policy() checks to see if the policy has changed, and the loop repeats if necessary.

The bug is in this crucial line. if @last becomes empty, this line turns into an infinite busy-loop. It should have been:

            shift @last while @last && $last[0] < time() - $time;
Whoops. I noticed this this morning when my system's load was around 12, and eight or nine qmail-remote processes were collectively eating 100% of the CPU. I would have noticed sooner, but outbound deliveries hadn't come to a complete halt yet.

Incidentally, there's another potential problem here arising from the concurrency. A process will complete the sleep loop in at most $time+3 seconds. But then it will go back and reread the history file, and it may have to repeat the loop. This could go on indefinitely if the system is busy. I can't think of a good way to fix this without getting rid of the concurrent qmail-remote processes.

Here's the code. I hereby place it in the public domain. It was written between 1 AM and 3 AM last night, so don't expect too much.


[Other articles in category /Unix] permanent link

Sat, 08 Mar 2008

On risk
Consider the following game: You bet one dollar on the throw of a die. If the die comes up 6, you get your dollar back plus 25 more dollars. Otherwise, you lose your dollar. You can play as much as you want to. This is a great moneymaking proposition, because your expected winnings are four dollars on each game. Play a hundred times, you can expect to be about four hundred dollars ahead. Even if you're only allowed to play once, you would probably choose to play this game.

I pulled some sleight-of-hand in the previous paragraph. I said the game was a good deal "because" the expected winnings were positive. But that's not sufficient. If it were, the following game would also be a good deal: You bet one million dollars on the throw of a die. If the die comes up 6, you get your million back plus 25 million more. Otherwise, you lose your million.

For some people, the second game is a good deal. For most people, including me, it's obviously a very bad idea. To get a million dollars, I'd at least have to mortgage everything I owned. Then I'd be under a crushing debt for the rest of my life, with 83% likelihood. But the expected return of the two games is the same; this shows that a good expected return is not a sufficient condition for a good investment.

The difference, of course, is that the second game is much riskier than the first.

I think most people understand this, but nevertheless you still hear them say a lot of dumb stuff about risk. For example, many people like to say that the lottery is a stupidity tax on people who don't understand basic arithmetic, and that nobody would play the lottery unless they were very stupid, because it's trivial to see that the expected return is very poor.

I used to meet people at parties who said this. I would point out that by this reasoning, fire insurance is also a stupidity tax on people who don't understand basic arithmetic, because it's clear that the expected return on fire insurance is negative. I did get argument from folks from time to time, but it's really not arguable. If fire insurance didn't have an expected negative return for the customer, the insurance company would go out of business. In fact, the insurance company employs a whole department full of mathematicians whose job it is to make sure that the value of the premium you pay exceeds the expected cost of the benefits that the company will pay. So there are only three choices here:

  1. You're better at simple arithmetic than the insurance company's actuarial department, or
  2. You should avoid buying insurance, since it's just a sucker bet, or
  3. The issue of insurance and lotteries is a little more complex than that.
I believe that the answer, as usual, is #3. (Advice to people wishing to become smarter: Get in the habit of assuming that everything is more complex than you imagine.)

Once again the issue is not so much the expected return as it is risk. You pay the insurance premiums in order to mitigate the risk of a fire. One big fire could wipe you out completely. So you insure your house against fire so that you can't be completely wiped out. In return, you pay small, predictable sums of money regularly.

Another way to look at this is to consider the idea of a utility function. This is just a fancy term for the observation that the usefulness of money is not a linear function of the face value of the money. Once you have a million dollars, the utility of another hundred is much lower than it is to someone who only has ten thousand.

When you calculate expected returns, you need to calculate the expected increase of the utility, not the expected return of the nominal face value of the money. Consider this thought experiment: you may bet one cent on a game that will pay you ten thousand dollars if you win, which it will do one time in two million. Do you play? Well, maybe you do, because if you lose, so what? It's only one cent, and you will never miss it. The utility of one cent is essentially zero. The utility of ten thousand dollars, on the other hand, is very high, much higher than two million times zero. But if you like this game, you're open to the same charge of not understanding simple arithmetic as the lottery people are, because the expected return is very low, about the same as the lottery. The game is the same as the lottery, only the cost and the payoff are each a hundred times smaller.

In the fire insurance scenario, I am betting a small amount of money, with comparatively low utility, against a very large amount with much higher utility. One can view the lottery as analogous. If I buy a lottery ticket for $1, it's not because I misunderstand arithmetic. It's because the utility of $1 is low for me. I could blow $.85 on a candy bar tomorrow at lunch without thinking about it much. But the utility of winning millions is very high. With ten million dollars, I could pay off my mortgage, quit my job, and spend the rest of my life travelling around and writing articles. The value of even a hundred-millionth chance of this happening might well be higher than the value of gobbling one more candy bar that my body didn't need anyway.

Here's an exercise I've been doing lately, trying to estimate the value I ascribe to my own life. I am afraid that this is a trite subject, If so, I apologize. But if not, try it yourself, and you might discover something interesting. Suppose you have the option to play Russian Roulette, in return for which you will receive a fee of x. The gun has one million chambers, one of which holds a bullet. If you get the bullet, you die. Otherwise you collect the fee. What is the minimum value for x that will induce you to play? Would you play if x were one million dollars? I would. It's an almost sure million, and a million is a huge amount of money to me. And I probably take bigger than million-to-one risks every time I cross the street, so why not? So one might say that this demonstrates that my own estimate of the value of my own life is less than 1012 dollars.

Would I play for a thousand dollars? No, probably not. But where's the cutoff? Ten thousand is a maybe, a hundred thousand is a probably. (I rather suspect that the cutoff is on the same order of magnitude as the mortgage on my house. This thought threatens to open a whole can of disturbing philosophical worms.)

Now let's up the risk. I've already agreed to bet my life on a million-to-one chance in return for a million dollars. The expected-value theory says that I should also be willing to bet it on a thousand-to-one chance for a billion dollars. Am I? No way. The utility of a billion dollars is much less than a thousand times the utility of a million, for me. For Donald Trump, it might be different.

As a final exercise in thinking about risk, consider this: Folks at NASA estimate that your chance of being killed by a meteorite are on the order of 1 in 25,000. It's not because you're likely to be hit in the head. Nobody in recorded history has been killed by a meteor. It's because really big meteors do come by every so often, and when (not if, but when) one hits the earth, it'll kill just about everyone.

[ Addendum 20060425: There is a followup article to this one. ]


[Other articles in category ] permanent link

Tue, 04 Mar 2008

"Boolean" or "boolean"?
In a recent article I wrote:

... a logical negation function ... takes a boolean argument and returns a boolean result.
I worried for some time about whether to capitalize "boolean" here. But writing "Boolean" felt strange enough that I didn't actually try it to see how it looked on the page.

I looked at the the Big Dictionary, and all the citations were capitalized. But the most recent one was from 1964, so that was not much help.

Then I tried Google search for "boolean capitalized". The first hit was a helpful article by Eric Lippert. M. Lippert starts by pointing out that "Boolean" means "pertaining to George Boole", and so should be capitalized. That much I knew already.

But then he pointed out a countervailing consideration:

English writers do not usually capitalize the eponyms "shrapnel" (Henry Shrapnel, 1761-1842), "diesel" (Rudolf Diesel, 1858-1913), "saxophone" (Adolphe Sax, 1814-1894), "baud" (Emile Baudot, 1845-1903), "ampere" (Andre Ampere, 1775-1836), "chauvinist" (Nicolas Chauvin, 1790-?), "nicotine" (Jean Nicot, 1530-1600) or "teddy bear" (Theodore Roosevelt, 1858-1916).
Isn't that a great paragraph? I just had to quote the whole thing.

Lippert concluded that the tendency is to capitalize an eponym when it is an adjective, but not when it is a noun. (Except when it isn't that way; consider "diesel engine". English is what it is.)

I went back to my example to see if that was why I resisted capitalizing "Boolean":

... takes a boolean argument and returns a boolean result.
Hmm, no, that wasn't it. I was using "boolean" as an adjective in both places. Wasn't I?

Something seemed wrong. I tried changing the example:

... takes an integer argument and returns an integer result.
Aha! Notice "integer", not "integral". "Integral" would have been acceptable also, but that isn't analogous to the expression I intended. I wasn't using "boolean" as an adjective to modify "argument" and "result". I was using it as a noun to denote a certain kind of data, as part of a noun phrase. So it is a noun, and that's why I didn't want to capitalize it.

I would have been happy to have written "takes a boolean and returns a boolean", and I think that's the controlling criterion.

Sorry, George.


[Other articles in category /lang] permanent link

Mon, 03 Mar 2008

Ralph Johnson on design patterns
Last month I wrote an article about design patterns which attracted a lot of favorable attention in blog world. I started by paraphrasing Peter Norvig's observation that:

"Patterns" that are used recurringly in one language may be invisible or trivial in a different language.

and ended by concluding:

Patterns are signs of weakness in programming languages.

When we identify and document one, that should not be the end of the story. Rather, we should have the long-term goal of trying to understand how to improve the language so that the pattern becomes invisible or unnecessary.

Ralph Johnson, one of the four authors of the famous book Design Patterns, took note of my article and responded. I found Johnson's response really interesting, and curious in a number of ways. I think everyone who was interested in my article should read his too.

[ Addendum 20070127: The link above to Ralph Johnson's response is correct, but your client will be rejected if you are referred from here. To see his blog page, visit the page without clicking on the link. ]

Johnson raises several points. First there is a meta-issue to deal with. Johnson says:

He clearly thinks that what he says is surprising. And other people think it is surprising, too. That is surprising to me.
I did think that what I had to say was interesting and worth saying, of course, or I would not have said it. And I was not surprised to find that other people agreed with me.

One thing that I did find surprising is the uniformity of other people's surprise and interest. There were dozens of blog posts and comments in the following two weeks, all pretty much saying what a great article I had written and how right I was. I tracked the responses as carefully as I could, and I did not see any articles that called me a dumbass; I did not see any except for Johnson's that suggested that what I was saying was unsurprising.

We can't conclude from this that I am right, of course; people agree with all sorts of stupid crap. But we can conclude that that what I said was surprising and interesting, since people were surprised and interested by it, even people who already have some knowledge of this topic. Johnson is right to be surprised by this, because he thought this was obvious and well-known, and that it was clearly laid out in his book, and he was mistaken. Many or most of the readers of his book have completely missed this point. I didn't miss it, but I didn't get it from the book, either.

Johnson and his three co-authors wrote this book, Design Patterns, which has had a huge influence on the way that programming is practiced. I think a lot of that influence has been malign. Any practice can be corrupted, of course, by being reduced to its formal aspects and applied in a rote fashion. (There's a really superb discussion of this in A. Ya. Khinchin's essay On the Teaching of Mathematics, and a shorter discussion in Polya's How to Solve It, in the section on "Pedantry and Mastery".) That will happen to any successful movement, and the Gang of Four can't take all the blame for that.

But if they really intended that everyone should understand that each design pattern is a demonstration of a weakness in its target language, then they blew it, because it appears that hardly anyone understood that.

Let's pause for a moment to imagine an alternate universe in which the subtitle of the Design Patterns book was not "Elements of Reusable Object-Oriented Software" but "Solutions for Recurring Problems in Object-Oriented Languages". And let's imagine that in each section, after "Pattern name", "Intent", "Motivation", "Applicability", and so forth, there was another subsection titled "Prophylaxis" that went something like this: "The need for the Iterator pattern in C++ appears to be due partly to its inflexible type system and partly to its lack of abstract iteration structures. The iterator pattern is unnecessary in the Python language, which avoids these defects as follows: ... at the expense of ... . In Common Lisp, on the other hand, ... (etc.)".

I would have liked to have seen that universe, but I suppose it's too late now. Oh well.

Anyway, moving on from meta-issues to the issues themselves, Johnson continues:

At the very end, he says that patterns are signs of weakness in programming languages. This is wrong.
This is interesting, and I was going to address it later, but I now think that it's the first evidence of a conceptual mistake that Johnson has made that underlies his entire response to my article, so I'll take it up now.

At the very end of his response, Johnson says:

No matter how complicated your language will be, there will always be things that are not in the language. These things will have to be patterns. So, we can eliminate one set of patterns by moving them into the language, but then we'll just have to focus on other patterns. We don't know what patterns will be important 50 years from now, but it is a safe bet that programmers will still be using patterns of some sort.
Here we are in complete agreement. So, to echo Johnson, I was surprised that he would think this was surprising. But how can we be in complete agrement if what I said was "wrong"? There must be a misunderstanding somewhere.

I think I know where it is. When I said "[Design] Patterns are signs of weakness in programming languages," what I meant was something like "Each design pattern is a sign of a weakness in the programming language to which it applies." But it seems that Johnson thinks that I meant that the very existence of design patterns, at all, is a sign of weakness in all programming languages everywhere.

If I thought that the existence of design patterns, at all, was a sign that current programming languages are defective, as a group, I would see an endpoint to programming language development: someday, we would have a perfect überlanguage in which it would be unnecessary to use patterns because all possible patterns would have been built in already.

I think Johnson thinks this was my point. In the passage quoted above, I think he is addressing the idea of the überlanguage that incorporates all patterns everywhere at all levels of abstraction. And similarly:

Some people like languages with a lot of features. . . . I prefer simple languages.

And again:

No matter how complicated your language will be, there will always be things that are not in the language.

But no, I don't imagine that someday we will have the ultimate language, into which every conceivable pattern has been absorbed. So a lot of what Johnson has to say is only knocking down a straw man.

What I imagine is that when pattern P applies to language L, then, to the extent that some programmer on some project finds themselves needing to use P in their project, the use of P indicates a deficiency in language L for that project.

The absence of a convenient and simple way to do P in language L is not always a problem. You might do a project in language L that does not require the use of pattern P. Then the problem does not manifest, and, whatever L's deficiencies might be for other projects, it is not deficient in that way for your project.

This should not be difficult for anyone to understand. Perl might be a very nice language for writing a program to compile a bioinformatic data file into a more reasonable form; it might be a terrible language for writing a real-time missile guidance system. Its deficiencies operate in the missile guidance project in a way that they may not in the data munging project.

But to the extent that some deficiency does come up in your project, it is a problem, because you are implementing the same design over and over, the same arrangement of objects and classes, to accomplish the same purpose. If the language provided more support for solving this recurring design problem, you wouldn't need to use a "pattern". Consider again the example of the "subroutine" pattern in assembly language: don't you have anything better to do than redesign and re-implement the process of saving the register values in a stack frame, over and over? Well, yes, you do. And that is why you use a language that has that built in. Consider again the example of the "object-oriented class" pattern in C: don't you have anything better to do than redesign and re-implement object-oriented method dispatch with inheritance, over and over? Yes, you do. And that is why you use a language that has that built in, if that is what you need.

By Gamma, Helm, Johnson, and Vlissides' own definition, the problems solved by patterns are recurring problems, and programmers must address them recurringly.

If these problems recurred in every language, we might conclude that they were endemic to programming itself. We might not, but it's hard to say, since if there are any such problems, they have not yet been brought to my attention. Every pattern discovered so far seems to be specific to only a small subset of the world's languages.

So it seems a small step to conclude that these recurring, language-specific problems are actually problems with the languages themselves. No problem is a problem in every language, but rather each problem is a red arrow, pointing at a design flaw in the language in which it appears.

Johnson continues:

Patterns might be a sign of weakness, but they might be a sign of simplicity. . . .
I think this argument fails, in light of the examples I brought up in my original article. The argument is loaded by the use of the word "simplicity". As Einstein said, things should be as simple as possible, but no simpler. In assembly language, "subroutine call" is a pattern. Does Johnson or anyone seriously think that C++ or Smalltalk or Common Lisp or Java would be improved by having the "subroutine call" pattern omitted? The languages might be "simpler", but would they be better?

The alternative, remember, is to require the programmer to use a "pattern": to make them consult a manual of "patterns" to implement a "general arrangement of objects and classes" to solve the subroutine-call problem every time it comes up.

I guess you could interpret that as a sign of "simplicity", but it's the wrong kind of simplicity. Language designers have a hard problem to solve. If they don't put enough stuff into the language, it'll be too hard to use. But if they put in too much stuff, it'll be confusing and hard to program, like C++. One reason it's hard to be a language designer is that it's hard to know what to put in and what to leave out. There is an extremely complex tradeoff between simplicity and functionality.

But in the case of "patterns", it's much easier to understand the tradeoff. A pattern, remember, is a general method for solving "a recurring design problem". Patterns might be a sign of "simplicity", but if so, they are a sign of simplicity in the wrong place, a place where the language needs to be less simple and more featureful. Because patterns are solutions to recurring design problems.

If you're a language designer, and a "pattern" comes to your attention, then you have a great opportunity. The programmers using your language have a recurring problem. They have to implement the same solution to it, over and over. Clearly, this is a good place to try to expend some design effort; perhaps you can trade off a little simplicity for some functionality and fix the language so that the problem is a problem no longer.

Getting rid of one recurring design problem might create new ones. But if the new problems are operating at a higher level of abstraction, you may have a win. Getting rid of the need for the "subroutine call" pattern in assembly language opened up all sorts of new problems: when and how do I do recursion? When and how do I do coroutines?

Getting rid of the "object-oriented class" pattern in C created a need for higher-level patterns, including the ones described in the Design Patterns book. When people didn't have to worry about implementing inheritance themselves, a lot of their attention was freed up, and they could notice patterns like Façade.

As Alfred North Whitehead says, civilization advances by extending the number of important operations which we can perform without thinking about them. The Design Patterns approach seems to be to identify the important operations and then to think about them over and over and over and over and over.

Or so it seems to me. Johnson's next paragraph makes me wonder if I've completely missed his point, because it seems completely senseless to me:

There is a trade-off between putting something in your programming language and making it be a convention, or perhaps putting it in the library. Smalltalk makes "constructor" be a convention. Arithmetic is in the library, not in the language. Control structures and exception handling are from the library, not in the language.
Huh? Why does "library" matter? Unless I have missed something essential, whether something is in the "language" or the "library" is entirely an implementation matter, to be left to the discretion of the compiler writer. Is printf part of the C language, or its library? The library, everyone knows that. Oh, well, except that its behavior is completely standardized by the language standard, and it is completely permissible for the compiler writer to implement printf by putting a special case into the compiler that is enabled when the compiler happens to see the directive #include <stdio.h>. There is absolutely no requirement that printf be loaded from a separate file or anything like that.

Or consider Perl's dbmopen function. Prior to version 5.000, it was part of the "language", in some sense; in 5.000 and later, it became part of the "library". But what's the difference, really? I can't find any.

Is Johnson talking about some syntactic or semantic difference here? Maybe if I knew more about Smalltalk, I would understand his point. As it is, it seems completely daft, which I interpret to mean that there's something that went completely over my head.

Well, the whole article leaves me wondering if maybe I missed his point, because Johnson is presumably a smart guy, but his argument about the built-in features vs. libraries makes no sense to me, his argument about simplicity seems so clearly and obviously dismantled by his own definition of patterns, and his apparent attack on a straw man seems so obviously erroneous.

But I can take some consolation in the thought that if I did miss his point, I'm not the only one, because the one thing I can be sure of in all of this is that a lot of other people have been missing his point for years.

Johnson says at the beginning that he "wasn't sure whether to be happy or unhappy". If I had written a book as successful and widely read as Design Patterns and then I found out that everyone had completely misunderstood it, I think I would be unhappy. But perhaps that's just my own grumpy personality.

[ Addendum 20080303: Miles Gould wrote a pleasant and insightful article on Johnson's point about libraries vs. language features. As I surmised, there was indeed a valuable point that went over my head. I said I couldn't find any difference between "language" and "library", but, as M. Gould explains, there is an important difference that I did not appreciate in this context. ]


[Other articles in category /prog] permanent link

Facts you should know
The Minneapolis Star Tribune offers an article on Why is the sky blue? Facts you should know, subtitled "Scientists offer 10 basic questions to test your knowledge". [ The original article has been removed; here is another copy. ] I had been planning to write for a while on why the sky is blue, and how the conventional answers are pretty crappy. (The short answer is "Rayleigh scattering", but that's another article for another day. Even crappier are the common explanations of why the sea is blue. You often hear the explanation that the sea is blue because it reflects the sky. This is obviously nonsense. The surface of the sea does reflect the sky, perhaps, but when the sea is blue, it is a deep, beautiful blue all the way down. The right answer is, again, Rayleigh scattering.)

The author, Andrea L. Gawrylewski, surveyed a number of scientists and educators and asked them "What is one science question every high school graduate should be able to answer?" The questions follow.

  1. What percentage of the earth is covered by water?

    This is the best question that the guy from Woods Hole Oceanographic Institute can come up with?

    It's a plain factual question, something you could learn in two seconds. You can know the answer to this question and still have no understanding whatever of biology, meteorology, geology, oceanography, or any other scientific matter of any importance. If I were going to make a list of the ten things that are most broken about science education, it would be that science education emphasizes stupid trivia like this at the expense of substantive matters.

    For a replacement question, how about "Why is it important that three-fourths of the Earth's surface is covered with water?" It's easy to recognize a good question. A good question is one that is quick to ask and long to answer. My question requires a long answer. This one does not.

  2. What sorts of signals does the brain use to communicate sensations, thoughts and actions?

    This one is a little better. But the answer given, "The single cells in the brain communicate through electrical and chemical signals" is still disappointing. It is an answer at the physical level. A more interesting answer would discuss the protocol layers. How does the brain perform error correction? How is the information actually encoded? I may be mistaken, but I think this stuff is all still a Big Mystery.

    The question given asks about how the brain communicates thoughts. The answer given completely fails to answer this question. OK, the brain uses electrical and chemical signals. So how does the brain use electricity and chemicals to communicate thoughts, then?

  3. Did dinosaurs and humans ever exist at the same time?

    Here's another factual question, one with even less information content than the one about the water. This one at least has some profound philosophical implications: since the answer is "no", it implies that people haven't always been on the earth. Is this really the one question every high school graduate should be able to answer? Why dinosaurs? Why not, say, trilobites?

    I think the author (Andrew C. Revkin of the New York Times) is probably trying to strike a blow against creationism here. But I think a better question would be something like "what is the origin of humanity?"

  4. What is Darwin's theory of the origin of species?

    At last we have a really substantive question. I think it's fair to say that high school graduates should be able to give an account of Darwinian thinking. I would not have picked the theory of the origin of species, specifically, particularly because the origin of species is not yet fully understood. Instead, I would have wanted to ask "What is Darwin's theory of evolution by natural selection?" And in fact the answer given strongly suggests that this is the question that the author thought he was asking.

    But I can't complain about the subject matter. The theory of evolution is certainly one of the most important ideas in all of science.

  5. Why does a year consist of 365 days, and a day of 24 hours?

    I got to this question and sighed in relief. "Ah," I said, "at last, something subtle." It is subtle because the two parts of the question appear to be similar, but in fact are quite different. A year is 365 days long because the earth spins on its axis in about 1/365th the time it takes to revolve around the sun. This matter has important implications. For example, why do we need to have leap years and what would happen if we didn't?

    The second part of the question, however, is entirely different. It is not astronomical but historical. Days have 24 hours because some Babylonian thought it would be convenient to divide the day and the night into 12 hours each. It could just as easily have been 1000 hours. We are stuck with 365.2422 whether we like it or not.

    The answer given appears to be completely oblivious that there is anything interesting going on here. As far as it is concerned, the two things are exactly the same. "A year, 365 days" it says, "is the time it takes for the earth to travel around the sun. A day, 24 hours, is the time it takes for the earth to spin around once on its axis."

  6. Why is the sky blue?

    I have no complaint here with the question, and the answer is all right, I suppose. (Although I still have a fondness for "because it reflects the sea.") But really the issue is rather tricky. It is not enough to just invoke Rayleigh scattering and point out that the high-frequency photons are scattered a lot more than the low-frequency ones. You need to think about the paths taken by the photons: The ones coming from the sky have, of course, come originally from the sun in a totally different direction, hit the atmosphere obliquely, and been scattered downward into your eyes. The sun itself looks slightly redder because the blue photons that were heading directly toward your eyes are scattered away; this effect is quite pronounced when there is more scattering than usual, as when there are particles of soot in the air, or at sunset.

    The explanation doesn't end there. Since the violet photons are scattered even more than the blue ones, why isn't the sky violet? I asked several professors of physics this question and never got a good answer. I eventually decided that it was because there aren't very many of them; because of the blackbody radiation law, the intensity of the sun's light falls off quite rapidly as the frequency increases, past a certain point. And I was delighted to see that the Wikipedia article on Rayleigh scattering addresses this exact point and brings up another matter I hadn't considered: your eyes are much more sensitive to blue light than to violet.

    The full explanation goes on even further: to explain the blackbody radiation and the Rayleigh scattering itself, you need to use quantum physical theories. In fact, the failure of classical physics to explain blackbody radiation was the impetus that led Max Planck to invent the quantum theory in the first place.

    So this question gets an A+ from me: It's a short question with a really long answer.

  7. What causes a rainbow?

    I have no issue with this question. I don't know if I'd want to select it as the "one science question every high school graduate should be able to answer", but it certainly isn't a terrible choice like some of the others.

  8. What is it that makes diseases caused by viruses and bacteria hard to treat?

    The phrasing of this one puzzled me. Did the author mean to suggest that genetic disorders, geriatric disorders, and prion diseases are not hard to treat? No, I suppose not. But still, the question seems philosophically strange. Why says that diseases are hard to treat? A lot of formerly fatal bacterial diseases are easy to treat: treating cholera is just a matter of giving the patient IV fluids until it goes away by itself; a course of antibiotics and your case of the bubonic plague will clear right up. And even supposing that we agree that these diseases are hard to treat, how can you rule out "answers" like "because we aren't very clever"? I just don't understand what's being asked here.

    The answer gives a bit of a hint about what the question means. It begins "influenza viruses and others continually change over time, usually by mutation." If that's what you're looking for, why not just ask why there's no cure for the common cold?

  9. How old are the oldest fossils on earth?

    Oh boy, another stupid question about how much water there is on the surface of the earth. I guessed a billion years; the answer turns out to be about 3.8 billion years. I think this, like the one about the dinosaurs, is a question motivated by a desire to rule out creationism. But I think it's an inept way of doing so, and the question itself is a loser.

  10. Why do we put salt on sidewalks when it snows?

    Gee, why do we do that? Well, the salt depresses the freezing point of the water, so that it melts at a lower temperature, one, we hope, that is lower than the temperature outside, so that the snow melts. And if it doesn't melt, the salt is gritty and provides some traction when we walk on it.

    But why does the salt depress the freezing point? I don't know; I've never understood this. The answer given in the article is no damn good:

    Adding salt to snow or ice increases the number of molecules on the ground surface and makes it harder for the water to freeze. Salt can lower freezing temperatures on sidewalks to 15 degrees from 32 degrees.

    The second sentence really doesn't add anything at all, and the first one is so plainly nonsense I'm not even sure where to start ridiculing it. (If all that is required is an increase in the number of molecules, why won't it work to add more snow?)

    So let me think. The water molecules are joggling around, bumping into each other, and the snow is a low-energy crystalline state that they would like to fall into. At low temperatures, even when a molecule manages to joggle its way out of the crystal, it's likely to fall back in pretty quickly, and if not there's probably another molecule around that can fall in instead. At lower temperatures, the molecules joggle less, and there's an equilibrium in this in-and-out exchange that results in more ice and less water than at higher temperatures.

    When the salt is around, the salt molecules might fall into the holes in the ice crystal instead, get in the way of the water molecules, and prevent the crystal from re-forming, so that's going to shift the equilibrium in favor of water and against ice. So if you want to reach the same equilibrium that's normally reached at zero degrees, you need to subtract some of the joggling energy, to compensate for the interference of the salt, and that's why the freezing temperature goes down.

    I think that's right, or close to it,and it certainly sounds pretty good, but my usual physics disclaimer applies: While I know next to nothing about physics, I can spin a line of bullshit that sounds plausible enough to fool people, including myself, into believing it.

    (Is there a such a thing as a salt molecule? Or does it really take the form of isolated sodium and chlorine ions? I guess it doesn't matter much in this instance.)

    I think this question is a winner.

    [ Addendum 20060416: Allan Farrell's blog Bento Box has another explanation of this. It seems to me that M. Farrell knows a lot more about it than I do, but also that my own explanation was essentially correct. But there may be subtle errors in my explanantion that I didn't notice, so you may want to read the other one and compare.]

    [ Addendum 20070204: A correspondent at MIT provided an alternative explanantion. ]

So, totaling up, we have three trivial factual questions (amount of earth's surface covered with water, age of fossils, and how many angels can dance on the head of a pin), one giant mystery (information representation in the brain), and six good questions (theory of evolution, length of day and year, why the sky is blue, the rainbow, disease, and freezing point depression). Of these, I think the theory-of-evolution one is probably the best.

The questions overall were a lot better than the answers, which made me wonder if perhaps M. Gawrylewski had written the answers herself.


[Other articles in category /physics] permanent link

The Spite House

Order
New York's Architectural Holdouts
New York's Architectural Holdouts
with kickback
no kickback
The subject of really narrow buildings came up on Reddit last week, and my post about the "Spite House" was well-received. Since pictures of it seem to be hard to come by, I scanned the pictures from New York's Architectural Holdouts by Andrew Alpern and Seymour Durst.

The book is worth checking out, particularly if you are familiar with New York. The canonical architectural holdout occurs when a developer is trying to assemble a large parcel of land for a big building, and a little old lady refuses to sell her home. The book is full of astonishing pictures: skyscrapers built with holdout buildings embedded inside them and with holdout buildings wedged underneath them. Skyscrapers built in the shape of the letter E (with the holdouts between the prongs), the letter C (with the holdout in the cup), and the letter Y (with the holdout in the fork).


Photo credit: Jerry Callen
When Henry Siegel, a New York store owner, got news in 1898 that Macy's was going to build a gigantic new flagship store on Herald Square, he bought the corner lot for $375,000 to screw over his competitors. The Herald Square Macy's still has a notch cut out of its corner; see the picture at right. The Macy's store on Queens Boulevard is in the shape of a perfect circle, except for the little bit cut out of one side where the proverbial old lady (this time named Mary Sendek) refused to sell a 7×15-foot back corner of her lot for $200,000 because she wanted her dog to have a place to play. (Here's a satellite view of the building. The notch is clearly visible at the northwest corner, facing 55th Aveue.)

But anyway, the Spite House. The story, as told by Alpern and Durst, is that around 1882, Patrick McQuade wanted to build some houses on 82nd Street at Lexington Avenue. The adjoining parcel of land, around the corner on Lexington, was owned by Joseph Richardson, shown at left. If McQuade could acquire this parcel, he would be able to extend his building all the way to Lexington Avenue, and put windows on that side of the building. No problem: the parcel was a strip of land 102 feet long and five feet wide along Lexington, useless for any other purpose. Surely Richardson would sell.

McQuade offered $1,000, but Richardson demanded $5,000. Unwilling to pay, McQuade started building his houses anyway, complete with windows looking out on Richardson's five-foot-wide strip, which was unbuildable. Or so he thought.


Richardson built a building five feet wide and 102 feet long, blocking McQuade's Lexington Avenue windows. (Click the pictures for large versions.)

The building soon became known as the "Spite House". The photograph above was taken around 1895. Lexington Avenue is torn up for maintenance in this picture.

Richardson took advantage of a clause in the building codes that allowed him to build bay window extensions in his building. This allowed him to extend its maximum width 2'3" beyond the boundary of the lot. (Alpern and Durst say "In those days, such encroachments on the public sidewalks were not prohibited.") The rooms of the Spite House were in these bay window extensions, connected by extremely narrow hallways:


As you can see, the Spite House was divided into two dwellings, each with a separate entrance, four floors, and two rooms on each floor. The rooms were 7'3" wide and were connected by hallways 3'4" wide.

After construction was completed, Richardson moved into the Spite House and lived there until he died in 1897. The pictures below and at left are from that time.

The edge-on photograph below, showing the Spite House's 3'4" frontage on 82nd Street, was taken in 1912.

The Spite House was demolished in 1915.


Picture credits

The photograph of the Macy's Herald Square store is copyright ©2004 Jerry Callen, and is used with permission.

All other pictures and photographs are in the public domain. I took them from pages 122–124 of the book New York's Architectural Holdouts, by Alpern and Durst. The original sources, as given by Alpern and Durst, are as follows:

Collection of Andrew Alpern.

January 1897 issue of Scientific American.



New York Journal, 5 June 1897
New York Public Service Commission


[Other articles in category /tech] permanent link

Uniquely-decodable codes revisited
[ This is a followup to an earlier article. ]

Alan Morgan wrote to ask if there was a difference between uniquely-decodable (UD) codes for strings and for streams. That is, is there a code for which every finite string is UD, but for which some infinite sequence of symbols has multiple decodings.

I pondered this a bit, and after a little guessing came up with an example: { "a", "ab", "bb" } is UD, because it is a suffix code. But the stream "abbbbbbbbbb..." can be decoded in two ways.

After I found the example, I realized that I shouldn't have needed to guess, because I already knew that you sometimes have to see the last symbol of a string before you can know how to decode it, and in such a code, if there is no such symbol, the decoding must be ambiguous. The code above is UD, but to decode "abbbbbbbbbbbbbbb" you have to count the "b"s to figure out whether the first code word is "a" or "ab".

Let's say that a code is UD+ if it has the property that no two infinite sequences of code words have the same concatenation. Can we characterize the UD+ codes? Clearly, UD+ implies UD, and the example above shows that the converse is not true. A simple argument shows that all prefix codes are UD+. So the question now is, are there UD+ codes that are not prefix codes? I don't know.

[ Addendum 20080303: Gareth McCaughan points out that { "a", "ab" } is UD+ but not prefix. ]


[Other articles in category /CS] permanent link

Sun, 02 Mar 2008

Subtypes and polymorphism

[ Note: You are cautioned that this article is in the oops section of my blog, and so you should understand it as a description of a mistake that I have made. ]

For more than a year now my day job has involved work on a large project written entirely in Java. I warned my employers that I didn't have any professional experience in Java, but they hired me anyway. So I had to learn Java. Really learn it, I mean. I hadn't looked at it closely since version 1.2 or so.

Java 1.5 has parametrized types, which they call "generics". These looked pretty good to me at first, and a big improvement on the cruddy 1975-style type system Java had had before. But then I made a shocking discovery: If General is a subtype of Soldier, then List<General> is not a subtype of List<Soldier>.

In particular, you cannot:

        List<General> listOfGenerals = ...
        List<Soldier> listOfSoldiers = listOfGenerals;
For a couple of weeks I went around muttering to myself about what idiots the Java people must be. "Geez fuckin' Louise," I said. "The Hindley-Milner languages have had this right for twenty years. How hard would it have been for those Java idiots to pick up a damn book or something?"

I was, of course, completely wrong in all respects. The assignment above leads to problems that are obvious if you think about it a bit, and that should have been obvious to me, and would have been, except that I was so busy exulting in my superiority to the entire Java community that I didn't bother to think about it a bit.

You would like to be able to do this:

        Soldier someSoldier = ...;
        listOfSoldiers.add(someSoldier);
But if you do, you are setting up a type failure:

        General someGeneral = listOfGenerals.getLast();
        someGeneral.orderAttack(...);
Here listOfSoldiers and listOfGenerals refer to the same underlying object, so we put a common soldier into that object back on line 4, and then took it out again on line 5. But line 5 is expecting it to be a General, and it is not. So we either have a type failure right there, or else we have a General variable that holds a a Soldier object, and then on line 6 a mere private is allowed to order an attack, causing a run-time type failure. If we're lucky.

The language designers must forbid one of these operations, and the best choice appears to be to forbid the assignment of the List<General> object to the List<Soldier> variable on line 2. The other choices seem clearly much worse.

The canonical Java generics tutorial has an example just like this one, to explain precisely this feature of Java generics. I would have known this, and I would have saved myself two weeks of grumbling, if I had picked up a damn book or something.

Furthermore, my premise was flawed. The H-M languages (SML, Haskell, Miranda, etc.) have not had this right for twenty years. It is easy to get right in the absence of references. But once you add references the problem becomes notoriously difficult, and SML, for example, has gone through several different strategies for dealing with it, as the years passed and more was gradually learned about the problem.

The naive approach for SML is simple. It says that if α is any type, then there is a type ref α, which is the type of a reference that refers to a storage cell that contains a value of type α. So for example ref int is the type of a reference to an int value. There are three functions for manipulating reference types:

        ref    : α  ref α
        !      : ref α  α
        :=     : (ref α * α)  unit
The ref function takes a value and produces a reference to it, like & in C; if the original value had type α then the result has type ref α. The ! function takes a reference of type ref α and dereferences it, returning the value of type α that it refers to, like * in C. And the := function, usually written infix, takes a reference and a value, stores the value into the place that the reference points to, replacing what was there before, and returns nothing. So for example:

        val a = "Kitty cat";    (*   a : string       *)
        val r = ref a;          (*   r : ref string   *)
        r := "Puppy dog";
        print !r;
This prints Puppy dog. But this next example fails, as you would hope and expect:

        val a = "Kitty cat";
        val r = ref a;
        r := 37;               (* fails *)
because r has type ref string, but 37 has type int, and := requires that the type of the value on the right match the type referred to by the reference on the left.

That is the obvious, naive approach. What goes wrong, though? The canonical example is:

        fun id x = x             (*   id : α → α         *)
        val a = ref id;          (*   a : ref (α → α)    *)

        fun not true = false
          | not false = true ;   (*   not: bool  bool   *)
        a := not;

        (!a) 13                  
The key here is that a is a variable of type ref (α → α), that is, a reference to a cell that can hold a function whose argument is any type α and whose return value is the same type. Here it holds a reference to id, which is the identity function.

Then we define a logical negation function, not, which has type bool → bool, that is, it takes a boolean argument and returns a boolean result. Since this is a subtype of α → α, we can store this function in the cell referenced by a.

Then we dereference a, recovering the value it points to, which, since the assignment, is the not function. But since a has type ref (α → α), !a has type α → α, and so should be applicable to any value. So the application of !a to the int value 13 passes the type checker, and SML blithely applies the not function to 13.

I've already talked about SML way longer than I planned to, and I won't belabor you further with explanations of the various schemes that were hatched over the years to try to sort this out. Suffice it to say that the problem is still an open research area.

Java, of course, is all references from top to bottom, so this issue obtrudes. The Java people do not know the answer either.

The big error that I made here was to jump to the conclusion that the Java world must be populated with idiots who know nothing about type theory or Haskell or anything else that would have tipped them off to the error I thought they had committed. Probably most of them know nothing about that stuff, but there are a lot of them, and presumably some of them have a clue, and perhaps some of them even know a thing or two that I don't. I said a while back that people who want to become smarter should get in the habit of assuming that everything is more complex than they imagine. Here I assumed the opposite.

As P.J. Plauger once said in a similar circumstance, there is a name for people who are so stupid that they think everyone else is stupid instead.

Maybe I won't be that person next time.


[Other articles in category /oops] permanent link

The missing deltahedron
I recently wrote about the convex deltahedra, which are the eight polyhedra whose faces are all congruent equilateral triangles:

NameFacesEdgesVertices
Tetrahedron 464
Triangular dipyramid 695
Octahedron 8126
Pentagonal dipyramid 10157
Snub disphenoid 12188
Triaugmented triangular prism 14219
Gyroelongated square dipyramid 162410
Icosahedron 203012
The names are rather horrible, so I think that from now on I'll just refer to them as D4, D6, D8, D10, D12, D14, and D20.

The number of edges that meet at a vertex is its valence. Vertices in convex deltahedra have valences of 3, 4, or 5. The valence can't be larger than 5 because only six equilateral triangles will fit, and if you fit 6 then they lie flat and the polyhedron is not properly convex.

Let V3, V4, and V5 be the number of vertices of valences 3, 4, and 5, respectively. Then:

WhatV3V4V5
D44  
D623 
D8 6 
D10 52
D12 44
D14 36
D16 28
D20  12
There's a clear pattern here, with V3s turning into V4s two at a time until you reach the octahedron (D8) and then V4s turning into V5s one at a time until you reach the icosahedron (D20). But where is V4=1, V5=10? There's a missing deltahedron. I don't mean it's missing from the table; I mean it's missing from the universe.

Well, this is all oversubtle, I realized later, because you don't need to do the V3V4V5 analysis to see that something is missing. There are convex deltahedra with 4, 6, 8, 10, 12, 14, and 20 faces; what happened to 18?

Still, I did a little work on a more careful analysis that might shed some light on the 18-hedron situation. I'm still in the middle of it, but I'm trying to continue my policy of posting more frequent, partial articles.

Let V be the number of vertices in a convex deltahedron, E be the number of edges, and F be the number of faces.

We then have V = V3 + V4 + V5. We also have E = ½(3V3 + 4V4 + 5V5). And since each face has exactly 3 edges, we have 3F = 2E.

By Euler's formula, F + V = E + 2. Plugging in the stuff from the previous paragraph, we get 3V3 + 2V4 + V5 = 12.

It is very easy to enumerate all possible solutions of this equation. There are 19:

V3V4V5What
400D4
311
303
230D6
222
214
206
141
133
125
117
109
060D8
052D10
044D12
036D14
028D16
0110
0012D20
Solutions in green correspond to convex deltahedra. What goes wrong with the other 11 items?

(3,1,1) fails completely because to have V5 > 0 you need V ≥ 6. There isn't even a graph with (V3, V4, V5) = (3,1,1), much less a polyhedron.

There is a graph with (3,0,3), but it is decidedly nonplanar: it contains K3,3, plus an additional triangle. But the graph of any polyhedron must be planar, because you can make a little hole in one of the faces of the polyhedron and flatten it out without the edges crossing.

Another way to think about (3,0,3) is to consider it as a sort of triangular tripyramid. Each of the V5s shares an edge with each of the other five vertices, so the three V5s are all pairwise connected by edges and form a triangle. Each of the three V3s must be connected to each of the three vertices of this triangle. You can add two of the required V3s, by erecting a triangular pyramid on the top and the bottom of the triangle. But then you have nowhere to put the third pyramid.

On Thursday I didn't know what went wrong with (2,2,2); it seemed fine. (I found it a little challenging to embed it in the plane, but I'm not sure if it would still be challenging if it hadn't been the middle of the night.) I decided that when I got into the office on Friday I would try making a model of it with my magnet toy and see what happened.

It turned out that nothing goes wrong with (2,2,2). It makes a perfectly good non-convex deltahedron. It's what you get when you glue together three tetrahedra, face-to-face-to-face. The concavity is on the underside in the picture.


(2,0,6) was a planar graph too, and so the problem had to be geometric, not topological. When I got to the office, I put it together. It also worked fine, but the result is not a polyhedron. The thing you get could be described as a gyroelongated triangular dipyramid. That is, you take an octahedron and glue tetrahedra to two of its opposite faces. But then the faces of the tetrahedra are coplanar with the faces of the octahedron to which they abut, and this is forbidden in polyhedra. When that happens you're supposed to eliminate the intervening edge and consider the two faces to be a single face, a rhombus in this case. The resulting thing is not a polyhedron with 12 triangular faces, but one with six rhombic faces (a rhombohedron), essentially a squashed cube. In fact, it's exactly what you get if you make a cube from the magnet toy and then try to insert another unit-length rod into the diagonal of each of the six faces. You have to squash the cube to do this, of course, since the diagonals had length √2 before and length 1 after.


So there are several ways in which the triples (V3,V4,V5) can fail to determine a convex deltahedron: There is an utter topological failure, as with (3,1,1).

There is a planarity failure, which is also topological, but less severe, as with (3,0,3). (3,0,3) also fails because you can't embed it into R3. (I mean that you cannot embed its 3-skeleton. Of course you can embed its 1-skeleton in R3, but that is not sufficient for the thing to be a polyhedron.) I'm not sure if this is really different from the previous failure; I need to consider more examples. And (3,0,3) fails in yet another way: you can't even embed its 1-skeleton in R3 without violating the constraint that says that the edges must all have unit length. The V5s must lie at the vertices of an equilateral triangle, and then the three unit spheres centered at the V5s intersect at exactly two points of R3. You can put two of the V3s at these points, but this leaves nowhere for the third V3. Again, I'm not sure that this is a fundamentally different failure mode than the other two.

Another failure mode is that the graph might be embeddable into R3, and might satisfy the unit-edge constraint, but in doing so it might determine a concave polyhedron, like (2,2,2) does, or a non-polyhedron, like (2,0,6) does.

I still have six (V3,V4,V5) triples to look into. I wonder if there are any other failure modes?

I should probably think about (0,1,10) first, since the whole point of all this was to figure out what happened to D18. But I'm trying to work up from the simple cases to the harder ones.

I suppose the next step is to look up the proof that there are only eight convex deltahedra and see how it goes.

I suspect that (2,1,4) turns out to be nonplanar, but I haven't looked at it carefully enough to actually find a forbidden minor.

One thing that did occur to me today was that a triple (V3, V4, V5) doesn't necessarily determine a unique graph, and I need to look into that in more detail. I'll be taking a plane trip on Sunday and I plan to take the magnet toy with me and continue my investigations on the plane.

In other news, Iris and I went to my office this evening to drop off some books and pick up some stuff for the trip, including the magnet toy. Iris was very excited when she saw the collection of convex deltahedron models on my desk, each in a different color, and wanted to build models just like them. We got through all of them, except D10, because we ran out of ball bearings. By the end Iris was getting pretty good at building the models, although I think she probably wouldn't be able to do it without directions yet. I thought it was good work, especially for someone who always skips from 14 to 16 when she counts.

On the way home in the car, we were talking about how she was getting older and I rhapsodized about how she was learning to do more things, learning to do the old things better, learning to count higher, and so on. Iris then suggested that when she is older she might remember to include 15.


[Other articles in category /math] permanent link

Lazy square roots of power series
Lately for various reasons I have been investigating the differential equation:

$$(f(x))^2 + (f'(x))^2 = 1$$

where f' is the derivative of f. This equation has a couple of obvious solutions (f(x) = 1; f(x) = sin(x)) and a bunch of not-so-obvious ones. Since I couldn't solve the equation symbolically, I decided to fall back on power series. Representing f(x) as a0 + a1x + a2x2 + ... one can manipulate the power series and solve for a0, a1, a2, etc. In fact, this is exactly the application for which mathematicians first became intersted in power series. The big question is "once you have found a0, a1, etc., do these values correspond to a real function? And for what x does the power series expression actually make sense?" This question, springing from a desire to solve intractable differential equations, motivates a lot of the theoretical mathematics of the last hundred and fifty years.

Order
Higher-Order Perl
Higher-Order Perl
with kickback
no kickback
I decided to see if I could use the power series methods of chapter 6 of Higher-Order Perl to calculate a0, etc. So far, not yet, although I am getting closer. The key is that if $series is the series you want, and if you can calculate at least one term at the front of the series, and then express the rest of $series in terms of $series, you win. For example:

        # Perl
        my $series;
        $series = node(1, promise { scale(2, $series) } );
This is perfectly well-defined code; it runs fine and sets $series to be the series [1,2,4,8,16...]. In Haskell this is standard operating procedure:

        -- Haskell
        series = 1 : scale 2 series
But in Perl it's still a bit outré.

Similarly, the book shows, on page 323, how to calculate the reciprocal of a series s. Any series can be expressed as the sum of the first term and the rest of the terms:

s = head(s) + x·tail(s)
Now suppose that r = 1/s.

r = head(r) + x·tail(r)
we have:

rs = 1

(head(r) + x·tail(r))(head(s) + x·tail(s)) = 1

head(r)head(s) + x·head(r)·tail(s) + x·tail(r)·head(s) + x2·tail(r)tail(s) = 1

This shows (equating the constant terms on both sides) that head(r) = 1/head(s). And equating the non-constant terms then gives:

x·(1/head(s))·tail(s) + x·tail(r)·head(s) + x2·tail(r)tail(s) = 0

(1/head(s))·tail(s) + tail(r)·head(s) + x·tail(r)tail(s) = 0

tail(r) = (-1/head(s))·tail(s) / (head(s) + x·tail(s))

tail(r) = (-1/head(s))·tail(s) / s

tail(r) = (-1/head(s))·tail(sr

and we win. This same calculation appears on page 323, in a somewhat more confused form. (Also, I considered only the special case where head(s) = 1.) The code works just fine.

To solve the differential equation f2 + (f')2 = 1, I want to do something like this:

$$f = \sqrt{1 - {(f')}^{2}}$$

so I need to be able to take the square root of a power series. This does not appear in the book, and I have not seen it elsewhere. Here it is.

Say we want r2 = s, where s is known. Then write, as usual:

s = head(s) + x·tail(s)
r = head(r) + x·tail(r)
as before, and, since r2 = s, we have:

(head(r))2 + 2x head(r) tail(r) + x2(tail(r))2 = head(s) + x·tail(s)
so, equating coefficients on both sides, (head(r))2 = head(s), and head(r) = √(head(s)).

Subtracting the head(s) from both sides, and dividing by x:

2·head(r) tail(r) + x·(tail(r))2 = tail(s)

tail(r)·(2·head(r) + x·tail(r)) = tail(s)

tail(r)·(head(r) + r) = tail(s)

tail(r) = tail(s) / (√(head(s)) + r)

and we win. Or rather, we win once we write the code, which would be something like this:

        # Perl
        sub series_sqrt {
          my $s = shift;
          my ($s0, $st) = (head($s), tail($s));
          my $r0 = sqrt($s0);
          my $r;
          $r  = node($r0, 
                      promise {
                        divide($st,
                               add2(node($r0, undef),
                                    $r))
                      });
          return $r;
        }
I confess I haven't tried this in Perl yet, but I have high confidence that it will work. I actually did the implementation in Haskell:

        -- Haskell
        series_sqrt (s0:st) = r 
           where r  = r0 : (divide st (add [r0] r))
                 r0 = sqrt(s0)
And when I asked it for the square root of [1,1,0,0,0,...] (that is, of 1+x) it gave me back [1, 0.5, -0.125, -0.0625, ...], which is indeed correct.

The Perl code is skankier than I wish it were. A couple of years ago I said in an interview that "I wish Perl's syntax were less verbose." Some people were surprised by this at the time, since Perl programmers consider Perl's syntax to be quite terse. But comparison of the Perl and Haskell code above demonstrates the sort of thing I mean.

Part of ths issue here, of course, is that the lazy list data structure is built in to Haskell, but I have to do it synthetically in Perl, and so every construction of a lazy list structure in Perl is accompanied by a syntactic marker (such as node(...) or promise { ... }) that is absent, or nearly absent, from the Haskell.

But when I complained about Perl's verbose syntax in 2005, one thing I had specifically in mind was Perl's argument acquisition syntax, here represented by my $s = shift;. Haskell is much terser, with no loss of expressiveness. Haskell gets another win in the automatic destructuring bind: instead of explicitly calling head() and tail() to acquire the values of s0 and st, as in the Perl code, they are implicitly called by the pattern match (s0:st) in the Haskell code, which never mentions s at all. It is quite fair to ascribe this to a failure of Perl's syntax, since there's no reason in principle why Perl couldn't support this, at least for built-in data structures. For example, consider the Perl code:

        sub blah {
          my $href = shift();
          my $a = $href->{this};
          my $tmp = $href->{that};
          my $b = $tmp->[0];
          my $c = $tmp->[2];

          # Now do something with $a, $b, $c
        }
It would be much more convenient to write this as:
        sub blah {
          my { this => $a, that => [$b, undef, $c] } = shift();

          # Now do something with $a, $b, $c
        }
This is a lot easier to understand.

There are a number of interesting user-interface issues to ask about here: What if the assigned value is not in the expected form? Are $a, $b, and $c copied from $href or are they aliases into it? And so on. One easy way to dispense with all of these interesting questions (perhaps not in the best way) is to assert that this notation is just syntactic sugar for the long version.

I talked to Chip Salzenberg about this at one time, and he said he thought it would not be very hard to implement. But even if he was right, what is not very hard for Chip Salzenberg to do can turn out to be nearly impossible for us mortals.

[ Addendum 20071209: There's a followup article that shows several different ways of solving the differential equation, including the power-series method. ]

[ Addendum 20071210: I did figure out how to get Haskell to solve the differential equation. ]


[Other articles in category /math] permanent link

Imaginary units
Yesterday I had a phenomenally annoying discussion with the pedants on the IRC #math channel. Someone was talking about square roots, and for some reason I needed to point out that when you are considering square roots of negative numbers, it is important not to forget that there are two square roots.

I should back up and discuss square roots in more detail. The square root of x, written √x, is defined to be the number y such that y2 = x. Well, no, that actually contains a subtle error. The error is in the use of the word "the". When we say "the number y such that...", we imply that there is only one. But every number (except zero) has two square roots. For example, the square roots of 16 are 4 and -4. Both of these are numbers y with the property that y2 = 16.

In many contexts, we can forget about one of the square roots. For example, in geometry problems, all quantities are positive. (I'm using "positive" here to mean "≥ 0".) When we consider a right triangle whose legs have lengths a and b, we say simply that the hypotenuse has length √(a2 + b2), and we don't have to think about the fact that there are actually two square roots, because one of them is negative, and is nonsensical when discussing hypotenuses. In such cases we can talk about the square root function, sqrt(x), which is defined to be the positive number y such that y2 = x. There the use of "the" is justified, because there is only one such number. But pinning down which square root we mean has a price: the square root function applies only to positive arguments. We cannot ask for sqrt(-1), because there is no positive number y such that y2 = -1. For negative arguments, this simplification is not available, and we must fall back to using √ in its full generality.

In high school algebra, we all learn about a number called i, which is defined to be the square root of -1. But again, the use of the word "the" here is misleading, because "the" square root is not unique; -1, like every other number (except 0) has two square roots. We cannot avail ourselves of the trick of taking the positive one, because neither root is positive. And in fact there is no other trick we can use to distinguish the two roots; they are mathematically indistinguishable.

The annoying discussion was whether it was correct to say that the two roots are mathematically indistinguishable. It was annoying because it's so obviously true. The number i is, by definition, a number such that i2 = -1. This is its one and only defining property. Since there is another number which shares this single defining property, it stands to reason that this other root is completely interchangeable with i—mathematically indistinguishable from it, in other words.

This other square root is usually written "-i", which suggests that it's somehow secondary to i. But this is not the case. Every numerical property possessed by i is possessed by -i as well. For example, i3 = -i. But we can replace i with -i and get (-i)3 = -(-i), which is just as true. Euler's famous formula says that eix = cos x + i sin x. But replacing i with -i here we get e-ix = cos x + -i sin x, which is also true.

Well, one of them is i, and the other is -i, so can't you distinguish them that way? No; those are only expressions that denote the numbers, not the numbers themselves. There is no way to know which of the numbers is denoted by which expression, and, in fact, it does not even make much sense to ask which number is denoted by which expression, since the two numbers are entirely interchangeable. One is i, and one is -i, sure, but this is just saying that one is the negative of the other. But so too is the other the negative of the one.

One of the #math people pointed out that there is a well-known Im() function, the "imaginary part" function, such that Im(i) = 1, but Im(-i) = -1, and suggested, rather forcefully, that they could be distinguished that way. This, of course, is hopeless. Because in order to define the "imaginary part" function in the first place, you must start by making an entirely arbitrary choice of which square root of -1 you are using as the unit, and then define Im() in terms of this choice. For example, one often defines Im(z) as $z - \bar{z} \over 2i$. But in order to make this definition, you have to select one of the imaginary units and designate it as i and use it in the denominator, thus begging the question. Had you defined Im() with -i in place of i, then Im(i) would have been -1, and vice versa.

Similarly, one #math inhabitant suggested that if one were to define the complex numbers as pairs of reals (a, b), such that (a, b) + (c, d) = (a + c, b + d), (a, b) × (c, d) = (ac - bd, ad + bc), then i is defined as (0,1), not (0,-1). This is even more clearly begging the question, since the definition of i here is solely a traditional and conventional one; defining i as (0, -1) instead of (0,1) works exactly as well; we still have i2 = -1 and all the other important properties.

As IRC discussions do, this one then started to move downwards into straw man attacks. The #math folks then argued that i ≠ -i, and so the two numbers are indeed distinguishable. This would have been a fine counterargument to the assertion that i = -i, but since I was not suggesting anything so silly, it was just stupid. When I said that the numbers were indistinguishable, I did not mean to say that they were numerically equal. If they were, then -1 would have only one square root. Of course, it does not; it has two unequal, but entirely interchangeable, square roots.

The that the square roots of -1 are indistinguishable has real content. 1 has two square roots that are not interchangeable in this way. Suppose someone tells you that a and b are different square roots of 1, and you have to figure out which is which. You can do that, because among the two equations a2 = a, b2 = b, only one will be true. If it's the former, then a=1 and b=-1; if the latter, then it's the other way around. The point about the square roots of -1 is that there is no corresponding criterion for distinguishing the two roots. This is a theorem. But the result is completely obvious if you just recall that i is merely defined to be a square root of -1, no more and no less, and that -1 has two square roots.

Oh well, it's IRC. There's no solution other than to just leave. [ Addenda: Part 2 Part 3 Part 4 Part 5 ]


[Other articles in category /math] permanent link

Sat, 01 Mar 2008

More rational roots of polynomials
I have a big file of ideas for blog articles, and when I feel like writing but I can't think of a topic, I look over the file. An item from last April was relevant to yesterday's article about finding rational roots of polynomials. It's a trick I saw in the first edition (1768!) of the Encyclopædia Britannica.

Suppose you have a polynomial P(x) = xn + ...+ p = 0. If it has a rational root r, this must be an integer that divides p = P(0). So far so good.

But consider P(x-1). This is a different polynomial, and if r is a root of P(x), then r+1 is a root of P(x-1). So, just as r must divide P(0), r+1 must divide P(-1). And similarly, r-1 must divide P+1.

So we have an extension of the rational root theorem: instead of guessing that some factor r of P(0) is a root, and checking it to see, we first check to see if r+1 is a factor of P(-1), and if r-1 is a factor of P(1), and proceed with the full check only if these two auxiliary tests pass.

My notes conclude with:

Is this really less work than just trying all the divisors of P(0) directly?
Let's find out.

As in the previous article, say P(x) = 3x2 + 6x - 45. The method only works for monic polynomials, so divide everything by 3. (It can be extended to work for non-monic polynomials, but the result is just that you have to divide everything by 3, so it comes to the same thing.) So we consider x2 + 2x - 15 instead. Say r is a rational root of P(x). Then:

r-1 divides P(1) = -12
r divides P(0) = -15
r+1 divides P(-1)= -16

So we need to find three consecutive integers that respectively divide 12, 15, and 16. The Britannica has no specific technique for this; it suggests doing it by eyeball. In this case, 2–3–4 jumps out pretty quickly, giving the root 3, and so does 6–5–4, which is the root -5. But the method also yields a false root: 4–3–2 suggests that -3 might be a root, and it is not.

Let's see how this goes for a harder example. I wrote a little Haskell program that generated the random polynomial x4 - 26x3 + 240 x2 - 918x + 1215.

r-1 divides P(1) = 512 = 29
r divides P(0) = 1215= 35·5
r+1 divides P(-1)= 2400= 25·3·52

That required a fair amount of mental arithmetic, and I screwed up and got 502 instead of 512, which I only noticed because 502 is not composite enough; but had I been doing a non-contrived example, I would not have noticed the error. (Then again, I would have done the addition on paper instead of in my head.) Clearly this example was not hard enough because 2–3–4 and 4–5–6 are obviously solutions, and it will not always be this easy. I increased the range on my random number generator and tried again.

The next time, it came up with the very delightful polynomial x4 - 2735x3 + 2712101 x2 - 1144435245x + 170960860950, and I decidedd not going to go any farther with it. The table values are easy to calculate, but they will be on the order of 170960860950, and I did not really care to try to factor that.

I decided to try one more example, of intermediate difficulty. The program first gave me x4 - 25x3 + 107 x2 - 143x + 60, which is a lucky fluke, since it has a root at 1. The next example it produced had a root at 3. At that point I changed the program to generate polynomials that had integer roots between 10 and 20, and got x4 - 61x3 + 1364 x2 - 13220x + 46800.

r-1 divides P(1) = 34864 = 22·33·17·19
r divides P(0) = 46800= 24·32·52·13
r+1 divides P(-1)= 61446= 2·3·72·11·19

This is just past my mental arithmetic ability; I got 34884 instead of 34864 in the first row, and balked at factoring 61446 in my head. But going ahead (having used the computer to finish the arithmetic), the 17 and 19 in the first and last rows are suggestive, and there is indeed a 17–18–19 to be found. Following up on the 19 in the first row suggests that we look for 19–20–21, which there is, and following up on the 11 in the last row, hoping for a 9–10–11, finds one of those too. All of these are roots, and I do have to admit that I don't know any better way of discovering that. So perhaps the method does have some value in some cases. But I had to work hard to find examples for which it made sense. I think it may be more reasonable with 18th-century technology than it is with 21st-century technology.


[Other articles in category /math] permanent link

Algebra techniques that don't work, except when they do
In Problems I Can't Fix in the Lecture Hall, Rudbeckia Hirta describes the efforts of a student to solve the equation 3x2 + 6x - 45 = 0. She describes "the usual incorrect strategy selected by students who can't do algebra":

3x2 + 6x - 45 = 0
3x2 + 6x = 45
x(3x + 6) = 45

She says "I stopped him before he factored out the x.".

I was a bit surprised by this, because the work so far seemed reasonable to me. I think the only mistake was not dividing the whole thing by 3 in the first step. But it is not too late to do that, and even without it, you can still make progress. x(3x + 6) = 45, so if there are any integer solutions, x must divide 45. So try x = ±1, ±3, ±5, ±9, ±15 in roughly that order. (The "look for the wallet under the lamppost" principle.) x = 3 solves the equation, and then you can get the other root, x=-5, by further application of the same method, or by dividing the original polynomial by x-3, or whatever.

If you get rid of the extra factor of 3 in the first place, the thing is even easier, because you have x(x + 2) = 15, so x = ±1, ±3, or ±5, and it is obviously solved by x=3 and x=-5.

Now obviously, this is not always going to work, but it works often enough that it would have been the first thing I would have tried. It is a lot quicker than calculating b2 - 4ac when c is as big as 45. If anyone hassles you about it, you can get them off your back by pointing out that it is an application of the so-called rational root theorem.

But probably the student did not have enough ingenuity or number sense to correctly carry off this technique (he didn't notice the 3), so that M. Hirta's advice to just use the damn quadratic formula already is probably good.

Still, I wonder if perhaps such students would benefit from exposure to this technique. I can guess M. Hirta's answer to this question: these students will not benefit from exposure to anything.

[ Addendum 20080228: Robert C. Helling points out that I could have factored the 45 in the first place, without any algebraic manipulations. Quite so; I completely botched my explanation of what I was doing. I meant to point out that once you have x(x+2) = 15 and the list [1, 3, 5, 15], the (3,5) pair jumps out at you instantly, since 3+2=5. I spent so much time talking about the unreduced polynomial x(3x+6) that I forgot to mention this effect, which is much less salient in the case of the unreduced polynomial. My apologies for any confusion caused by this omission. ]

[ Addendum 20080301: There is a followup to this article. ]


[Other articles in category /math] permanent link