The Universe of Discourse


Fri, 14 Mar 2008

Drawing lines
As part of this thing I sometimes do when I'm not writing in my blog—what is it called?—oh, now I remember.

As part of my job I had to produce the following display:

The idea here is that the user can fill in the names of three organisms into the form blanks, and the application will find all the studies in its database which conclude that those organisms are related in the indicated way. For example, the user can put "whale" and "hippo" in the top two blanks and "cow" in the bottom one, and the result will be all the studies that conclude (perhaps among other things) that whales and hippos are more closely related to each other than either is to cows. (I think "cothurnocystis bifida" is biologist jargon for cows.)

If you wanted to hear more about phylogeny, Java programming, or tree algorithms, you are about to be disappointed. The subject of my article today is those fat black lines.

The first draft of the page did not have the fat black lines. It had some incredibly awful ASCII-art that was not even properly aligned. Really it was terrible; it would have been better to have left it out completely. I will not make you look at it.

I needed the lines, so I popped down the "graphics" menu on my computer and looked for something suitable. I tried the Gimp first. It seems that the Gimp has no tool for drawing straight lines. If someone wants to claim that it does, I will not dispute the claim. The Gimp has a huge and complex control panel covered with all sorts of gizmos, and maybe one of those gizmos draws a straight line. I did not find one. I gave up after a few minutes.

Next I tried Dia. It kept selecting the "move the line around on the page" tool when I thought I had selected the "draw another line" tool. The lines were not constrained to a grid by default, and there was no obvious way to tell it that I wanted to draw a diagram smaller than a whole page. I would have had to turn the thing into a bitmap and then crop the bitmap. "By Zeus's Beard," I cried, "does this have to be so difficult?" Except that the oath I actually uttered was somewhat coarser and less erudite than I have indicated. I won't repeat it, but it started with "fuck" and ended with "this".

Here's what I did instead. I wrote a program that would read an input like this:

        >-v-<
        '-+-`
and produce a jpeg file that looks like this:

Or similarly this:

        .---,    
        |   >--, 
        '---`  '-
Becomes this:

You get the idea.

Now I know some of you are just itching to write to me and ask "why didn't you just use...?", so before you do that, let me remind you of two things. First, I had already wasted ten or fifteen minutes on "just use..." that didn't work. And second, this program only took twenty minutes to write.

The program depends on one key insight, which is that it is very, very easy to write a Perl program that generates a graphic output in "PBM" ("portable bitmap") format. Here is a typical PBM file:

        P1
        10 10
        1111111111
        1000000001
        1000000001
        1001111001
        1001111001
        1001111001
        1001111001
        1000000001
        1000000001
        1111111111
The P1 is a magic number that identifies the file format; it is always the same. The 10 10 warns the processor that the upcoming bitmap is 10 pixels wide and 10 pixels high. The following characters are the bitmap data. I'm not going to insult you by showing the 10×10 bitmap image that this represents.

PBM was invented about twenty years ago by Jef Poskanzer. It was intended to be an interchange format: say you want to convert images from format X to format Y, but you don't have a converter. You might, however, have a converter that turns X into PBM and then one that turns PBM into Y. Or if not, it might not be too hard to produce such converters. It is, in the words of the Extreme Programming guys, the Simplest Thing that Could Possibly Work.

There are also PGM (portable graymap) and PPM (portable pixmap) formats for grayscale and 24-bit color images as well. They are only fractionally more complicated.

Because these formats are so very, very simple, they have been widely adopted. For example, the JPEG reference implementation includes a sample cjpeg program, for converting an input to a JPEG file. The input it expects is a PGM or PPM file.

Writing a Perl program to generate a P?M file, and then feeding the output to pbmtoxbm or ppmtogif or cjpeg is a good trick, and I have used it many times. For example, I used this technique to generate a zillion little colored squares in this article about the Pólya-Burnside counting lemma. Sure, I could have drawn them one at a time by hand, and probably gone insane and run amuck with an axe immediately after, but the PPM technique was certainly much easier. It always wins big, and this time was no exception.

The program may be interesting as an example of this technique, and possibly also as a reminder of something else. The Perl community luminaries invest a lot of effort in demonstrating that not every Perl program looks like a garbage heap, that Perl can be as bland and aseptic as Java, that Perl is not necessarily the language that most closely resembles quick-drying shit in a tube, from which you can squirt out the contents into any shape you want and get your complete, finished artifact in only twenty minutes and only slightly smelly.

No, sorry, folks. Not everything we do is a brilliant, diamond-like jewel, polished to a luminous gloss with pages torn from one of Donald Knuth's books. This line-drawing program was squirted out of a tube, and a fine brown piece of engineering it is.

        #!/usr/bin/perl

        my ($S) = shift || 50;
$S here is "size". The default is to turn every character in the input into a 50×50 pixel tile. Here's the previous example with $S=10:

        my ($h, $w);
        my $output = [];
        while (<>) {
          chomp;
          $w ||= length();
          $h++;
          push @$output, convert($_);
        }  
The biggest defect in the program is right here: it assumes that each line will have the same width $w. Lines all must be space-padded to the same width. Fixing this is left as an easy exercise, but it wasn't as easy as padding the inputs, so I didn't do it.

The magic happens here:

        open STDOUT, "| pnmscale 1 | cjpeg" or die $!;
        print "P1\n", $w * $S, " ", $h * $S, "\n";
        print $_, "\n" for @$output;
        exit;
The output is run through cjpeg to convert the PBM data to JPEG. For some reason cjpeg doesn't accept PBM data, only PGM or PPM, however, so the output first goes through pnmscale, which resizes a P?M input. Here the scale factor is 1, which is a no-op, except that pnmscale happens to turn a PBM input into a PGM output. This is what is known in the business as a "trick". (There is a pbmtopgm program, but it does something different.)

If we wanted gif output, we could have used "| ppmtogif" instead. If we wanted output in Symbolics Lisp Machine format, we could have used "| pgmtolispm" instead. Ah, the glories of interchange formats.

I'm going to omit the details of convert, which just breaks each line into characters, calls convert_ch on each character, and assembles the results. (The complete source code is here if you want to see it anyway.) The business end of the program is convert_ch:

        # 
        sub convert_ch {
          my @rows;
          my $ch = shift;
          my $up = $ch =~ /[<|>^'`+]/i;
          my $dn = $ch =~ /[<|>V.,+]/i;
          my $lt = $ch =~ /[-<V^,`+]/i;
          my $rt = $ch =~ /[->V^.'+]/i;
These last four variables record whether the tile has a line from its center going up, down, left, or right respectively. For example, "|" produces a tile with lines coming up and down from the center, but not left or right. The /i in the regexes is because I kept writing v instead of V in the inputs.

          my $top = int($S * 0.4);
          my $mid = int($S * 0.2);
          my $bot = int($S * 0.4);
The tile is divided into three bands, of the indicated widths. This probably looks bad, or fails utterly, unless $S is a multiple of 5. I haven't tried it. Do you think I care? Hint: I haven't tried it.

          my $v0 = "0" x $S;
          my $v1 = "0" x $top . "1" x $mid . "0" x $bot;
          push @rows, ($up ? $v1 : $v0) x $top;
This assembles the top portion of the tile, including the "up" line, if there is one. Note that despite their names, $top also determines the width of the left portion of the tile, and $bot determines the width of the right portion. The letter "v" here is for "vertical".

Perhaps I should explain for the benefit of the readers of Planet Haskell (if any of them have read this far and not yet fainted with disgust) that "$a x $b" in Perl is like concat (replicate b a) in the better sorts of languages.

          my $ls = $lt ? "1" : "0";
          my $ms = ($lt || $rt || $up || $dn) ? "1" : "0";
          my $rs = $rt ? "1" : "0";
          push @rows, ($ls x $top . $ms x $mid . $rs x $bot) x $mid;
This assembles the middle section, including the "left" and "right" lines.

          push @rows, ($dn ? $v1 : $v0) x $bot;
This does the bottom section.

          return @rows;
        }
And we are done. Nothing to it. Adding diagonal lines would be a fairly simple matter.

Download the complete source code if you haven't seen enough yet.

There is no part of this program of which I am proud. Rather, I am proud of the thing as a whole. It did the job I needed, and it did it by 5 PM. Larry Wall once said that "a Perl script is correct if it's halfway readable and gets the job done before your boss fires you." Thank you, Larry.

No, that is not quite true. There is one line in this program that I'm proud of. I noticed after I finished that there is exactly one comment in this program, and it is blank. I don't know how that got in there, but I decided to leave it in. Who says program code can't be funny?


[Other articles in category /prog/perl] permanent link