The Universe of Discourse


Wed, 21 Feb 2007

A bug in HTML generation
A few days ago I hacked on the TeX plugin I wrote for Blosxom so that it would put the TeX source code into the ALT attributes of the image elements it generated.

But then I started to see requests in the HTTP error log for URLs like this:

    /pictures/blog/tex/total-die-rolls.gif$${6/choose%20k}k!{N!/over%20/prod%20{i!}^{n_i}{n_i}!}/qquad%20/hbox{/rm%20where%20$k%20=%20/sum%20n_i$}$$.gif
Someone must be referring people to these incorrect URLs, and it is presumably me. The HTML version of the blog looked okay, so I checked the RSS and Atom files, and found that, indeed, they were malformed. Instead of <img src="foo.gif" alt="$TeX$">, they contained codes for <img src="foo.gif$TeX$">.

I tracked down and fixed the problem. Usually when I get a bug like this, I ask myself what I could learn from it. This one is unusual. I can't think of much. Here's the bug.

The <img> element is generated by a function called imglink. The arguments to imglink are the filename that contains the image (for use in the SRC attribute) and the text for the ALT attribute. The ALT text is optional. If it is omitted, the function tries to locate the TeX source code and fetch it. If this attempt fails, it continues anyway, and omits the ALT attribute. Then it generates and returns the HTML:

        sub imglink {
          my $file = shift;
          ...

          my $alt = shift || fetch_tex($file);

          ...
          $alt = qq{alt="$alt"} if $alt;

          qq{<img $alt border=0 src="$url">};
        }
This function is called from several places in the plugin. Sometimes the TeX source code is available at the place from which the call comes, and the code has return imglink($file, $tex); sometimes it isn't and the code has return imglink($file) and hopes that the imglink function can retrieve the TeX.

One such place is the branch that handles generation of tags for every type of output except HTML. When generating the HTML output, the plugin actually tries to run TeX and generate the resulting image file. For other types of output, it assumes that the image file is already prepared, and just calls imglink to refer to an image that it presumes already exists:

  return imglink($file, $tex) unless $blosxom::flavour eq "html";
The bug was that I had written this instead:

  return imglink($file. $tex) unless $blosxom::flavour eq "html";
The . here is a string concatenation operator.

It's a bit surprising that I don't make more errors like this than I do. I am a very inaccurate typist.

Stronger type checking would not have saved me here. Both arguments are strings, concatenation of strings is perfectly well-defined, and the imglink function was designed and implemented to accept either one or two arguments.

The function did note the omission of the $tex argument, attempted to locate the TeX source code for the bizarrely-named file, and failed, but I had opted to have it recover and continue silently. I still think that was the right design. But I need to think about that some more.

The only lesson I have been able to extract from this so far is that I need a way of previewing the RSS and Atom outputs before publishing them. I do preview the HTML output, but in this case it was perfectly correct.


[Other articles in category /prog/bug] permanent link