The Universe of Discourse

Thu, 04 Oct 2007

The world's worst macro preprocessor: postmortem
I see that the world's worst macro processor, subject of a previous article, is a little over a year old. A year ago I said that it was a huge success. I think it's time for a postmortem analysis.

My overall assessment is that it has been a huge success, and that if I were doing it over I would do it the same way.

A recent article contained a bunch of red and blue dots:

Well, clearly you can do four: . And then you can add another red one on the end: . And then another that could be either red or blue: . And then the next can be either color, say blue: .

I typed this using these macros:

        #define R* <span style="color: red">&bull;</span>
#define B* <span style="color: blue">&bull;</span>
#define Y* <span style="color: yellow">&bull;</span>

Without the macro processor, I would have had to suffer a lot. Then, a little while later, I needed to prepare this display:

••••••••••••••••••••••••••
••••••••••••••••••••••••••
••••••••••••••••••••••••••
••••••••••••••••••••••••••
••••••••••••••••••••••••••
••••••••••••••••••••••••••
••••••••••••••••••••••••••
••••••••••••••••••••••••••

No problem; the lines just look like R*R*B*B*R*R*B*Y*B*Y*Y*R*Y*R*R*B*R*B*B*Y*R*Y*Y*B*Y*B*.

Some time later I realized that this display would be totally illegible to the blind, the color-blind, and people using text-only browsers. So I just changed the macros:

        #define R* <span style="color: red">R</span>
#define B* <span style="color: blue">B</span>
#define Y* <span style="color: yellow">Y</span>

Problem solved. instantly becomes R R B B R B B. And a good thing, too, because I discovered afterward that a lot of aggregators, like bloglines and feedburner, discard the color information.

I find that I've used the macro feature 114 times so far. The most common use has been:

   #define ^2 <sup>2</sup>

But I also have files with:

      #define r2 &radic;2

That last one appears in three files. Clearly, making the macros local to files was a good decision.

Those uses are pretty typical. A less typical one is:

      #define <OVL> <span style="text-decoration: overline">
#define </OVL> </span>

This is the sort of thing that you can get away with on a one-time basis, but which you wouldn't want to make a convention of. Since the purpose of the macro processor is to enable such hacks for the duration of a single article, it's all good.

I did run into at least one problem: I was writing an article in which I had defined ^i to abbreviate <sup><i>i</i></sup>. And then several paragraphs later I had a TeX formula that contained the ^i sequence in its TeX meaning. This was being replaced with a bunch of HTML, which was then passed to TeX, which then produced the wrong output.

One can solve this by reordering the plugins. If I had put the TeX plugin before the macro plugin, the problem would have gone away, because the TeX plugin would have replaced the TeX formula with an image element before the macro plugin ever saw the ^i.

This approach has many drawbacks. One is that it would no longer have been possible to use Blosxom macros in a TeX formula. I wasn't willing to foreclose this possibility, and I also wasn't sure that I hadn't done it somewhere. If I had, the TeX formula that depended on the macro expansion would have broken. And this is a risk whenever you move the macro plugin: if you move it from before plugin X to after plugin X, you have to worry that maybe something in some article depended on the text passed to X having been macro-processed.

When I installed the macro processor, I placed it first in plugin order for precisely this reason. Moving the macro substitution later would have required me to remember which plugins would be affected by the macro substitutions and which not. With the macro processing first, the question has a simple answer: all of them are affected.

Also, I didn't ever want to have to worry that some macro definition might mangle the output of some plugin. What if you are hacking on some plugin, and you change it to return <span style="Foo"> instead of <span style="foo">, and then discover that three articles you wrote back in 1997 are now totally garbled because they contained #define Foo >WUGGA<? It's just too unpredictable. Having the macro processing occur first means that you can always see in the original article file just what might be macro-replaced.

So I didn't reorder the plugins.

Another way to solve the TeX ^i problem would have been to do something like this:

        #define ^i <sup><i>i</i></sup>
#define ^*i ^i

with the idea that I could write ^*i in the TeX formula, and the macro processor would replace it with ^i after it was done replacing all the ^i's.

At present the macro processor does not define any order to macro replacements, but it does guarantee to replace each string only once. That is, the results of macro replacement are not themselves searched for macro replacement. This limits the power of the macro system, but I think that is a good thing. One of the powers that is thus proscribed is the power to get stuck in an infinite loop.

It occurs to me now that although I call it the world's worst macro system, perhaps that doesn't give me enough credit for doing good design that might not have been obvious. I had forgotten about my choice of single-substituion behavior, but looking back on it a year later, I feel pleased with myself for it, and imagine that a lot of people would have made the wrong choice instead.

(A brief digression: unlimited, repeated substitution is a bad move here because it is complex—much more complex than it appears. A macro system with single substitution is nothing much, but a macro system with repeated substitution is a programming language. The semantics of the λ-calculus is nothing more than simple substitution, repeated as necessary, and the λ-calculus is a maximally complex computational engine. Term-rewriting systems are a more obvious theoretical example, and TeX is a better-known practical example of this phenomenon. I was sure I did not want my macro system to be a programming language, so I avoided repeated substitution.)

Because each input text is substituted at most once, the processor's refusal to define the order of the replacements is not something you have to think about, as long as your macros are prefix-unique. (That is, as long as none is a prefix of another.) So you shouldn't define:

  #define foo   bar
#define fool  idiot

because then you don't know if foolish turns into barlish or idiotish. This is not a big deal in practice.

Well, anyway, I did not solve the problem with #define ^*i ^i. I took a much worse solution, which was to hack a #undefall directive into the macro processor. In my original article, I boasted that the macro processor "has exactly one feature". Now it has two, and it's not an improvement. I disliked the new feature at the time, and now that I'm reviewing the decision, I think I'm going to take it out.

I see that I did use the double-macro solution elsewhere. In the article about Gödel and the U.S. Constitution, I macroed an abbreviation for the umlaut:

        #define Godel G&ouml;del

But this sequence also ocurred in the URLs in the link elements, and the substitution broke the links. I should probably have changed this to:

        #define Go:del G&ouml;del

        #define GODEL Godel

      #define PAa prosopagnosia