The Universe of Discourse
           
Sat, 28 Jan 2006

An unusually badly designed bit of software
I am inaugurating this new section of my blog, which will contain articles detailing things I have screwed up.

Back on 19 January, I decided that readers might find it convenient if, when I mentioned a book, there was a link to buy the book. I was planning to write a lot about what books I was reading, and perhaps if I was convincing enough about how interesting they were, people would want their own copies.

The obvious way to do this is just to embed the HTML for the book link directly into each entry in the appropriate place. But that is a pain in the butt, and if you want to change the format of the book link, there is no good way to do it. So I decided to write a Blosxom plugin module that would translate some sort of escape code into the appropriate HTML. The escape code would only need to contain one bit of information about the book, say its ISBN, and then the plugin could fetch the other information, such as the title and price, from a database.

The initial implementation allowed me to put <book>1558607013</book> tags into an entry, and the plugin would translate this to the appropriate HTML. (There's an example on the right.

Order
Higher-Order Perl
Higher-Order Perl
with kickback
no kickback
) The 1558607013 was the ISBN. The plugin would look up this key in a Berkeley DB database, where it would find the book title and Barnes and Noble image URL. Then it would replace the <book> element with the appropriate HTML. I did a really bad job with this plugin and had to rewrite it.

Since Berkeley DB only maps string keys to single string values, I had stored the title and image URL as a single string, with a colon character in between. That was my first dumb mistake, since book titles frequently include colons. I ran into this right away, with Voyages and Discoveries: Selections from Hakluyt's Principal Navigations.

This, however, was a minor error. I had made two major errors. One was that the <book>1558607013</book> tags were unintelligible. There was no way to look at one and know what book was being linked without consulting the database.

But even this wouldn't have been a disaster without the other big mistake, which was to use Berkeley DB. Berkeley DB is a great package. It provides fast keyed lookup even if you have millions of records. I don't have millions of records. I will never have millions of records. Right now, I have 15 records. In a year, I might have 200.

The price I paid for fast access to the millions of records I don't have is that the database is not a text file. If it were a text file, I could look up <book>1558607013</book> by using grep. Instead, I need a special tool to dump out the database in text form, and pipe the output through grep. I can't use my text editor to add a record to the database; I had to write a special tool to do that. If I use the wrong ISBN by mistake, I can't just correct it; I have to write a special tool to delete an item from the database and then I have to insert the new record.

When I decided to change the field separator from colon to \x22, I couldn't just M-x replace-string; I had to write a special tool. If I later decided to add another field to the database, I wouldn't be able to enter the new data by hand; I'd have to write a special tool.

On top of all that, for my database, Berkeley DB was probably slower than the flat text file would have been. The Berkeley DB file was 12,288 bytes long. It has an index, which Berkeley DB must consult first, before it can fetch the data. Loading the Berkeley DB module takes time too. The text file is 845 bytes long and can be read entirely into memory. Doing so requires only builtin functions and only a single trip to the disk.

I redid the plugin module to use a flat text file with tab-separated columns:

        HOP	1558607013	Higher-Order Perl	9072008
        DDI	068482471X	Darwin's Dangerous Idea	1363778
        Autobiog	0760768617	Franklin's Autobiography	9101737
        VoyDD	0486434915	The Voyages of Doctor Dolittle	7969205
        Brainstorms	0262540371	Brainstorms	1163594
        Liber Abaci	0387954198	Liber Abaci	6934973
        Perl Medic	0201795264	Perl Medic	7254439
        Perl Debugged	0201700549	Perl Debugged	3942025
        CLTL2	1555580416	Common Lisp: The Language	3851403
        Frege	0631194452	The Frege Reader	8619273
        Ingenious Franklin	0812210670	Ingenious Dr. Franklin	977000
The columns are a nickname ("HOP" for Higher-Order Perl, for example), the ISBN, the full title, and the image URL. The plugin will accept either <book>1558607013</book> or <book>HOP</book> to designate Higher-Order Perl. I only use the nicknames now, but I let it accept ISBNs for backward compatibility so I wouldn't have to go around changing all the <book> elements I had already done.

Now I'm going to go off and write "just use a text file, fool!" a hundred times.


[Other articles in category /oops] permanent link