Sat, 28 Jan 2006
Back on 19 January, I decided that readers might find it convenient if, when I mentioned a book, there was a link to buy the book. I was planning to write a lot about what books I was reading, and perhaps if I was convincing enough about how interesting they were, people would want their own copies.
The obvious way to do this is just to embed the HTML for the book link directly into each entry in the appropriate place. But that is a pain in the butt, and if you want to change the format of the book link, there is no good way to do it. So I decided to write a Blosxom plugin module that would translate some sort of escape code into the appropriate HTML. The escape code would only need to contain one bit of information about the book, say its ISBN, and then the plugin could fetch the other information, such as the title and price, from a database.
The initial implementation allowed me to put <book>1558607013</book> tags into an entry, and the plugin would translate this to the appropriate HTML. (There's an example on the right.
Since Berkeley DB only maps string keys to single string values, I had stored the title and image URL as a single string, with a colon character in between. That was my first dumb mistake, since book titles frequently include colons. I ran into this right away, with Voyages and Discoveries: Selections from Hakluyt's Principal Navigations.
This, however, was a minor error. I had made two major errors. One was that the <book>1558607013</book> tags were unintelligible. There was no way to look at one and know what book was being linked without consulting the database.
But even this wouldn't have been a disaster without the other big mistake, which was to use Berkeley DB. Berkeley DB is a great package. It provides fast keyed lookup even if you have millions of records. I don't have millions of records. I will never have millions of records. Right now, I have 15 records. In a year, I might have 200.
The price I paid for fast access to the millions of records I don't have is that the database is not a text file. If it were a text file, I could look up <book>1558607013</book> by using grep. Instead, I need a special tool to dump out the database in text form, and pipe the output through grep. I can't use my text editor to add a record to the database; I had to write a special tool to do that. If I use the wrong ISBN by mistake, I can't just correct it; I have to write a special tool to delete an item from the database and then I have to insert the new record.
When I decided to change the field separator from colon to \x22, I couldn't just M-x replace-string; I had to write a special tool. If I later decided to add another field to the database, I wouldn't be able to enter the new data by hand; I'd have to write a special tool.
On top of all that, for my database, Berkeley DB was probably slower than the flat text file would have been. The Berkeley DB file was 12,288 bytes long. It has an index, which Berkeley DB must consult first, before it can fetch the data. Loading the Berkeley DB module takes time too. The text file is 845 bytes long and can be read entirely into memory. Doing so requires only builtin functions and only a single trip to the disk.
I redid the plugin module to use a flat text file with tab-separated columns:
HOP 1558607013 Higher-Order Perl 9072008 DDI 068482471X Darwin's Dangerous Idea 1363778 Autobiog 0760768617 Franklin's Autobiography 9101737 VoyDD 0486434915 The Voyages of Doctor Dolittle 7969205 Brainstorms 0262540371 Brainstorms 1163594 Liber Abaci 0387954198 Liber Abaci 6934973 Perl Medic 0201795264 Perl Medic 7254439 Perl Debugged 0201700549 Perl Debugged 3942025 CLTL2 1555580416 Common Lisp: The Language 3851403 Frege 0631194452 The Frege Reader 8619273 Ingenious Franklin 0812210670 Ingenious Dr. Franklin 977000The columns are a nickname ("HOP" for Higher-Order Perl, for example), the ISBN, the full title, and the image URL. The plugin will accept either <book>1558607013</book> or <book>HOP</book> to designate Higher-Order Perl. I only use the nicknames now, but I let it accept ISBNs for backward compatibility so I wouldn't have to go around changing all the <book> elements I had already done.
Now I'm going to go off and write "just use a text file, fool!" a hundred times.