The Universe of Discourse
           
Tue, 31 Jan 2006

Petard
A petard is a Renaissance-era bomb, basically a big firecracker: a box or small barrel of gunpowder with a fuse attached. Those hissing black exploding spheres that you see in Daffy Duck cartoons are petards. Outside of cartoons, you are most likely to encounter the petard in the phrase "hoist with his own petard", which is from Hamlet. Rosencrantz and Guildenstern are being sent to England with the warrant for Hamlet's death; Hamlet alters the warrant to contain R&G's names instead of his own. "Hoist", of course, means "raised", and Hamlet is saying that it is amusing to see someone screw up his own petard and blow himself sky-high with it.

Order
On Food and Cooking
On Food and Cooking
with kickback
no kickback
This morning I read in On Food in Cooking that there's a kind of fried choux pastry called pets de soeurs ("nuns' farts") because they're so light and delicate. That brought to mind Le Pétomane, the world-famous theatrical fartmaster. Then there was a link on reddit titled "Xmas Petard (cool gif video!)" which got me thinking about petards, and it occurred to me that "petard" was probably akin to pets, because it makes a bang like a fart. And hey, I was right; how delightful.

Another fart-related word is "partridge", so named because its call sounds like a fart.


[Other articles in category /lang/etym] permanent link

Mon, 30 Jan 2006

Rotten code in a ProFTPD plugin module
One of my work colleagues asked me to look at a piece of C source code today. He was tracking down a bug in the FTP server. He thought he had traced it to this spot, and wanted to know if I concurred and if I agreed with his suggested change.

Here's the (exceptionally putrid) (relevant portion of the) code:

static int gss_netio_write_cb(pr_netio_stream_t *nstrm, char *buf,size_t buflen) {

    int     count=0;
    int     total_count=0;        
    char    *p;

    OM_uint32   maj_stat, min_stat;
    OM_uint32   max_buf_size;

    ...
    /* max_buf_size = maximal input buffer size */
    p=buf;
    while ( buflen > total_count ) { 
        /* */ 
        if ( buflen - total_count > max_buf_size ) {
            if ((count = gss_write(nstrm,p,max_buf_size)) != max_buf_size )
                return -1;
        } else {
            if ((count = gss_write(nstrm,p,buflen-total_count)) != buflen-total_count )
                return -1;
        }       
        total_count = buflen - total_count > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
    }

    return buflen;  
}
(You know there's something wrong when the comment says "maximal input buffer size", but the buffer is for performing output. I have not looked at any of the other code in this module, which is 2,800 lines long, so I do not know if this chunk is typical.) Mr. Colleague suggested that p=p+total_count was wrong, and should be replaced with p=p+max_buf_size. I agreed that it was wrong, and that his change would fix the problem, although I suggested that p += count would be a better change. Mr. Colleague's change, although it would no longer manifest the bug, was still "wrong" in the sense that it would leave p pointing to a garbage location (and incidentally invokes behavior not defined by the C language standard) whereas my change would leave p pointing to the end of the buffer, as one would expect.

Since this is a maintenance programming task, I recommended that we not touch anything not directly related to fixing the bug at hand. But I couldn't stop myself from pointing out that the code here is remarkably badly written. Did I say "exceptionally putrid" yet? Oh, I did.

Good. It stinks like a week-old fish.

The first thing to notice is that the expression buflen - total_count appears four times in only nine lines of code—five if you count the buflen > total_count comparison. This strongly suggests that the algorithm would be more clearly expressed in terms of whatever buflen - total_count really is. Since buflen is the total number of characters to be written, and total_count is the number of characters that have been written, buflen - total_count is just the number of characters remaining. Rather than computing the same expression four times, we should rewrite the loop in terms of the number of characters remaining.

    size_t left_to_write = buflen;
    while ( left_to_write > 0 ) { 
        /* */ 
        if ( left_to_write > max_buf_size ) {
            if ((count = gss_write(nstrm,p,max_buf_size)) != max_buf_size )
                return -1;
        } else {
            if ((count = gss_write(nstrm,p,left_to_write)) != left_to_write )
                return -1;
        }       
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
        left_to_write -= count;
    }
Now we should notice that the two calls to gss_write are almost exactly the same. Duplicated code like this can almost always be eliminated, and eliminating it almost always produces a favorable result. In this case, it's just a matter of introducing an auxiliary variable to record the amount that should be written:

    size_t left_to_write = buflen, write_size;
    while ( left_to_write > 0 ) { 
        write_size = left_to_write > max_buf_size ? max_buf_size : left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
        left_to_write -= count;
    }
At this point we can see that write_size is going to be max_buf_size for every write except possibly the last one, so we can simplify the logic the maintains it:

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
        left_to_write -= count;
    }
Even if we weren't here to fix a bug, we might notice something fishy: left_to_write is being decremented by count, but p, the buffer position, is being incremented by total_count instead. In fact, this is exactly the bug that was discovered by Mr. Colleague. Let's fix it:

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p += count;
        left_to_write -= count;
    }
We could fix up the line the maintains the total_count variable so that it would be correct, but since total_count isn't used anywhere else, let's just delete it.

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        p += count;
        left_to_write -= count;
    }
Finally, if we change the != write_size test to < 0, the function will correctly handle partial writes, should gss_write be modified in the future to perform them:
    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) < 0 )
                return -1;
        p += count;
        left_to_write -= count;
    }
We could trim one more line of code and one more state change by eliminating the modification of p:

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p+buflen-left_to_write,write_size)) < 0 )
                return -1;
        left_to_write -= count;
    }
I'm not sure I think that is an improvement. (My idea is that if we do this, it would be better to create a p_end variable up front, set to p+buflen, and then use p_end - left_to_write in place of p+buflen-left_to_write. But that adds back another variable, although it's a constant one, and the backward logic in the calculation might be more confusing than the thing we were replacing. Like I said, I'm not sure. What do you think?)

Anyway, I am sure that the final code is a big improvement on the original in every way. It has fewer bugs, both active and latent. It has the same number of variables. It has six lines of logic instead of eight, and they are simpler lines. I suspect that it will be a bit more efficient, since it's doing the same thing in the same way but without the redundant computations, although you never know what the compiler will be able to optimize away.

Right now I'm engaged in writing a book about this sort of cleanup and renovation for Perl programs. I've long suspected that the same sort of processes could be applied to C programs, but this is the first time I've actually done it.

Order
Advanced Unix Programming
Advanced Unix Programming
with kickback
no kickback
The funny thing about this code is that it's performing a task that I thought every C programmer would already have known how to do: block-writing of a bufferfull of data. Examples of the right way to do this are all over the place. I first saw it done in Marc J. Rochkind's superb book Advanced Unix Programming around 1989. (I learned from the first edition, but the link to the right is for the much-expanded second edition that came out in 2004.) I'm sure it must pop up all over the Stevens books.

But the really exciting thing I've learned about code like this is that it doesn't matter if you don't already know how to do it right, because you can turn the wrong code into the right code, as we did here, by noticing a few common problems, like duplicate tests and repeated subexpressions, and applying a few simple refactorizations to get rid of them. That's what my book will be about.

(I am also very pleased that it has taken me 37 blog entries to work around to discussing any programming-related matters.)


[Other articles in category /prog] permanent link

Google query roundup
Now that I have a reasonably-sized body of blog posts, my blog is starting to attract Google queries. It's really exciting when someone visits one of my pages looking for something incredibly specific and obscure and I can tell from their query in the referrer log that I have unknowingly written exactly the document they were hoping to find. That's one of the wonders of the Internet thingy.

For example:

	   1 monkey rope banana weight
	   1 "how long is the banana"
	   3 monkey's mother problem
	   1 "basil brown" carrot juice
	   1 story about diophantus,how old was diophantus when he got married
(Numbers indicate the number of hits on my pages that were referred by the indicated query.)

And this visitor got rather more than they wanted:

	   1 what pennsylvanian can we thank for daylight savings time
I imagine a middle-schooler, working on her homework. The middle-schooler is now going to have to go back to her teacher and tell her that she was wrong, and that Franklin did not invent DST, and a lot of other stuff that middle-school teachers usually do not want to be bothereed with. I hope it works out well. Or perhaps the middle-schooler will just write down "Benjamin Franklin" and leave it at that, which would be cynical but effective.

Although you'd think that by now the middle schooler would have figured out that questions that start with "What Pennsylvanian can we thank for..." are about Benjamin Franklin with extremely high probability.

I think this person was probably fairly happy:

	   3 franklin "restoration of life by sun rays"
The referenced page includes the title of a book that contains the relevant essay, with a link to the bookseller. The only way the searcher could be happier is if they found the text of the essay itself.

Similarly, I imagine that this person was pleased:

	   1 monarch-like butterfly
Perhaps they couldn't remember the name of the Viceroy butterfly, and my article reminded them.

Some of the queries are intriguing. I wonder what this person was looking for?

	   1 spanish armada & monkey
I'd love to know the story of the Monkey and the Spanish Armada. if there isn't one already, someone should invent one.

	   1 there is a cabinet with 12 drawers. each drawer is opened
	     only once. in each drawer are about 30 compartments, with
	     only 7 names.
This one was so weird that I had to do the search myself. It's a puzzle on a page described as "Quick Riddles: Easy puzzles, riddles and brainteasers you can solve on sight"; the question was "what is it?" Presumably it's some sort of calendrical object, containing pills or some other item to be dispensed daily. I looked at the answer on the web page, which is just "the calendar". I have not seen any calendars with drawers and compartments, so I suppose they were meant metaphorically. I think it's a pretty crappy riddle.

Sometimes I know that the searches did not find what they were looking for.

	   1 eliminate debt using linear math
I don't know what this was, but it reminds me of when I was teaching math at the Johns Hopkins CTY program. One of my fellow instructors told me sadly that he had a student whose uncle had invented a brilliant secret system for making millions of dollars in the stock market. The student had been sent to math camp to learn trigonometry so that he would be able to execute the system for his uncle. Kids get sent to math camp for a lot of bad reasons, but I think that one was the winner.

	   1 armonica how many people can properly use it
This one is a complete miss. The armonica (or "glass harmonica") is a kind of musical instrument. (Who can guess what Pennsylvanian we have to thank for it?) As all ill-behaved children know, you can make a water glass sing by rubbing its edge with a damp fingertip. The armonica is a souped-up version of this. There is a series of glass bowls in graduated sizes, mounted on a revolving spindle. The operator touches the rims of the revolving bowls with his fingers; this makes them vibrate. The smaller bowls produce higher tones. The sound is very ethereal, not like any other instrument.

I had the good fortune to attend an armonica recital by Dean Shostak as part of the Philadelphia Fringe Festival a few years ago. Mr. Shostak is one of very few living armonica players. (He says that there are seven others.) The armonica is not popular because it is bulky, hard to manufacture, and difficult to play. The bowls must be constructed precisely, by a skilled glassblower, to almost the right pitch, and then carefully filed down until they are exactly right. If you overfile one, it is junk. If a bowl goes out of tune, it must be replaced; this requires that all the other bowls be unmounted from the spindle. The bowls are fragile and break easily.

The operator's hands must be perfectly clean, because the slightest amount of lubrication prevents the operator from setting the glass vibrating. The operator must keep his fingertips damp at all times, continually wetting them from a convenient bowl of water. By the end of a concert, his fingers are all pruney and have been continually rubbed against the rotating bowls; this limits the amount of time the instrument can be played.

Shostak's web site has some samples that you can listen to. Unfortunately, it does not also have any videos of him playing the instrument.

	   1 want did an wang invent 
This one was also a miss; the poor querent found my page about medieval Chinese type management instead.

An Wang invented the magnetic core memory that was the principal high-speed memory for computers through the 1950s and 1960s. In this memory technology, each bit was stored in a little ferrite doughnut, called a "core". If the magnetic field went one way through the doughnut, it represented a 0; the other way was a 1. Thousands of these cores would be strung on wire grids. Each core was on one vertical and one horizontal wire. The computer could modify the value of the bit by sending current on the core's horizontal wire and vertical wire simultaneously. The two currents individually were too small to modify the other bits in the same row and column. If the bit was actually changed, the resulting effect on the current could be detected; this is how bits were read: You'd try to write a 1, and see if that caused a change in the bit value. Then if it turned out to have been a 0, you'd put it back the way it was.

The cores themselves were cheap and easy to manufacture. You mix powdered iron with ceramic, stamp it into the desired shape in a mold, and bake it in a kiln. Stringing cores into grids was more expensive. and was done by hand.

As the technology improved, the cores themselves got smaller and the grids held more and more of them. Cores from the 1950s were about a quarter-inch in diameter; cores from the late 1960s were about one-quarter that size. They were finally obsoleted in the 1970s by integrated circuits.

When I was in high school in New York in the 1980s, it was still possible to obtain ferrite cores by the pound from the surplus-electronics stores on Canal Street. By the 1990s, the cores were gone. You can still buy them online.

An Wang got very rich from the invention and was able to found Wang computers. Around 1980 my mother's employer had a Wang word-processing system. It was a marvel that took up a large space and cost $15,000. ($35,000 in 2006 dollars.) She sometimes brought me in on weekends so that I could play with it. Such systems, the first word processors, were tremendously popular between 1976 and 1981. They invented the form, which, as I recall, was not significantly different from the word processors we have today. Of course, these systems were doomed, replaced by cheap general-purpose machines within a few years.

The undergraduate dormitories at Harvard University are named mostly for Harvard's presidents: Mather House, Dunster House, Eliot House, and so on. One exception was North House. A legend says Harvard refused an immense donation from Wang, whose successful company was based in Cambridge, because it came with the condition that North house be renamed after him. (Similarly, one sometimes hears it said that the Houses are named for all the first presidents of Harvard, except for president number 3, Leonard Hoar, who was skipped. It's not true; numbers 2, 4, and 5 were skipped also.)


[Other articles in category /google-roundup] permanent link

Sun, 29 Jan 2006

G.H. Hardy on analytic number theory and other matters

Order
Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work
Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work
with kickback
no kickback
A while back I was in the Penn math and physics library browsing in the old books, and I ran across Ramanujan: Twelve Lectures on Subjects Suggested by His Life and Work by G.H. Hardy. Srinivasa Ramanujan was an unknown amateur mathematician in India; one day he sent Hardy some of the theorems he had been proving. Hardy was boggled; many of Ramanujan's theorems were unlike anything he had ever seen before. Hardy said that the formulas in the letter must be true, because if they were not true, no one would have had the imagination to invent them. Here's a typical example:

Hardy says that it was clear that Ramanujan was either a genius or a confidence trickster, and that confidence tricksters of that caliber were much rarer than geniuses, so he was prepared to give him the benefit of the doubt.

But anyway, the main point of this note is to present the following quotation from Hardy. He is discussing analytic number theory:

The fact remains that hardly any of Ramanujan's work in this field had any permanent value. The analytic theory of numbers is one of those exceptional branches of mathematics in which proof really is everything and nothing short of absolute rigour counts. The achievement of the mathematicians who found the Prime Number Theorem was quite a small thing compared with that of those who found the proof. It is not merely that in this theory (as Littlewood's theorem shows) you can never be quite sure of the facts without the proof, though this is important enough. The whole history of the Prime Number Theorem, and the other big theorems of the subject, shows that you cannot reach any real understanding of the structure and meaning of the theory, or have any sound instincts to guide you in further research, until you have mastered the proofs. It is comparatively easy to make clever guesses; indeed there are theorems like "Goldbach's Theorem", which have never been proved and which any fool could have guessed.

(G.H. Hardy, Ramanujan.)

Some notes about this:

  1. Notice that this implies that in most branches of mathematics, you can get away with less than absolute rigor. I think that Hardy is quite correct here. (This is a rather arrogant remark, since Hardy is much more qualified than I am to be telling you what counts as worthwhile mathematics and what it is like. But this is my blog.) In most branches of mathematics, the difficult part is understanding the objects you are studying. If you understand them well enough to come up with a plausible conjecture, you are doing well. And in some mathematical pursuits, the proof may even be secondary. Consider, for example, linear programming problems. The point of the theory is to come up with good numerical solutions to the problems. If you can do that, your understanding of the mathematics is in some sense unimportant. If you invent a good algorithm that reliably produces good answers reasonably efficiently, proving that the algorithm is always efficient is of rather less value. In fact, there is such an algorithm—the "simplex algorithm"—and it is known to have exponential time in the worst case, a fact which is of decidedly limited practical interest.

    In analytic number theory, however, two facts weigh in favor of rigor. First, the objects you are studying are the positive integers. You already have as much intuitive understanding of them as you are ever going to have; you are not, through years of study and analysis, going to come to a clearer intuition of the number 3. And second, analytic number theory is much more inward-looking than most mathematics. The applications to the rest of mathematics are somewhat limited, and to the wider world even more limited. So a guessed or conjectured theorem is unlikely to have much value; the value is in understanding the theorem itself, and if you don't have a rigorous proof, you don't really understand the theorem.

    Hardy's example of the Goldbach conjecture is a good one. In the 18th Century, Christian Goldbach, who was nobody in particular, conjectured that every even number is the sum of two primes. Nobody doubts that this is true. It's certainly true for all small even numbers, and for large ones, you have lots and lots of primes to choose from. No proof, however, is in view. (The primes are all about multiplication. Proving things about their additive properties is swimming upstream.) And nobody particularly cares whether the conjecture is true or not. So what if every even number is the sum of two primes? But a proof would involve startling mathematics, deep understanding of something not even guessed at now, powerful techniques not currently devised. The proof itself would have value, but the result doesn't.

    Fermat's theorem (the one about an + bn = cn) is another example of this type. Not that Fermat was in any sense a fool to have conjectured it. But the result itself is of almost no interest. Again, all the value is in the proof, and the techniques that were required to carry it through.

  2. The Prime Number Theorem that Hardy mentions is the theorem about the average density of the prime numbers. The Greeks knew that there were an infinite number of primes. So the next question to ask is what fraction of integers are prime. Are the primes sparse, like the squares? Or are they common, like multiples of 7? The answer turns out to be somewhere in between.

    Of the integers 1–10, four (2, 3, 5, 7) are prime, or 40%. Of the integers 1–100, 25% are prime. Of the integers 1–1000, 16.8% are prime. What's the relationship?

    The relationship turns out to be amazing: Of the integers 1–n, about 1/log(n) are prime. Here's a graph: the red line is the fraction of the numbers 1–n that are prime; the green line is 1/log(n):

    It's not hard to conjecture this, and I think it's not hard to come up with offhand arguments why it should be so. But, as Hardy says, proving it is another matter, and that's where the real value is, because to prove it requires powerful understanding and sophisticated technique, and the understanding and technique will be applicable to other problems.

    The theorem of Littlewood that Hardy refers to is a related matter.

Order
A Mathematician's Apology
A Mathematician's Apology
with kickback
no kickback
Hardy was an unusual fellow. Toward the end of his life, he wrote an essay called A Mathematician's Apology in which he tried to explain why he had devoted his life for pure mathematics. I found it an extraordinarily compelling piece of writing. I first read it in my teens, at a time when I thought I might become a professional mathematician, and it's had a strong influence on my life. The passage that resonates most for me is this one:

A man who sets out to justify his existence and his activities has to distinguish two different questions. The first is whether the work which he does is worth doing; and the second is why he does it, whatever its value may be, The first question is often very difficult, and the answer very discouraging, but most people will find the second easy enough even then. Their answers, if they are honest, will usually take one or another of two forms . . . the first . . . is the only answer which we need consider seriously.

(1) 'I do what I do because it is the one and only thing I can do at all well. . . . I agree that it might be better to be a poet or a mathematician, but unfortunately I have no talents for such pursuits.'

I am not suggesting that this is a defence which can be made by most people, since most people can do nothing at all well. But it is impregnable when it can be made without absurdity. . . It is a tiny minority who can do anything really well, and the number of men who can do two things well is negligible. If a man has any genuine talent, he should be ready to make almost any sacrifice in order to cultivate it to the full.

And that, ultimately, is why I didn't become a mathematician. I don't have the talent for it. I have no doubt that I could have become a quite competent second-rate mathematician, with a secure appointment at some second-rate college, and a series of second-rate published papers. But as I entered my mid-twenties, it became clear that although I wouldn't ever be a first-rate mathematician, I could be a first-rate computer programmer and teacher of computer programming. I don't think the world is any worse off for the lack of my mediocre mathematical contributions. But by teaching I've been able to give entertainment and skill to a lot of people.

When I teach classes, I sometimes come back from the mid-class break and ask if there are any questions about anything at all. Not infrequently, some wag in the audience asks why the sky is blue, or what the meaning of life is. If you're going to do something as risky as asking for unconstrained questions, you need to be ready with answers. When people ask why the sky is blue, I reply "because it reflects the sea." And the first time I got the question about the meaning of life, I was glad that I had thought about this beforehand and so had an answer ready. "Find out what your work is," I said, "and then do it as well as you can." I am sure that this idea owes a lot to Hardy. I wouldn't want to say that's the meaning of life for everyone, but it seems to me to be a good answer, so if you are looking for a meaning of life, you might try that one and see how you like it.

(Incidentally, I'm not sure it makes sense to buy a copy of this book, since it's really just a long essay. My copy, which is the same as the one I've linked above, ekes it out to book length by setting it in a very large font with very large margins, and by prepending a fifty-page(!) introduction by C.P. Snow.)


[Other articles in category /math] permanent link

Sat, 28 Jan 2006

An unusually badly designed bit of software
I am inaugurating this new section of my blog, which will contain articles detailing things I have screwed up.

Back on 19 January, I decided that readers might find it convenient if, when I mentioned a book, there was a link to buy the book. I was planning to write a lot about what books I was reading, and perhaps if I was convincing enough about how interesting they were, people would want their own copies.

The obvious way to do this is just to embed the HTML for the book link directly into each entry in the appropriate place. But that is a pain in the butt, and if you want to change the format of the book link, there is no good way to do it. So I decided to write a Blosxom plugin module that would translate some sort of escape code into the appropriate HTML. The escape code would only need to contain one bit of information about the book, say its ISBN, and then the plugin could fetch the other information, such as the title and price, from a database.

The initial implementation allowed me to put <book>1558607013</book> tags into an entry, and the plugin would translate this to the appropriate HTML. (There's an example on the right.

Order
Higher-Order Perl
Higher-Order Perl
with kickback
no kickback
) The 1558607013 was the ISBN. The plugin would look up this key in a Berkeley DB database, where it would find the book title and Barnes and Noble image URL. Then it would replace the <book> element with the appropriate HTML. I did a really bad job with this plugin and had to rewrite it.

Since Berkeley DB only maps string keys to single string values, I had stored the title and image URL as a single string, with a colon character in between. That was my first dumb mistake, since book titles frequently include colons. I ran into this right away, with Voyages and Discoveries: Selections from Hakluyt's Principal Navigations.

This, however, was a minor error. I had made two major errors. One was that the <book>1558607013</book> tags were unintelligible. There was no way to look at one and know what book was being linked without consulting the database.

But even this wouldn't have been a disaster without the other big mistake, which was to use Berkeley DB. Berkeley DB is a great package. It provides fast keyed lookup even if you have millions of records. I don't have millions of records. I will never have millions of records. Right now, I have 15 records. In a year, I might have 200.

The price I paid for fast access to the millions of records I don't have is that the database is not a text file. If it were a text file, I could look up <book>1558607013</book> by using grep. Instead, I need a special tool to dump out the database in text form, and pipe the output through grep. I can't use my text editor to add a record to the database; I had to write a special tool to do that. If I use the wrong ISBN by mistake, I can't just correct it; I have to write a special tool to delete an item from the database and then I have to insert the new record.

When I decided to change the field separator from colon to \x22, I couldn't just M-x replace-string; I had to write a special tool. If I later decided to add another field to the database, I wouldn't be able to enter the new data by hand; I'd have to write a special tool.

On top of all that, for my database, Berkeley DB was probably slower than the flat text file would have been. The Berkeley DB file was 12,288 bytes long. It has an index, which Berkeley DB must consult first, before it can fetch the data. Loading the Berkeley DB module takes time too. The text file is 845 bytes long and can be read entirely into memory. Doing so requires only builtin functions and only a single trip to the disk.

I redid the plugin module to use a flat text file with tab-separated columns:

        HOP	1558607013	Higher-Order Perl	9072008
        DDI	068482471X	Darwin's Dangerous Idea	1363778
        Autobiog	0760768617	Franklin's Autobiography	9101737
        VoyDD	0486434915	The Voyages of Doctor Dolittle	7969205
        Brainstorms	0262540371	Brainstorms	1163594
        Liber Abaci	0387954198	Liber Abaci	6934973
        Perl Medic	0201795264	Perl Medic	7254439
        Perl Debugged	0201700549	Perl Debugged	3942025
        CLTL2	1555580416	Common Lisp: The Language	3851403
        Frege	0631194452	The Frege Reader	8619273
        Ingenious Franklin	0812210670	Ingenious Dr. Franklin	977000
The columns are a nickname ("HOP" for Higher-Order Perl, for example), the ISBN, the full title, and the image URL. The plugin will accept either <book>1558607013</book> or <book>HOP</book> to designate Higher-Order Perl. I only use the nicknames now, but I let it accept ISBNs for backward compatibility so I wouldn't have to go around changing all the <book> elements I had already done.

Now I'm going to go off and write "just use a text file, fool!" a hundred times.


[Other articles in category /oops] permanent link

Fri, 27 Jan 2006

Travels of Mirza Abu Taleb Khan
In a couple of recent posts, I talked about the lucky finds you can have then you browse at random in strange libraries. Sometimes the finds don't turn out so well.

I'm an employee of the University of Pennsylvania, and one of the best fringe benefits of the job is that I get unrestricted access to the library and generous borrowing privileges. A few weeks ago I was up there, and found my way somehow into the section with the travel books. I grabbed a bunch, one of which was the source for my discussion of the dot product in 1580. Another was Travels of Mirza Abu Taleb Khan, written around 1806, and translated into English and published in English in 1814.

Order
Travels of Mirza Abu Taleb Khan
Travels of Mirza Abu Taleb Khan
with kickback
no kickback
Travels is the account of a Persian nobleman who fell upon hard times in India and decided to take a leave of absence and travel to Europe. His travels lasted from 1799 through August 1803, and when he got back to Calcutta, he wrote up an account of his journey for popular consumption.

Wow, what a find, I thought, when I discovered it in the library. How could such a book fail to be fascinating? But if you take that as a real question, not as a rhetorical one, an answer comes to mind immediately: Mirza Abu Taleb does not have very much to say!

A large portion of the book drops the names of the many people that Mirza Abu Taleb met with, had dinner with, went riding with, went drinking with, or attended a party at the house of. Opening the book at random, for example, I find:

The Duke of Leinster, the first of the nobles of this kingdom honoured me with an invitation; his house is the most superb of any in Dublin, and contains a very numerous and valuable collection of statues and paintings. His grace is distinguished for the dignity of his manners, and the urbanity of his disposition. He is blessed with several angelic daughters.

There you see how to use sixty-two words to communicate nothing. How fascinating it might have been to hear about the superbities of the Duke's house. How marvelous to have seen even one of the numerous and valuable statues. How delightful to meet one of his several angelic daughters. How unfortunate that Abu Taleb's powers of description have been exhausted and that we don't get to do any of those things. "Dude, I saw the awesomest house yesterday! I can't really describe it, but it was really really awesome!"

Here's another:

[In Paris] I also had the pleasure of again meeting my friend Colonel Wombell, from whom I experienced so much civility in Dublin. He was rejoiced to see me, and accompanied me to all the public places. From Mr. and Miss Ogilvy I received the most marked attention.

I could quote another fifty paragraphs like those, but I'll spare you.

Even when Abu Taleb has something to say, he usually doesn't say it:

I was much entertained by an exhibition of Horsemanship, by Mr. Astley and his company. They have an established house in London, but come over to Dublin for four or five months in every year, to gratify the Irish, by displaying their skill in this science, which far surpasses any thing I ever saw in India.

Oh boy! I can't wait to hear about the surpassing horsemanship. Did they do tricks? How many were in the company? Was it men only, or both men and women? Did they wear glittery costumes? What were the horses like? Was the exhibition indoors or out? Was the crowd pleased? Did anything go wrong?

I don't know. That's all there is about Mr. Astley and his company.

Almost the whole book is like this. Abu Taleb is simply not a good observer. Good writers in any language can make you feel that you were there at the same place and the same time, seeing what they saw and hearing what they heard. Abu Taleb doesn't understand that one good specific story is worth a pound of vague, obscure generalities. This defect spoils nearly every part of the book in one degree or another:

[The Irish] are not so intolerant as the English, neither have they austerity and bigotry of the Scotch. In bravery and determination, hospitality, and prodigality, freedom of speech and open-heartedness, they surpass the English and the Scotch, but are deficient in prudence and sound judgement: they are nevertheless witty, and quick of comprehension.

But every once in a while you come upon an anecdote or some other specific. I found the next passage interesting:

Thus my land lady and her children soon comprehended my broken English; and what I could not explain by language, they understood by signs. . . . When I was about to leave them, and proceed on my journey, many of my friends appeared much affected, and said: "With your little knowledge of the language, you will suffer much distress in England; for the people there will not give themselves any trouble to comprehend your meaning, or to make themselves useful to you." In fact, after I had resided for a whole year in England, and could speak the language a hundred times better than on my first arrival, I found much more difficulty in obtaining what I wanted, than I did in Ireland.

Aha, so that's what he meant by "quick of comprehension". Thanks, Mirza.

Here's another passage I liked:

In this country and all through Europe, but especially in France and in Italy, statues of stone and marble are held in high estimation, approaching to idolatry. Once in my presence, in London, a figure which had lost its head, arms, and legs, and of which, in short, nothing but the trunk remained, was sold for 40,000 rupees (£5000). It is really astonishing that people possessing so much knowledge and good sense, and who reproach the nobility of Hindoostan with wearing gold and silver ornaments like women, whould be thus tempted by Satan to throw away their money upon useless blocks. There is a great variety of these figures, and they seem to have appropriate statues for every situation. . .

Oh no---he isn't going to stop there, is he? No! We're saved!
. . . thus, at the doors or gates, they have huge janitors; in the interior they have figures of women dancing with tambourines and other musical instruments; over the chimney-pieces they place some of the heathen deities of Greece; in the burying grounds they have the statues of the deceased; and in the gardens they put up devils, tigers, or wolves in pursuit of a fox, in hopes that animals, on beholding these figures will be frightened, and not come into the garden.

If more of the book were like that, it would be a treasure. But you have to wait a long time between such paragraphs.

Order
Kon-Tiki: Across the Pacific by Raft
Kon-Tiki: Across the Pacific by Raft
with kickback
no kickback
Order
Personal Narrative of a Pilgrimage to Al Madinah and Meccah, Vol. 1
Personal Narrative of a Pilgrimage to Al Madinah and Meccah, Vol. 1
with kickback
no kickback
There are plenty of good travel books in the world. Kon-Tiki, for example. In Kon-Tiki, Thor Heyerdahl takes you across the Pacific Ocean on a balsa wood raft. Every detail is there: how and why they built the raft, and the troubles they went to to get the balsa, and to build it, and to launch it. How it was steered, and where they kept the food and water. What happened to the logs as they got gradually more waterlogged and the incessant rubbing of the ropes ropes wore them away. What they ate, and drank, and how they cooked and slept and shat. What happened in storms and calm. The fish that came to visit, and how every morning the first duty of the day's cook was to fry up the flying fish that had landed on the roof of the cabin in the night. Every page has some fascinating detail that you would not have been able to invent yourself, and that's what makes it worth reading, because what's the point of reading a book that you could have invented yourself?

Another similarly good travel book is Sir Richard Francis Burton's 1853 account of his pilgimage to Mecca. Infidels were not allowed in the holy city of Mecca. Burton disguised himself as an Afghan and snuck in. I expect I'll have something to say about this book in a future article.



[Other articles in category /book] permanent link

Thu, 26 Jan 2006

The octopus and the creation of the cosmos
In an earlier post, I mentioned the lucky finds you sometimes make when you're wandering at random in a library. Here's another such. In 2001 I was in Boston with my wife, who was attending the United States Figure Skating Championships. Instead of attending the Junior Dance Compulsories, I went to the Boston Public Library, where I serendipitously unearthed the following treasure:

Although we have the source of all things from chaos, it is a chaos which is simply the wreck and ruin of an earlier world....The drama of creation, according to The Hawaiian account, is divided into a series of stages, and in the very first of these life springs from the shadowy abyss and dark night...At first the lowly zoophytes and corals come into being, and these are followed by worms and shellfish, each type being declared to conquer and destroy its predecessor, a struggle for existence in which the strongest survive....As type follows type, the accumulating slime of their decay raises land above the waters, in which, as spectator of all, swims the octopus, the lone survivor of an earlier world.

(Mythology of All Races, vol. ix ("Oceanic"), R.B. Dixon. Thanks to the wonders of the Internet, you can now read the complete text online.)

Everyone, it seems, recognizes the octopus as a weird alien, unique in our universe.


[Other articles in category /bio/octopus] permanent link

More irrational numbers
Gaal Yahas has written in with a delightfully simple proof that a particular number is irrational. Let x = log2 3; that is, such that 2x = 3. If x is rational, then we have 2a/b = 3 and 2a = 3b, where a and b are integers. But the left side is even and the right side is odd, so there are no such integers, and x must be irrational.

As long as I am on the subject, undergraduates are sometimes asked whether there are irrational numbers a and b such that ab is rational. It's easy to prove that there are. First, consider a = b = √2. If √2√2 is rational, then we are done. Otherwise, take a = √2√2 and b = √2. Both are irrational, but ab = 2.

This is also a standard example of a non-constructive proof: it demonstrates conclusively that the numbers in question exist, but it does not tell you which of the two constructed pairs is actually the one that is wanted. Pinning down the real answer is tricky. The Gelfond-Schneider theorem establishes that it is in fact the second pair, as one would expect.


[Other articles in category /math] permanent link

"Farther" vs. "further"
People mostly use "farther" and "further" interchangeably. What's the difference?

I looked it up in the dictionary, and it turns out it's simple. "Farther" means "more far". "Further" means "more forward".

"Further" does often connote "farther", because something that is further out is usually farther away, and so in many cases the two are interchangeable. For example, "Hitherto shalt thou come, but no further" (Job 38:11.)

But now when I see people write things like China Steps Further Back From Democracy (The New York Times, 26 November 1995) or, even worse, Big Pension Plans Fall Further Behind (Washington Post, 7 June 2005) it freaks me out.

Google finds 3.2 million citations for "further back", and 9.5 million for "further behind", so common usage is strongly in favor of this. But a quick check of the OED does not reveal much historical confusion between these two. Of the citations there, I can only find one that rings my alarm bell. ("1821 J. BAILLIE Metr. Leg., Wallace lvi, In the further rear.")


[Other articles in category /lang] permanent link

The square root of 2 is irrational
I heard some story that the Pythagoreans tried to cover this up by drowning the guy who discovered it, but I don't know if it's true and probably nobody else does either.

The usual proof goes like this. Suppose that √2 is rational; then there are integers a and b with a / b = √2, where a / b is in lowest terms. Then a2 / b2 = 2, and a2 = 2b2. Since the right-hand side is even, so too must the left-hand side be, and since a2 is even, a must also be even. Then a = 2k for some integer k, and we have 4k2 = 2b2, and so 2k2 = b2. But then since the left-hand side is even, so too must the right-hand side be, and since b2 is even, b must also be even. But since a and b are both even, a / b was not in lowest terms, a contradiction. So no such a and b can exist, and √2 is irrational.

There are some subtle points that are glossed over here, but that's OK; the proof is correct.

A number of years ago, a different proof occurred to me. It goes like this:

Suppose that √2 is rational; then there are integers a and b with a / b = √2, where a / b is in lowest terms. Since a and b have no common factors, nor do a2 and b2, and a2 / b2 = 2 is also in lowest terms. Since the representation of rational numbers by fractions in lowest terms is unique, and a2 / b2 = 2/1, we have a2 = 2. But there is no such integer a, a contradiction. So no such a and b can exist, and √2 is irrational.

This also glosses over some subtle points, but it also seems to be correct.

I've been pondering this off and on for several years now, and it seems to me that it seems simpler in some ways and more complex in others. These are all hidden in the subtle points I alluded to.

For example, consider fact that both proofs should go through just as well for 3 as for 2. They do. And both should fail for 4, since √4 is rational. Where do these failures occur? The first proof concludes that since a2 is even, a must be also. This is simple. And this is the step that fails if you replace 2 with 4: the corresponding deduction is that since a2 is a multiple of 4, a must be also. This is false. Fine.

You would also like the proof to go through successfully for 12, because √12 is irrational. But instead it fails, because the crucial step is that since a2 is divisible by 12, a must be also—and this step is false.

You can fix this, but you have to get tricky. To make it go through for 12, you have to say that a2 is divisible by 3, and so a must be also. To do it in general for √n requires some fussing.

The second proof, however, works whenever it should and fails whenever it shouldn't. The failure for √4 is in the final step, and it is totally transparent: "we have a2 = 4," it says, "but there is no such integer....oops, yes there is." And, unlike the first proof, it works just fine for 12, with no required fussery: "we have a2 = 12. But there is no such integer, a contradiction."

The second proof depends on the (unproved) fact that lowest-term fractions are unique. This is actually a very strong theorem. It is true in the integers, but not in general domains. (More about this in the future, probably.) Is this a defect? I'm not sure. On the one hand, one could be seen as pulling the wool over the readers' eyes, or using a heavy theorem to prove a light one. On the other hand, this is a very interesting connection, and raises the question of whether the corresponding theorems are true in general domains. The first proof also does some wool-pulling, and it's rather more complicated-looking than the second. And whereas the first one appears simple, and is actually more complex than it seems, the point of complexity in the second proof is right out in the open, inviting question.

The really interesting thing here is that you always see the first proof quoted, never the second. When I first discovered the second proof I pulled a few books off the shelf at random to see how the proof went; it was invariably the first one. For a while I wondered if perhaps the second proof had some subtle mistake I was missing, but I'm pretty sure it doesn't.

[ Addendum 20070220: a later article discusses an awesome geometric proof by Tom M. Apostol. Check it out. ]


[Other articles in category /math] permanent link

Wed, 25 Jan 2006

Morphogenetic puzzles
In a recent post, I briefly discussed puzzling issues of morphogenesis: when a caterpillar pupates, how do its cells know how to reorganize into a butterfly? When the blastocyst grows inside a mammal, how do its cells know what shape to take? I said it was all a big mystery.

A reader, who goes by the name of Omar, wrote to remind me of the "Hox" (short for "homeobox") genes discussed by Richard Dawkins in The Ancestor's Tale. (No "buy this" link; I only do that for books I've actually read and recommend.) These genes are certainly part of the story, just not the part I was wondering about.

The Hox genes seem to be the master controls for notifying developing cells of their body locations. The proteins they manufacture bind with DNA and enable or disable other genes, which in turn manufacture proteins that enable still other genes, and so on. A mutation to the Hox genes, therefore, results in a major change to the animal's body plan. Inserting an additional copy of a Hox gene into an invertebrate can cause its offspring to have duplicated body segements; transposing the order of the genes can mix up the segments. One such mutation, occurring in fruit flies, is called antennapedia, and causes the flies' antennae to be replaced by fully-formed legs!

So it's clear that these genes play an important part in the overall body layout.

But the question I'm most interested in right now is how the small details are implemented. That's why I specifically brought up the example of a ring finger.

Or consider that part of the ring finger turns into a fingernail bed and the rest doesn't. The nail bed is distally located, but the most distal part of the finger nevertheless decides not to be a nail bed. And the ventral part of the finger at the same distance also decides not to be a nail bed.

Meanwhile, the ear is growing into a very complicated but specific shape with a helix and an antihelix and a tragus and an antitragus. How does that happen? How do the growing parts communicate between each other so as to produce that exact shape? (Sometimes, of course, they get confused; look up accessory tragus for example.)

In computer science there are a series of related problems called "firing squad problems". In the basic problem, you have a line of soldiers. You can communicate with the guy at one end, and other than that each soldier can only communicate with the two standing next to him. The idea is to give the soldiers a protocol that allows them to synchronize so that they all fire their guns simultaneously.

It seems to me that the embryonic cells have a much more difficult problem of the same type. Now you need the soldiers to get into an extremely elaborate formation, even though each soldier can only see and talk to the soldiers next to him.

Omar suggested that the Hox genes contain the answer to how the fetal cells "know" whether to be a finger and not a kneecap. But I think that's the wrong way to look at the problem, and one that glosses over the part I find so interesting. No cell "becomes a finger". There is no such thing as a "finger cell". Some cells turn into hair follicles and some turn into bone and some turn into nail bed and some turn into nerves and some turn into oil glands and some turn into fat, and yet you somehow end up with all the cells in the right places turning into the right things so that you have a finger! And the finger has hair on the first knuckle but not the second. How do the cells know which knuckle they are part of? At the end of the finger, the oil glands are in the grooves and not on the ridges. How do the cells know whether they will be at the ridges or the grooves? And the fat pad is on the underside of the distal knuckle and not all spread around. How do the cells know that they are in the middle of the ventral surface of the distal knuckle, but not too close to the surface?

Somehow the fat pad arises in just the right place, and decides to stop growing when it gets big enough. The hair cells arise only on the dorsal side and the oil glands only on the ventral side.

How do they know all these things? How does the cell decide that it's in the right place to differentiate into an oil gland cell? How does the skin decide to grow in that funny pattern of ridges and grooves? And having decided that, how do the skin cells know whether they're positioned at the appropriate place for a ridge or a groove? Is there a master control that tells all the cells everything at once? I bet not; I imagine that the cells conduct chemical arguments with their neighbors about who will do which job.

One example of this kind of communication is phyllotaxis, the way plants decide how to distribute their leaves around the stem. Under certain simple assumptions, there is an optimal way to do this: you want to go around the stem, putting each leaf about 360°/φ farther than the previous one, where φ is ½(1+√5). (More about this in some future post.) And in fact many plants do grow in just this pattern. How does the plant do such an elaborate calculation? It turns out to be simple: Suppose leafing is controlled by the buildup of some chemical, and a leaf comes out when the chemical concentration is high. But when a leaf comes out, it also depletes the concentration of the chemical in its vicinity, so that the next leaf is more likely to come out somewhere else. Then the plant does in fact get leaves with very close to optimal placement. Each leaf, when it comes out, warns the nearby cells not to turn into a leaf themselves---not until the rest of the stem is full, anyway. I imagine that the shape of the ear is constructed through a more complicated control system of the same sort.


[Other articles in category /bio] permanent link

Red Flags world tour: New York City
My wife came up with a brilliant plan to help me make regular progress on my current book. The idea of the book is that I show how to take typical programs and repair and refurbish them. The result usually has between one-third and one-half less code, is usually a little faster, and sometimes has fewer bugs. Lorrie's idea was that I should schedule a series of talks for Perl Mongers groups. Before each talk, I would solicit groups to send me code to review; then I'd write up and deliver the talk, and then afterward I could turn the talk notes into a book chapter. The talks provide built-in, inflexible deadlines, and I love giving talks, so the plan will help keep me happy while writing the book.

The first of these talks was on Monday, in my home town of New York.

Order
Martin Chuzzlewit
Martin Chuzzlewit
with kickback
no kickback
'It makes no odds whether a man has a thousand pound, or nothing, there. Particular in New York, I'm told, where Ned landed.'

'New York, was it?' asked Martin, thoughtfully.

'Yes,' said Bill. 'New York. I know that, because he sent word home that it brought Old York to his mind, quite wivid, in consequence of being so exactly unlike it in every respect.'

(Charles Dickens, Martin Chuzzlewit, about which more in some future entry, perhaps.)

The New Yorkers gave me a wonderful welcome, and generously paid my expenses afterward. The only major hitch was that I accidentally wrote my talk about a submission that had come from London. Oops! I must be more careful in the future.

Each time I look at a new program it teaches me something new. Some people, perhaps, seem to be able to reason from general principles to specifics: if you tell them that common code in both branches of a conditional can be factored out, they will immediately see what you mean. Or so they would have you believe; I have my doubts. Anyway, whether they are telling the truth or not, I have almost none of that ability myself. I frequently tell people that I have very little capacity for abstract thought. They sometimes think I'm joking, but I'm not. What I mean is that I can't identify, remember, or understand general principles except as generalizations of specific examples. Whenever I want to study some problem, my approach is always to select a few typical-seeming examples and study them minutely to try to understand what they might have in common. Some people seem to be able to go from abstract properties to conclusions; I can only go from examples.

So my approach to understanding how to improve programs is to collect a bunch of programs, repair them, take notes, and see what sorts of repairs come up frequently, what techniques seem to apply to multiple programs, what techniques work on one program and fail on another, and why, and so on. Probably someone smarter than me would come up with a brilliant general theory about what makes bad programs bad, but that's not how my brain works. My brain is good at coming up with a body of technique. It's a limitation, but it's not all bad.


Order
Concrete Mathematics
Concrete Mathematics
with kickback
no kickback
The goal of generalization had become so fashionable that a generation of mathematicians had become unable to relish beauty in the particular, to enjoy the challenge of solving quantitative problems, or to appreciate the value of technique.

(Ronald L. Graham, Donald E. Knuth, Oren Patashnik, Concrete Mathematics.)

So anyway, here's something I learned from this program. I have this idea now that you should generally avoid the Perl . (string concatenation) operator, because there's almost always a better alternative. The typical use of the . operator looks like this:

          $html = "<a href='".$url."'>".$hot_text."</a>";
It's hard to see here what is code and what is data. You pretty much have to run the Perl lexer algorithm in your head. But Perl has another notation for concatenating strings: "$a$b" concatenates strings $a and $b. If you use this interpolation notation to rewrite the example above, it gets much easier to read:

          $html = "<a href='$url'>$hot_text</a>";
So when I do these classes, I always suggest that whenever you're going to use the . operator, you try writing it as an interpolation too and see which you like better.

This frequently brings on a question about what to do in cases like this:

          $tmpfilealrt = "alert_$daynum" . "_$day" . "_$mon.log" ;
Here you can't eliminate the . operators in this way, because you would get:
          $tmpfilealrt = "alert_$daynum_$day_$mon.log" ;
This fails because it wants to interpolate $daynum_ and $day_, rather than $daynum and $day. Perl has an escape hatch for this situation:
          $tmpfilealrt = "alert_${daynum}_${day}_$mon.log" ;
But it's not clear to me that that is an improvement on the version that used the . operator. The punctuation is only slightly reduced, and you've used an obscure notation that a lot of people won't recognize and that is visually similar to, but entirely unconnected with, hash notation.

Anyway, when this question would come up, I'd discuss it, and say that yeah, in that case it didn't seem to me that the . operator was inferior to the alternatives. But since my review of the program I talked about in New York on Monday, I know a better alternative. The author of that program wrote it like this:

          $tmpfilealrt = "alert_$daynum\_$day\_$mon.log" ;
When I saw it, I said "Duh! Why didn't I think of that?"


[Other articles in category /prs] permanent link

Vitamin A poisoning

Order
On Food and Cooking
On Food and Cooking
with kickback
no kickback
In an earlier post I remarked that "The liver of arctic animals . . . has a toxically high concentration of vitamin D". Dennis Taylor has pointed out that this is mistaken; I meant to say "vitamin A". Thanks, Dennis.

B and C vitamins are not toxic in large doses; they are water-soluble so that excess quantities are easily excreted. Vitamins A and D are not water-soluble, so excess quantities are harder to get rid of. Apparently, though, the liver is capable of storing very large quantities of vitamin D, so that vitamin D poisoning is extremely rare.

The only cases of vitamin A poisoning I've heard of concerned either people who ate the livers of polar bears, walruses, sled dogs, or other arctic animals, or else health food nuts who consumed enormous quantities of pure vitamin A in a misguided effort to prove how healthy it is. In On Food and Cooking, Harold McGee writes:

In the space of 10 days in February of 1974, an English health food enthusiast named Basil Brown took about 10,000 times the recommended requirement of vitamin A, and drank about 10 gallons of carrot juice, whose pigment is a precursor of vitamin A. At the end of those ten days, he was dead of severe liver damage. His skin was bright yellow.

(First edition, p. 536.)

There was a period in my life in which I was eating very large quantities of carrots. (Not for any policy reason; just because I like carrots.) I started to worry that I might hurt myself, so I did a little research. The carrots themselves don't contain vitamin A; they contain beta-carotene, which the body converts internally to vitamin A. The beta-carotene itself is harmless, and excess is easily eliminated. So eat all the carrots you want! You might turn orange, but it probably won't kill you.


[Other articles in category /bio] permanent link

Tue, 24 Jan 2006

Butterflies
Yesterday I visited the American Museum of Natural History in New York City, for the first time in many years. They have a special exhibit of butterflies. They get pupae shipped in from farms, and pin the pupae to wooden racks; when the adults emerge, they get to flutter around in a heated room that is furnished with plants, ponds of nectar, and cut fruit.

The really interesting thing I learned was that chrysalises are not featureless lumps. You can see something of the shape of the animal in them. (See, for example, this Wikipedia illustration.) The caterpillar has an exoskeleton, which it molts several times as it grows. When time comes to pupate, the chrysalis is in fact the final exoskeleton, part of the animal itself. This is in contrast to a cocoon, which is different. A cocoon is a case made of silk or leaves that is not part of the animal; the animal builds it and lives inside. When you think of a featureless round lump, you're thinking of a cocoon.

Until recently, I had the idea that the larva's legs get longer, wings sprout, and so forth, but it's not like that at all. Instead, inside the chrysalis, almost the entire animal breaks down into a liquid! The metamorphosis then reorganizes this soup into an adult. I asked the explainer at the Museum if the individual cells retained their identities, or if they were broken down into component chemicals. She didn't know, unfortunately. I hope to find this out in coming weeks.

How does the animal reorganize itself during metamorphosis? How does its body know what new shape to grow into? It's all a big mystery. It's nice that we still have big mysteries. Not all mysteries have survived the scientific revolution. What makes the rain fall and the lightning strike? Solved problems. What happens to the food we eat, and why do we breathe? Well-understood. How does the butterfly reorganize itself from caterpillar soup? It's a big puzzle.

A related puzzle is how a single cell turns into a human baby during gestation. For a while, the thing doubles, then doubles again, and again, becoming roughly spherical, as you'd expect. But then stuff starts to happen: it dimples, and folds over; three layers form, a miracle occurs, and eventually you get a small but perfectly-formed human being. How do the cells in the fingers decide to turn into fingers? How does the cells in the fourth finger know they're one finger from one side of the hand and three fingers from the other side? Maybe the formation of the adult insect inside the chrysalis uses a similar mechanism. Or maybe it's completely different. Both possibilities are mind-boggling.

This is nowhere near being the biggest pending mystery; I think we at least have some idea of where to start looking for the answer. Contrast this with the question of how it is we are conscious, where nobody even has a good idea of what the question is.

Other caterpillar news: chrysalides are so named because they often have a bright golden sheen, or golden features. (Greek "khrusos" is "gold".) The Wikipedia picture of this is excellent too. The "gold" is a yellow pigmented area covered with a shiny coating. The explainer said that some people speculate that it helps break up the outlines of the pupa and camouflage it.

I asked if the chrysalis of the viceroy butterfly, which, as an adult, resembles the poisonous monarch butterfly, also resembled the monarch's chrysalis. The answer: no, they look completely different. Isn't that interesting? You'd think that the pupa would get at least as much benefit from mimicry as the adult. One possible explanation why not: most pupae don't make it to adulthood anyway, so the marginal benefit to the species from mimicry in the pupal stage is small compared with the benefit in the adult stage. Another: the pupa's main defense, which is not available to the adult, is to be difficult to see; beyond that it doesn't matter much what happens if it is seen. Which is correct? I don't know.

For a long time folks thought that the monarch was poisonous and the viceroy was not, and that the viceroy's monarch-like coloring tricked predators into avoiding it unnecessarily. It's now believed that both speciies are poisonous and bad-tasting, and that their similar coloring therefore protects both species. A predator who eats one will avoid both in the future. The former kind of mimicry is called Batesian; the latter, Müllerian.

The monarch butterfly does not manufacture its toxic and bad-tasting chemicals itself. It is poisonous because it ingests poisonous chemicals in its food, which I think is milkweed plants. Plant chemistry is very weird. Think of all the poisonous foods you've ever heard of. Very few of them are animals. (The only poisonous meat I can think of offhand is the liver of arctic animals, which has a toxically high concentration of vitamin D.) If you're stuck on a desert island, you're a lot safer eating strange animals than you are eating strange berries.


[Other articles in category /bio] permanent link

Franklin and Daylight Saving Time
You often hear it asserted that Benjamin Franklin was the inventor of daylight saving time. But it's really not true.

The essential feature of DST is that there is an official change to the civil calendar to move back all the real times by one hour. Events that were scheduled to occur at noon now occur at 11 AM, because all the clocks say noon when it's really 11 AM.

The proposal by Franklin that's cited as evidence that he invented DST doesn't propose any such thing. It's a letter to the editors of The Journal of Paris, originally sent in 1784. There are two things you should know about this letter: First, it's obviously a joke. And second, what it actually proposes is just that people should get up earlier!

I went home, and to bed, three or four hours after midnight. . . . An accidental sudden noise waked me about six in the morning, when I was surprised to find my room filled with light. . . I got up and looked out to see what might be the occasion of it, when I saw the sun just rising above the horizon, from whence he poured his rays plentifully into my chamber. . .

. . . still thinking it something extraordinary that the sun should rise so early, I looked into the almanac, where I found it to be the hour given for his rising on that day. . . . Your readers, who with me have never seen any signs of sunshine before noon, and seldom regard the astronomical part of the almanac, will be as much astonished as I was, when they hear of his rising so early; and especially when I assure them, that he gives light as soon as he rises. I am convinced of this. I am certain of my fact. One cannot be more certain of any fact. I saw it with my own eyes. And, having repeated this observation the three following mornings, I found always precisely the same result.

I considered that, if I had not been awakened so early in the morning, I should have slept six hours longer by the light of the sun, and in exchange have lived six hours the following night by candle-light; and, the latter being a much more expensive light than the former, my love of economy induced me to muster up what little arithmetic I was master of, and to make some calculations. . .

Franklin then follows with a calculation of the number of candles that would be saved if everyone in Paris got up at six in the morning instead of at noon, and how much money would be saved thereby. He then proposes four measures to encourage this: that windows be taxed if they have shutters; that "guards be placed in the shops of the wax and tallow chandlers, and no family be permitted to be supplied with more than one pound of candles per week", that travelling by coach after sundown be forbidden, and that church bells be rung and cannon fired in the street every day at dawn.

Franklin finishes by offering his brilliant insight to the world free of charge or reward:

I expect only to have the honour of it. And yet I know there are little, envious minds, who will, as usual, deny me this and say, that my invention was known to the ancients, and perhaps they may bring passages out of the old books in proof of it. I will not dispute with these people, that the ancients knew not the sun would rise at certain hours; they possibly had, as we have, almanacs that predicted it; but it does not follow thence, that they knew he gave light as soon as he rose. This is what I claim as my discovery.
As usual, the complete text is available online.

OK, I'm not done yet. I think the story of how I happened to find this out might be instructive.

I used to live at 9th and Pine streets, across from Pennsylvania Hospital. (It's the oldest hospital in the U.S.) Sometimes I would get tired of working at home and would go across the street to the hospital to read or think. Hospitals in general are good for that: they are well-equipped with lounges, waiting rooms, comfortable chairs, sofas, coffee carts, cafeterias, and bathrooms. They are open around the clock. The staff do not check at the door to make sure that you actually have business there. Most of the people who work in the hospital are too busy to notice if you have been hanging around for hours on end, and if they do notice they will not think it is unusual; people do that all the time. A hospital is a great place to work unmolested.

Pennsylvania Hospital is an unusually pleasant hospital. The original building is still standing, and you can go see the cornerstone that was laid in 1755 by Franklin himself. It has a beautful flower garden, with azaleas and wisteria, and a medicinal herb garden. Inside, the building is decorated with exhibits of art and urban archaeology, including a fire engine that the hospital acquired in 1780, and a massive painting of Christ healing the sick, originally painted by Benjamin West so that the hospital could raise funds by charging people a fee to come look at it. You can visit the 19th-century surgical amphitheatre, with its observation gallery. Even the food in the cafeteria is way above average. (I realize that that is not saying much, since it is, after all, a hospital cafeteria. But it was sufficiently palatable to induce me to eat lunch there from time to time.)

Having found so many reasons to like Pennsylvania Hospital, I went to visit their web site to see what else I could find out. I discovered that the hospital's clinical library, adjacent to the surgical amphitheatre, was open to the public. So I went to visit a few times and browsed the stacks.

Order
Ingenious Dr. Franklin
Ingenious Dr. Franklin
with kickback
no kickback
Mostly, as you would expect, they had a lot of medical texts. But on one of these visits I happened to notice a copy of Ingenious Dr. Franklin: Selected Scientific Letters of Benjamin Franklin on the shelf. This caught my interest, so I sat down with it. It contained all sorts of good stuff, including Franklin's letter on "Daylight Saving". Here is the table of contents:

Preface
The Ingenious Dr. Franklin
Daylight Saving
Treatment for Gout
Cold Air Bath
Electrical Treatment for Paralysis
Lead Poisoning
Rules of Health and Long Life
The Art of Procuring Pleasant Dreams
Learning to Swim
On Swimming
Choosing Eye-Glasses
Bifocals
Lightning Rods
Advantage of Pointed Conductors
Pennsylvanian Fireplaces
Slaughtering by Electricity
Canal Transportation
Indian Corn
The Armonica
First Hydrogen Balloon
A Hot-Air Balloon
First Aerial Voyage by Man
Second Aerial Voyage by Man
A Prophecy on Aerial Navigation
Magic Squares
Early Electrical Experiments
Electrical Experiments
The Kite
The Course and Effect of Lightning
Character of Clouds
Musical Sounds
Locating the Gulf Stream
Charting the Gulf Stream
Depth of Water and Speed of Boats
Distillation of Salt Water
Behavior of Oil on Water
Earliest Account of Marsh Gas
Smallpox and Cancer
Restoration of Life by Sun Rays
Cause of Colds
Definition of a Cold
Heat and Cold
Cold by Evaporation
On Springs
Tides and Rivers
Direction of Rivers
Salt and Salt Water
Origin of Northeast Storms
Effect of Oil on Water
Spouts and Whirlwinds
Sun Spots
Conductors and Non-Conductors
Queries on Electricity
Magnetism and the Theory of the Earth
Nature of Lightning
Sound
Prehistoric Animals of the Ohio
Toads Found in Stone
Checklist of Letters and Papers
List of Correspondents
List of a Few Additional Letters
I'm sure that anyone who bothers to read my blog would find at least some of those items appealing. I certainly did.

Anyway, the moral of the story, as I see it, is: If you make your way into strange libraries and browse through the stacks, sometimes you find some good stuff, so go do that once in a while.


[Other articles in category /calendar] permanent link

Mon, 23 Jan 2006

The Bowdlerization of Dr. Dolittle

Order
The Voyages of Doctor Dolittle
The Voyages of Doctor Dolittle
with kickback
no kickback
In 1920 Hugh Lofting wrote and illustrated The Story of Doctor Dolittle, an account of a small-town English doctor around 1840 who learns to speak the languages of animals and becomes the most successful veterinarian the world has ever seen. The book was a tremendous success, and spawned thirteen sequels, two posthumously. The 1922 sequel, The Voyages of Doctor Dolittle, won the prestigious Newbery award. The books have been reprinted many times, and the first two are now in the public domain in the USA, barring any further meddling by Congress with the copyright statute. The Voyages of Doctor Dolittle was one of my favorite books as a child, and I know it by heart. I returned the original 1922 copy that I had to my grandmother shortly before she died, and replaced it with a 1988 reprinting, the "Dell Centenary Edition". On reading the new copy, I discovered that some changes had been made to the text—I had heard that a recent edition of the books had attempted to remove racist references from them, and I discovered that my new 1988 copy was indeed this edition.

Order
The Voyages of Doctor Dolittle (Bowdlerized)
The Voyages of Doctor Dolittle (Bowdlerized)
with kickback
no kickback
The 1988 reprinting contains an afterword by Christopher Lofting, the son of Hugh Lofting, and explains why the changes were made:

When it was decided to reissue the Doctor Dolittle books, we were faced with a challenging opportunity and decision. In some of the books there were certain incidents depicted that, in light of today's sensitivities, were considered by some to be disrespectful to ethnic minorities and, therefore, perhaps inappropriate for today's young reader. In these centenary editions, this issue is addressed.

. . . After much soul-searching the consensus was that changes should be made. The deciding factor was the strong belief that the author himself would have immediately approved of making the alterations. Hugh Lofting would have been appalled at the suggestion that any part of his work could give offense and would have been the first to have made the changes himself. In any case, the alterations are minor enough not to interfere with the style and spirit of the original.

This note will summarize some of the changes to The Voyages of Doctor Dolittle. I have not examined the text exhaustively. I worked from memory, reading the Centenary Edition, and when I thought I noticed a change, I crosschecked the text against the Project Gutenberg version of the original text. So this does not purport to be a complete listing of all the changes that were made. But I do think it is comprehensive enough to give a sense of what was changed.

Many of the changes concern Prince Bumpo, a character who first appeared in The Story of Doctor Dolittle. Bumpo is a black African prince, who, at the beginning of Voyages, is in England, attending school at Oxford. Bumpo is a highly sympathetic character, but also a comic one. In Voyages his speech is sprinkled with inappropriate "Oxford" words: he refers to "the college quadrilateral", and later says "I feel I am about to weep from sediment", for example. Studying algebra makes his head hurt, but he says "I think Cicero's fine—so simultaneous. By the way, they tell me his son is rowing for our college next year—charming fellow." None of this humor at Bumpo's expense has been removed from the Centenary Edition.

Bumpo's first appearance in the book, however, has been substantially cut:

The Doctor had no sooner gone below to stow away his note-books than another visitor appeared upon the gang-plank. This was a most extraordinary-looking black man. The only other negroes I had seen had been in circuses, where they wore feathers and bone necklaces and things like that. But this one was dressed in a fashionable frock coat with an enormous bright red cravat. On his head was a straw hat with a gay band; and over this he held a large green umbrella. He was very smart in every respect except his feet. He wore no shoes or socks.

In the revised edition, this is abridged to:

The Doctor had no sooner gone below to stow away his note-books than another visitor appeared upon the gang-plank. This was a black man, very fashionably dressed. (p. 128)

I think it's interesting that they excised the part about Bumpo being barefooted, because the explanation of his now unmentioned barefootedness still appears on the following page. (The shoes hurt his feet, and he threw them over the wall of "the college quadrilateral" earlier that morning.) Bumpo's feet make another appearance later on:

I very soon grew to be quite fond of our funny black friend Bumpo, with his grand way of speaking and his enormous feet which some one was always stepping on or falling over.
The only change to this in the revised version is the omission of the word 'black'. (p.139)

This is typical. Most of the changes are excisions of rather ordinary references to the skin color of the characters. For example, the original:

It is quite possible we shall be the first white men to land there. But I daresay we shall have some difficulty in finding it first."
The bowdlerized version omits 'white men'. (p.120.)

Another typical cut:

"Great Red-Skin," he said in the fierce screams and short grunts that the big birds use, "never have I been so glad in all my life as I am to-day to find you still alive."

In a flash Long Arrow's stony face lit up with a smile of understanding; and back came the answer in eagle-tongue.

"Mighty White Man, I owe my life to you. For the remainder of my days I am your servant to command."

(Long Arrow has been buried alive for several months in a cave.) The revised edition replaces "Great Red-Skin" with "Great Long Arrow", and "Mighty White Man" with "Mighty Friend". (p.223)

Another, larger change of this type, where apparently value-neutral references to skin color have been excised, is in the poem "The Song of the Terrible Three" at the end of part V, chapter 5. The complete poem is:

THE SONG OF THE TERRIBLE THREE

Oh hear ye the Song of the Terrible Three
And the fight that they fought by the edge of the sea.
Down from the mountains, the rocks and the crags,
Swarming like wasps, came the Bag-jagderags.

Surrounding our village, our walls they broke down.
Oh, sad was the plight of our men and our town!
But Heaven determined our land to set free
And sent us the help of the Terrible Three.

One was a Black—he was dark as the night;
One was a Red-skin, a mountain of height;
But the chief was a White Man, round like a bee;
And all in a row stood the Terrible Three.

Shoulder to shoulder, they hammered and hit.
Like demons of fury they kicked and they bit.
Like a wall of destruction they stood in a row,
Flattening enemies, six at a blow.

Oh, strong was the Red-skin fierce was the Black.
Bag-jagderags trembled and tried to turn back.
But 'twas of the White Man they shouted, "Beware!
He throws men in handfuls, straight up in the air!"

Long shall they frighten bad children at night
With tales of the Red and the Black and the White.
And long shall we sing of the Terrible Three
And the fight that they fought by the edge of the sea.
The ten lines in boldface have been excised in the revised version. Also in this vicinity, the phrase "the strength and weight of those three men of different lands and colors" has been changed to omit "and colors". (pp. 242-243)

Here's an interesting change:

Long Arrow said they were apologizing and trying to tell the Doctor how sorry they were that they had seemed unfriendly to him at the beach. They had never seen a white man before and had really been afraid of him—especially when they saw him conversing with the porpoises. They had thought he was the Devil, they said.
The revised edition changes 'a white man' to 'a man like him' (which seems rather vague) and makes 'devil' lower-case.

In some cases the changes seem completely bizarre. When I first heard that the books had been purged of racism I immediately thought of this passage, in which the protagonists discover that a sailor has stowed away on their boat and eaten all their salt beef (p. 142):

"I don't know what the mischief we're going to do now," I heard her whisper to Bumpo. "We've no money to buy any more; and that salt beef was the most important part of the stores."

"Would it not be good political economy," Bumpo whispered back, "if we salted the able seaman and ate him instead? I should judge that he would weigh more than a hundred and twenty pounds."

"How often must I tell you that we are not in Jolliginki," snapped Polynesia. "Those things are not done on white men's ships—Still," she murmured after a moment's thought, "it's an awfully bright idea. I don't suppose anybody saw him come on to the ship—Oh, but Heavens! we haven't got enough salt. Besides, he'd be sure to taste of tobacco."

I was expecting major changes to this passage, or its complete removal. I would never have guessed the changes that were actually made. Here is the revised version of the passage, with the changed part marked in boldface:

"I don't know what the mischief we're going to do now," I heard her whisper to Bumpo. "We've no money to buy any more; and that salt beef was the most important part of the stores."

"Would it not be good political economy," Bumpo whispered back, "if we salted the able seaman and ate him instead? I should judge that he would weigh more than a hundred and twenty pounds."

"Don't be silly," snapped Polynesia. "Those things are not done anymore.—Still," she murmured after a moment's thought, "it's an awfully bright idea. I don't suppose anybody saw him come on to the ship—Oh, but Heavens! we haven't got enough salt. Besides, he'd be sure to taste of tobacco."

The reference to 'white men' has been removed, but rest of passage, which I would consider to be among the most potentially offensive of the entire book, with its association of Bumpo with cannibalism, is otherwise unchanged. I was amazed. It is interesting to notice that the references to cannibalism have been excised from a passage on page 30:

"There were great doings in Jolliginki when he left. He was scared to death to come. He was the first man from that country to go abroad. He thought he was going to be eaten by white cannibals or something.

The revised edition cuts the sentence about white cannibals. The rest of the paragraph continues:

"You know what those niggers are—that ignorant! Well!—But his father made him come. He said that all the black kings were sending their sons to Oxford now. It was the fashion, and he would have to go. Bumpo wanted to bring his six wives with him. But the king wouldn't let him do that either. Poor Bumpo went off in tears—and everybody in the palace was crying too. You never heard such a hullabaloo."

The revised version reads:

"But his father made him come. He said that all the African kings were sending their sons to Oxford now. It was the fashion, and he would have to go. Poor Bumpo went off in tears—and everybody in the palace was crying too. You never heard such a hullabaloo."

The six paragraphs that follow this, which refer to the Sleeping Beauty subplot from the previous book, The Story of Doctor Dolittle, have been excised. (More about this later.)

There are some apparently trivial changes:

"Listen," said Polynesia, "I've been breaking my head trying to think up some way we can get money to buy those stores with; and at last I've got it."

"The money?" said Bumpo.

"No, stupid. The idea—to make the money with."

The revised edition omits 'stupid'. (p.155) On page 230:

"Poor perishing heathens!" muttered Bumpo. "No wonder the old chief died of cold!"
becomes
"No wonder the old chief died of cold!" muttered Bumpo.
I gather from other people's remarks that the changes to The Story of Doctor Dolittle were much more extensive. In Story (in which Bumpo first appears) there is a subplot that concerns Bumpo wanting to be made into a white prince. The doctor agrees to do this in return for help escaping from jail.

When I found out this had been excised, I thought it was unfortunate. It seems to me that it was easy to view the original plot as a commentary on the cultural appropriation and racism that accompanies colonialism. (Bumpo wants to be a white prince because he has become obsessed with European fairy tales, Sleeping Beauty in particular.) Perhaps had the book been left intact it might have sparked discussion of these issues. I'm told that this subplot was replaced with one in which Bumpo wants the Doctor to turn him into a lion.


[Other articles in category /book] permanent link

Fri, 20 Jan 2006

Franklin is indeed 300 years old
I can now happily report that my determination that Benjamin Franklin is only 299 years old this year was mistaken. To my relief, Franklin is really 300 years old after all.

After hearing an alternative analysis from Corprew Reed, I double-checked with Daniel K. Richter, a Professor of History at the University of Pennsylvania, and director of the new McNeil Center for Early American Studies.

Richter confirms Reed's analysis: By the 18th century, nearly everyone was reckoning years to start on 1 January except certain official legal documents. The official change of New Year's day was only to bring the legal documents into conformance with what everyone was already doing. So when Franklin's birthdate is reported as 6 January 1706, it means 1706 according to modern reckoning (that is, January 300 years ago) and not 1706 in the "official" reckoning (which would have been only 299 years ago).

Deke Kassabian also wrote in with a helpful reference, referring me to an article that appeared Wednesday in Slate. The relevant part says:

. . . according to documents from Boston's city registrar, he actually came into the world on the old-style Jan. 6, 1705. So, this year's tricentennial is right on time.

So the matter is cleared up, and in the best possible way. Many thanks to Deke, Corprew, and Professor Richter.


[Other articles in category /calendar] permanent link

Thu, 19 Jan 2006

Franklin is probably 300 years old after all
In a recent post, I surmised that Benjamin Franklin is only 299 years old this year, not 300, because of rejiggering of the start of the calendar year in England and its colonies in 1751/1752.

However, Corprew Reed writes to suggest that I am mistaken. Reed points out that although the legal start of the year prior to 1752 was 25 March, the common usage was to cite 1 January as the start of the year. The the British Calendar Act of 1751 even says as much:

WHEREAS the legal Supputation of the Year . . . according to which the Year beginneth on the 25th Day of March, hath been found by Experience to be attended with divers Inconveniencies, . . . as it differs . . . from the common Usage throughout the whole Kingdom. . .

So Reed suggests that when Franklin (and others) report his birthdate as being 6 January 1706, they are referring to "common usage", the winter of the official, legal year 1705, and thus that Franklin really was born exactly 300 years ago as of Tuesday.

If so, this would be a great relief to me. It was really bothering me that everyone might be clebrating Franklin's 300th birthday a year early without realizing it.

I'm going to try to see who here at