The Universe of Discourse

Tue, 03 Jan 2012

Eta-reduction in Haskell and English
The other day Iris and I were putting together a model, and she asked what a certain small green part was for. I said "It's a thing for connecting a thing to another thing."

Iris objected that this was a completely unhelpful explanation, but I disagreed. I would have agreed that it was an excessively verbose explanation, but she didn't argue that point.

Later, it occurred to me that Haskell has a syntax for eliding unnecessary variables in cases like this. In Haskell, one can abbreviate the expression

        λx → λy → x + y

to just (+). (Perl users may find it helpful to know that the Perl equivalent of the expression above is sub { my ($x) = @_; return sub { my ($y) = @_; return $x +$y } }.) This is an example of a general transformation called η-reduction. In general, for any function f, λxf x is a function that takes an argument x and returns f x. But that's exactly what f does. So we can replace the longer version with the shorter version, and that's η-reduction, or we can go the other way, which is η-expansion.

Anyway, once I thought of this it occurred to me that, just like the longer expression could be reduced to (+), my original explanation that the small green part was "a thing for connecting a thing to another thing" could be η-reduced to "a connector".

Perhaps if I had said that in the first place Iris would not have complained.

Happy new year, all readers.

Wed, 20 May 2009

No flimping
Advance disclaimer: I am not a linguist, have never studied linguistics, and am sure to get some of the details wrong in this article. Caveat lector.

There is a standard example in linguistics that is attached to the word "flimp". The idea it labels is that certain grammatical operations are restricted in the way they behave, and cannot reach deeply into grammatical structures and rearrange them.

For instance, you can ask "What did you use to see the girl on the hill in the blue dress?" and I can reply "I used a telescope to see the girl on the hill in the blue dress". Here "the girl on the hill in the blue dress" is operating as a single component, which could, in principle, be arbitrarily long. ("The girl on the hill that was fought over in the war between the two countries that have been at war since the time your mother saw that monkey climb the steeple of the church...") This component can be extracted whole from one sentence and made the object of a new sentence, or the subject of some other sentence.

But certain other structures are not transportable. For example, in "Bill left all his money to Fred and someone", one can reach down as far as "Fred and someone" and ask "What did Bill leave to Fred and someone?" but one cannot reach all the way down to "someone" and ask "Who did Bill leave all his money to Fred and"?

Under certain linguistic theories of syntax, analogous constraints rule out the existence of certain words. "Flimped" is the hypothetical nonexistent word which, under these theories, cannot exist. To flimp is to kiss a girl who is allergic to. For example, to flimp coconuts is to kiss a girl who is allergic to coconuts. (The grammatical failure in the last sentence but one illustrates the syntactic problem that supposedly rules out the word "flimped".

I am not making this up; for more details (from someone who, unlike me, may know what he is talking about) See Word meaning and Montague grammar by David Dowty, p. 236. Dowty cites the earlier sources, from 1969–1973 who proposed this theory in the first place. The "flimped" example above is exactly the same as Dowty's, and I believe it is the standard one.

Dowty provides a similar, but different example: there is not, and under this theory there cannot be, a verb "to thork" which means "to lend your uncle and", so that "John thorked Harry ten dollars" would mean "John lent his uncle and Harry ten dollars".

I had these examples knocking around in my head for many years. I used to work for the University of Pennsylvania Computer and Information Sciences department, and from my frequent contacts with various cognitive-science types I acquired a lot of odds and ends of linguistic and computational folklore. Michael Niv told me this one sometime around 1992.

The "flimp" thing rattled around my head, surfacing every few months or so, until last week, when I thought of a counterexample: Wank.

The verb "to wank to" means "to rub one's genitals while considering", and so seems to provide a countexample to the theory that says that verbs of this type are illegal in English.

When I went to investigate, I found that the theory had pretty much been refuted anyway. The Dowty book (published 1979) produced another example: "to cuckold" is "to have sexual intercourse with the woman who is married to".

Some Reddit person recently complained that one of my blog posts had no point. Eat this, Reddit person.

Fri, 08 May 2009

Most annoying phrase known to man?
I have been wasting time, those precious minutes of my life that will never return, by eliminating the odious phrase "known to man" from Wikipedia articles. It is satisfying, in much the same way as doing the crossword puzzle, or popping bubble wrap.

In the past I have gone on search-and-destroy missions against certain specific phrases, for example "It should be noted that...", which can nearly always be replaced with "" with no loss of meaning. But "known to man" is more fun.

One pleasant property of this phrase is that one can sidestep the issue of whether "man" is gender-neutral. People on both sides of this argument can still agree that "known to man" is best replaced with "known". For example:

• The only albino gorilla known to man...
• The most reactive and electronegative substance known to man...
• Copper and iron were known to man well before the copper age and iron age...
In examples like these, "to man" is superfluous, and one can delete it with no regret.

As a pleonasm and a cliché, "known to man" is a signpost to prose that has been written by someone who was not thinking about what they were saying, and so one often finds it amid other prose that is pleonastic and clichéd. For example:

Diamond ... is one of the hardest naturally occurring material known (another harder substance known today is the man-made substance aggregated diamond nanorods which is still not the hardest substance known to man).
Which I trimmed to say:

Diamond ... is one of the hardest naturally-occurring materials known. (Some artificial substances, such as aggregated diamond nanorods, are harder.)
Many people ridicule Strunk and White's fatuous advice to "omit needless words"—if you knew which words were needless, you wouldn't need the advice—but all editors know that beginning writers will use ten words where five will do. The passage above is a good example.

Can "known to man" always be improved by replacement with "known"? I might have said so yesterday, but I mentioned the issue to Yaakov Sloman, who pointed out that the original use was meant to suggest a contrast not with female knowledge but with divine knowledge, an important point that completely escaped my atheist self. In light of this observation, it was easy to come up with a counterexample: "His acts descended to a depth of evil previously unknown to man" partakes of the theological connotations very nicely, I think, and so loses some of its force if it is truncated to "... previously unknown". I suppose that many similar examples appear in the work of H. P. Lovecraft.

It would be nice if some of the Wikipedia examples were of this type, but so far I haven't found any. The only cases so far that I haven't changed are all direct quotations, including several from the introductory narration of The Twilight Zone, which asserts that "There is a fifth dimension beyond that which is known to man...". I like when things turn out better than I expected, but this wasn't one of those times. Instead, there was one example that was even worse than I expected. Bad writing it may be, but the wrongness of "known to man" is at least arguable in most cases. (An argument I don't want to make today, although if I did, I might suggest that "titanium dioxide is the best whitening agent known to man" be rewritten as "titanium dioxide is the best whitening agent known to persons of both sexes with at least nine and a half inches of savage, throbbing cockmeat.") But one of the examples I corrected was risibly inept, in an unusual way:

Wonder Woman's Amazon training also gave her limited telepathy, profound scientific knowledge, and the ability to speak every language known to man.
I have difficulty imagining that the training imparted to Diana, crown princess of the exclusively female population of Paradise Island, would be limited to languages known to man.

Earle Martin drew my attention to the Wikipedia article on "The hardest metal known to man". I did not dare to change this.

[ Addendum 20090515: There is a followup article. ]

Sun, 15 Feb 2009

"She is not 'your' girlfriend," said this knucklehead. "She does not belong to you."
Through pure happenstance, I discovered last night that there is an account of this same bit of equivocation in Plato's Euthydemus. In this dialogue, Socrates tells of a sophist named Dionysodorus, who is so clever that he can refute any proposition, whether true or false. Here Dionysodorus demonstrates that Ctesippus's father is a dog:

You say that you have a dog.

Yes, a villain of a one, said Ctesippus.

And he has puppies?

Yes, and they are very like himself.

And the dog is the father of them?

Yes, he said, I certainly saw him and the mother of the puppies come together.

And is he not yours?

To be sure he is.

Then he is a father, and he is yours; ergo, he is your father, and the puppies are your brothers.

So my knuckleheaded interlocutor was not even being original.

I gratefully acknowledge the gift of Thomas Guest. Thank you very much!

Fri, 31 Oct 2008

A proposed correction to an inconsistency in English orthography
English contains exactly zero homophones of "zero", if one ignores the trivial homophone "zero", as is usually done.

English also contains exactly one homophone of "one", namely "won".

English does indeed contain two homophones of "two": "too" and "to".

However, the expected homophones of "three" are missing. I propose to rectify this inconsistency. This is sure to make English orthography more consistent and therefore easier for beginners to learn.

I suggest the following:

thrie
threigh
thurry
I also suggest the founding of a well-funded institute with the following mission:

1. Determine the meanings of these three new homophones
2. Conduct a public education campaign to establish them in common use
3. Lobby politicians to promote these new words by legislation, educational standards, public funding, or whatever other means are appropriate
4. Investigate the obvious sequel issues: "four" has only "for" and "fore" as homophones; what should be done about this?
Obviously, the director of this institute should be a thoughtful, far-seeing individual who will not allow his good judgement to be clouded by the generous salary. I refer, of course, to myself.

Happy Halloween. All Hail Discordia.

Wed, 14 May 2008

More artificial Finnish
Several Finns wrote to me to explain in some detail what was wrong with the artificial Finnish in yesterday's article. As I surmised, the words "ssän" and "kkeen" are lexically illegal in Finnish. There were a number of similar problems. For example, my sample output included the non-word "t". I don't know how this could have happened, since the input probably didn't include anything like that, and the Markov process I used to generate it shouldn't have done so. But the code is lost, so I suppose I'll never know.

Of the various comments I received, perhaps the most interesting was from Ilmari Vacklin. ("Vacklin", huh? If my program had generated "Vacklin", the Finns would have been all over the error.) M. Vacklin pointed out that a number of words in my sample output violated the Finnish rules of vowel harmony.

(M. Vacklin also suggested that my article must have been inspired by this comic, but it wasn't. I venture to guess that the Internet is full of places that point out that you can manufacture pseudo-Finnish by stringing together a lot of k's and a's and t's; it's not that hard to figure out. Maybe this would be a good place to mention the word "saippuakauppias", the Finnish term for a soap-dealer, which was in the Guinness Book of World Records as the longest commonly-used palindromic word in any language.)

Anyway, back to vowel harmony. Vowel harmony is a phenomenon found in certain languages, including Finnish. These languages class vowels into two antithetical groups. Vowels from one group never appear in the same word as vowels from the other group. When one has a prefix or a suffix that normally has a group A vowel, and one wants to join it to a word with group B vowels, the vowel in the suffix changes to match. This happens a lot in Finnish, which has a zillion suffixes. In many languages, including Finnish, there is also a third group of vowels which are "neutral" and can be mixed with either group A or with group B.

Modern Korean does not have vowel harmony, mostly, but Middle Korean did have it, up until the early 16th century. The Korean alphabet was invented around 1443, and the notation for the vowels reflected the vowel harmony:

[ Addendum 20080517: The following paragraph about vowel harmony contains significant errors of fact. I got the groups wrong. ]

The first four vowels in this illustration, with the vertical lines, were incompatible with the second four vowels, the ones with the horizontal lines. The last two vowels were neutral, as was another one, not shown here, which was written as a single dot and which has since fallen out of use. Incidentally, vowel harmony is an unusual feature of languages, and its presence in Korean has led some people to suggest that it might be distantly related to Turkish.

The vowel harmony thing is interesting in this context for the following reason. My pseudo-Finnish was generated by a Markov process: each letter was selected at random so as to make the overall frequency of the output match that of real Finnish. Similarly, the overall frequency of two- and three-letter sequences in pseudo-Finnish should match that in real Finnish. Is this enough to generate plausible (although nonsensical) Finnish text? For English, we might say maybe. But for Finnish the answer is no, because this process does not respect the vowel harmony rules. The Markov process doesn't remember, by the time it gets to the end of a long word, whether it is generating a word in vowel category A or B, and so it doesn't know which vowels it whould be generating. It will inevitably generate words with mixed vowels, which is forbidden. This problem does not come up in the generation of pseudo-English.

None of that was what I was planning to write about, however. What I wanted to do was to present samples of pseudo-Finnish generated with various tunings of the Markov process.

The basic model is this: you choose a number N, say 2, and then you look at some input text. For each different sequence of N characters, you count how many times that sequence is followed by "a", how many times it is followed by "b", and so on.

Then you start generating text at random. You pick a sequence of N characters arbitrarily to start, and then you generate the next character according to the probabilities that you calculated. Then you look at the last N characters (the last N-1 from before, plus the new one) and repeat. You keep doing that until you get tired.

For example, suppose we have N=2. Then we have a big table whose keys are 2-character strings like "ab", and then associated with each such string, a table that looks something like this:
 r 54.52 a 15.89 i 10.41 o 7.95 l 4.11 e 3.01 u 1.1 space 0.82 : 0.55 t 0.55 , 0.27 . 0.27 b 0.27 s 0.27
So in the input to this process, "ab" was followed by "r" more than 54% of the time, by "a" about 16% of the time, and so on. And when generating the output, every time our process happens to generate "ab", it will follow by generating an "r" 54.52% of the time, an "a" 15.89% of the time, and so on.

Whether to count capital letters as the same as lowercase, and what to do about punctuation and spaces and so forth, are up to the designer.

Here, as examples, are some samples of pseudo-English, generated with various N. The input text was the book of Genesis, which is not entirely typical. In each case, I deleted the initial N characters and the final partial word, cleaned up the capitalization by hand, and appended a final period.

N=0
Lt per f idd et oblcs hs hae:uso ar w aaolt y tndh rl ohn otuhrthpboleel.ee n synenihbdrha,spegn.
N=1
Cachand t wim, heheethas anevem blsant ims, andofan, ieahrn anthaye s, lso iveeti alll t tand, w.
N=2
Ged hich callochbarthe of th to tre said nothem, and rin ing of brom. My and he behou spend the.
N=3
Sack one eved of and refor ther of the hand he will there that in the ful, when it up unto rangers.
It should be clear that the quality improves as one increases the N parameter. The N=3 sample has mostly real words, and the few nonsense ones it contains ("eved", "ful") are completely plausible English. N=2, on the other hand, is mostly nonsense, although it's mostly plausible nonsense. Even "callochbarthe" is almost plausible. (The unfortunate "chb" in the middle is just bad luck. It occurs because Genesis 36 mentions Baalhanan the son of Achbor.) The N=1 sample is recognizably bogus; no English word looks like "ieahrn", and the triple "l" in "alll" is nearly impossible. (I did once write to Jesse Sheidlower, an editor of the Big Dictionary, to ask his advice about whether "ballless" should be hyphenated.)

I have prepared samples of pseudo-Finnish of various qualities. The input here was a bunch of text I copied out of Finnish Wikipedia. (Where else? If you need Finnish text in 1988, you get it from the Usenet fi.talk group; if you need Finnish text in 2008, you get it from Finnish Wikipedia.) I did a little bit of manual cleanup, as with the English, but not too much.

N=0
Vtnnstäklun so so rl sieesjo.Aiijesjeäyuiotiannorin traäl.N vpojanti jonn oteaanlskmt enhksaiaaiiv oenlulniavas. Rottlatutsenynöisu iikännam e lavantkektann eaagla admikkosulssmpnrtinrkudilsorirumlshsmoti,anlosa anuioessydshln.Atierisllsjnlu e.Itatlosyhi vnko ättr otneän akho smalloailäi jiaat kajvtaopnasneilstio tntin einteaonaiimotn:r apoya oruasnainttotne wknaiossäelaäinoev aobrs,vteorlokynv. Aevsrikhanä tp s s oälnlke rvmi il ynae nara ign ssm lkimttbhineaatismäi tst lli ahaltineshne kr keöunv ah s itenh s .Ia pa elstpnanmnuiksriil anaalnttt mr ti.Ooa ka eee eiiei,tnees äusee a nanhetv.Iopkijeatatits,i l eklbiik suössmap tioaotaktdiir rkeaviohiesotkeagarihv nnadvö jlape öt kaeakmjkhykoto tnt iunnuyknnelu rutliie.Leva eiriaösnaj,rk oyumtsle,iioa,aspa aeiaä wsuinn eta y tvati klssviutkuaktmlpnheomi.T akapskushhnuksnhnnheaaaaussitseminmpnamäiaä pät.Kaaaabl unnionuhnpa iaes,outka.Cväinvkshvrnlteeoea rmi re suodmpr autlysa tnliaanäass. Srs rnvrtsita kmidusvjn tii.
N=1
Ava pän svun kerekent lsita batävomenasttenerga kovosuujalules rma punntäni rtraliksainoi van eukällä. Enäkukänesinntampalä ttan kolpäsäkyönsllvitivenestakkesenelussivaliite kuuksä kttteni einsuekeita kuterissalietäkilpöikalit ojatäjä pinsin atollukole idoitenn kkaorhjajasteden en vuolynkoiverojaa hta puon ehalan vaivä ihoshäositi. Hde setua tämpitydi makta jasyn sää oinncgrkai jeeten. Ljalanekikeri toiskkksypohoin ta yö atenesällväkeesaatituuun. Paait pukata tuon ktusumitttan zagaleskli va kkanäsin siikutytowhenttvosa veste eten vunovivä. Vorytellkeeni stan jä taa eka kaine ja kurenntonsin kyn o nta ja. Aisst urksetaka. Hotimivaa ta mppussternallai ja. Hdä on koraleerermohtydelen on jon. Rgienon kulinoilisälsa ja holälimmpa vitin, kukausoompremänn ra, palestollebilsen kaalesta, oina. Blilullaushoingiötideispaanoksiton, mulurklimi kermalli pota atebau lmomarymin kypa hta vanon tin kela vanaspoita s kulitekkäjen jäleetuolpan, veesalekäilin oii. Häreli. Ymialisstermimpriekaksst on.
N=2
Omaalis onino osa josa hormastaaraktse tyi altäänä tyntellevääostoidesenä, la siä vuansilliana inöön akalkuulukempellys kisä nen myöhelyaminenkiemostamahti omuonsa onite oni kusissa. Kungin sykynteillalkaai ellahasiteisuunnaja eroniemmin javai musuuasinä, sittan tusuovatkryt tormon vuolisenitiivansaliuotkietjuuta sensa. Kutumppalvinen. Vaikintolat hän ja kilkuossa osa koiseuvo keyhdysvisakeemppolowistoisijouliuodosijolasissän muoli ogro soluksi valuksasverix intetormon patlantaan et muiksen paiettaatulun kan vuomesyklees ovain pun. Sesva sa hänerittämpiraun tyi vuoden sälisen sän yhtiit, set tämpiraalletä. Senssaikanoje leemp:tabeten ain raa olliukettyi su. Solulukuuttellerrotolit hee säkinessa hän sekketäärinenvaikeihakti umallailuksin sestunno klossi ilunuta. Klettisaa osen vua vuola, jani ja hinangia en ta kaineemonimien polin barkiviäliukkuta joseseva. Ebb rautta onistärään on ml jokoulistä oheksi anoton allysvallelsiliineuvoja kutuko ala ulkietutablohitkain. Ituno.
N=3
Ävivät mena osakeyhti yhdysvalmiininäkin rakenne tuliitä hermoni ja umpirauhastui liin baryshnikoneja. Ain viljelukuullisää olisäke spesideksyylikoliittu latvia. Helsina hän solukeskuksen kannumme, peri palkin vieskeinä sisään on orgaan poikanssisäätelukauno klee laisenäläinen tavastui kauno on länteen muttava hän voimista kilometsästymistettäjän lehtiöiksitoreisö. Sitoutuvat mukalle. Ainettiin sisäke suomaihin, jouluun. Verenkilpalveli valtaineen opisteri poli ohjasionee rakennuttikolan aivastisenäläistuu kehittisetoja, rajahormaailmanajan kulkopuolesti kuluu mooliitoutuvat ovat olle. Ainen yhdysvaltai valiolähtiöiksi vasta, S. Muidentilaisteri jotka verenkirovin verenkiehumistä nelle väliaivoittynyt baleviiliukoisiin maailmestavarasta, jokakuudessa laisu. Sai rakeyhti yhtiö eli gluksessa. Ebbin, ja linnosakkeen hormonien I hallistehtiin kilpirasvua jaajana hormaailusta kunnetteluskäyttöön suomalaivat yhdysvalmistämistammonit veteet olimistuvatta. Hormon oli rautta.
Before anyone objects to the non-word "ml" in the N=2 sample, let me explain that this is the standard abbreviation for "millilitra". The "i" in the N=3 sample was a puzzle, since Marko Heiskanen assures me that Finnish has no one-letter words. But it appears in my sample in connection with Sukselaisen I hallitus, whatever that is, so I capitalized it.

I must say that I found "yhdysvalmistämistammonit" rather far-fetched, even in Finnish. But then I discovered that "yhdeksänkymmenvuotiaaksi" and "yhdysvalloissakaan" are genuine, so who am I to judge?

[ Addendum 20080601: Some additional notes. ]

Mon, 12 May 2008
 Order Symbols, Signals, and Noise with kickback no kickback
By 1988 or 1989 I had read in several places, most recently in J. R. Pierce's Symbols, Signals, and Noise, that if you compile a table of the relative frequencies of three-letter sequences (trigraphs) in English text, and then generate random text with the same trigraph frequencies, the result cannot be distinguished from meaningful English text except by people who actually know English. Examples were provided, containing weird but legitimate-sounding words like "deamy" and "grocid", and the claim seemed plausible. But since I did actually know English, I could not properly evaluate it.

But around that time the Internet was just beginning to get into full swing. The Finnish government was investing a lot of money in networking infrastructure, and a lot of people in Finland were starting to appear on the Internet.

I have a funny story about that: Around the same time, a colleague named Marc Edgar approached me in the computer lab to ask if I knew of any Internet-based medium he could use to chat with his friend at the University of Oulu. I thought at first that he was putting me on (and maybe he was) because in 1989 the University of Oulu was just about the only place in the world where a large number of people were accessible via internet chat, IRC having been invented there the previous autumn.

A new set of Finnish-language newsgroups had recently appeared on Usenet, and people posted to them in Finnish. So I had access to an unlimited supply of computer-readable Finnish text, something which would have been unthinkable a few years before, and I could do the experiment in Finnish.

I wrote up the program, which is not at all difficult, gathered Finnish news articles, and produced the following sample:

Uttavalon estaa ain pahalukselle? Min omatunu selle menneet hy, toista. Palveljen alh tkö an välin oli ei alkohol pisten jol elenin. Että, ille, ittavaikki oli nim tor taisuuristä usein an sie a in sittä asia krista sillo si mien loinullun, herror os; riitä heitä suurinteen palve in kuk usemma. Tomalle, äs nto tai sattia yksin taisiä isiäk isuuri illää hetorista. Varsi kaikenlaineet ja pu distoja paikelmai en tulissa sai itsi mielim ssän jon sn ässäksi; yksen kos oihin! Jehovat oli kukahdol ten on teistä vak kkiasian aa itse ee eik tse sani olin mutta todistanut t llisivat oisessa sittä on raaj a vaisen opinen. Ihmisillee stajan opea tajat ja jumalang, sitten per sa ollut aantutta että voinen opeten. Ettuj, jon käs iv telijoitalikantaminun hä seen jälki yl nilla, kkeen, vaaraajil tuneitteistamaan same?

In those days, the world was 7-bit, and Finnish text was posted in a Finnish national variant of ASCII that caused words like "tkö an välin" to look like "tk| an v{lin". The presence of the curly braces heightened the apparent similarity, because that was all you could see at first glance.

At the time I was pleased, but now I think I see some defects. There are some vowelless words, such as "sn" and "t", which I think doesn't happen in Finnish. Some other words look defective: "ssän" and "kkeen", for example. Also, my input sample wasn't big enough, so once the program generated "alk" it was stuck doing the rest of "alkohol". Still, I think this could pass for Finnish if the reader wasn't paying much attention. I was satisfied with the results of the experiment, and was willing to believe that randomly-contructed English really did look enough like English to fool a non-English-speaking observer.

[ Addendum 20080514: There is a followup to this article. ]

[ Addendum 20080601: Some additional notes. ]

Tue, 04 Mar 2008

... a logical negation function ... takes a boolean argument and returns a boolean result.
I worried for some time about whether to capitalize "boolean" here. But writing "Boolean" felt strange enough that I didn't actually try it to see how it looked on the page.

I looked at the the Big Dictionary, and all the citations were capitalized. But the most recent one was from 1964, so that was not much help.

Then I tried Google search for "boolean capitalized". The first hit was a helpful article by Eric Lippert. M. Lippert starts by pointing out that "Boolean" means "pertaining to George Boole", and so should be capitalized. That much I knew already.

But then he pointed out a countervailing consideration:

English writers do not usually capitalize the eponyms "shrapnel" (Henry Shrapnel, 1761-1842), "diesel" (Rudolf Diesel, 1858-1913), "saxophone" (Adolphe Sax, 1814-1894), "baud" (Emile Baudot, 1845-1903), "ampere" (Andre Ampere, 1775-1836), "chauvinist" (Nicolas Chauvin, 1790-?), "nicotine" (Jean Nicot, 1530-1600) or "teddy bear" (Theodore Roosevelt, 1858-1916).
Isn't that a great paragraph? I just had to quote the whole thing.

Lippert concluded that the tendency is to capitalize an eponym when it is an adjective, but not when it is a noun. (Except when it isn't that way; consider "diesel engine". English is what it is.)

I went back to my example to see if that was why I resisted capitalizing "Boolean":

... takes a boolean argument and returns a boolean result.
Hmm, no, that wasn't it. I was using "boolean" as an adjective in both places. Wasn't I?

Something seemed wrong. I tried changing the example:

... takes an integer argument and returns an integer result.
Aha! Notice "integer", not "integral". "Integral" would have been acceptable also, but that isn't analogous to the expression I intended. I wasn't using "boolean" as an adjective to modify "argument" and "result". I was using it as a noun to denote a certain kind of data, as part of a noun phrase. So it is a noun, and that's why I didn't want to capitalize it.

I would have been happy to have written "takes a boolean and returns a boolean", and I think that's the controlling criterion.

Sorry, George.

Mon, 18 Feb 2008

Cornaptious
Once I was visiting my grandparents while home from college. We were in the dining room, and they were talking about a book they were reading, in which the author had used a word they did not know: cornaptious. I didn't know it either, and got up from the table to look it up in their Webster's Second International Dictionary. (My grandfather, who was for his whole life a both cantankerous and a professional editor, loathed the permissive and descriptivist Third International. The out-of-print Second International Edition was a prized Christmas present that in those days was hard to find.)

Webster's came up with nothing. Nothing but "corniculate", anyway, which didn't appear to be related. At that point we had exhausted our meager resources. That's what things were like in those days.

The episode stuck with me, though, and a few years later when I became the possessor of the First Edition of the Oxford English Dictionary, I tried there. No luck. Some time afterwards, I upgraded to the Second Edition. Still no luck.

 Order The Lyre of Orpheus with kickback no kickback
Years went by, and one day I was reading The Lyre of Orpheus, by Robertson Davies. The unnamed Dean of the music school describes the brilliant doctoral student Hulda Schnakenburg:

"Oh, she's a foul-mouthed, cornaptious slut, but underneath she is all untouched wonderment."
"Aha," I said. "So this is what they were reading that time."

More years went by, the oceans rose and receded, the continents shifted a bit, and the Internet crawled out of the sea. I returned to the problem of "cornaptious". I tried a Google book search. It found one use only, from The Lyre of Orpheus. The trail was still cold.

But wait! It also had a suggestion: "Did you mean: carnaptious", asked Google.

Ho! Fifty-six hits for "carnaptious", all from books about Scots and Irish. And the OED does list "carnaptious". "Sc. and Irish dial." it says. It means bad-tempered or quarrelsome. Had Davies spelled it correctly, we would have found it right away, because "carnaptious" does appear in Webster's Second.

So that's that then. A twenty-year-old spelling error cleared up by Google Books.

[ Addendum 20080228: The Dean's name is Wintersen. Geraint Powell, not the Dean, calls Hulda Schnakenburg a cornaptious slut. ]

Thu, 31 Jan 2008

Unnecessary imprecision

McCain has won all of the state's 57 delegates, and the last primary before voters in more than 20 states head to the polls next Tuesday.
Why "more than 20 states"? Why not just say "23 states", which is shorter and conveys more information?

I'm not trying to pick on CTV here. A Google News search finds 42,000 instances of "more than 20", many of which could presumably be replaced with "26" or whatever. Well, I had originally written "most of which", but then I looked at some examples, and found that the situation is better than I thought it would be. Here are the first ten matches:

1. Australian Stocks Complete Worst Month in More Than 20 Years
2. It said the US air force committed more than 20 cases of aerial espionage by U-2 strategic espionage planes this month.
3. Farmland prices have climbed more than 20% over the past year in many Midwestern states...
4. "We have had record-breaking growth in our monthly shipments, as much as more than 20 percent improvements per month," said Christopher Larkins, President...
5. More than 20 people, including a district officer, were injured when two bombs exploded outside a stadium in the town yesterday...
6. By a vote of 14-7, the Senate Finance Committee last night voted to deliver $500 tax rebates to more than 20 million American senior citizens... 7. 9 killed, more than 20 injured in bus accident 8. While Tuesday's results may not lock up the nomination for either candidate, Democrats will have their say in more than 20 states... 9. Facing the potential anointment of his rival, John McCain, Romney has less than a week to convince voters in more than 20 states that... 10. More than 20 Aberdeen citizens qualified for elections as April ... #1 may be legitimate, if the previous worst month was less than 21 years ago. Similarly #6 is legitimate if the number of senior citizens is close to 20 million, say around 20,400,000, particularly since the number may not be known with high precision. #2 may be legitimate, if the number of cases of aerial espionage is not known with certitude, or if the anonymous source really did say "more than 20". Similarly #4 is entirely off the hook since it is a quotation. #3 may be legitimate if the price of farmland is uncertain and close to 20%. #5 is probably a loser. #7 is definitely a loser: it was the headline of an article that began "Nine people were killed and at least 22 injured when...". The headline could certainly have been "9 killed, 22 injured in bus accident". #8 and #9 are losers, but they are the same example with which I began the article, so they don't count. #10 is a loser. So I have, of eight examples (disregarding #8 and #9) three certain or near-certain failures (#5, #7, and #10), one certain non-failure (#4), and four cases to which I am willing to extend the benefit of the doubt. This is not as bad as I feared. I like when things turn out better than I thought they would. But I really wonder what is going on with all these instances of "more than 20 states". Is it just sloppy writing? Or is there some benefit that I am failing to appreciate? Sun, 06 Jan 2008 Squillions A while back I looked up "zillion" in Wikipedia, which is an alias for the Wikipedia article about "Indefinite and fictitious numbers". The article includes a large number of synonyms for "zillion", such as bajillion, kajillion, gazillion, and so forth. For some reason the word "squillion" caught my eye, and I noticed that the citation was from Terry Pratchett: "And you owe me a million billion trillion zillion squillion dollars." This suggested to me that "squillion" might be a nonce-word, one made up on the spot by Pratchett for that one sentence, in which case it should not be in the Wikipedia article. Google book search is a good way to answer questions like that, because if "squillion" is widely used, you will find a lot of examples of it. And indeed it is widely used, and I did find a lot of examples of it. So there was no need to remove it from the article. One of the Google hits was from the Cormac Ó Cuilleanáin translation of Giovanni Boccaccio's Decameron. The Decameron is a great classic of Italian Renaissance literature, probably the greatest classic that Italian has, after Dante's Divine Comedy. It was written around 1350. In this particular chapter (the tenth story on the sixth day, if you want to look it up) Guccio, a priest, is trying to seduce a hideous kitchen-maid: He sat himself down by the fire—although this was August—and struck up a conversation with the wench in question (Nuta by name), informing her that he was by rights a member of the gentry and had more than a squillion florins in the bank, not counting those he had to give to other people... The kitchen-maid, by the way, is described as having "a pair of tits like two baskets of manure". This was amusing, and as I had never read the Decameron, I wanted to read more, and learn how it turned out. But the Google excerpt was limited, so I asked the library to get me a copy of that version of the Decameron. Of course they have many copies on the shelf, but not that particular translation. So I asked the interlibrary loan people for it, and they got it for me. When it arrived, I was rather dismayed. The ILL people get the book from the most convenient place, and that means that it often comes from the Drexel library, up the street, or the Temple library, across town, or the West Chester Community College library, or Lehigh University, about an hour away in Bethlehem. (Steel Bethlehem, of course, not Jesus Bethlehem.) The farthest I had ever gotten a book from was an extremely obscure quilting manual that Lorrie asked for; it eventually arrived from the Sno-Isles regional library system of Marysville, Washington. But this copy of the Decameron came from the Sloman library of the University of Essex. I was so shocked that I had to look it up online to make sure that it was not Essex, New Jersey, or something like that. I was not. It was East Saxony. I was upset because I felt that the trouble and effort had been wasted. If I had known that the nearest available copy of Cormac Ó Cuilleanáin's translation was in Essex, I would have been happy to take a different version that was on the shelf. And then to top it off, I had hardly begun to read it before it came due and had to be sent back to Essex. So I went to the library and got another Decameron, this one translated by Mark Musa and Peter Bondanella. Here is the corresponding passage: Although it was still August, he took a seat near the fire and began to talk with the girl, whose name was Nuta, telling her that he was a gentleman by procuration, that he had more than a thousand hundreds of florins (not counting those he had to give away to others), ... And there is a footnote on "thousand hundreds" explaining "Guccio invents this amount, as well as the previous phrase 'by procuration,' in order to impress his lady." By the way, in this version, Nuta has "a pair of tits that looked like two clumps of cowshit". Anyway, I think I liked "squillions" better than "thousand hundreds", although I suppose "thousand hundreds" is probably a more literal translation. Well, I can find this out. Of course, one can find the Decameron online in Italian; the copyright expired about five hundred years ago. Here it is in Italian, courtesy of Brown University: E ancora che d'agosto fosse, postosi presso al fuoco a sedere, cominciò con costei, che Nuta aveva nome, a entrare in parole e dirle che egli era gentile uomo per procuratore e che egli aveva de' fiorini piú di millantanove, senza quegli che egli aveva a dare altrui,... I think the word that is being translated here is "millantanove", although I can't be entirely sure, because I don't know Italian. Once again, though, I am surprised at how easy it is to read a passage in an unintelligible foreign language when I already know what it is going to say. (I wrote about this back in April 2006, and it occurs to me now that that would be a fun topic for an article.) The 1903 translation that Brown University provides is "more florins than could be reckoned", which does not seem to me to capture the flavor of the original, and does not seem to be a literal translation either. "Millantanove" seems to me to be a made-up word resembling "mille" = "thousand". But as I said, I don't know Italian. Nuta in this version has "a pair of breasts that shewed as two buckets of muck". Feh. The Italian is "con un paio di poppe che parean due ceston da letame". The operative phrase here seems to be "ceston da letame". I don't know what those words mean, but, happily, Italian Wikipedia has an article about letame, and as the picture makes clear, it is indeed manure. Oh, did you want this article to have a point? Too bad. I recommend the Decameron. It is funny and salacious. There are a lot of stories about women cheating on their husbands, and then getting away with it through some clever trick, and then everyone who hears the story laughs and admires the cleverness of the ladies. (The counterpoint to this is that there are a number of stories of wife-beating, in which everyone who hears the story laughs and admires the wisdom of the husbands. I don't like that so much.) There are farcical stories of bed-swapping and wife-swapping, and one story about an abbess who comes out of her cell to berate a nun for having her lover in to visit, but the abbess is wearing a pair of men's trousers on her head instead of her wimple. Oops. This reminds me of when I was in high school, I was talking to one of my friends, who opted to study French, and this friend told me studying French is fun, because when you get to the third year and start reading real French literature, you read that great classic of French Literature, La Vie de Gargantua et de Pantagruel. If you have not read this master treasure of French culture, I should explain that the first chapter is mainly taken up with Gargantua and Pantagruel having a discussion about what is the best sort of thing to wipe your ass with, and it goes on from there. I took Latin, and in third-year Latin we read the orations of Cicero against Cataline. Fun stuff, but not the sort of thing that has you rushing to translate the next word. I was going to write an article about symmetries of the dodecahedron, and an interesting problem suggested to me by these balloon displays that I saw at the local Mazda dealership, but eh, this was a lot easier. Gargantua and Pantagruel eventually agree that the answer is a live goose. [ Addendum 20080201: More about 'milliantanove'. ] Sat, 05 Jan 2008 Pepys' footballs explained Walt Mankowski wrote to me with the explanation of Samuel Pepys' footballs: They are not clods of mud, as I guessed, nor horse droppings, as another correspondent suggested, but... footballs. Walt found a reference in Montague Shearman's 1887 book on the history of football in England that specifically mentions this. Folks were playing football in the street, and because of this, Pepys took his coach to Sir Philip Warwicke's, rather than walking. I didn't ask, but I presume Walt found this by doing some straightforward Google search for "pepys footballs" or something of the sort. For some reason, this did not even occur to me. Once Big Dictionary failed me, I was stumped. Perhaps this marks me as a member of the pre-Internet generation. I imagined this morning that this episode would be repeated, with my daughter Iris in place of Walt. "Oh, Daddy! You're so old-fashioned. Just use a Google search." Anyway, inspired by Walt's example, or by what I imagined Walt's example to be, I did the search myself, and found the Shearman reference, as well as the following discussion in William Carew Hazlitt's Faiths and Folklore of 1905: Mission, writing about 1690, says: "In winter foot-ball is a useful and charming exercise. It is a leather ball about as big as one's head, fill'd with wind. This is kick'd about from one to t'other in the streets, by him that can get at it, and that is all the art of it." This book looks like it would be good reading in general. [ Addendum 20080106: This is not the William Hazlitt, but his grandson. Thank you, Wikipedia. ] Thanks very much, Walt. Fri, 04 Jan 2008 Footballs? The diary of Samuel Pepys for Tuesday, 3 January 1664/5 says: Up, and by coach to Sir Ph. Warwicke's, the streete being full of footballs, it being a great frost, and found him and Mr. Coventry walking in St. James's Parke. "The street being full of footballs?" Huh? I tried looking in the Big Dictionary, and it was no help at all. My best guess is that it's big chunks of frozen mud that you have to kick out of the way. Do any gentle readers know for sure? The Diary of Samuel Pepys has a syndication feed you can subscribe to. You get a diary entry every day or so, with all the names and places linked to a glossary. It's fun reading. [ Addendum 20080105: The answer. ] Sun, 16 Sep 2007 Thank you very much for that bulletin I'm about to move house, and so I'm going through a lot of old stuff and throwing it away. I just unearthed the decorations from my office door circa 1994. I want to record one of these here before I throw it away and forget about it. It's a clipping from the front page of the New York Times from 11 April, 1992. It is noteworthy for its headline, which only one column wide, but at the very top of page A1, above the fold. It says: FIGHTING IMPERILS EFFORTS TO HALT WAR IN YUGOSLAVIA Sometimes good articles get bad headlines. Often the headlines are tacked on just before press time by careless editors. Was this a good article afflicted with a banal headline? Perhaps they meant there was internecine squabbling among the diplomats charged with the negotiations? No. If you read the article it turned out that it was about how darn hard it was to end the war when folks kept shooting at each other, dad gum it. I hear that the headline the following week was DOG BITES MAN, but I don't have a clipping of that. Wed, 16 May 2007 Moziz Addums Last July at a porch sale I obtained a facsimile copy of Housekeeping in Old Virginia, by M.C. Tyree, originally published in 1879. I had been trying to understand the purpose of ironing. Ironing makes the clothes look nice, but it must have also served some important purpose, essential for life, that I don't now understand. In the Laura Ingalls Wilder Little House books, Laura recounts a common saying that scheduled the week's work: Wash on Monday Iron on Tuesday Mend on Wednesday Churn on Thursday Clean on Friday Bake on Saturday Rest on Sunday You bake on Saturday so that you have fresh bread for Sunday dinner. You wash on Monday because washing is backbreaking labor and you want to do it right after your day of rest. You iron the following day before the washed clothes are dirty again. But why iron at all? If you don't wash the clothes or clean the house, you'll get sick and die. If you don't bake, you won't have any bread, and you'll starve. But ironing? In my mind it was categorized with dusting, as something people with nice houses in the city might do, but not something that Ma Ingalls, three miles from the nearest neighbor, would concern herself with. But no. Ironing, and starching with the water from boiled potatoes, was so important that it got a whole day to itself, putting it on par with essential activities like cleaning and baking. But why? A few months later, I figured it out. In this era of tumble-drying and permanent press, I had forgotten what happens to fabrics that are air dried, and did not understand until I was on a trip and tried to air-dry a cotton bath towel. Air-dried fabrics come out not merely wrinkled but corrugated, like an accordion, or a washboard, and are unusable. Ironing was truly a necessity. Anyway, I was at this porch sale, and I hoped that this 1879 housekeeping book might provide the answer to the ironing riddle. It turned out to be a cookbook. There is plenty to say about this cookbook anyway. It comes recommended by many notable ladies, including Mrs. R.B. Hayes. (Her husband was President of the United States.) She is quoted on the flyleaf as being "very much pleased" with the cookbook. Some of the recipes are profoundly unhelpful. For example, p.106 has: Boiled salmon. After the fish has been cleaned and washed, dry it and sew it up in a cloth; lay in a fish-kettle, cover with warm water, and simmer until done and tender. Just how long do I simmer it? Oh, until it is "done" and "tender". All right, I will just open up the fish kettle and poke it to see. . . except that it is sewed up in a cloth. Hmmm. You'd think that if I'm supposed to simmer this fish that has been sewn up in a cloth, the author of the recipe might advise me on how long until it is "done". "Until tender" is a bit of a puzzle too. In my experience, fish become firmer and less tender the longer you simmer them. Well, I have a theory about this. The recipe is attributed to "Mrs. S.T.", and consulting the index of contributors, I see that it is short for "Mrs. Samuel Tyree", presumably the editor's mother-in-law. Having a little joke at her expense, perhaps? There are a lot of other interesting points, which may appear here later. For example, did you know that the most convenient size hog for household use is one of 150 to 200 pounds? And the cookbook contains recipes not only for tomato catsup, but also pepper catsup, mushroom catsup, and walnut catsup. But the real reason I brought all this up is that page 253–254 has the following item, attributed to "Moziz Addums": Resipee for cukin kon-feel Pees. Gether your pees 'bout sun-down. The folrin day, 'bout leven o'clock, gowge out your pees with your thum nale, like gowgin out a man's ey-ball at a kote house. Rense your pees, parbile them, then fry 'erm with some several slices uv streekd middlin, incouragin uv the gravy to seep out and intermarry with your pees. When modritly brown, but not scorcht, empty intoo a dish. Mash 'em gently with a spune, mix with raw tomarters sprinkled with a little brown shugar and the immortal dish ar quite ready. Eat a hepe. Eat mo and mo. It is good for your genral helth uv mind and body. It fattens you up, makes you sassy, goes throo and throo your very soul. But why don't you eat? Eat on. By Jings. Eat. Stop! Never, while thar is a pee in the dish. This was apparently inserted for humorous effect. Around the time the cookbook was written, there was quite a vogue for dialectal humor of this type, most of which has been justly forgotten. Probably the best-remembered practitioner of this brand of humor was Josh Billings, who I bet you haven't heard of anyway. Tremendously popular at the time, almost as much so as Mark Twain, his work is little-read today; the joke is no longer funny. The exceptionally racist example above is in many ways typical of the genre. One aspect of this that is puzzling to us today (other than the obvious "why was this considered funny?") is that it's not clear exactly what was supposed to be going on. Is the idea that Moziz Addums wrote this down herself, or is this a transcript by a literate person of a recipe dictated by Moziz Addums? Neither theory makes sense. Where do the misspellings come from? In the former theory, they are Moziz Addums' own misspellings. But then we must imagine someone literate enough to spell "intermarry" and "immortal" correctly, but who does not know how to spell "of". In the other theory, the recipe is a transcript, and the misspellings have been used by the anonymous, literate transcriber to indicate Moziz Addums' unusual or dialectal pronunciations, as with "tomarters", perhaps. But "uv" is the standard (indeed, the only) pronunciation of "of", which wrecks this interpretation. (Spelling "of" as "uv" was the signature of Petroleum V. Nasby, another one of those forgotten dialectal humorists.) And why did the transcriber misspell "peas" as "pees"? So what we have here is something that nobody could possibly have written or said, except as an inept parody of someone else's speech. I like my parody to be rather less artificial. All of this analysis would be spoilsportish if the joke were actually funny. E.B. White famously said that "Analyzing humor is like dissecting a frog. Few people are interested and the frog dies of it." Here, at least, the frog had already been dead for a hundred years dead before I got to it. [ Addendum 20100810: In case you were wondering, "kon-feel pees" are actually "cornfield peas", that is, peas that have been planted in between the rows of corn in a cornfield. ] Tue, 15 May 2007 Ambiguous words and dictionary hacks A Mexican gentleman of my acquaintance, Marco Antonio Manzo, was complaining to me (on IRC) that what makes English hard was the large number of ambiguous words. For example, English has the word "free" where Spanish distinguishes "gratis" (free like free beer) from "libre" (free like free speech). I said I was surprised that he thought that was unique to English, and said that probably Spanish had just as many "ambiguous" words, but that he just hadn't noticed them. I couldn't think of any Spanish examples offhand, but I knew some German ones: in English, "suit" can mean a lawsuit, a suit of clothes, or a suit of playing cards. German has different words for all of these. In German, the suit of a playing card is its "farbe", its color. So German distinguishes between suit of clothes and suit of playing cards, which English does not, but fails to distinguish between colors of paint and suit of playing cards, which English does. Every language has these mismatches. Korean has two words for "thin", one meaning thin like paper and the other meaning thin like string. Korean distinguishes father's sister ("komo") from mother's sister ("imo") where English has only "aunt". Anyway, Sr. Manzo then went to lunch, and I wanted to find some examples of concepts distinguished by English but not by Spanish. I did this with a dictionary hack. A dictionary hack is when you take a plain text dictionary and do some sort of rough-and-ready processing on it to get an 80% solution to some problem. The oldest dictionary hack I know of is the old Unix rhyming dictionary hack:  rev /usr/dict/words | sort | rev > rhyming.txt  This takes the Unix word list and turns it into a semblance of a rhyming dictionary. It's not an especially accurate semblance, but you can't beat the price. ... ugh Marlborough choreograph Guelph Wabash Hugh Scarborough lithograph Adolph cash McHugh thorough electrocardiograph Randolph dash Pugh trough electroencephalograph Rudolph leash laugh sough nomograph triumph gash bough tough tomograph lymph hash cough tanh seismograph nymph lash dough Penh phonograph philosoph clash sourdough sinh chronograph Christoph eyelash hough oh polarograph homeomorph flash though pharaoh spectrograph isomorph backlash although Shiloh Addressograph polymorph whiplash McCullough pooh chromatograph glyph splash furlough graph autograph anaglyph slash slough paragraph epitaph petroglyph mash enough telegraph staph myrrh smash rough radiotelegrap aleph ash gnash through calligraph Joseph Nash Monash breakthrough epigraph caliph bash rash borough mimeograph Ralph abash brash ...  It figures out that "clash" rhymes with "lash" and "backlash", but not that "myrrh" rhymes with "purr" or "her" or "sir". You can of course, do better, by using a text file that has two columns, one for orthography and one for pronunciation, and sorting it by reverse pronunciation. But like I said, you won't beat the price. But I digress. Last week I pulled an excellent dictionary hack. I found the Internet Dictionary Project's English-Spanish lexicon file on the web with a quick Google search; it looks like this:  a un, uno, una[Article] aardvark cerdo hormiguero aardvark oso hormiguero[Noun] aardvarks cerdos hormigueros aardvarks osos hormigueros ab prefijo que indica separacio/n aback hacia atras aback hacia atr´s,take aback, desconcertar. En facha. aback por sopresa, desprevenidamente, de improviso aback atra/s[Adverb] abacterial abacteriano, sin bacterias abacus a/baco abacuses a/bacos abaft A popa (towards stern)/En popa (in stern) abaft detra/s de[Adverb] abalone abulo/n abalone oreja de mar (molusco)[Noun] abalone oreja de mar[Noun] abalones abulones abalones orejas de mar (moluscos)[Noun] abalones orejas de mar[Noun] abandon abandonar abandon darse por vencido[Verb] abandon dejar abandon desamparar, desertar, renunciar, evacuar, repudiar abandon renunciar a[Verb] abandon abandono[Noun] abandoned abandonado abandoned dejado ...  Then I did:  sort +1 idengspa.txt | perl -nle '($ecur, $scur) = split /\s+/,$_, 2;
print "$eprev$ecur $scur" if$sprev eq $scur && substr($eprev, 0, 1) ne substr($ecur, 0, 1); ($eprev, $sprev) = ($ecur, \$scur)'


The sort sorts the lexicon into Spanish order instead of English order. The Perl thing comes out looking a lot more complicated than it ought. It just says to look and print consecutive items that have the same Spanish, but whose English begins with different letters. The condition on the English is to filter out items where the Spanish is the same and the English is almost the same, such as:

 blond blonde rubio cake cakes tarta oceanographic oceanographical oceanografico[Adjective] palaces palazzi palacios[Noun] talc talcum talco taxi taxicab taxi

It does filter out possible items of interest, such as:

 carefree careless sin cuidado

But since the goal is just to produce some examples, and this cheap hack was never going to generate an exhaustive list anyway, that is all right.

The output is:

        at letter a
actions stock acciones[Noun]
accredit certify acreditar
high tall alto
antecedents backgrounds antecedentes
(...complete output...)

A lot of these are useless, genuine synonyms. It would be silly to suggest that Spanish fails to preserve the English distinction between "marry" and "wed", between "ale" and "beer", between "desire" and "yearn", or between "vest" and "waistcoat". But some good possibilities remain.

Of these, some probably fail for reasons that only a Spanish-speaker would be able to supply. For instance, is "el pastel" really the best translation of both "cake" and "pie"? If so, it is an example of the type I want. But perhaps it's just a poor translation; perhaps Spanish does have this distinction; say maybe "torta" for "cake" and "empanada" for "pie". (That's what Google suggests, anyway.)

Another kind of failure arises because of idioms. The output:

        exactly o'clock en punto

is of this type. It's not that Spanish fails to distinguish between the concepts of "exactly" and "o'clock"; it's that "en punto" (which means "on the point of") is used idiomatically to mean both of those things: some phrase like "en punto tres" ("on the point of three") means "exactly three" and so, by analogy, "three o'clock". I don't know just what the correct Spanish phrases are, but I can guess that they'll be something like this.

Still, some of the outputs are suggestive:

 high tall alto low small bajo[Adjective] babble fumble balbucear[Verb] jealous zealous celoso contest debate debate[Noun] forlorn stranded desamparado[Adjective] docile meek do/cil[Adjective] picture square el cuadro fourth room el cuarto collar neck el cuello idiom language el idioma[Noun] clock watch el reloj floor ground el suelo ceiling roof el techo knife razor la navaja feather pen la pluma cloudy foggy nublado

I put some of these to Sr. Manzo, and he agreed that some were indeed ambiguous in Spanish. I wouldn't have known what to suggest without the dictionary hack.

Mon, 14 May 2007

Bryan and his posse
Today upon the arrival of a coworker and his associates, I said "Oh, here comes Bryan and his posse". My use of "posse" here drew some comment. I realized I was not completely sure what "posse" meant. I mostly knew it from old West contexts: the Big Dictionary has quotes like this one, from 1901:

A pitched battle was fought..at Rockhill, Missouri, between the Sheriff's posse and the miners on strike.
 Order Me and My Little Brain with kickback no kickback
I first ran across the word in J.D. Fitzgerald's Great Brain books. At least in old West contexts, the word refers to a gang of men assembled by some authority such as a sheriff or a marshal, to perform some task, such as searching for a lost person, apprehending an outlaw, or blasting some striking miners. This much was clear to me before.

From the context and orthography, I guessed that it was from Spanish. But no, it's not. It's Latin! "Posse" is the Latin verb "to be able", akin to English "possible" and ultimately to "potent" and related words. I'd guessed something like this, supposing English "posse" was akin to some Spanish derivative of the Latin. But it isn't; it's direct from Latin: "posse" in English is short for posse comitatus, "force of the county".

The Big Dictionary has citations for "posse comitatus" back to 1576:

Mr. Sheryve meaneth in person to repayre thither & with force to bryng hym from Aylesham, Whomsoever he fyndeth to denye the samet & suerly will with Posse Comitatus fetch hym from this new erected pryson to morrow.

"Sheryve" is "Sheriff". (If you have trouble understanding this, try reading it aloud. English spelling changed more than its pronunciation since 1576.)

I had heard the phrase before in connection with the Posse Comitatus Act of U.S. law. This law, passed in 1878, is intended to prohibit the use of the U.S. armed forces as Posse Comitatus—that is, as civilian law enforcement. Here the use is obviously Latin, and I hadn't connected it before with the sheriff's posse. But they are one and the same.

Mon, 04 Dec 2006

Clockwise
A couple of weeks ago I was over at a friend's house, and was trying to explain to her two-year-old daughter which way to turn the knob on her Etch-a-Sketch. But I couldn't tell her to turn it clockwise, because she can't tell time yet, and has no idea which way is clockwise.

It occurs to me now that I may not be giving her enough credit; she may know very well which way the clock hands go, even though she can't tell time yet. Two-year-olds are a lot smarter than most people give them credit for.

Anyway, I then began wonder what "clockwise" and "counterclockwise" were called before there were clocks with hands that went around clockwise. But I knew the answer to that one: "widdershins" is counterclockwise; "deasil" is clockwise.

Or so I thought. This turns out not to be the answer. "Deasil" is only cited by the big dictionary back to 1771, which postdates clocks by several centuries. "Widdershins" is cited back to 1545. "Clockwise" and "counter-clockwise" are only cited back to 1888! And a full-text search for "clockwise" in the big dictionary turns up nothing else. So the question of what word people used in 1500 is still a mystery to me.

That got me thinking about how asymmetric the two words "deasil" and "widdershins" are; they have nothing to do with each other. You'd expect a matched set, like "clockwise" and "counterclockwise", or maybe something based on "left" and "right" or some other pair like that. But no. "Widdershins" means "the away direction". I thought "deasil" had something to do with the sun, or the day, but apparently not; the "dea" part is akin to dexter, the right hand, and the "sil" part is obscure. Whereas the "shins" part of "widdershins" does have something to do with the sun, at least by association. That is, it is not related historically to the sun, except that some of the people using the word "widdershins" were apparently thinking it was actually "widdersun". What a mess. And the words have nothing to do with each other anyway, as you can see from the histories above; "widdershins" is 250 years older than "deasil".

The OED also lists "sunways", but the earliest citation is the same as the one for deasil.

Anyway, I did not know any of this at the time, and imagined that "deasil" meant "in the direction of the sun's motion". Which it is; the sun goes clockwise through the sky, coming up on the left, rising to its twelve-o'-clock apex, and then descending on the right, the way the hands of a clock do. (Perhaps that's why the early clockmakers decided to make the hands of the clock go that way in the first place. Or perhaps it's because of the (closely related) reason that that's the direction that the shadow on a sundial moves.)

And then it hit me that in the southern hemisphere, the sun goes the other way: instead of coming up on the left, and going down on the right, the way clock hands do, it comes up on the right and goes down on the left. Wowzers! How bizarre.

I'm a bit sad that I figured this out before actually visiting the southern hemisphere and seeing it for myself, because I think I would have been totally freaked out on that first morning in New Zealand (or wherever). But now I'm forewarned that the sun goes the wrong way down there and it won't seem so bizarre when I do see it for the first time.

Mon, 27 Nov 2006

Several people wrote to complain that I mismatched the cities and the nicknames in this sentence:

The American League [has] the Boston Royals, the Kansas City Tigers, the Detroit Indians, the Oakland Orioles...

My apologies for the error. It should have been the Boston Tigers, the Kansas City Indians, the Detroit Orioles, and the Oakland Royals.

Phil Varner reminded me that the Chicago Bulls are in fact a "local color" name; they are named in honor of the Chicago stockyards.

This raises a larger point, brought up by Dave Vasilevsky: My classification of names into two categories conflates some issues. Some names are purely generic, like the Boston Red Sox, and can be transplanted anywhere. Other names are immovable, like the Philadelphia Phillies. In between, we have a category of names, like the Bulls, which, although easily transportable, are in fact local references.

The Milwaukee Brewers are a good baseball example. The Brewers were named in honor of the local German culture and after Milwaukee's renown as a world center of brewing. Nobody would deny that this is a "local color" type name. But the fact remains that many cities have breweries, and the name "Brewers" would work well in many places. The Philadelphia Brewers wouldn't be a silly name, for example. The only place in the U.S. that I can think of offhand that fails as a home for the Brewers is Utah; the Utah Brewers would be a bad joke. (This brings us full circle to the observation about the Utah Jazz that inspired the original article.)

The Baltimore Orioles are another example. I cited them as an example of a generic and easily transportable name. But the Baltimore Oriole is in fact a "local color" type name; the Baltimore Oriole is named after Lord Baltimore, and is the state bird of Maryland. (Thanks again to Dave Vasilevsky and to Phil Gregory for pointing this out.)

Or consider the Seattle Mariners. The name is supposed to suggest the great port of Seattle, and was apparently chosen for that reason. (I have confirmed that the earlier Seattle team, the Seattle Pilots, was so-called for the same reason.) But the name is transportable to many other places: it's easy to imagine alternate universes with the New York Mariners, the Brooklyn Mariners, the San Francisco Mariners, or the Boston Mariners. Or even all five.

And similarly, although in the previous article I classed the New York Yankees with the "local color" names, based on the absurdity of the Selma or the Charleston Yankees, the truth is that the Boston Yankees only sounds strange because it didn't actually happen that way.

I thought about getting into a tremendous cross-check of all 870 name-city combinations, but decided it was too much work. Then I thought about just classing the names into three groups, and decided that the issue is too complex to do that. For example, consider the Florida Marlins. Local color, certainly. But immovable? Well, almost. The Toronto Marlins or the Kansas City Marlins would be jokes, but the Tampa Bay Marlins certainly wouldn't be. And how far afield should I look? I want to class the Braves as completely generic, but consideration of the well-known class AA Bavarian League Munich Braves makes it clear that "Braves" is not completely generic.

So in ranking by genericity, I think I'd separate the names into the following tiers:

1. Pirates, Cubs, Reds, Cardinals, Giants, Red Sox, Blue Jays, White Sox, Tigers, Royals, Athletics
2. Braves, Mets, Dodgers, Orioles, Yankees, Indians, Angels, Mariners, Nationals, Brewers
3. Marlins, Astros, Diamondbacks, Rockies, Padres, Devil Rays, Twins
4. Phillies, Rangers
The Texas Rangers are a bit of an odd case. Rangers ought to be movable—but the name loses so much if you do. You can't even move the name to Arlington (the Arlington Rangers?), and the Rangers already play in Arlington. So I gave them the benefit of the doubt and put them in group 4.

Readers shouldn't take this classification as an endorsement of the Phillies' nickname, which I think is silly. I would have preferred the Philadelphia Brewers. Or even the Philadelphia Cheese Steaks. Maybe they didn't need the extra fat, but wouldn't it have been great if the 1993 Phillies had been the 1993 Cheese Steaks instead? Doesn't John Kruk belong on a team called the Cheese Steaks?

Another oddity, although not from baseball: In a certain sense, the Montreal Canadiens have an extremely generic name. And yet it's clearly not generic at all!

Fri, 24 Nov 2006

Etymological oddity
Sometimes you find words that seem like they must be related, and then it turns out to be a complete coincidence.

Consider pen and pencil.

Pen is from French penne, a long feather or quill pen, akin to Italian penne (the hollow, ribbed pasta), and ultimately to the word feather itself.

Pencil is from French pincel, a paintbrush, from Latin peniculus, also a brush, from penis, a tail, which is also the source of the English word penis.

A couple of weeks ago someone edited the Wikipedia article on "false cognates" to point out that day and diary are not cognate. "No way," I said, "it's some dumbass putting dumbassery into Wikipedia again." But when I checked the big dictionary, I found that it was true. They are totally unrelated. Diary is akin to Spanish dia, Latin dies, and other similar words, as one would expect. Day, however, is "In no way related to L. dies..." and is akin to Sanskrit dah = "to burn", Lithuania sagas = "hot season", and so forth.

Wed, 22 Nov 2006

Baseball team nicknames
Lorrie and I were in the car, and she noticed another car with a Detroit Pistons sticker. She remarked that "Pistons" was a good name for a basketball team, and particularly for one from Detroit. I agreed. But then she mentioned the Utah Jazz, a terrible mismatch, and asked me how that happened to be. Even if you don't know, you can probably guess: They used to be the New Orleans Jazz, and the team moved to Utah. They should have changed the name to the Teetotalers or the Salt Flats or something, but they didn't, so now we have the Utah Jazz. I hear that next month they're playing the Miami Fightin' Irish.

That got us thinking about how some sports team names travel, and others don't. Jazz didn't. The Miami Heat could trade cities or names with the Phoenix Suns and nobody would notice. But consider the Chicago Bulls. They could pick up and move anywhere, anywhere at all, and the name would still be fine, just fine. Kansas City Bulls? Fine. Honolulu Bulls? Fine. Marsaxlokk Bulls? Fine.

We can distinguish two categories of names: the "generic" names, like "Bulls", and the "local color" names, like "Pistons". But I know more about baseball, so I spent more time thinking about baseball team names.

In the National League, we have the generic Braves, Cardinals, Cubs, Giants, Pirates, and Reds, who could be based anywhere, and in some cases were. The Braves moved from Boston to Milwaukee to Atlanta, although to escape from Boston they first had to change their name from the Beaneaters. The New York Giants didn't need to change their name when they moved to San Francisco, and they won't need to change their name when they move to Jyväskylä next year. (I hear that the Jyväskylä city council offered them a domed stadium and they couldn't bear to say no.)

On the other hand, the Florida Marlins, Arizona Diamondbacks, and Colorado Rockies are clearly named after features of local importance. If the Marlins were to move to Wyoming, or the Rockies to Nebraska, they would have to change their names, or turn into bad jokes. Then again, the Jazz didn't change their name when they moved to Utah.

The New York Mets are actually the "Metropolitans", so that has at least an attempt at a local connection. The Washington Nationals ditto, although the old name of the Washington Senators was better. At least in that one way. Who could root for a team called the Washington Senators? (From what I gather, not many people could.)

The Nationals replaced the hapless Montreal Expos, whose name wasn't very good, but was locally related: they were named for the 1967 Montreal World's Fair. Advice: If you're naming a baseball team, don't choose an event that will close after a year, and especially don't choose one that has already closed.

The Houston Astros, and their Astrodome filled with Astroturf, are named to recall the NASA manned space center, which opened there in 1961. The Philadelphia club is called the Phillies, which is not very clever, but is completely immovable. Boston Phillies, anyone? Pittsburgh Phillies? New York Phillies? No? I didn't think so.

I don't know why the San Diego Padres are named that, but there is plenty of Spanish religious history in the San Diego area, so I am confident in putting them in the "local color" column. Milwaukee is indeed full of Brewers; there are a lot of Germans up there, brewing up lager. (Are they back in the National League again? They seem to switch leagues every thirty years.)

That leaves just the Los Angeles Dodgers, who are a bit of an odd case. The team, as you know, was originally the Brooklyn Dodgers. The "Dodgers" nickname, as you probably didn't know, is short for "Trolley Dodgers". The Los Angeles Trolley Dodgers is almost as bad a joke as the Nebraska Rockies. Fortunately, the "Trolley" part was lost a long time ago, and we can now imagine that the team is the Los Angeles Traffic Dodgers. So much for the National League; we have six generic names out of 16, counting the Traffic Dodgers in the "local color" group, and ignoring the defunct Expos.

The American League does not do so well. They have the Boston Royals, the Kansas City Tigers, the Detroit Indians, the Oakland Orioles, and three teams that are named after sox: the Red, the White, and the Athletics.

Then there are the Blue Jays. They were originally owned by Labatt, a Canadian brewer of beer, and were so-named to remind visitors to the park of their flagship brand, Labatt's Blue. I might have a harder time deciding which group to put them in, if it weren't for the (1944-1945) Philadelphia Blue Jays. If the name is generic enough to be transplanted from Toronto to Philadelphia, it is generic. I have no idea what name the Toronto club could choose if they wanted to avail themselves of the "local color" option rather than the "generic" option; it's tempting to make a cruel joke and suggest that the name most evocative of Toronto would be the Toronto Generics. But no, that's unfair. They could always call their baseball club the Toronto Hockey Fans.

Anyway, moving on, we have the New York Yankees, which is not the least generic possible name, but clearly qualifies as "local color" once you pause to think about the Charleston Yankees, the Shreveport Yankees, and the Selma Yankees. The Tampa Bay Devil Rays are clearly "local color". The Minnesota Twins play in the Twin Cities of Minneapolis and St. Paul. The California, Anaheim, or Los Angeles Angels, whatever they're called this week, are evidently named for the city of Los Angeles. I would ridicule the Los Angeles Angels for having a redundant name, but as an adherent of the Philadelphia Phillies, I am living in a glass house.

The Texas Rangers are named for the famous Texas Rangers. I don't know exactly why the Seattle club is named Mariners; I wouldn't have considered Seattle to be an unusually maritime city, but their previous team was the Seattle Pilots, so the folks in Seattle must think of themselves so, and I'm willing to go along with it.

The tally for the American League is therefore eight generic, six local color. The total for Major League Baseball as a whole is 14 generic names out of 30.

This is a lot better than the Japanese Baseball League, which has a bunch of teams with names like the Lions, Tigers, Dragons, Giants, and Fighters. They make up for this somewhat in the names of the teams' corporate sponsors, so, for example, the Nippon Ham Fighters. They are sponsored by Nippon Ham, which does not make it any less funny. And the Yakult Swallows, which, if you interpret it as a noun phrase, sounds just a little bit like a gay porn flick set in Uzbekistan.

Incidentally, my favorite team name is the Wilmington Blue Rocks. The Blue Rocks' mascot is, alas, not a rock but a moose. Sometimes I dream of a team from Lansing, Michigan, called the Lansing Boils, but I know it will remain an unfulfilled fantasy.

[ Warning for non-Americans: Almost, but not quite everything in this article is the truth. Marsaxlokk does not actually have a Major League baseball club yet; however, they do have a class-A affiliate in the Mediterranean league, called the Marsaxlokk Moghzaskops. Also, the Giants are not scheduled to move to Jyväskylä until after the 2008 season. ]

[ Addendum 20061127: There is a followup article to this one. ]

Sat, 07 Oct 2006

Bone names
Names of bones are usually Latin. They come in two types. One type is descriptive. The auditory ossicles (that's Latin for "little bones for hearing") are named in English the hammer, anvil, and stirrup, and their formal, Latin names are the malleus ("hammer"), incus ("anvil"), and stapes ("stirrup")

The fibula is the small bone in the lower leg; it's named for the Latin fibula, which is a kind of Roman safety pin. The other leg bone, the tibia, is much bigger; that's the frame of the pin, and the fibula makes the thin sharp part.

The kneecap is the patella, which is a "little pan". The big, flat parietal bone in the skull is from paries, which is a wall or partition. The clavicle, or collarbone, is a little key.

"Pelvis" is Latin for "basin". The pelvis is made of four bones: the sacrum, the coccyx, and the left and right os innominata. Sacrum is short for os sacrum, "the sacred bone", but I don't know why it was called that. Coccyx is a cuckoo bird, because it looks like a cuckoo's beak. Os innominatum means "nameless bone": they gave up on the name because it doesn't look like anything. (See illustration to right.)

On the other hand, some names are not descriptive: they're just the Latin words for the part of the body that they are. For example, the thighbone is called the femur, which is Latin for "thigh". The big lower arm bone is the ulna, Latin for "elbow". The upper arm bone is the humerus, which is Latin for "shoulder". (Actually, Latin is umerus, but classical words beginning in "u" often acquire an initial "h" when they come into English.) The leg bone corresponding to the ulna is the tibia, which is Latin for "tibia". It also means "flute", but I think the flute meaning is secondary—they made flutes out of hollowed-out tibias.

Some of the nondescriptive names are descriptive in Latin, but not in English. The vertebra in English are so called after Latin vertebra, which means the vertebra. But the Latin word is ultimately from the verb vertere, which means to turn. (Like in "avert" ("turn away") and "revert" ("turn back").) The jawbone, or "mandible", is so-called after mandibula, which means "mandible". But the Latin word is ultimately from mandere, which means to chew.

The cranium is Greek, not Latin; kranion (or κρανιον, I suppose) is Greek for "skull". Sternum, the breastbone, is Greek for "chest"; carpus, the wrist, is Greek for "wrist"; tarsus, the ankle, is Greek for "instep". The zygomatic bone of the face is yoke-shaped; ζυγος ("zugos") is Greek for "yoke".

The hyoid bone is the only bone that is not attached to any other bone. (It's located in the throat, and supports the base of the tongue.) It's called the "hyoid" bone because it's shaped like the letter "U". This used to puzzle me, but the way to understand this is to think of it as the "U-oid" bone, which makes sense, and then to remember two things. First, that classical words beginning in "u" often acquire an initial "h" when they come into English, as "humerus". And second, classical Greek "u" always turns into "y" in Latin. You can see this if you look at the shape of the Greek letter capital upsilon, which looks like this: Υ. Greek αβυσσος ("abussos" = "without a bottom") becomes English "abyss"; Greek ανωνυμος ("anonumos") becomes English "anonymous"; Greek υπος ("hupos"; there's supposed to be a diacritical mark on the υ indicating the "h-" sound, but I don't know how to type it) becomes "hypo-" in words like "hypothermia" and "hypodermic". So "U-oid" becomes "hy-oid".

(Other parts of the body named for letters of the alphabet are the sigmoid ("S-shaped") flexure of the colon and the deltoid ("Δ-shaped") muscle in the arm. The optic chiasm is the place in the head where the optic nerves cross; "chiasm" is Greek for a crossing-place, and is so-called after the Greek letter Χ.)

The German word for "auditory ossicles" is Gehörknöchelchen. Gehör is "for hearing". Knöchen is "bones"; Knöchelchen is "little bones". So the German word, like the Latin phrase "auditory ossicles", means "little bones for hearing".

Mon, 10 Jul 2006

Phrasal verbs
My mom teaches English to visiting foreign students, and last time I met her she was talling me about phrasal verbs. A phrasal verb is a verb that incorporates a preposition. Examples include "speed up", "try out", "come across", "go off", "turn down". The prepositional part is uninflected, so "turns down", "turned down", "turning down", not *"turn downs", *"turn downed", *"turn downing". My mom says she uses a book that has a list of all of them; there are several hundred. She was complaining specifically about "go off", which has an unusually peculiar meaning: when the alarm clock goes off in the morning, it actually goes on.

This reminded me that "slow up" and "slow down" are synonymous. And there is "speed up", but no "speed down". And you cannot understand "stand down" by analogy with "stand up", "sit up", and "sit down". And you also cannot understand "nose job" by analogy with "hand job". But I digress.

One of the things about the phrasal verbs that gives the foreign students so much trouble is that the verbs don't all obey the same rules. For example, some are separable and some not. Consider "turned down". I can turn down the thermostat, but I can also turn the thermostat down. And I can try out my new game, and I can also try my new game out. And I can stand up my blind date, and I can stand my blind date up. But while I can come across a fountain in the park, I can't *come a fountain across in the park. And while I can go off to Chicago, I can't *go to Chicago off. There's no way to know which of these work and which not, except just by memorizing which are allowed and which not.

And sometimes the separable ones can't be unseparated. I can give back the map, and I can give the map back, and I can give it back, but I can't *give back it. I can hold up the line, and I can hold the line up, and I can hold us up, but I can't *hold up us. I don't know what the rule is exactly, and I don't want to go to the library again to get the Cambridge Grammar, because last time I did that I dropped it on my toe.

I hadn't realized any of this until I read this article about them, but when I did, I had a sudden flash of insight. I had not realized before what was going on when someone set up us the bomb. "Set up" is separable: I can set up the bomb, or set the bomb up, or someone can set us up. But "us", as noted above, is not deseperable, so you cannot have *set up us. But I think I understand the mistake better now than I did before; it seems less like a complete freak and more like a member of a common type of error.

Wed, 05 Apr 2006

TeX and the long S
It just occurs to me, reading today's article, that the final sentence is one of the strangest I've written in quite a while. It says:

stock TeX does not have any way to make a long medial s.

This is a strange thing to say because TeX was principally designed as a mathematical typesetting system, and one of the most common of all mathematical notations is the integral sign:

$$\int_a^b f'(x) dx = f(b) - f(a)$$

And the integral sign $$\int$$ is nothing more than an old-style long s; the 's' is for 'sum'.

Strange or not, the substance of my remark is correct, since standard TeX's fonts do not provide a long s in a size suitable for use in running text in place of a regular s.

On baroque long S
Jokes about the long medial 's' are easy to make. Stan Freberg's album Stan Freberg Presents: The United States of America, Volume I: The Early Years has a scene in which John Adams or Benjamin Franklin or one of those guys is reading Thomas Jefferson's draft of the Declaration of Independence: "'Life, liberty, and the purfuit of happinefs'? Tom, all your s's look like f's!"

A story by Frances Warfield, appropriately titled "Fpafm", gets probably as much juice out of the joke as there is to be got. I believe the copyright has expired, so here it is, in its entirety:

## FPAFM

### by Frances Warfield

I ordered ham and eggs, as I always do on the diner, and then, as I always do, looked around for pamphlets. There was one handy, "Echoes of Colonial Days," it was called, "being a little fouvenir iffued from time to time, for the benefit of the guefts of The Baltimore & Ohio Railroad Company as a reminder of the pleafant moments fpent..." Involuntarily, my lips began to move. I reached for a pencil. But the man across from me already had his pencil out. He had written:

"Oh, fay can you fee?"

I said, "Fing Fomething Fimple."

"Filly, ifn't it?" he said, and kept on writing.

I wrote: "Fing a Fong of Fixpence."

"Oh, ftop the fongs," he said, "Too eafy." He wrote: "The Courtfhip of Miles Fandifh," "I fee a fquirrel," "I undereftimate ftatefmanfhip," "My fifter feems fuperfenfitive," and, seeing that I did not appreciate the last one, which he evidently thought very fine, he wrote: "Forry to fee you fo ftupid."

I ate my lunch grouchily. How could I help it if he was in practice and I was not? He had probably taken this train before.

"Pafs the falt," I said.

"Pleafe pafs the falt," he triumphed.

I paid no attention. "Waiter!" I said. The waiter did not budge.

"You muft fpeak the language," said the man opposite me. "Fay, Fteward!"

The waiter jumped to attention. "Fir?" he said.

"Pleafe fill the faltcellar."

"The faltcellar fhall be replenifhed inftantly," replied the waiter, with a superior gleam in his eyes.

I smiled and my companion unbent a little. "Let's try for hard ones," he invited.

"Farcafm," he said.

"Fubftance."

"Fubfiftence," he scored.

"Fcythe."

"S's inside now," he ruled.

Perfuafive," I said instantly.

"Lanfguifh."

"Bafilifk."

"Quiefcent."

"Nonfenfe," I finished. "Fon of a fpeckled fea monfter."

"Ftep-fon of a poifonous fnake!" he cried.

"You don't fay fo!" I retorted.

"I do fay fo," he replied, getting up and leaving the diner.

"Fool!" I called after him, fniffiling.

Well, fo much for that.

Reading Baroque scientific papers, you see a lot of long-medial-s. Opening to a random page of the Philosophical Experiments and Observations of Robert Hooke, for example, we have:

The ſecond Experiment, was made, to ſhew a Way, how to find the true and comparative Expanſion of any metal, when melted, and ſo to compare it both with the Expanſion of the ſame metal, when ſolid, and likewiſe with the Expanſion of any other, either fluid or ſolid Body.

As I read more of this sort of thing, I went through several phases. At first it I just found it confusing. Then later I started to get good at reading the words with f's instead of s's and it became funny. ("Fhew! Folid! Hee hee!") Then it stopped being funny, although I still noticed it and found it quaint and charming. Also a constant reminder of how learned and scholarly I am, to be reading this old stuff. (Yes, I really do think this way. Pathetic, isn't it? And you are an enabler of this pathetic behavior.) Then eventually I didn't notice it any more, except in a few startling cases, such as when Dr. Hooke wrote on the tendency of ice to incorporate air bubbles while freezing, and said "...at the ſame time it may not be ſaid to ſuck it in".

What hasn't happened, however: it hasn't become completely transparent. The long s really does look a lot like an f, so much so that I can find it confusing when the context doesn't help me out. The fact that these books are always facsimiles and that the originals were printed on coarse paper and the ink has smudged, does not make it any easier to tell when one is looking at an s and when at an f. So far, the most difficult instance I have encountered involved a reference to "the Learned Dr. Voſſius". Or was it Voffius? Or Vofſius? Or was it Voſfius? Well, I found out later it was indeed Vossius; this is Dr. Gerhard Johann Voss (1577-1649), Latinized to "Vossius". But I was only able to be sure because I encountered the name somewhere else with the short s's.

This typographic detail raises a question of scholarly ethics that I don't know how to answer. In an earlier article, I needed to show how 17th-century writers referred to dates early in the year, which in common nomenclature occurred during one year, but which legally were part of the preceding year. Simply quoting one of these writers wasn't enough, because the date was disambiguated typographically, with the digit for the legal year directly above the digit for the conventional year. So I programmed TeX to demonstrate the typography:

But this raised another problem: to what degree should I reproduce the original typography? There is a scale here of which substitutions are more or less permissible:

1. Most permissible is to replace the original 17th-century font with a modern one.

2. Slightly less permissible would be to reduce the heavy 17th-century usage of italic face, in Royal Society for example, replacing it with roman typefaces.

3. Slightly less permissible still would be to replace the 17th-century capitalization conventions with 20th-century conventions. For example, in C20 we would not capitalize "Remark".

4. Then can I replace obsolete 17th-century contractions such as "consider'd" with 20th-century equivalents such as "considered"? If that is acceptable, then what about "'tis"? Can I replace "3dly" with "thirdly"?

5. Can I replace obsolete Baroque spellings such as "plaister", "fatt", and "it self" with "plaster", "fat", and "itself"?

6. Can I replace obsolete Baroquisms such as "strow'd" in "strow'd on Ice" with "strewn", or "stopple" with "stopper"?

7. At the bottom of the list, I could just rewrite the whole thing in a modern style and pass it off as what Derham actually wrote.
It seems to me that replacing the long medial s's with short ones is toward the top of this scale. By doing this, I am not changing the spelling, because a long medial s is still an s; I am just replacing one s with another, and this is akin to changing the font. And anyway, my choice was forced, because stock TeX does not have any way to make a long medial s.

Sun, 12 Mar 2006

Naomi Wolf and Big Ethel
Aaron Swartz has done a text search of The Beauty Myth and concluded that Wolf never intended Big Ethel to serve as an example of intelligence, contrary to what I asserted in my previous article. M. Swartz says:

Judging from a search on Amazon, the only time Ethel is mentioned is in the context of noting that an attractive woman is often paired with an unattractive one: "... Veronica and Ethel in Riverdale; ... and so forth. Male culture seems happiest to imagine two women together when they are defined as being one winner and one loser in the beauty myth." (59f)

I still question the aptness of the example, since, again, the principal case in which two women are imagined together in Archie comics is not Veronica and Ethel, but Veronica and Betty, both of whom are portrayed as "winners". Betty and Veronica are major characters; Ethel is not. But the error isn't nearly as serious as the one I said Wolf had made.

The most serious error here is mine: I should have considered and discussed the possibility that my friend was misquoting Wolf. That I didn't do this was unfair to Wolf and entirely my fault. Since I haven't read the book myself, I should have realized what shaky ground I was on, and taken pains to point this out. And yet other possibilities are:

• That my friend didn't misquote Wolf at all, and I misunderstood her at the time, or
• that my friend correctly quoted Wolf and I understood her at the time, but my memory of the episode (which occurred around 1993) is faulty.
I took Vallely to task for poor research and for failing to pick up a dictionary to confirm some of his assertions. Had I taken my own advice, I would have checked to see what Wolf said before commenting on it. My disclaimer in the original article that I had not read the book relieves me of only part of the responsibility for this failure.

On saying too much, or, bad things come in threes
Long ago, I had a conversation with a woman who had recently read Naomi Wolf's book The Beauty Myth. She was extolling the book, which I had not read, and mentioned that Wolf had an extensive discussion of the popular dichotomy between beauty and intelligence. She told me that Wolf had cited Archie comics as containing an example of this dichotomy, in the characters of Veronica and Big Ethel.

I had been nodding and agreeing up to that point. But at the mention of Big Ethel I was quite startled, and said that that spoiled the argument for me, and made me doubt the conclusion. I now had doubts about what had seemed so plausible a moment before.

Veronica is indeed one half of a contrasting pair in Archie comics. But Veronica and Big Ethel? No. Veronica is not complementary to Big Ethel. The counterpart of Veronica is Betty. The contrast is not between beauty and brains but between rich and poor, and between their derived properties, spoiled and sweet. A good point could be made about Veronica and Betty, but it was not the point that Wolf wanted to make; her citation of Veronica and Big Ethel as exemplifying the opposition of beauty and intelligence was just bizarre. Big Ethel, to my knowledge, has never been portrayed as unusually intelligent. She is characterized by homeliness and by her embarrassing and unrequited attraction to Jughead, not by intelligence.

Why would this make me doubt the conclusion of Wolf's argument? Because I had been fully ready to believe the conclusion, that our culture manufactures a division between attractiveness and intelligence for women, and makes them choose one or the other. I had imagined that it would be easy to produce examples demonstrating the point. But the example Wolf chose was completely inept. And, as I said at the time, "Naomi Wolf is very smart, and has studied this closely and thought about it for a long time. If that is the best example that she can come up with, then perhaps I'm wrong, and there really aren't as many examples as I thought there would be." Without the example, I would have agreed with the conclusion. With the example, intended to support the conclusion, I wasn't so sure.

Now, I come to the real point of this note. Paul Vallely has written an article for The Independent on "How Islamic inventors changed the world". He lists twenty of the most influential contributions of the Muslim world, including the discovery of coffee, inoculation, and the fountain pen. I am not so clear on the history of the technology here. Some of it I know is correct; some is plausible; some is extremely dubious. (The crank, not invented before 1206? Please.) But the whole article is spoiled for me, except as a topic of derision, because of three errors.

Item #1 concerns the discovery of the coffee bean. One might expect this to have been discovered in prehistoric times by local Ethiopians, long before the founding of Islam. But I'm in no position to argue with it, and I was ready to give Vallely the benefit of the doubt.

Item #2 on Vallely's list was more worrying. It says "Ibn al-Haitham....set up the first Camera Obscura (from the Arab word qamara for a dark or private room)." It may or may not be true that "qamara" is an "Arab word" (by which I suppose Vallely means an "Arabic word") for "chamber", but it is certainly true that this word, if it exists, is not the source of the English word "camera". I don't know from "qamara", but "camera obscura" is Latin for "dark chamber". "Camera" means "chamber" in Latin and has for thousands of years. The two words, in fact, are etymologically the same, which is why they have almost the same spelling. It is for this reason that the part of a legal hearing held in the judge's private chambers is said to be "in camera".

There might be an Arabic word "qamara", for all I know. If there is, it might be derived from the Latin. (The Latin word is not derived from Arabic, either; it is from Greek καμαρα, which refers to anything with an arched cover.) Two things are sure: The English word "camera" is not derived from Arabic, and Vallely did not bother to pick up a dictionary before he said that it was.

Anyone can make a mistake. But I started to get excited when I read item 3, which is about the game of chess. Vallely says "The word rook comes from the Persian rukh, which means chariot." This is true, sort of, but it is off in a subtle way. The rooks or castles of modern chess did start out as chariots. (Moving castles around never did make much sense.) And "rook" is indeed from Persian rukh. But rukh doesn't exactly mean a chariot. It means a chariot in the game of chess. The Persian word for a chariot outside of chess was different. (I don't remember what it was.) Saying that rukh is the Persian word for chariot is like saying that "rook" is the English word for castle.

I was only on item 3 and had already encountered one serious error of etymology and one other item which although it wasn't exactly an error, was peculiar. I considered that I wouldn't really have enough material for a blog post, unless Vallely made at least one more serious mistake. But there were still 17 of 20 items left. So I read on. Would Vallely escape?

No, or I would not have written this article. Item 17 says "The modern cheque comes from the Arabic saqq, a written vow to pay for goods when they were delivered...". But no. The correct etymology is fascinating and bizarre. "Cheque" is derived from Norman French "exchequer", which was roughly the equivalent of the treasury and internal revenue department in England starting around 1300. Why was the internal revenue department called the exchequer? Because it was named after the chessboard, which was also called "exchequer".

What do chessboards have to do with internal revenue? Ah, I am glad you wondered. Hindu-Arabic numerals had not yet become popular in Europe; numbers were still recorded using Roman numerals. It is extremely difficult to calculate efficiently with Roman numerals. How, then did the internal revenue department calculate taxes owed and amounts payable?

They used an abacus. But it wasn't an abacus like modern Chinese or Japanese abacuses, with beads strung on wires. A medieval European abacus was a table with a raised edge and a grid of squares ruled on it. The columns of squares represented ones, tens, hundreds, and so on. You would put metal counters, called jettons, on the squares to represent numbers. Three jettons on a "hundred" square represented three hundred; four jettons on the square to its right represented forty. Each row of squares recorded a separate numeral. To add two numerals together, just take the jettons from one row, move them to the other row, and then resolve the carrying appropriately: Ten jettons on a square can be removed and replaced with a single jetton on the square to the left.

The internal revenue department, the "exchequer", got its name from these counting-boards covered with ruled squares like chessboards.

(The word "exchequer" meaning a chessboard was derived directly from the name of the game: Old French eschecs, Medieval Latin scacci, and so on, all from shah, which means "king" in Persian. The word "checkered" is also closely related.)

So, in summary: the game is "chess", or eschek in French; the board is therefore exchequer, and since the counting-tables of the treasury department look like chessboards, the treasury department itself becomes known as the exchequer. The treasury department, like all treasury departments, issues notes promising to pay certain sums at certain times, and these notes are called "exchequer notes" or just "exchequers", later shortened (by the English) to "cheques" or (by Americans) to "checks". Arabic saqq, if there is such a word, does not come into it. Once again, it is clear that Vallely's research was shoddy.

While I was writing up this article, yet another serious error came to light. Item 11 says "The windmill was invented in 634 for a Persian caliph...". Now, I am not very knowledgeable about history, and my historical education is very poor. But that was so peculiar that it startled even me. 634 seemed to me much too early for any clever inventions to be attributed to Muslims. Then I looked it up, and so it was. Muhammad himself had only died in 632.

As for the Persian caliph Vallely mentions, he did not exist. The caliphs are the successors of Muhammad, so of course there was one in 634---the first one, in fact. Abu Bakr reigned from the death of the Prophet in 632 until his own death in 634; he was succeeded by Umar. Neither was Persian. They were both Arabs, as you would expect of Muslim leaders in 634. There were no Persian caliphs in 634.

My own ignorance of Islam and its history is vast and deep, but at least I had a vague idea that 634 was extremely early. Vallely could have looked up the date of the founding of the caliphate as easily as I did. Why didn't he? Well, perhaps it was just a typo, and should have said 834 or 934. In that case it's just poor editing and inattention. But perhaps it was a genuine factual error, in which case Vallely was not only not paying attention, but is apparently even less familiar with Islamic history than I am, difficult as that is to achieve. In which case we have this article about the twenty greatest contributions of Islam written by a guy who literally does not know the first thing about Islam.

And so this article, which I hoped to enjoy, was spoiled by a series of errors. I am very sympathetic to the idea that the brilliant history of Islamic science and engineering has been neglected by European scholarship. One of my very first blog posts was about the Islamic use of algebra to solve complex probate problems. Just last week I was reading about al-Biruni's invention, around 1000 years ago, of an improved method for measuring the size of the earth, a topic that Vallely treats as item 18. But after reading Vallely's article, I worried a bit that the case might have been overstated. Perhaps the contributions of Muslims are not as large as I had thought?

Fortunately, there was an alternative: the conclusion is correct, and the inept support from the author speaks only to the author's ineptness, not to the validity of the conclusion. I did not have that alternative with Naomi Wolf, who is not inept. (Also, see this addendum.)

With only cursory attention, I found three major errors of fact in this one short article. How many more did I miss, I wonder? Did Abbas ibn Firnas really invent a working parachute, as Vallely says? Maybe it was someone else. Maybe there was no parachute. Maybe there was, but it didn't work. Maybe the whole thing is a propaganda invention by someone who wants to promote Islam, and has suckered Vallely into repeating fiction. Maybe all of these. Someone knows the truth, but it isn't me, and I can't trust Vallely.

Were the Turks vaccinating people eighty years before the Europeans, or did Vallely swallow a tall tale? I don't know, and I can't trust Vallely.

People sometimes joke "I am stupider for having read this," but I really believe this was the case here. The article is worse than useless, because it has polluted my brain with a lot of unreliable non-information. I will have to be careful not to think that quilted fabrics were first brought to Europe by the crusaders, who got them from the Muslims. My real fear is that the "fact" will remain in my brain for years, long after I have forgotten how unreliable Vallely is, and that I will bring it out again as real information, which it is not. True or not, it is too unreliable to be information.

The best I can hope for now is that I will forget everything Vallely says, and meet the true parts again somewhere else in the future. In the meantime, I am worse off for having read it.

Thu, 02 Feb 2006

Petard corrections
Eric Cholet has written in to mention that he is familiar with the fried choux pastry that I mentioned yesterday, but under the name pets de nonne, not pets de soeurs, as I said. (Nonne, of course, is "nun". The word soeur is literally "sister", but in this context means "nun". ) I had cited On Food and Cooking as mentioning pets de soeurs, but it agrees with Eric, not with me.

It appears, though, that many people do use the name pets de soeurs to refer to these fritters, and some people also use it to refer to a kind of soda-raised cinnamon roll. Citations to various cookbooks are available through the usual searches.

Eric also points out that petard is the current word for a firecracker, and also now refers to a doobie. I was already aware of this because pictures of those things appeared when I did Google image seach for petard. Thank you, Eric.

Tue, 31 Jan 2006

Petard
A petard is a Renaissance-era bomb, basically a big firecracker: a box or small barrel of gunpowder with a fuse attached. Those hissing black exploding spheres that you see in Daffy Duck cartoons are petards. Outside of cartoons, you are most likely to encounter the petard in the phrase "hoist with his own petard", which is from Hamlet. Rosencrantz and Guildenstern are being sent to England with the warrant for Hamlet's death; Hamlet alters the warrant to contain R&G's names instead of his own. "Hoist", of course, means "raised", and Hamlet is saying that it is amusing to see someone screw up his own petard and blow himself sky-high with it.

 Order On Food and Cooking with kickback no kickback
This morning I read in On Food in Cooking that there's a kind of fried choux pastry called pets de soeurs ("nuns' farts") because they're so light and delicate. That brought to mind Le Pétomane, the world-famous theatrical fartmaster. Then there was a link on reddit titled "Xmas Petard (cool gif video!)" which got me thinking about petards, and it occurred to me that "petard" was probably akin to pets, because it makes a bang like a fart. And hey, I was right; how delightful.

Another fart-related word is "partridge", so named because its call sounds like a fart.

Thu, 26 Jan 2006

"Farther" vs. "further"
People mostly use "farther" and "further" interchangeably. What's the difference?

I looked it up in the dictionary, and it turns out it's simple. "Farther" means "more far". "Further" means "more forward".

"Further" does often connote "farther", because something that is further out is usually farther away, and so in many cases the two are interchangeable. For example, "Hitherto shalt thou come, but no further" (Job 38:11.)

But now when I see people write things like China Steps Further Back From Democracy (The New York Times, 26 November 1995) or, even worse, Big Pension Plans Fall Further Behind (Washington Post, 7 June 2005) it freaks me out.

Google finds 3.2 million citations for "further back", and 9.5 million for "further behind", so common usage is strongly in favor of this. But a quick check of the OED does not reveal much historical confusion between these two. Of the citations there, I can only find one that rings my alarm bell. ("1821 J. BAILLIE Metr. Leg., Wallace lvi, In the further rear.")

Thu, 01 Jan 1970

Last night I gave a talk for the New York Perl Mongers, and got to see a number of people that I like but don't often see. Among these was Michael Fischer, who told me of a story about myself that I had completely forgotten, but I think will be of general interest.

 Order Oulipo Compendium with kickback no kickback

The front end of the story is this: Michael first met me at some conference, shortly after the publication of Higher-Order Perl, and people were coming up to me and presenting me with copies of the book to sign. In many cases these were people who had helped me edit the book, or who had reported printing errors; for some of those people I would find the error in the text that they had reported, circle it, and write a thank-you note on the same page. Michael did not have a copy of my book, but for some reason he had with him a copy of Oulipo Compendium, and he presented this to me to sign instead.

Oulipo is a society of writers, founded in 1960, who pursue “constrained writing”. Perhaps the best-known example is the lipogrammatic novel La Disparition, written in 1969 by Oulipo member Georges Perec, entirely without the use of the letter e. Another possibly well-known example is the Exercises in Style of Raymond Queneau, which retells the same vapid anecdote in 99 different styles. The book that Michael put in front of me to sign is a compendium of anecdotes, examples of Oulipan work, and other Oulipalia.

What Michael did not realize, however, was that the gods of fate were handing me an opportunity. He says that I glared at him for a moment, then flipped through the pages, found the place in the book where I was mentioned, circled it, and signed that.

The other half of that story is how I happened to be mentioned in Oulipo Compendium.

Back in the early 1990s I did a few text processing projects which would be trivial now, but which were unusual at the time, in a small way. For example, I constructed a concordance of the King James Bible, listing, for each word, the number of every verse in which it appeared. This was a significant effort at the time; the Bible was sufficiently large (around five megabytes) that I normally kept the files compressed to save space. This project was surprisingly popular, and I received frequent email from strangers asking for copies of the concordance.

Another project, less popular but still interesting, was an anagram dictionary. The word list from Webster's Second International dictionary was available, and it was an easy matter to locate all the anagrams in it, and compile a file. Unlike the Bible concordance, which I considered inferior to simply running grep, I still have the anagram dictionary. It begins:

aal ala
aam ama
Aarhus (See arusha')
Aaronic (See Nicarao')
Aaronite aeration
Aaru aura


And ends:

zoosporic sporozoic
zootype ozotype
zyga gazy
zygal glazy


The cross-references are to save space. When two words are anagrams of one another, both are listed in both places. But when three or more words are anagrams, the words are listed in one place, with cross-references in the other places, so for example:

Ateles teasel stelae saltee sealet
saltee (See Ateles')
sealet (See Ateles')
stelae (See Ateles')
teasel (See Ateles')


saves 52 characters over the unabbreviated version. Even with this optimization, the complete anagram dictionary was around 750 kilobytes, a significant amount of space in 1991. A few years later I generated an improved version, which dispensed with the abbreviation, by that time unnecessary, and which attempted, sucessfully I thought, to score the anagrams according to interestingness. But I digress.

One day in August of 1994, I received a query about the anagram dictionary, including a question about whether it could be used in a certain way. I replied in detail, explaining what I had done, how it could be used, and what could be done instead, and the result was a reply from Harry Mathews, another well-known member of the Oulipo, of which I had not heard before. Mr. Mathews, correctly recognizing that I would be interested, explained what he was really after:

A poetic procedure created by the late Georges Perec falls into the latter category. According to this procedure, only the 11 commonest letters in the language can be used, and all have to be used before any of them can be used again. A poem therefore consists of a series of 11 multi-word anagrams of, in French, the letters e s a r t i n u l o c (a c e i l n o r s t). Perec discovered only one one-word anagram for the letter-group, "ulcerations", which was adopted as a generic name for the procedure.

Mathews wanted, not exactly an anagram dictionary, but a list of words acceptable for the English version of "ulcerations". They should contain only the letters a d e h i l n o r s t, at most once each. In particular, he wanted a word containing precisely these eleven letters, to use as the translation of "ulcerations".

Producing the requisite list was much easier then producing the anagram dictionary iself, so I quickly did it and sent it back; it looked like this:

a A a
d D d
e E e
h H h
i I i
l L l
n N n
o O o
r R r
s S s
t T t
ae ae ea
ah Ah ah ha
...
lost lost lots slot
nors sorn
nort torn tron
nost snot
orst sort
...
deilnorst nostriled
ehilnorst nosethirl
aehilnorst hortensial
`

The leftmost column is the alphabetical list of letters. This is so that if you find yourself needing to use the letters 'a d e h s' at some point in your poem, you can jump to that part of the list and immediately locate the words containing exactly those letters. (It provides somewhat less help for discovering the shorter words that contain only some of those letters, but there is a limit to how much can be done with static files.)

As can be seen at the end of the list, there were three words that each used ten of the eleven required letters: “hortensial”, “threnodial”, “disenthral”, but none with all eleven. However, Mathews replied:

You have found the solution to my immediate problem: "threnodial" may only have 10 letters, but the 11th letter is "s". So, as an adjectival noun, "threnodials" becomes the one and only generic name for English "Ulcerations". It is not only less harsh a word than the French one but a sorrowfully appropriate one, since the form is naturally associated with Georges Perec, who died 12 years ago at 46 to the lasting consternation of us all.

(A threnody is a hymn of mourning.)

A few years later, the Oulipo Compendium appeared, edited by Mathews, and the article on Threnodials mentions my assistance. And so it was that when Michael Fischer handed me a copy, I was able to open it up to the place where I was mentioned.

[ Addendum 20140428: Thanks to Philippe Bruhat for some corrections: neither Perec nor Mathews was a founding member of Oulipo. ]