The Universe of Discourse
           
Thu, 15 May 2008

Luminous band-aids
Last night after bedtime Iris asked for a small band-aid for her knee. I went into the bathroom to get one, and unwrapped it in the dark.

The band-aid itself is circular, about 1.5 cm in diameter. It is sealed between two pieces of paper, each about an inch square, that have been glued together along the four pairs of edges. There is a flap at one edge that you pull, and then you can peel the two glued-together pieces of paper apart to get the band-aid out.

As I peeled apart the two pieces of paper in the dark, there was a thin luminous greenish line running along the inside of the wrapper at the place the papers were being pulled away from each other. The line moved downward following the topmost point of contact between the papers as I pulled the papers apart. It was clearly visible in the dark.

I've never heard of anything like this; the closest I can think of is the thing about how wintergreen Life Savers glow in the dark when you crush them.

My best guess is that it's a static discharge, but I don't know. I don't have pictures of the phenomenon itself, and I'm not likely to be able to get any. But the band-aids look like this:

Have any of my Gentle Readers seen anything like this before? A cursory Internet search has revealed nothing of value.


[Other articles in category /physics] permanent link

Wed, 14 May 2008

More artificial Finnish
Several Finns wrote to me to explain in some detail what was wrong with the artificial Finnish in yesterday's article. As I surmised, the words "ssän" and "kkeen" are lexically illegal in Finnish. There were a number of similar problems. For example, my sample output included the non-word "t". I don't know how this could have happened, since the input probably didn't include anything like that, and the Markov process I used to generate it shouldn't have done so. But the code is lost, so I suppose I'll never know.

Of the various comments I received, perhaps the most interesting was from Ilmari Vacklin. ("Vacklin", huh? If my program had generated "Vacklin", the Finns would have been all over the error.) M. Vacklin pointed out that a number of words in my sample output violated the Finnish rules of vowel harmony.

(M. Vacklin also suggested that my article must have been inspired by this comic, but it wasn't. I venture to guess that the Internet is full of places that point out that you can manufacture pseudo-Finnish by stringing together a lot of k's and a's and t's; it's not that hard to figure out. Maybe this would be a good place to mention the word "saippuakauppias", the Finnish term for a soap-dealer, which was in the Guinness Book of World Records as the longest commonly-used palindromic word in any language.)

Anyway, back to vowel harmony. Vowel harmony is a phenomenon found in certain languages, including Finnish. These languages class vowels into two antithetical groups. Vowels from one group never appear in the same word as vowels from the other group. When one has a prefix or a suffix that normally has a group A vowel, and one wants to join it to a word with group B vowels, the vowel in the suffix changes to match. This happens a lot in Finnish, which has a zillion suffixes. In many languages, including Finnish, there is also a third group of vowels which are "neutral" and can be mixed with either group A or with group B.

Modern Korean does not have vowel harmony, mostly, but Middle Korean did have it, up until the early 16th century. The Korean alphabet was invented around 1443, and the notation for the vowels reflected the vowel harmony:

The first four vowels in this illustration, with the vertical lines, were incompatible with the second four vowels, the ones with the horizontal lines. The last two vowels were neutral, as was another one, not shown here, which was written as a single dot and which has since fallen out of use. Incidentally, vowel harmony is an unusual feature of languages, and its presence in Korean has led some people to suggest that it might be distantly related to Turkish.

The vowel harmony thing is interesting in this context for the following reason. My pseudo-Finnish was generated by a Markov process: each letter was selected at random so as to make the overall frequency of the output match that of real Finnish. Similarly, the overall frequency of two- and three-letter sequences in pseudo-Finnish should match that in real Finnish. Is this enough to generate plausible (although nonsensical) Finnish text? For English, we might say maybe. But for Finnish the answer is no, because this process does not respect the vowel harmony rules. The Markov process doesn't remember, by the time it gets to the end of a long word, whether it is generating a word in vowel category A or B, and so it doesn't know which vowels it whould be generating. It will inevitably generate words with moxed vowels, which is forbidden. This problem does not come up in the generation of pseudo-English.

None of that was what I was planning to write about, however. What I wanted to do was to present samples of pseudo-Finnish generated with various tunings of the Markov process.

The basic model is this: you choose a number N, say 2, and then you look at some input text. For each different sequence of N characters, you count how many times that sequence is followed by "a", how many times it is followed by "b", and so on.

Then you start generating text at random. You pick a sequence of N characters arbitrarily to start, and then you generate the next character according to the probabilities that you calculated. Then you look at the last N characters (the last N-1 from before, plus the new one) and repeat. You keep doing that until you get tired.

For example, suppose we have N=2. Then we have a big table whose keys are 2-character strings like "ab", and then associated with each such string, a table that looks something like this:
r 54.52
a 15.89
i 10.41
o 7.95
l 4.11
e 3.01
u 1.10
space 0.82
: 0.55
t 0.55
, 0.27
. 0.27
b 0.27
s 0.27
So in the input to this process, "ab" was followed by "r" more than 54% of the time, by "a" about 16% of the time, and so on. And when generating the output, every time our process happens to generate "ab", it will follow by generating an "r" 54.52% of the time, an "a" 15.89% of the time, and so on.

Whether to count capital letters as the same as lowercase, and what to do about punctuation and spaces and so forth, are up to the designer.

Here, as examples, are some samples of pseudo-English, generated with various N. The input text was the book of Genesis, which is not entirely typical. In each case, I deleted the initial N characters and the final partial word, cleaned up the capitalization by hand, and appended a final period.

N=0
Lt per f idd et oblcs hs hae:uso ar w aaolt y tndh rl ohn otuhrthpboleel.ee n synenihbdrha,spegn.
N=1
Cachand t wim, heheethas anevem blsant ims, andofan, ieahrn anthaye s, lso iveeti alll t tand, w.
N=2
Ged hich callochbarthe of th to tre said nothem, and rin ing of brom. My and he behou spend the.
N=3
Sack one eved of and refor ther of the hand he will there that in the ful, when it up unto rangers.
It should be clear that the quality improves as one increases the N parameter. The N=3 sample has mostly real words, and the few nonsense ones it contains ("eved", "ful") are completely plausible English. N=2, on the other hand, is mostly nonsense, although it's mostly plausible nonsense. Even "callochbarthe" is almost plausible. (The unfortunate "chb" in the middle is just bad luck. It occurs because Genesis 36 mentions Baalhanan the son of Achbor.) The N=1 sample is recognizably bogus; no English word looks like "ieahrn", and the triple "l" in "alll" is nearly impossible. (I did once write to Jesse Sheidlower, an editor of the Big Dictionary, to ask his advice about whether "ballless" should be hyphenated.)

I have prepared samples of pseudo-Finnish of various qualities. The input here was a bunch of text I copied out of Finnish Wikipedia. (Where else? If you need Finnish text in 1988, you get it from the Usenet fi.talk group; if you need Finnish text in 2008, you get it from Finnish Wikipedia.) I did a little bit of manual cleanup, as with the English, but not too much.

N=0
Vtnnstäklun so so rl sieesjo.Aiijesjeäyuiotiannorin traäl.N vpojanti jonn oteaanlskmt enhksaiaaiiv oenlulniavas. Rottlatutsenynöisu iikännam e lavantkektann eaagla admikkosulssmpnrtinrkudilsorirumlshsmoti,anlosa anuioessydshln.Atierisllsjnlu e.Itatlosyhi vnko ättr otneän akho smalloailäi jiaat kajvtaopnasneilstio tntin einteaonaiimotn:r apoya oruasnainttotne wknaiossäelaäinoev aobrs,vteorlokynv. Aevsrikhanä tp s s oälnlke rvmi il ynae nara ign ssm lkimttbhineaatismäi tst lli ahaltineshne kr keöunv ah s itenh s .Ia pa elstpnanmnuiksriil anaalnttt mr ti.Ooa ka eee eiiei,tnees äusee a nanhetv.Iopkijeatatits,i l eklbiik suössmap tioaotaktdiir rkeaviohiesotkeagarihv nnadvö jlape öt kaeakmjkhykoto tnt iunnuyknnelu rutliie.Leva eiriaösnaj,rk oyumtsle,iioa,aspa aeiaä wsuinn eta y tvati klssviutkuaktmlpnheomi.T akapskushhnuksnhnnheaaaaussitseminmpnamäiaä pät.Kaaaabl unnionuhnpa iaes,outka.Cväinvkshvrnlteeoea rmi re suodmpr autlysa tnliaanäass. Srs rnvrtsita kmidusvjn tii.
N=1
Ava pän svun kerekent lsita batävomenasttenerga kovosuujalules rma punntäni rtraliksainoi van eukällä. Enäkukänesinntampalä ttan kolpäsäkyönsllvitivenestakkesenelussivaliite kuuksä kttteni einsuekeita kuterissalietäkilpöikalit ojatäjä pinsin atollukole idoitenn kkaorhjajasteden en vuolynkoiverojaa hta puon ehalan vaivä ihoshäositi. Hde setua tämpitydi makta jasyn sää oinncgrkai jeeten. Ljalanekikeri toiskkksypohoin ta yö atenesällväkeesaatituuun. Paait pukata tuon ktusumitttan zagaleskli va kkanäsin siikutytowhenttvosa veste eten vunovivä. Vorytellkeeni stan jä taa eka kaine ja kurenntonsin kyn o nta ja. Aisst urksetaka. Hotimivaa ta mppussternallai ja. Hdä on koraleerermohtydelen on jon. Rgienon kulinoilisälsa ja holälimmpa vitin, kukausoompremänn ra, palestollebilsen kaalesta, oina. Blilullaushoingiötideispaanoksiton, mulurklimi kermalli pota atebau lmomarymin kypa hta vanon tin kela vanaspoita s kulitekkäjen jäleetuolpan, veesalekäilin oii. Häreli. Ymialisstermimpriekaksst on.
N=2
Omaalis onino osa josa hormastaaraktse tyi altäänä tyntellevääostoidesenä, la siä vuansilliana inöön akalkuulukempellys kisä nen myöhelyaminenkiemostamahti omuonsa onite oni kusissa. Kungin sykynteillalkaai ellahasiteisuunnaja eroniemmin javai musuuasinä, sittan tusuovatkryt tormon vuolisenitiivansaliuotkietjuuta sensa. Kutumppalvinen. Vaikintolat hän ja kilkuossa osa koiseuvo keyhdysvisakeemppolowistoisijouliuodosijolasissän muoli ogro soluksi valuksasverix intetormon patlantaan et muiksen paiettaatulun kan vuomesyklees ovain pun. Sesva sa hänerittämpiraun tyi vuoden sälisen sän yhtiit, set tämpiraalletä. Senssaikanoje leemp:tabeten ain raa olliukettyi su. Solulukuuttellerrotolit hee säkinessa hän sekketäärinenvaikeihakti umallailuksin sestunno klossi ilunuta. Klettisaa osen vua vuola, jani ja hinangia en ta kaineemonimien polin barkiviäliukkuta joseseva. Ebb rautta onistärään on ml jokoulistä oheksi anoton allysvallelsiliineuvoja kutuko ala ulkietutablohitkain. Ituno.
N=3
Ävivät mena osakeyhti yhdysvalmiininäkin rakenne tuliitä hermoni ja umpirauhastui liin baryshnikoneja. Ain viljelukuullisää olisäke spesideksyylikoliittu latvia. Helsina hän solukeskuksen kannumme, peri palkin vieskeinä sisään on orgaan poikanssisäätelukauno klee laisenäläinen tavastui kauno on länteen muttava hän voimista kilometsästymistettäjän lehtiöiksitoreisö. Sitoutuvat mukalle. Ainettiin sisäke suomaihin, jouluun. Verenkilpalveli valtaineen opisteri poli ohjasionee rakennuttikolan aivastisenäläistuu kehittisetoja, rajahormaailmanajan kulkopuolesti kuluu mooliitoutuvat ovat olle. Ainen yhdysvaltai valiolähtiöiksi vasta, S. Muidentilaisteri jotka verenkirovin verenkiehumistä nelle väliaivoittynyt baleviiliukoisiin maailmestavarasta, jokakuudessa laisu. Sai rakeyhti yhtiö eli gluksessa. Ebbin, ja linnosakkeen hormonien I hallistehtiin kilpirasvua jaajana hormaailusta kunnetteluskäyttöön suomalaivat yhdysvalmistämistammonit veteet olimistuvatta. Hormon oli rautta.
Before anyone objects to the non-word "ml" in the N=2 sample, let me explain that this is the standard abbreviation for "millilitra". The "i" in the N=3 sample was a puzzle, since Marko Heiskanen assures me that Finnish has no one-letter words. But it appears in my sample in connection with Sukselaisen I hallitus, whatever that is, so I capitalized it.

I must say that I found "yhdysvalmistämistammonit" rather far-fetched, even in Finnish. But then I discovered that "yhdeksänkymmenvuotiaaksi" and "yhdysvalloissakaan" are genuine, so who am I to judge?


[Other articles in category /lang] permanent link

Mon, 12 May 2008

Artificial Finnish

Order
Symbols, Signals, and Noise
Symbols, Signals, and Noise
with kickback
no kickback
By 1988 or 1989 I had read in several places, most recently in J. R. Pierce's Symbols, Signals, and Noise, that if you compile a table of the relative frequencies of three-letter sequences (trigraphs) in English text, and then generate random text with the same trigraph frequencies, the result cannot be distinguished from meaningful English text except by people who actually know English. Examples were provided, containing weird but legitimate-sounding words like "deamy" and "grocid", and the claim seemed plausible. But since I did actually know English, I could not properly evaluate it.

But around that time the Internet was just beginning to get into full swing. The Finnish government was investing a lot of money in networking infrastructure, and a lot of people in Finland were starting to appear on the Internet.

I have a funny story about that: Around the same time, a colleague named Marc Edgar approached me in the computer lab to ask if I knew of any Internet-based medium he could use to chat with his friend at the University of Oulu. I thought at first that he was putting me on (and maybe he was) because in 1989 the University of Oulu was just about the only place in the world where a large number of people were accessible via internet chat, IRC having been invented there the previous autumn.

A new set of Finnish-language newsgroups had recently appeared on Usenet, and people posted to them in Finnish. So I had access to an unlimited supply of computer-readable Finnish text, something which would have been unthinkable a few years before, and I could do the experiment in Finnish.


I wrote up the program, which is not at all difficult, gathered Finnish news articles, and produced the following sample:

Uttavalon estaa ain pahalukselle? Min omatunu selle menneet hy, toista. Palveljen alh tkö an välin oli ei alkohol pisten jol elenin. Että, ille, ittavaikki oli nim tor taisuuristä usein an sie a in sittä asia krista sillo si mien loinullun, herror os; riitä heitä suurinteen palve in kuk usemma. Tomalle, äs nto tai sattia yksin taisiä isiäk isuuri illää hetorista. Varsi kaikenlaineet ja pu distoja paikelmai en tulissa sai itsi mielim ssän jon sn ässäksi; yksen kos oihin! Jehovat oli kukahdol ten on teistä vak kkiasian aa itse ee eik tse sani olin mutta todistanut t llisivat oisessa sittä on raaj a vaisen opinen. Ihmisillee stajan opea tajat ja jumalang, sitten per sa ollut aantutta että voinen opeten. Ettuj, jon käs iv telijoitalikantaminun hä seen jälki yl nilla, kkeen, vaaraajil tuneitteistamaan same?

In those days, the world was 7-bit, and Finnish text was posted in a Finnish national variant of ASCII that caused words like "tkö an välin" to look like "tk| an v{lin". The presence of the curly braces heightened the apparent similarity, because that was all you could see at first glance.

At the time I was pleased, but now I think I see some defects. There are some vowelless words, such as "sn" and "t", which I think doesn't happen in Finnish. Some other words look defective: "ssän" and "kkeen", for example. Also, my input sample wasn't big enough, so once the program generated "alk" it was stuck doing the rest of "alkohol". Still, I think this could pass for Finnish if the reader wasn't paying much attention. I was satisfied with the results of the experiment, and was willing to believe that randomly-contructed English really did look enough like English to fool a non-English-speaking observer.

[ Addendum 20080514: There is a followup to this article. ]


[Other articles in category /lang] permanent link

Thu, 08 May 2008

The Origin of Consciousness

Order
The Origin of Consciousness in the Breakdown of the Bicameral Mind
The Origin of Consciousness in the Breakdown of the Bicameral Mind
with kickback
no kickback
One of my favorite books is The Origin of Consciousness in the Breakdown of the Bicameral Mind, by Julian Jaynes, a psychologist at Princeton University. Nearly everyone seems to agree that this is either a work of profound genius, or of profound crackpottery, and also that they aren't sure which it is. Jaynes' theory, as nearly as I can summarize the book, is something like this:

Human consciousness (which Jaynes describes and defines in considerable detail) is a relatively recent development, dating back at most only about 3,000 years or so.

There is the shocking part of the theory. Most people probably imagine consciousness arising much, much earlier, perhaps before language. Jaynes disagrees. In his theory, language, and in particular its mediation of thought through the use of metaphors, is an essential prerequisite for consciousness. And his date for the development of consciousness means that human consciousness would postdate several other important developments, such as metalworking, large-scale agriculture, complex hierarchical social structures, and even writing. Jaynes thinks that the development of consciousness is a historical event and is attested to by written history. He tries to examine the historical record to find evidence not only of preconscious culture, but of the tremendous upheavals that both caused and were the result of the arrival of consciousness.

If preconscious humans farmed, built temples and granaries, and kept records, they must have had some sort of organizing behavior that sufficed in place of consciousness. Jaynes believes that prior to the development of consciousness, humans had a very different mentality. When you or I need to make a decision, we construct a mental narrative, in which we imagine ourselves trying several courses of action, and attempt to predict the possible consequences. Jaynes claims that Bronze Age humans did not do this. What then?

Instead, says Jaynes, the two halves of the brain were less well-integrated in preconscious humans than they are today. The preconscious mentality was "bicameral", with the two halves of the brain operating more independently, and sometimes at odds with each other. The left hemisphere, as today, was usually dominant. Faced with a difficult decision, preconscious human would wait, possibly undergoing (and perhaps even encouraging) an increasingly agitated physical state, until they heard the voice of a god directing them what to do. These hallucinated voices were generated by the right hemisphere of the brain, and projected internally into the left hemisphere.

For example, when the Iliad says that the goddess Athena spoke to Achilles, and commanded and physically restrained him from killing Agamemnon, it is not fabulating: Achilles' right brain hallucinated the voice of the Goddess and restrained him.

In Jaynes' view, there is a large amount of varied literary, anthropological, and neurological evidence supporting this admittedly bizarre hypothesis. For example, he compares the language used in the Biblical Book of Amos (bicameral) with that in Ecclesiastes (conscious). He finds many examples of records from the right period of history bewailing the loss of the guidance of the gods, the stilling of their voices, and the measures that people took, involving seers and prophets, to try to bring the guiding voices back.

Jaynes speculates that mental states such as schizophrenia, which are frequently accompanied by irresistible auditorily hallucinated commands, may be throwbacks to the older, "bicameral" mental state.

Whether you find the theory amazingly brilliant or amazingly stupid, I urge to to withhold judgment until you have read the book. It is a fat book, and there is a mass of fascinating detail. As I implied, it's either a work of profound genius or of profound crackpottery, and I'm not sure which. (Yaakov Sloman tells me that the response to Wittgenstein's Tractatus Logico-Philosophicus was similarly ambivalent when it was new. I think the consensus is now on the genius side.) Either way, it is quite fascinating. There needs to be some theory to account for the historical development of consciousness, and as far as I know, this is the only one on offer.

Anyway, I did not mean to get into this in so much detail. The reason I brought this up is that because of my continuing interest in Jaynes' theory, and how it is viewed by later scholars, I am reading Muses, Madmen, and Prophets: Rethinking the History, Science, and Meaning of Auditory Hallucination by Daniel B. Smith. I am not very far into it yet, but Smith has many interesting things to say about auditory hallucinations, their relationship to obsessive-compulsive disorder, and other matters.

On page 37 Smith mentions a paper, which as he says, has a wonderful title: "Involuntary Masturbation as a Manifestation of Stroke-Related Alien Hand Syndrome". Isn't that just awesome? It gets you coming and going, like a one-two punch. First there's the involuntary masturbation, and while you're still reeling from that it follows up with "alien hand syndrome".

To save you the trouble of reading the paper, I will summarize. The patient is a 72-year-old male. He has lesions in his right frontal lobe. He is experiencing "alien hand syndrome", where his hand seems to be under someone else's control, grabbing objects, like the TV remote control, or grabbing pieces of chicken off his plate and feeding them to him, when what he wanted to do was feed himself with the fork in his right hand. "During his hospital stay, the patient expressed frustration and dismay when he realized that he was masturbating publicly and with his inability to voluntarily release his grasp of objects in the left hand."

Reaction time tests of his hands revealed that when the left hand was under his conscious control, it suffered from a reaction time delay, but when it was under the alien's control, it didn't.

Whee, freaky.


[Other articles in category /brain] permanent link

Thu, 01 May 2008

At that moment, the novice was enlightened...
Presented without further comment, a conversation I had yesterday on IRC. I am yrlnry:

--> You are now talking on #ubuntu
23:37<yrlnry>I upgraded to HH this afternoon. Since the upgrade, when I select a URL in gnome-terminal and then pick the "open this link" menu item, the link doesn't open in my browser. Instead, I get a dialog that says "Could not open the address "http://...": There was an error launching the default action command associated with this location." How can I fix this, or find out what the "error" was?
23:38<lpkmgj> yrlnry: this happeds in Windows
   yrlnry: i get that in Windows 2
23:39<yrlnry> lpkmgj: thanks! that fixed my problem!
 <lpkmgj>yrlnry: sarcasm?
 <yrlnry>lpkmgj: No!
 <lpkmgj>yrlnry: right ....
23:40<yrlnry>lpkmgj: WHen you said that, I realized that the problem was that HH had installed Firefox 3, and that the terminal program wants to use the default browser, which is FF2, which is no longer present since the upgrade.
 <yrlnry>lpkmgj: so I told FF3 to make itself the default browser, and the problem went away.
 <lpkmgj>yrlnry: oh, well glad i helped : )

(I have changed the name of the other person.)


[Other articles in category /tech] permanent link

Wed, 23 Apr 2008

Recounting the rationals
I just read a really excellent math paper, Recounting the rationals, by Calkin and Wilf.

Let b(n) be the number of ways of adding up powers of 2 to get n, with each power of 2 used no more than twice. So, for example, b(5) = 2, because there are 2 ways to get 5:
5 = 4 + 1
 = 2 + 2 + 1

And b(10) = 5, because there are 5 ways to get 10:
10 = 8 + 2
  = 8 + 1 + 1
  = 4 + 4 + 2
  = 4 + 4 + 1 + 1
  = 4 + 2 + 2 + 1 + 1

The sequence of values of b(n) begins as follows:

1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 ...
Now consider the sequence b(n) / b(n+1). This is just what you get if you take two copies of the b(n) sequence and place one over the other, with the bottom one shifted left one place, like this:

    1 1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 ...
    - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
    1 2 1 3 2 3 1 4 3 5 2 5 3 4 1 5 4 7 3 8 5 7 2 7 5 8 3 7 4 5 ...
Reading each pair as a rational number, we get the sequence b(n) / b(n+1), which is 1/1, 1/2, 2/1, 1/3, 3/2, 2/3, 3/1, 1/4, 4/3, 3/5, 5/2, ... .

Here is the punchline: This sequence contains each positive rational number exactly once.

If you are just learning to read math papers, or you think you might like to learn to read them, the paper in which this is proved would be a good place to start. It is serious research mathematics, but elementary. It is very short. The result is very elegant. The proofs are straightforward. The techniques used are typical and widely applicable; there is no weird ad-hockery. The discussion in the paper is sure to inspire you to tinker around with it more on your own. All sorts of nice things turn up. The b(n) sequence satisfies a simple recurrence, the fractions organize themselves neatly into a tree structure, and everything is related to everything else. Check it out.

Thanks to Brent Yorgey for bringing this to my attention. I saw it in this old blog article, but then discovered he had written a six-part series about it. I also discovered that M. Yorgey independently came to the same conclusion that I did about the paper: it would be a good first paper to read.

[ Addendum 20080505: Brad Clow agrees that it was a good place to start. ]


[Other articles in category /math] permanent link

Fri, 18 Apr 2008

A few notes on "The Manticore"

Order
The Manticore
The Manticore
with kickback
no kickback
This past week I reread Robertson Davies' 1972 novel The Manticore, which is a sequel to his much more famous novel Fifth Business (1970). I've read Fifth Business and its other sequel, World of Wonders (1975), several times each, but I found The Manticore much less compelling, and this is only the second time I have read it.

Here are a few miscellaneous notes about The Manticore.

Early memories

Here is the David Staunton's earliest memory, from chapter 2, section 1. (Page 87 in my Penguin paperback edition.)

Dr. Von Haller: What is the earliest recollection you can honestly vouch for?

Myself: Oh, that's easy. I was standing in my grandmother's garden, in warm sunlight, looking into a deep red peony. As I recall it, I wasn't much taller than the peony. It was a moment of very great—perhaps I shouldn't say happiness, because it was really an intense absorption. The whole world, the whole of life, and I myself, became a warm, rich, peony-red.

Here is the earliest memory of Francis Cornish, the protagonist of Davies' novel What's Bred in the Bone (1985):

It was in a garden that Francis Cornish first became truly aware of himself as a creature observing a world apart from himself. He was almost three years old, and he was looking deep into a splendid red peony.
That is the opening sentence of part two, page 63 in my Penguin Books copy.

The sideboard

This is from chapter 3 of The Manticore, David's diary entry of Dec. 20:

Inside, it is filled with ... gigantic pieces of furniture on which every surface has been carved within an inch of its life with fruits, flowers, birds, hares, and even, on one thing which seems to be an altar to greed but is more probably a sideboard, full-sized hounds; six of them with real bronze chains on their collars.

The following quotation is from Davies' 1984 New York Times article "In a Welsh Border House, the Legacy of the Victorians", a reminiscence of the house his father lived in after his retirement in 1950:

Until my father had it dismantled and removed to a stable, the Great Hall was dominated by what I can only call an altar to gluttony against the south wall. It was a German sideboard of monumental proportions that the Naylors had acquired at the Great Exhibition of 1851. Every fruit, flower, meat, game, and edible was carved on it in life size, including four large hounds, chained to the understructure with wooden chains, so cunningly wrought that they could be moved, like real chains.
This is reprinted in The Enthusiasms of Robertson Davies, Judith Skelton Grant, ed.

What do Canadians think of Saints?

Davies has said on a number of occasions that in Fifth Business he wanted to write about the nature of sainthood, and in particular how Canadians would respond if they found that they had a true saint among them. For example, in his talk "What May Canada Expect from Her Writers?" (reprinted in One Half of Robertson Davies, pp. 139–140) he says:

For many years the question occurred to me at intervals: What would Canada do with a saint, if such a strange creature were to appear within our borders? I thought Canada would reject the saint because Canada has no use for saints, because saints hold unusual opinions, and worst of all, saints do not pay. So in 1970 I wrote a book, called Fifth Business, in which that theme played a part.
Fifth Business does indeed treat this theme extensively and subtly. In The Manticore he is somewhat less subtle. A perpetual criticism I have of Davies is that he is never content to trust the reader to understand him. He always gets worried later that the reader is not clever enough, and he always comes back to hammer in his point a little more obviously.

For example, Fifth Business ends with the question "Who killed Boy Staunton?" and a cryptic, oracular answer. But Davies was unable to resist the temptation to explain his answer for the benefit of people unable or unwilling to puzzle out their own answers, and the end of The Manticore includes a detailed explanation. I think there might be an even plainer explanation in World of Wonders, but I forget. I have a partly-finished essay in progress discussing this tendency in Davies' writing, but I don't know when it will be done; perhaps never.

What would Canada think of a saint? Fifth Business is one answer, a deep and brilliant one. But Davies was not content to leave it there. He put a very plain answer into The Manticore. This is again from David's diary entry of Dec. 20 (p. 280):

Eisengrim's mother had been a dominant figure in his own life. He spoke of her as "saintly," which puzzles me. Wouldn't Netty have mentioned someone like that?

David's old nurse Netty did indeed mention Eisengrim's mother, although David didn't know that that was who was being mentioned. The mention appears in chapter 2, section 6, p. 160:

She had some awful piece of lore from Deptford to bring out. It seems there had been some woman there when she was a little girl who had always been "at it" and had eventually been discovered in a gravel pit, "at it" with a tramp; of course this woman had gone stark, staring mad and had had to be kept in her house, tied up.

If you want to know what Robertson Davies thinks that Canada would make of a saint, but you don't want to read and ponder Fifth Business to find out, there you have it in one sentence.

[ Addendum: The New York Times review of The Manticore is interesting for several reasons. The title is misspelled in the headline: "The Manitcore". The review was written by a then-unknown William Kennedy, who later became the author of Ironweed (which won the Pulitzer Prize) and other novels. Check it out. ]


[Other articles in category /book] permanent link

Thu, 17 Apr 2008

Is blood a transitive relation?
When you're first teaching high school students about the idea of a relation, you give examples of the important properties of relations. Relations can be some, none, or all of reflexive, symmetric, antisymmetric, or transitive. You start with examples that everyone is already familiar with, such as the identity relation, which is reflexive, symmetric, and transitive, and the ≤ relation, which is antisymmetric and transitive. Other good examples include familial relations: "sister-in-law of" is symmetric on the set of women, but not on the larger set of people; "ancestor of" is transitive but not symmetric.

It might seem at first glance that "is related to" is transitive, but, at least under conventional definitions, it isn't, because my wife is not related to my cousins.

(I was once invited to speak at Haverford College, and, since I have no obvious qualifications in the topic on which I was speaking, I was asked how I had come to be there. I explained that it was because my wife's mother's younger brother's daughter's husband's older brother's wife was the chair of the mathematics department. Remember, it's not what you know, it's who you know.)

I think I had sometimes tried to turn "related to" into a transitive relation by restricting it to "is related to by blood". This rules out the example above of my wife being unrelated to my cousins, because my relationship with my wife is not one of blood. I don't quite remember using "related by blood" as an example of a transitive relation, but I think I might have, because I was quite surprised when I realized that it didn't work. I spent a lot of time that morning going over my counterexample in detail, writing it up in my head, as it were. I was waiting around in Trevose for the mechanic to finish examining my car, and had nothing better to do that morning. If I had had a blog then, I would probably have posted it. But it is a good thing that I didn't, because in spite of all my thought about it, I missed something important.

The example is as follows. A and B have a child, X. (You may suppose that they marry beforehand, and divorce afterward, if your morality requires it.) Similarly, C and D have a child, Z. Then B and C have a child Y. Y is now the half-sibling of both X and Z, and so is unquestionably a blood relative of both, but X and Z are entirely unrelated. They are not even step-siblings.

Well, this is all very well, but the reason I have filed it under oops/, and the reason it's a good thing I didn't post it on my (then nonexistent) blog is that this elaborate counterexample contains a much simpler one: X is the child and hence the blood relative of both A and B, who are not in general related to each other. C, D, Y, and Z are wholly unnecessary.

I wish I had some nice conclusion to draw here, but if there's something I could learn from it I can't think would it might be.


[Other articles in category /oops] permanent link

Fri, 28 Mar 2008

Suffering from "make install"
I am writing application X, which uses the nonstandard perl modules DBI, DBD::SQLite, and Template. These might not be available on the target system, so I got the idea to include them in the distribution for X and have the build process for X build and install the modules. X already carries its own custom Perl modules in X/lib anyway, so I can just install DBI and the others into X/lib and everything will Just Work. Or so I thought.

After building DBI, for example, how do you get it to install itself into X/lib instead of the default system-wide location, which only the super-user has permission to modify?

There are at least five solutions to this common problem.

Uh-oh. If solution #1 had worked, people would not have needed to invent solution #2. If solution #2 had worked, people would not have needed to invent solution #3. Since there are five solutions, there is a good chance that none of them work.

You can, I am informed:

  • Set PREFIX=X when building the Makefile
  • Set INSTALLDIRS=vendor and VENDORPREFIX=X when building the Makefile
    • Or maybe instead of VENDORPREFIX you need to set INSTALLVENDORLIB or something
    • Or maybe instead of setting them while building the Makefile you need to set them while running the make install target
  • Set LIB=X/lib when building the Makefile
  • Use PAR
  • Use local::lib
Some of these fail by being excessively complicated. Some fail by addressing a larger problem set that is too large. For example, I do not want to do whatever PAR does; I just want to install the damn modules into X/lib where the application can find them.

Some of these items fail because they just plain fail. For example, the first thing everyone says is that you can just set PREFIX to X. No, because then the module Foo does not go into X/lib/Foo.pm. It goes into X/Foo/lib/perl5/site_perl/5.12.23/Foo.pm. Which means that if X does use lib 'X/lib'; it will not be able to find Foo.

The manual (which goes by the marvelously obvious and easily-typed name of ExtUtils::MakeMaker, by the way) is of limited help. It recommends solving the problem by travelling to Paterson, NJ, gouging your eyes out with your mom's jewelry, and then driving over the Passaic River falls. Ha ha, just kidding. That would be a big improvement on what it actually suggests, for three reasons. First, it is clear and straightforward. Second, it would feel better than the stuff it does suggest. And third, it would actually solve your problem, although obliquely.

It turns out there is a simple solution that doesn't involve travelling to New Jersey. The first thing you have to do is give up entirely on trying to use make install to install the modules. It is completely broken for this application, because even if the destination could somehow be forced to be what you wanted—and, after all, why would you expect that make install would let you configure the destination directory in a simple fashion?—it would still install not only the contents of MODULE/lib, but also the contents of MODULE/bin, MODULE/man, MODULE/share, MODULE/pus, MODULE/dork, MODULE/felch, and MODULE/scrotum, some of which you probably didn't want.

So no. But the solution is actually simple. The normal module build process (as distinct from the install process) puts all this crap under MODULE/blib. The test suite is run against the blib installation. So the test programs have the same problem that X has. If they can find the stuff under blib, so can X, by replicating the layout under blib and then doing what the test suite does.

In fact, the modules are installed into the proper subdirectories of MODULE/blib/lib. So the simple solution is just to build the module and then, instead of trying to get the installer to put the right stuff in the right place, use cp -pr MODULE/blib/lib/* X/lib. Problem solved.

For modules with a shared library, you need to copy MODULE/blib/arch/auto/* into X/lib/auto also.

I remember suffering over this at least ten years ago, when a student in a class I was teaching asked me how to do it and I let ExtUtils::MakeMaker make a monkey of me. I was amazed to find myself suffering over it once again. I am relieved to have found the right answer.

This is one of those days when I am not happy with software. It sometimes surprises me how many of those days involve make.

Dennis Ritchie once said that "make is like Pascal. Everybody likes it, so they go in and change it." I never really thought about this before, but it now occurs to me that probably Ritchie meant that they like make in about the same way that they like bladder stones. Because Dennis Ritchie probably does not like Pascal, and actually nobody else likes Pascal either. They may say they do, and they may even think they do, but if you look a little closer it always turns out that the thing they like is not actually Pascal, but some language that more or less resembles Pascal. Unfortunately, the changes people make to make tend to make it bigger and wartier, and this improves make about as much as it would improve a bladder stone.

I would like to end this article on a positive note. If you haven't already, please read Recursive make Considered Harmful and be prepared to be blinded by the Glorious Truth therein.


[Other articles in category /prog] permanent link

Tue, 25 Mar 2008

The "z" command: output filtering
My last few articles ([1] [2] [p] [p-2]) have been about this z program. The first part of this article is a summary of that discussion, which you can skip if you remember it.

The idea of z is that you can do:

        z grep pattern files...
and it does approximately the same as:
        zgrep pattern files...
or you could do:
        z sed script files...
and it would do the same as:
        zsed script files...
if there were a zsed command, although there isn't.

Much of the discussion has concerned a problem with the implementation, which is that the names of the original compressed files are not available to the command, due to the legerdemain z must perform in order to make the uncompressed data available to the command. The problem is especially apparent with wc:

        % z wc *  
            411    2611   16988 ctime.blog
             71     358    2351 /proc/self/fd/3
            121     725    5053 /proc/self/fd/4
             51     380    2381 files-talk.blog
             48     145     885 find-uniq.pl
            288    2159   12829 /proc/self/fd/5
             95     665    4337 ssh-agent-revisted.blog
            221     941    6733 struct-inode.blog
            106     555    3976 sync-2.blog
            115     793    4904 sync.blog
            124     624    4208 /proc/self/fd/6
           1651    9956   64645 total

Here /proc/self/fd/3 and the rest should have been names ending in .gz, such as env-2.blog.gz.

Another possible solution

At the time I wrote the first article, it occurred to me briefly that it would be possible to have z capture the output of the command and attempt to translate /proc/self/fd/3 back to env-2.blog.gz or whatever is appropriate, because although the subcommand does not know the original filenames, z itself does. The code would look something like this. Instead of ending by execing the command, as the original version of z did:
  exec $command, @ARGV;
  die "Couldn't run '$command': $!.\n";
this revised version of z, which we might call zz, would end with the code to translate back to the original filenames:

  open my($out), "-|", $command, @ARGV
    or die "Couldn't run '$command': $!.\n";
  while (<$out>) {
    s{/proc/self/fd/(\d+)}{$old[$1]}g;
    print;
  }
Here @old is an array that translates from file descriptors back to the original filename.

At the time, I thought of doing this, and my immediate thought was "well, that is so obviously a terrible idea that it is not worth even mentioning", so I left it out. But since then at least five people have written to me to suggest it, so it appears that it is not obviously a terrible idea. I had to think a little deeper about why I thought it was a terrible idea.

Really the question is why I think this is a more terrible idea than the original z program was in the first place. Because one could say that z is garbling the output of its command, and the filtering code in zz is only un-garbling it. But I think this isn't the right way to look at it.

The output of the command has a certain format, a certain structure. We don't know ahead of time what that structure is, but it can be described for any particular command. For instance, the output of wc is always a sequence of lines where each line has four whitespace-separated fields, of which the first three are numerals and the last is a filename, and then a final total line at the end.

Similarly, the output of tar is a file in a complicated binary format, one which is documented somewhere and which is intelligible to other instances of the tar command that are trying to decode it.

The original behavior of z may alter the content of the command output to some extent, replacing some filenames with others. But it cannot disrupt the structure or the format of the file, ever. This is because the output of z tar is the output of tar, unmodified. The z program tampers with the arguments it gives to tar, but having done that it runs tar and lets tar do what it wants, and tar then must produce a tar-format output, possibly not the one it would have normally produced—the content might be a little different—but a properly-formatted one for sure. In particular, any program written to deal properly with the output of tar will still work with the output of z tar. The output might not have the same meaning, but we can say very particularly what the extent of the differences might be: if the output mentions filenames, then some of these might have changed from the true filenames to filenames of the form /proc/self/fd/37.

With zz, we cannot make any such guarantee. The output of zz tar zc foo.gz, for example, might be in proper .tar.gz format. But suppose the output of tar zc foo.gz creates compressed binary output that just happens to contain the byte sequence 2f70 726f 632f 7365 6c66 2f66 642f 33? (That is, "/proc/self/fd/3".) Then zz will silently replace these 15 bytes with the six bytes 666f 6f2e 677a.

What if the original sequence was understood as part of a sequence of 2-byte integers? The result is not even properly aligned. What if that initial 2f was a count? The resulting count (66) is much too long. The result would be utterly garbled and unintelligible to tar zx. What the tar command will do with a garbled input is not well-defined: it might dump core, or it might write out random garbage data, or overwrite essential files in the filesystem. We are into nasal demon territory. With the original z, we never get anywhere near the nasal demons.

I suppose the short summary here is that z treats its command as a black box, while zz pretends to understand what comes out of it. But zz's understanding is a false pretense. My experience says that programs should not screw around with things they don't understand, and this is why I instantly rejected the idea when I thought of it before.

One correspondent argued that the garbling is very unlikely, and proposed various techniques to make it even less likely, mostly by rewriting the input filenames to various long random strings. But I felt then that this was missing the point, and I still do. He says it is unlikely, but he doesn't know that it is unlikely, and indeed the unlikeliness depends on the format of the output of the command, which is precisely the unknown here. In my view, the difference between z and zz is that the changes that z makes are bounded, because you can describe them briefly, as I did above, and the changes that zz makes are unbounded, because there is no limit to what could happen as a result.

On the other hand, this correspondent made a good point that if the output of zz is not consumed by anything other than human eyeballs, there may be no real problem. And for some particular commands, such as wc, there is never any problem at all. So perhaps it's a good idea to add a command-line option to z to enable the zz behavior. I did this in my version, and I'm going to try it out and see how it goes.

Complete modified source code is available. (Diffs from previous version.)


[Other articles in category /Unix] permanent link

Sat, 22 Mar 2008

The "z" command: alternative implementations
In yesterday's article I discussed a possibly-useful utility program named z, which has a flaw. To jog your memory, here is a demonstration:

        % z grep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        /proc/self/fd/3:All five people who wrote to me about this immediately said "oh, yes,
        /proc/self/fd/5:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
For a detailed discussion, see the previous article.

Fixing this flaw seems difficult-to-impossible. As I said earlier, the trick is to fool the command into reading from a pipe when it thinks it is opening a file, and this is precisely what /proc/self/fd is for. But there is an older, even more widely-implemented Unix feature that does the same thing, namely the FIFO. So an alternative implementation creates one FIFO for each compressed file, with a gzip process writing to the FIFO, and tells the command to read from the FIFO. Since we have some limited control over the name of the FIFO, we can ameliorate the missing-filename problem to some extent. Say, for example, we create the FIFOs in /tmp/PID. Then the broken zgrep example above might look like this instead:

        % z grep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        /tmp/7516/env-2.blog.gz:All five people who wrote to me about this immediately said "oh, yes,
        /tmp/7516/qmail-throttle.blog.gz:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
The output is an improvement, but it is not completely solved, and the cost is that the process and file management are much more complicated. In fact, the cost is so high that you have to wonder if it might not be simpler to replace z with a shell script that copies the data to a temporary directory, uncompresses the files, and runs the command on the uncompressed files, perhaps something along these lines:

        #!/bin/sh
        DIR=/tmp/$$
        mkdir $DIR

        COMMAND=$1
        shift
        cp -p "$@" $DIR

        cd $DIR
        gzip -d *
        $COMMAND *
This has problems too, but my point is that if you are willing to accept a crappy, semi-working solution along the lines of the FIFO one, simpler ones are at hand. You can compare the FIFO version directly with the shell script, and I think the FIFO version loses. The z implementation I have is a solution in a different direction, and different tradeoffs, and so might be preferable to it in a number of ways.

But as I said, I don't know yet.

[ Addendum 20080325: Several people suggested a fix that I had considered so unwise that I didn't even mention it. But after receiving the suggestion repeatedly, I wrote an article about it. ]


[Other articles in category /Unix] permanent link

Fri, 21 Mar 2008

z-commands
The gzip distribution includes a command called zcat. Its command-line arguments can include any number of filenames, compressed or not, and it prints out the contents, uncompressing them on the fly if necessary. Sometime later a zgrep command appeared, which was similar but which also performed a grep search.

But for anything else, you either need to uncompress the files, or build a special tool. I have a utility that scans the web logs of blog.plover.com, and extracts a report about new referrers. The historical web logs are normally kept compressed, so I recently built in support for decompression. This is quite easy in Perl. Normally one scans a sequence of input files something like this:

        while (<>) {
          ... do something with $_ ...
        }
The <> operator implicitly scans all the lines in all the files named in the command-line arguments, opening a new file each time the previous one is exhausted.

To decompress the files on the fly, one can preprocess the command-line arguments:

        for (@ARGV) {
          if (/\.gz$/) {
            $_ = "gzip -dc $_ |";
          }
        }

        while (<>) {
          ... do something with $_ ...
        }
The for loop scans the command-line arguments, replacing each one that has the form foo.gz with gzip -dc foo.gz |. Perl's magic open semantics treat filenames specially if they end with a pipe symbol: a pipe to a command is opened instead. Of course, anyone can think of half a dozen ways in which this can go wrong. But Larry Wall's skill in making such tradeoffs has been a large factor in Perl's success.

But it bothered me to have to make this kind of change in every program that wanted to handle compressed files. We have zcat and zgrep; where are zcut, zpr, zrev, zwc, zcol, zbc, zsed, zawk, and so on? Echh.

But after I got to thinking about it, I decided that I could write a single z utility that would do a lot of the same things. Instead of this:

        zsed -e 's/:.*//' * | ...
where the * matches some files that have .gz suffixes and some that haven't, one would write:

        z sed -e 's/:.*//' * | ...
and it would Just Work. That's the idea, anyway.

If sed were written in Perl, z would have an easy job. It could rely on Perl's magic open, and simply preprocess the arguments before running sed:

        # hypothetical implementation of z
        #
        my $command = shift;
        for (@ARGV) {
          if (/\.gz$/) {
            $_ = "gzip -dc $_ |";
          }
        }
        exec $command, @ARGV;
        die "Couldn't run command '$command': $!\n";
But sed is not written in Perl, and has no magic open. So I have to play a trickier trick:

        for my $file (@ARGV) {
          if ($file =~ /\.gz$/) {
            unless (open($fhs[@fhs], "-|", "gzip", "-cd", $file)) {
              warn "Couldn't open file '$file': $!; skipping\n";
              next;
            }
            my $fd = fileno $fhs[-1];
            $_ = "/proc/self/fd/$fd";
          }
        }

        # warn "running $command @ARGV\n";
        exec $command, @ARGV;
        die "Couldn't run command '$command': $!\n";
This is a stripped-down version to illustrate the idea. For various reasons that I explained yesterday, it does not actually work. The complete, working source code is here.

The idea, as before, is that the program preprocesses the command-line arguments. But instead of replacing the arguments with pipe commands, which are not supported by open(2), the program sets up the pipes itself, and then directs the command to take its input from the pipes by specifying the appropriate items from /proc/self/fd.

The trick depends crucially on having /proc/self/fd, or /dev/fd, or something of the sort, because otherwise there's no way to trick the command into reading from a pipe when it thinks it is opening a file. (Actually there is at least one other way, involving FIFOs, which I plan to discuss tomorrow.) Most modern systems do have /proc/self/fd. That feature postdates my earliest involvement with Unix, so it isn't a ready part of my mental apparatus as perhaps it ought to be. But this utility seems to me like a sort of canonical application of /proc/self/fd, in the sense that, if you couldn't think what /proc/self/fd might be good for, then you could read this example and afterwards have a pretty clear idea.

The z utility has a number of flaws. Principally, the original filenames are gone. Here's a typical run with regular zgrep:

        % zgrep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        env-2.blog.gz:All five people who wrote to me about this immediately said "oh, yes,
        qmail-throttle.blog.gz:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
But here's the same thing with z:

        % z grep immediately *  
        ctime.blog:we want to update.  It is immediately copied into a register, and
        /proc/self/fd/3:All five people who wrote to me about this immediately said "oh, yes,
        /proc/self/fd/5:program continues immediately, possibly posting its message.  (It
        struct-inode.blog:is a symbolic link, its inode is returned immediately; iname() would
        sync.blog:and reports success back to the process immediately, even though the
The problem is even more glaring in the case of commands like wc:

        % z wc *  
            411    2611   16988 ctime.blog
             71     358    2351 /proc/self/fd/3
            121     725    5053 /proc/self/fd/4
             51     380    2381 files-talk.blog
             48     145     885 find-uniq.pl
            288    2159   12829 /proc/self/fd/5
             95     665    4337 ssh-agent-revisted.blog
            221     941    6733 struct-inode.blog
            106     555    3976 sync-2.blog
            115     793    4904 sync.blog
            124     624    4208 /proc/self/fd/6
           1651    9956   64645 total

So perhaps z will not turn out to be useful enough to be more than a curiosity. But I'm not sure yet.

This is article #300 on my blog. Thanks for reading.

[ Addendum 20080322: There is a followup to this article. ]

[ Addendum 20080325: Another followup. ]


[Other articles in category /Unix] permanent link