The Universe of Discourse

Tue, 07 Apr 2020

Fern motif experts on the Internet

I live near Woodlands Cemetery and by far the largest monument there, a thirty-foot obelisk, belongs to Thomas W. Evans, who is an interesting person. In his life he was a world-famous dentist, whose clients included many crowned heads of Europe. He was born in Philadelphia, and land to the University of Pennsylvania to found a dental school, which to this day is located at the site of Evans’ former family home at 40th and Spruce Street.

A few days ago my family went to visit the cemetery and I insisted on visting the Evans memorial.

A young girl, seen from the back, is climbing a large stone
monument.  She is wearing black boots, blue jeans, and a black leather
jacket.  She is about six feet off the ground. Attached to the monument to her right is a
green copper plate that says  (among other things) ‘In memory of
DR. THOMAS WILLIAM EVANS’.  In the background is a tree, and other, smaller
monuments can be seen.

The obelisk has this interesting ornament:

Description below.

The thing around the middle is evidently a wreath of pine branches, but what is the thing in the middle? Some sort of leaf, or frond perhaps? Or is it a feather? If Evans had been a writer I would have assumed it was a quill pen, but he was a dentist. Thanks to the Wonders of the Internet, I was able to find out.

First I took the question to Reddit's /r/whatisthisthing forum. Reddit didn't have the answer, but Reddit user @hangeryyy had something better: they observed that there was a fad for fern decorations, called pteridomania, in the second half of the 19th century. Maybe the thing was a fern.

I was nerdsniped by pteridomania and found out that a book on pteridomania had been written by Dr. Sarah Whittingham, who goes by the encouraging Twitter name of @DrFrond.

Dr. Whittingham's opinion is that this is not a fern frond, but a palm frond. The question has been answered to my full and complete satisfaction.

My thanks to Dr. Whittingham, @hangeryyy, and the /r/whatisthisthing community.

[Other articles in category /art] permanent link

Mon, 06 Apr 2020

Anglo-Saxon and Hawai‘ian Wikipedias

Yesterday browsing the list of Wikipedias I learned there is an Anglo-Saxon Wikipedia. This seems really strange to me for several reasons: Who is writing it? And why?

And there is a vocabulary problem. Not just because Anglo-Saxon is dead, and one wouldn't expect it to have any words for anything not invented in the last 900 years or so. But also, there are very few extant Anglo-Saxon manuscripts, so we don't have a lot of vocabulary, even for things that had been invented 900 years ago.

Helene Hanff said:

I have these guilts about never having read Chaucer but I was talked out of learning Early Anglo-Saxon / Middle English by a friend who had to take it for her Ph.D. They told her to write an essay in Early Anglo-Saxon on any-subject-of-her-own-choosing. “Which is all very well,” she said bitterly, “but the only essay subject you can find enough Early Anglo-Saxon words for is ‘How to Slaughter a Thousand Men in a Mead Hall’.”

I don't read Anglo-Saxon but if you want to investigate, you might look at the Anglo-Saxon article about the Maybach Exelero (a hēahfremmende sportƿægn), Barack Obama, or taekwondo. I am pre-committing to not getting sucked into this, but sportƿægn is evidently intended to mean “sportscar” (the ƿ is an obsolete letter called wynn and is approximately a W, so that ƿægn is “wagon”) and I think that fremmende is “foreign” and hēah is something like "high" or "very". But I'm really not sure.

Anyway Wikipedia reports that the Anglo-Saxon Wikipedia has 3,197 articles (although most are very short) and around 30 active users. In contrast, the Hawai‘ian Wikipedia has 3,919 articles and only around 14 active users, and that is a language that people actually speak.

[Other articles in category /lang] permanent link

Caricatures of Nazis and the number four in Russian

[ Warning: this article is kinda all over the place. ]

I was looking at this awesome poster of D. Moor (Д. Моор), one of Russia's most famous political poster artists:

A Soviet propaganda poster, black,
with the foreground in yellowish-beige and a border of the same
color.  It depicts caricatures of the faces of Himmler, Göring,
Hitler, and Goebbels, labeled on the left with their names in
Russian.  Each name begins with the Russian letter Г, which is shaped
like an upside-down letter L.  Further description is below.

(original source at Artchive.RU)

This is interesting for a couple of reasons. First, in Russian, “Himmler”, “Göring”, “Hitler”, and “Goebbels” all begin with the same letter, ‘Г’, which is homologous to ‘G’. (Similarly, Harry Potter in Russian is Га́рри, ‘Garri’.)

I also love the pictures, and especially Goebbels. These four men were so ugly, each in his own distinctively loathsome way. The artist has done such a marvelous job of depicting them, highlighting their various hideousness. It's exaggerated, and yet not unfair, these are really good likenesses! It's as if D. Moor had drawn a map of all the ways in which these men were ugly.

My all-time favorite depiction of Goebbels is this one, by Boris Yefimov (Бори́с Ефи́мов):

A poster in black, blue, yellow, and muddy green, depicting
Goebbels as a hideous mashup with
Mickey Mouse. His tail divides into four at the end and is shaped like
a swastika.  His yellow-clived hands are balled into fists and spittle
is flying from his mouth. The poster is captioned (in English) at the top: “WHAT
IS AN ‘ARYAN’?  He is HANDSOME” and at the bottom “AS GOEBBELS”.

For comparison, here's the actual Goebbels:

Actual archival photograph of Goebbels, in right profile, just
like Mickey Mouse Goebbels in the previous picture, but from the chest
up.  His mouth is
closed and he is wearing a wool suit, white shirt with collar, and a
wide necktie.

Looking at pictures of Goebbels, I had often thought “That is one ugly guy,” but never been able to put my finger on what specifically was wrong with his face. But since seeing the Efimov picture, I have never been able to look at a picture of Goebbels without thinking of a rat. D. Moor has also drawn Goebbels as a tiny rat, scurrying around the baseboards of his poster.

Anyway, that was not what I had planned to write about. The right-hand side of D. Moor's poster imagines the initial ‘Г’ of the four Nazis’ names as the four bent arms of the swastika. The captions underneath mean “first Г”, “second Г” and so on.

[ Addendum: Darrin Edwards explains the meaning here that had escaped me:

One of the Russian words for shit is "govno" (говно). A euphemism for this is to just use the initial g; so "something na g" is roughly equivalent to saying "a crappy something". So the title "vse na g" (all on g) is literally "they all start with g" but pretty blatantly means "they're all crap" or "what a bunch of crap". I believe the trick of constructing the swastika out of four g's is meant to extend this association from the four men to the entire movement…

Thank you, M. Edwards! ]

Looking at the fourth one, четвертое /chetvyertoye/, I had a sudden brainwave. “Aha,” I thought, “I bet this is akin to Greek “tetra”, and the /t/ turned into /ch/ in Russian.”

Well, now that I'm writing it down it doesn't seem that exciting. I now remember that all the other Russian number words are clearly derived from PIE just as Greek, Latin, and German are:

English German Latin Greek Russian
one ein unum εἷς (eis) оди́н (odeen)
two zwei duo δύο (dyo) два (dva)
three drei trēs τρεῖς (treis) три (tri)
four vier quattuor τέτταρες (tettares) четы́ре (chyetirye)
five fünf quinque πέντε (pente) пять (pyat’)

In Latin that /t/ turned into a /k/ and we get /quadra/ instead of /tetra/. The Russian Ч /ch/ is more like a /t/ than it is like a /k/.

The change from /t/ to /f/ in English and /v/ in German is a bit weird. (The Big Dictionary says it “presents anomalies of which the explanation is still disputed”.) The change from the /p/ of ‘pente’ to the /f/ of ‘five’ is much more typical. (Consider Latin ‘pater’, ‘piscum’, ‘ped’ and the corresponding English ‘father’, ‘fish’, ‘foot’.) This is called Grimm's Law, yeah, after that Grimm.

The change from /q/ in quinque to /p/ in pente is also not unusual. (The ancestral form in PIE is believed to have been more like the /q/.) There's a classification of Celtic lanugages into P-Celtic and Q-Celtic that's similar, exemplified by the change from the Irish patronymic prefix Mac- into the Welsh patronymic map or ap.

I could probably write a whole article comparing the numbers from one to ten in these languages. (And Sanskrit. Wouldn't want to leave out Sanskrit.) The line for ‘two’ would be a great place to begin because all those words are basically the same, with only minor and typical variations in the spelling and pronunciation. Maybe someday.

[Other articles in category /lang/etym] permanent link

Sun, 05 Apr 2020

Screensharing your talk slides is skeuomorphic

Back when the Web was much newer, and people hadn't really figured it out yet, there was an attempt to bring a dictionary to the web. Like a paper dictionary, its text was set in a barely-readable tiny font, and there were page breaks in arbitrary places. That is a skeuomorph: it's an incidental feature of an object that persists even in a new medium where the incidental feature no longer makes sense.

Anyway, I was scheduled to give a talk to the local Linux user group last week, and because of current conditions we tried doing it as a videoconference. I thought this went well!

We used Jitsi Meet, which I thought worked quite well, and which I recommend.

The usual procedure is for the speaker to have some sort of presentation materials, anachronistically called “slides”, which they display one at a time to the audience. In the Victorian age these were glass plates, and the image was projected on a screen with a slide projector. Later developments replaced the glass with celluloid or other transparent plastic, and then with digital projectors. In videoconferences, the slides are presented by displaying them on the speaker's screen, and then sharing the screen image to the audience.

This last development is skeuomorphic. When the audience is together in a big room, it might make sense to project the slide images on a shared screen. But when everyone is looking at the talk on their own separate screen anyway, why make them all use the exact same copy?

Instead, I published the slides on my website ahead of time, and sent the link to the attendees. They had the option to follow along on the web site, or to download a copy and follow along in their own local copy.

This has several advantages:

  1. Each audience person can adjust the monitor size, font size, colors to suit their own viewing preferences.

    With the screenshare, everyone is stuck with whatever I have chosen. If my font is too small for one person to read, they are out of luck.

  2. The audience can see the speaker. Instead of using my outgoing video feed to share the slides, I could share my face as I spoke. I'm not sure how common this is, but I hate attending lectures given by disembodied voices. And I hate even more being the disembodied voice. Giving a talk to people I can't see is creepy. My one condition to the Linux people was that I had to be able to see at least part of the audience.

  3. With the slides under their control, audience members can go back to refer to earlier material, or skip ahead if they want. Haven't you had the experience of having the presenter skip ahead to the next slide before you had finished reading the one you were looking at? With this technique, that can't happen.

Some co-workers suggested the drawback that it might be annoying to try to stay synchronized with the speaker. It didn't take me long to get in the habit of saying “Next slide, #18” or whatever as I moved through the talk. If you try this, be sure to put numbers on the slides! (This is a good practice anyway, I have found.) I don't know if my audience found it annoying.

The whole idea only works if you can be sure that everyone will have suitable display software for your presentation materials. If you require WalSoft AwesomePresent version 18.3, it will be a problem. But for the past 25 years I have made my presentation materials in HTML, so this wasn't an issue.

If you're giving a talk over videoconference, consider trying this technique.

[ Addendum: I should write an article about all the many ways in which the HTML has been a good choice. ]

[Other articles in category /talk] permanent link

Fri, 27 Mar 2020

Pauli chess

Last week Pierre-Françoys Brousseau and I invented a nice chess variant that I've never seen before. The main idea is: two pieces can be on the same square. Sometimes when you try to make a drasatic change to the rules, what you get fails completely. This one seemed to work okay. We played a game and it was fun.

Specfically, our rules say:

  1. All pieces move and capture the same as in standard chess, except:

  2. Up to two pieces may occupy the same square.

  3. A piece may move into an occupied square, but not through it.

  4. A piece moving into a square occupied by a piece of the opposite color has the option to capture it or to share the square.

  5. Pieces of opposite colors sharing a square do not threaten one another.

  6. A piece moving into a square occupied by two pieces of the opposite color may capture either, but not both.

  7. Castling is permitted, but only under the same circumstances as standard chess. Pieces moved during castling must move to empty squares.

Miscellaneous notes

Pierre-Françoys says he wishes that more than two pieces could share a square. I think it could be confusing. (Also, with the chess set we had, more than two did not really fit within the physical confines of the squares.)

Similarly, I proposed the castling rule because I thought it would be less confusing. And I did not like the idea that you could castle on the first move of the game.

The role of pawns is very different than in standard chess. In this variant, you cannot stop a pawn from advancing by blocking it with another pawn.

Usually when you have the chance to capture an enemy piece that is alone on its square you will want to do that, rather than move your own piece into its square to share space. But it is not hard to imagine that in rare circumstances you might want to pick a nonviolent approach, perhaps to avoid a stalemate.

Some discussion of similar variants is on Chess Stack Exchange.

The name “Pauli Chess”, is inspired by the Pauli exclusion principle, which says that no more than two electrons can occupy the same atomic orbital.

[Other articles in category /games] permanent link

Tue, 24 Mar 2020

git log --author=... confused me

Today I was looking for recent commits by co worker Fred Flooney, address, so I did

    git log --author=ffloo

but nothing came up. I couldn't remember if --author would do a substring search, so I tried

    git log --author=fflooney
    git log

and still nothing came up. “Okay,” I said, “probably I have Fred's address wrong.” Then I did

    git log --format=%ae | grep ffloo

The --format=%ae means to just print out commit author email addresses, instead of the usual information. This command did produce many commits with the author address

I changed this to

    git log --format='%H %ae' | grep ffloo

which also prints out the full hash of the matching commits. The first one was 542ab72c92c2692d223bfca4470cf2c0f2339441.

Then I had a perplexity. When I did

    git log -1 --format='%H %ae' 542ab72c92c2692d223bfca4470cf2c0f2339441

it told me the author email address was But when I did

    git show 542ab72c92c2692d223bfca4470cf2c0f2339441

the address displayed was

The answer is, the repository might have a file in its root named .mailmap that says “If you see this name and address, pretend you saw this other name and address instead.” Some of the commits really had been created with the address I was looking for, fflooney. But the .mailmap said that the canonical version of that address was fredf@. Nearly all Git operations use the canonical address. The git-log --author option searches the canonical address, and git-show and git-log, by default, display the canonical address.

But my --format=%ae overrides the default behavior; %ae explicitly requests the actual address. To display the canonical address, I should have used --format=%aE instead.

Also, I learned that --author= does not only a substring search but a regex search. I asked it for --author=d* and was puzzled when it produced commits written by people with no d. This is a beginner mistake: d* matches zero or more instances of d, and every name contains zero or more instances of d. (I had thought that the * would be like a shell glob.)

Also, I learned that --author=d+ matches only authors that contain the literal characters d+. If you want the + to mean “one or more” you need --author=d\+.

Thanks to Cees Hek, Gerald Burns, and Val Kalesnik for helping me get to the bottom of this.

The .mailmap thing is documented in git-check-mailmap.

[ Addendum: I could also have used git-log --no-use-mailmap ..., had I known about this beforehand. ]

[Other articles in category /prog] permanent link

Sun, 16 Feb 2020


Over on the other blog I said “Midichlorians predated The Phantom Menace.” No, the bacterium was named years after the movie was released.

Thanks to Eyal Joseph Minsky-Fenick and Shreevatsa R. for (almost simultaneously) pointing out this mistake.

[Other articles in category /oops] permanent link

Thu, 13 Feb 2020

Gentzen's rules for natural deduction

Here is Gerhard Gentzen's original statement of the rules of Natural Deduction (“ein Kalkül für ‘natürliche’, intuitionistische Herleitungen”):

from Gentzen's 1934 paper, titled “Die Schlußfiguren-Schemata”.  The
table is laid out in three lines, with the rules for ‘and’ and ‘or’,
then the rules for ‘exists’ and “for all’, and then the rules for
‘implies’, ‘not’, and ‘false’.  The variable names are written in
old-style German black-letter font, but otherwise the presentation is
almost identical to the modern form.

Natural deduction looks pretty much exactly the same as it does today, although the symbols are a little different. But only a little! Gentzen has not yet invented !!\land!! for logical and, and is still using !!\&!!. But he has invented !!\forall!!. The style of the !!\lnot!! symbol is a little different from what we use now, and he has that tent thingy !!⋏!! where we would now use !!\bot!!. I suppose !!⋏!! didn't catch on because it looks too much like !!\land!!. (He similarly used !!⋎!! to mean !!\top!!, but as usual, that doesn't appear in the deduction rules.)

We still use Gentzen's system for naming the rules. The notations “UE” and “OB” for example, stand for “und-Einführung” and “oder-Beseitigung”, which mean “and-introduction” and “or-elimination”.

Gentzen says (footnote 4, page 178) that he got the !!\lor, \supset, \exists!! signs from Russell, but he didn't want to use Russell's signs !!\cdot, \equiv, \sim, ()!! because they already had other meanings in mathematics. He took the !!\&!! from Hilbert, but Gentzen disliked his other symbols. Gentzen objected especially to the “uncomfortable” overbar that Hilbert used to indicate negation (“[Es] stellt eine Abweichung von der linearen Anordnung der Zeichen dar”). He attributes his symbols for logical equivalence (!!\supset\subset!!) and negation to Heyting, and explains that his new !!\forall!! symbol is analogous to !!\exists!!. I find it remarkable how quickly this caught on. Gentzen also later replaced !!\&!! with !!\land!!. Of the rest, the only one that didn't stick was !!\supset\subset!! in place of !!\equiv!!. But !!\equiv!! is much less important than the others, being merely an abbreviation.

Gentzen died at age 35, a casualty of the World War.

Source: Gerhard Gentzen, “Untersuchungen über das logische Schließen I”, pp. 176–210 Mathematische Zeitschrift v. 39, Springer, 1935. The display above appears on page 186.

[ Addendum 20200214: Thanks to Andreas Fuchs for correcting my German grammar. ]

[Other articles in category /math/logic] permanent link

Thu, 06 Feb 2020

Major screwups in mathematics: example 3

[ Previously: “Cases in which some statement S was considered to be proved, and later turned out to be false”. ]

In 1905, Henri Lebesgue claimed to have proved that if !!B!! is a subset of !!\Bbb R^2!! with the Borel property, then its projection onto a line (the !!x!!-axis, say) is a Borel subset of the line. This is false. The mistake was apparently noticed some years later by Andrei Souslin. In 1912 Souslin and Luzin defined an analytic set as the projection of a Borel set. All Borel sets are analytic, but, contrary to Lebesgue's claim, the converse is false. These sets are counterexamples to the plausible-seeming conjecture that all measurable sets are Borel.

I would like to track down more details about this. This Math Overflow post summarizes Lebesgue's error:

It came down to his claim that if !!{A_n}!! is a decreasing sequence of subsets in the plane with intersection !!A!!, the the projected sets in the line intersect to the projection of !!A!!. Of course this is nonsense. Lebesgue knew projection didn't commute with countable intersections, but apparently thought that by requiring the sets to be decreasing this would work.

[Other articles in category /math] permanent link

Tue, 28 Jan 2020

James Blaine keeps turning up

Today I learned that James Blaine (U.S. Speaker of the House, senator, perennial presidential candidate, and Secretary of State under Presidents Cleveland, Garfield, and Arthur; previously) was the namesake of the notorious “Blaine Amendments”. These are still an ongoing legal issue!

The Blaine Amendment was a proposed U.S. constitutional amendment rooted in anti-Catholic, anti-immigrant sentiment, at a time when the scary immigrant bogeymen were Irish and Italian Catholics.

The amendment would have prevented the U.S. federal government from providing aid to any educational institution with a religious affiliation; the specific intention was to make Catholic parochial schools ineligible for federal education funds. The federal amendment failed, but many states adopted it and still have it in their state constitutions.

Here we are 150 years later and this is still an issue! It was the subject of the 2017 Supreme Court case Trinity Lutheran Church of Columbia, Inc. v. Comer. My quick summary is:

  1. The Missouri state Department of Natural Resources had a program offering grants to licensed daycare facilities to resurface their playgrounds with shredded tires.

  2. In 2012, a daycare facility operated by Trinity Lutheran church ranked fifth out of 44 applicants according to the department’s criteria.

  3. 14 of the 44 applicants received grants, but Trinity Lutheran's daycare was denied, because the Missouri constitution has a Blaine Amendment.

  4. The Court found (7–2) that denying the grant to an otherwise qualified daycare just because of its religious affiliation was a violation of the Constitution's promises of free exercise of religion. (Full opinion)

It's interesting to me that now that Blaine is someone I recognize, he keeps turning up. He was really important, a major player in national politics for thirty years. But who remembers him now?

[Other articles in category /law] permanent link

Fri, 17 Jan 2020

Pylgremage of the Sowle

As Middle English goes, Pylgremage of the Sowle (unknown author, 1413) is much easier to read than Chaucer:

He hath iourneyed by the perylous pas of Pryde, by the malycious montayne of Wrethe and Enuye, he hath waltred hym self and wesshen in the lothely lake of cursyd Lechery, he hath ben encombred in the golf of Glotony. Also he hath mysgouerned hym in the contre of Couetyse, and often tyme taken his rest whan tyme was best to trauayle, slepyng and slomeryng in the bed of Slouthe.

I initially misread “Enuye” as “ennui”, understanding it as sloth. But when sloth showed up at the end, I realized that it was simpler than I thought, it's just “envy”.

[Other articles in category /book] permanent link

Thu, 16 Jan 2020

A serious proposal to exploit the loophole in the U.S. Constitution

In 2007 I described an impractical scheme to turn the U.S. into a dictatorship, or to make any other desired change to the Constitution, by having Congress admit a large number of very small states, which could then ratify any constitutional amendments deemed desirable.

An anonymous writer (probably a third-year law student) has independently discovered my scheme, and has proposed it as a way to “fix” the problems that they perceive with the current political and electoral structure. The proposal has been published in the Harvard Law Review in an article that does not appear to be an April Fools’ prank.

The article points out that admission of new states has sometimes been done as a political hack. It says:

Republicans in Congress were worried about Lincoln’s reelection chances and short the votes necessary to pass the Thirteenth Amendment. So notwithstanding the traditional population requirements for statehood, they turned the territory of Nevada — population 6,857 — into a state, adding Republican votes to Congress and the Electoral College.

Specifically, the proposal is that the new states should be allocated out of territory currently in the District of Columbia (which will help ensure that they are politically aligned in the way the author prefers), and that a suitable number of new states might be one hundred and twenty-seven.

[Other articles in category /law] permanent link

Tue, 14 Jan 2020

More about triple border points

[ Previously ]

A couple of readers wrote to discuss tripoints, which are places where three states or other regions share a common border point.

Doug Orleans told me about the Tri-States Monument near Port Jervis, New York. This marks the approximate location of the Pennsylvania - New Jersey - New York border. (The actual tripoint, as I mentioned, is at the bottom of the river.)

I had independently been thinking about taking a drive around the entire border of Pennsylvania, and this is just one more reason to do that. (Also, I would drive through the Delaware Water Gap, which is lovely.) Looking into this I learned about the small town of North East, so-named because it's in the northeast corner of Erie County. It's also the northernmost point in Pennsylvania.

(I got onto a tangent about whether it was the northeastmost point in Pennsylvania, and I'm really not sure. It is certainly an extreme northeast point in the sense that you can't travel north, east, or northeast from it without leaving the state. But it would be a very strange choice, since Erie County is at the very western end of the state.)

My putative circumnavigation of Pennsylvanias would take me as close as possible to Pennsylvania's only international boundary, with Ontario; there are Pennsylvania - Ontario tripoints with New York and with Ohio. Unfortunately, both of them are in Lake Erie. The only really accessible Pennsylvania tripoints are the one with West Virginia and Maryland (near Morgantown) and Maryland and Delaware (near Newark).

These points do tend to be marked, with surveyors’ markers if nothing else. Loren Spice sent me a picture of themselves standing at the tripoint of Kansas, Missouri, and Oklahoma, not too far from Joplin, Missouri.

While looking into this, I discovered the Kentucky Bend, which is an exclave of Kentucky, embedded between Tennessee and Missouri:

 Missouri is mostly north of
Tennessee, divided by the winding Mississippi River.  But the river
makes a hairpin turn, flowing north to New Madrid, MO, and then
turning sharply south again, leaving a narrow peninsula
protruding north from Tennessee… Except that the swollen northern end
of the peninsula is in Kentucky.  Its land border, to the south,
is with Tennessee, and its river borders, all around, are with

It appears that what happened here is that the border between Kentucky and Missouri is the river, with Kentucky getting the territory on the left bank, here the south side. And the border between Kentucky and Tennessee is a straight line, following roughly the 36.5 parallel, with Kentucky getting the territory north of the line. The bubble is south of the river but north of the line.

So these three states have not one tripoint, but three, all only a few miles apart!

Closeup of the three
tripoints, all at about the same latitude, where the line crosses the
winding Mississipi river in three places.

Finally, I must mention the Lakes of Wada, which are not real lakes, but rather are three connected subsets of the unit disc which have the property that every point on their boundaries is a tripoint.

[Other articles in category /misc] permanent link

Thu, 09 Jan 2020

Three Corners

I'm a fan of geographic oddities, and a few years back when I took a road trip to circumnavigate Chesapeake Bay, I planned its official start in New Castle, DE, which is noted for being the center of the only circular state boundary in the U.S.:

Map of
Delaware, showing that its northern border (with Pennsylvania) is an
arc of a circle; an adjoining map of just New Castle County has the
city of New Castle highlighted, showing that New Castle itself is at
the center of the circle.

The red blob is New Castle. Supposedly an early treaty allotted to Delaware all points west of the river that were within twelve miles of the State House in New Castle.

I drove to New Castle, made a short visit to the State House, and then began my road trip in earnest. This is a little bit silly, because the border is completely invisible, whether you are up close or twelve miles away, and the State House is just another building, and would be exactly the same even if the border were actually a semicubic parabola with its focus at the second-tallest building in Wilmington.

Whatever, I like going places, so I went to New Castle to check it out. Perhaps it was silly, but I enjoyed going out of my way to visit a point of purely geometric significance. The continuing popularity of Four Corners as a tourist destination shows that I'm not the only one. I don't have any plans to visit Four Corners, because it's far away, kinda in the middle of nowhere, and seems like rather a tourist trap. (Not that I begrudge the Navajo Nation whatever they can get from it.)

Four Corners is famously the only point in the U.S. where four state borders coincide. But a couple of weeks ago as I was falling asleep, I had the thought that there are many triple-border points, and it might be fun to visit some. In particular, I live in southeastern Pennsylvania, so the Pennsylvania-New Jersey-Delaware triple point must be somewhere nearby. I sat up and got my phone so I could look at the map, and felt foolish:

Map of the
Pennsylvania-New Jersey-Delaware triple border, about a kilometer
offshore from Marcus Hook, PA, further described below.

As you can see, the triple point is in the middle of the Delaware River, as of course it must be; the entire border between Pennsylvania and New Jersey, all the hundreds of miles from its northernmost point (near Port Jervis) to its southernmost (shown above), runs right down the middle of the Delaware.

I briefly considered making a trip to get as close as possible, and photographing the point from land. That would not be too inconvenient. Nearby Marcus Hook is served by commuter rail. But Marcus Hook is not very attractive as a destination. Having been to Marcus Hook, it is hard for me to work up much enthusiasm for a return visit.

But I may look into this further. I usually like going places and being places, and I like being surprised when I get there, so visting arbitrarily-chosen places has often worked out well for me. I see that the Pennsylvania-Delaware-Maryland triple border is near White Clay Creek State Park, outside of Newark, DE. That sounds nice, so perhaps I will stop by and take a look, and see if there really is white clay in the creek.

Who knows, I may even go back to Marcus Hook one day.

[ Addendum 20190114: More about nearby tripoints and related matters. ]

[Other articles in category /misc] permanent link

Wed, 08 Jan 2020

Unix bc command and its -l flag

In a recent article about Unix utilities, I wrote:

We need the -l flag on bc because otherwise it stupidly does integer arithmetic.

This is wrong, as was kindly pointed out to me by Luke Shumaker. The behavior of bc is rather more complicated than I said, and less stupid. In the application I was discussing, the input was a string like 0.25+0.37, and it's easy to verify that bc produces the correct answer even without -l:

   $ echo 0.25+0.37 | bc

In bc, each number is represented internally as !!m·10^{-s}!!, where !!m!! is in base 10 and !!s!! is called the “scale”, essentially the number of digits after the decimal point. For addition, subtraction, and multiplication, bc produces a result with the correct scale for an exact result. For example, when multiplying two numbers with scales a and b, the result always has scale a + b, so the operation is performed with exact precision.

But for division, bc doesn't know what scale it should use for the result. The result of !!23÷7!! can't be represented exactly, regardless of the scale used. So how should bc choose how many digits to retain? It can't retain all of them, and it doesn't know how many you will want. The answer is: you tell it, by setting a special variable, called scale. If you set scale=3 then !!23÷7!! will produce the result !!3.285!!.

Unfortunately, if you don't set it — this is the stupid part — scale defaults to zero. Then bc will discard everything after the decimal point, and tell you that !!23÷7 = 3!!.

Long, long ago I was in the habit of manually entering scale=20 at the start of every bc session. I eventually learned about -l, which, among other things, sets the default scale to 20 instead of 0. And I have used -l habitually ever since, even in cases like this, where it isn't needed.

Many thanks to Luke Shumaker for pointing this out. M. Shumaker adds:

I think the mis-recollection/understanding of -l says something about your "memorized trivia" point, but I'm not quite sure what.

Yeah, same.

[Other articles in category /oops] permanent link

Tue, 07 Jan 2020

Social classes identified by letters

Looking up the letter E in the Big Dictionary, I learned that British sociologists were dividing social classes into lettered strata long before Aldous Huxley did it in Brave New World (1932). The OED quoted F. G. D’Aeth, “Present Tendencies of Class Differentiation”, The Sociological Review, vol 3 no 4, October, 1910:

The present class structure is based upon different standards of life…

A. The Loafer
B. Low-skilled labour
C. Artizan
D. Smaller Shopkeeper and clerk
E. Smaller Business Class
F. Professional and Administrative Class
G. The Rich

The OED doesn't quote further, but D’Aeth goes on to explain:

A. represents the refuse of a race; C. is a solid, independent and valuable class in society. … E. possesses the elements of refinement; provincialisms in speech are avoided, its sons are selected as clerks, etc., in good class businesses, e.g., banking, insurance.

Notice that in D’Aeth's classification, the later letters are higher classes. According to the OED this was typical; they also quote a similar classification from 1887 in which A was the lowest class. But the OED labels this sort of classification, with A at the bottom, as “obsolete”.

In Brave New World, you will recall, it is the in the other direction, with the Alphas (administrators and specialists), at the top, and the Epsilons (menial workers with artificially-induced fetal alcohol syndrome) at the bottom.

The OED's later quotations, from 1950–2014, all follow Huxley in putting class A at the top and E at the bottom. They also follow Huxley in having only five classes instead of seven or eight. (One has six classes, but two of them are C1 and C2.)

I wonder how much influence Brave New World had on this sort of classification. Was anyone before Huxley dividing British society into five lettered classes with A at the top?

[ By the way, I have been informed that this paper, which I have linked above, is “Copyright © 2020 by The Sociological Review Publication Limited. All rights are reserved.” This is a bald lie. Sociological Review Publication Limited should be ashamed of themselves. ]

[Other articles in category /lang] permanent link

Fri, 03 Jan 2020

Benchmarking shell pipelines and the Unix “tools” philosophy

Sometimes I look through the HTTP referrer logs to see if anyone is talking about my blog. I use the f 11 command to extract the referrer field from the log files, count up the number of occurrences of each referring URL, then discard the ones that are internal referrers from elsewhere on my blog. It looks like this:

    f 11 access.2020-01-0* | count | grep -v plover

(I've discussed f before. The f 11 just prints the eleventh field of each line. It is essentially shorthand for awk '{print $11}' or perl -lane 'print $F[10]'. The count utility is even simpler; it counts the number of occurrences of each distinct line in its input, and emits a report sorted from least to most frequent, essentially a trivial wrapper around sort | uniq -c | sort -n. Civilization advances by extending the number of important operations which we can perform without thinking about them.)

This has obvious defects, but it works well enough. But every time I used it, I wondered: is it faster to do the grep before the count, or after? I didn't ever notice a difference. But I still wanted to know.

After years of idly wondering this, I have finally looked into it. The point of this article is that the investigation produced the following pipeline, which I think is a great example of the Unix “tools” philosophy:

        for i in $(seq 20); do 
          TIME="%U+%S" time \
             sh -c f 11 access.2020-01-0* | grep -v plover | count > /dev/null' \
               2>&1 | bc -l ;
        done | addup

I typed this on the command line, with no backslashes or newlines, so it actually looked like this:

        for i in $(seq 20); do TIME="%U+%S" time sh -c 'f 11 access.2020-01-0* | grep -v plover |count > /dev/null' 2>&1 | bc -l ; done | addup

Okay, what's going on here? The pipeline I actually want to analyze, with f | grep| count, is there in the middle, and I've already explained it, so let's elide it:

        for i in $(seq 20); do 
          TIME="%U+%S" time \
             sh -c '¿SOMETHING? > /dev/null' 2>&1 | bc -l ;
        done | addup

Continuing to work from inside to out, we're going to use time to actually do the timings. The time command is standard. It runs a program, asks the kernel how long the program took, then prints a report.

The time command will only time a single process (plus its subprocesses, a crucial fact that is inexplicably omitted from the man page). The ¿SOMETHING? includes a pipeline, which must be set up by the shell, so we're actually timing a shell command sh -c '...' which tells time to run the shell and instruct it to run the pipeline we're interested in. We tell the shell to throw away the output of the pipeline, with > /dev/null, so that the output doesn't get mixed up with time's own report.

The default format for the report printed by time is intended for human consumption. We can supply an alternative format in the $TIME variable. The format I'm using here is %U+%S, which comes out as something like 0.25+0.37, where 0.25 is the user CPU time and 0.37 is the system CPU time. I didn't see a format specifier that would emit the sum of these directly. So instead I had it emit them with a + in between, and then piped the result through the bc command, which performs the requested arithmetic and emits the result. We need the -l flag on bc because otherwise it stupidly does integer arithmetic. The time command emits its report to standard error, so I use 2>&1 to redirect the standard error into the pipe.

[ Addendum 20200108: We don't actually need -l here; I was mistaken. ]

Collapsing the details I just discussed, we have:

        for i in $(seq 20); do 
          (run once and emit the total CPU time)
        done | addup

seq is a utility I invented no later than 1993 which has since become standard in most Unix systems. (As with netcat, I am not claiming to be the first or only person to have invented this, only to have invented it independently.) There are many variations of seq, but the main use case is that seq 20 prints


Here we don't actually care about the output (we never actually use $i) but it's a convenient way to get the for loop to run twenty times. The output of the for loop is the twenty total CPU times that were emitted by the twenty invocations of bc. (Did you know that you can pipe the output of a loop?) These twenty lines of output are passed into addup, which I wrote no later than 2011. (Why did it take me so long to do this?) It reads a list of numbers and prints the sum.

All together, the command runs and prints a single number like 5.17, indicating that the twenty runs of the pipeline took 5.17 CPU-seconds total. I can do this a few times for the original pipeline, with count before grep, get times between 4.77 and 5.78, and then try again with the grep before the count, producing times between 4.32 and 5.14. The difference is large enough to detect but too small to notice.

(To do this right we also need to test a null command, say

    sh -c 'sleep 0.1 < /dev/null'

because we might learn that 95% of the reported time is spent in running the shell, so the actual difference between the two pipelines is twenty times as large as we thought. I did this; it turns out that the time spent to run the shell is insignificant.)

What to learn from all this? On the one hand, Unix wins: it's supposed to be quick and easy to assemble small tools to do whatever it is you're trying to do. When time wouldn't do the arithmetic I needed it to, I sent its output to a generic arithmetic-doing utility. When I needed to count to twenty, I had a utility for doing that; if I hadn't there are any number of easy workarounds. The shell provided the I/O redirection and control flow I needed.

On the other hand, gosh, what a weird mishmash of stuff I had to remember or look up. The -l flag for bc. The fact that I needed bc at all because time won't report total CPU time. The $TIME variable that controls its report format. The bizarro 2>&1 syntax for redirecting standard error into a pipe. The sh -c trick to get time to execute a pipeline. The missing documentation of the core functionality of time.

Was it a win overall? What if Unix had less compositionality but I could use it with less memorized trivia? Would that be an improvement?

I don't know. I rather suspect that there's no way to actually reach that hypothetical universe. The bizarre mishmash of weirdness exists because so many different people invented so many tools over such a long period. And they wouldn't have done any of that inventing if the compositionality hadn't been there. I think we don't actually get to make a choice between an incoherent mess of composable paraphernalia and a coherent, well-designed but noncompositional system. Rather, we get a choice between a incoherent but useful mess and an incomplete, limited noncompositional system.

(Notes to self: (1) In connection with Parse::RecDescent, you once wrote about open versus closed systems. This is another point in that discussion. (2) Open systems tend to evolve into messes. But closed systems tend not to evolve at all, and die. (3) Closed systems are centralized and hierarchical; open systems, when they succeed, are decentralized and organic. (4) If you are looking for another example of a successful incoherent mess of composable paraphernalia, consider Git.)

[ Addendum: Add this to the list of “weird mishmash of trivia”: There are two time commands. One, which I discussed above, is a separate executable, usually in /usr/bin/time. The other is built into the shell. They are incompatible. Which was I actually using? I would have been pretty confused if I had accidentally gotten the built-in one, which ignores $TIME and uses a $TIMEFORMAT that is interpreted in a completely different way. I was fortunate, and got the one I intended to get. But it took me quite a while to understand why I had! The appearance of the TIME=… assignment at the start of the shell command disabled the shell's special builtin treatment of the keyword time, so it really did use /usr/bin/time. This computer stuff is amazingly complicated. I don't know how anyone gets anything done. ]

[ Addenda 20200104: (1) Perl's module ecosystem is another example of a successful incoherent mess of composable paraphernalia. (2) Of the seven trivia I included in my “weird mishmash”, five were related to the time command. Is this a reflection on time, or is it just because time was central to this particular example? ]

[ Addendum 20200104: And, of course, this is exactly what Richard Gabriel was thinking about in Worse is Better. Like Gabriel, I'm not sure. ]

[Other articles in category /Unix] permanent link

Thu, 02 Jan 2020

A sticky problem that evaporated

Back in early 1995, I worked on an incredibly early e-commerce site.
The folks there were used to producing shopping catalogs for distribution in airplane seat-back pockets and such like, and they were going to try bringing a catalog to this World-Wide Web thing that people were all of a sudden talking about.

One of their clients was Eddie Bauer. They wanted to put up a product catalog with a page for each product, say a sweatshirt, and the page should show color swatches for each possible sweatshirt color.

“Sure, I can do that,” I said. “But you have to understand that the user may not see the color swatches exactly as you expect them to.” Nobody would need to have this explained now, but in early 1995 I wasn't sure the catalog folks would understand. When you have a physical catalog you can leaf through a few samples to make sure that the printer didn't mess up the colors.

But what if two months down the line the Eddie Bauer people were shocked by how many complaints customers had about things being not quite the right color, “Hey I ordered mulberry but this is more like maroonish.” Having absolutely no way to solve the problem, I didn't want to to land in my lap, I wanted to be able to say I had warned them ahead of time. So I asked “Will it be okay that there will be variations in how each customer sees the color swatches?”

The catalog people were concerned. Why wouldn't the colors be the same? And I struggled to explain: the customer will see the swatches on their monitor, and we have no idea how old or crappy it might be, we have no idea how the monitor settings are adjusted, the colors could be completely off, it might be a monochrome monitor, or maybe the green part of their RGB video cable is badly seated and the monitor is displaying everything in red, blue, and purple, blah blah blah… I completely failed to get the point across in a way that the catalog people could understand.

They looked more and more puzzled, but then one of them brightened up suddenly and said “Oh, just like on TV!”

“Yes!” I cried in relief. “Just like that!”

“Oh sure, that's no problem.” Clearly, that was what I should have said in the first place, but I hadn't thought of it.

I no longer have any idea who it was that suddenly figured out what Geek Boy's actual point was, but I'm really grateful that they did.

[Other articles in category /tech] permanent link