The Universe of Discourse

Fri, 22 May 2009

A child is bitten by a dog every 0.07 seconds...
I read in the newspaper today that letter carriers were bitten by dogs 3,000 times last year. (Curiously, this is not a round number; it is exact.) The article then continued: "children ... are 900 times more likely to be bitten than letter carriers."

This is obviously nonsense, because suppose the post office employs half a million letter carriers. (The actual number is actually about half that, but we are doing a back-of-the-envelope estimate of plausibility.) Then the bite rate is six bites per thousand letter carriers per year, and if children are 900 times more likely to be bitten, they are getting bitten at a rate of 5,400 bites per thousand children per year, or 5.4 bites per child. Insert your own joke here, or use the prefabricated joke framework in the title of this article.

I wrote to the reporter, who attributed the claim to the Postal Bulletin 22258 of 7 May 2009. It does indeed appear there. I am trying to track down the ultimate source, but I suspect I will not get any farther. I have discovered that the "900 times" figure appears in the Post Office's annual announcements of Dog Bite Prevention Month as far back as 2004, but not as far back as 2002.

Meantime, what are the correct numbers?

The Centers for Disease Control and Prevention have a superb on-line database of injury data. It immediately delivers the correct numbers for dog bite rate among children:

AgeNumber of
PopulationRate per
0 2,302 4,257,020 54.08
1 7,100 4,182,171 169.77
2 10,049 4,110,458 244.47
3 10,355 4,111,354 251.86
4 9,920 4,063,122 244.15
5 7,915 4,031,709 196.32
6 8,829 4,089,126 215.91
7 6,404 3,935,663 162.72
8 8,464 3,891,755 217.48
9 8,090 3,901,375 207.36
10 7,388 3,927,298 188.11
11 6,501 4,010,171 162.11
12 7,640 4,074,587 187.49
13 5,876 4,108,962 142.99
14 4,720 4,193,291 112.56
15 5,477 4,264,883 128.42
16 4,379 4,334,265 101.03
17 4,459 4,414,523 101.01
Total 133,560 82,361,752 162.16

According to the USPS 2008 Annual Report, in 2008 the USPS employed 211,661 city delivery carriers and 68,900 full-time rural delivery carriers, a total of 280,561. Since these 280,561 carriers received 3,000 dog bites, the rate per 100,000 carriers per year is 1069.29 bites.

So the correct statistic is not that children are 900 times more likely than carriers to be bitten, but rather that carriers are 6.6 times as likely as children to be bitten, 5.6 times if you consider only children under 13. Incidentally, your toddler's chance of being bitten in the course of a year is only about a quarter of a percent, ceteris paribus.

Where did 900 come from? I have no idea.

There are 293 times as many children as there are letter carriers, and they received a total of 44.5 times as many bites. The "900" figure is all over the Internet, despite being utterly wrong. Even with extensive searching, I was not able to find this factoid in the brochures or reports of any other reputable organization, including the American Veterinary Medical Association, the American Academy of Pediatrics, the Centers for Disease Control and Prevention, or the Humane Society of the Uniited States. It appears to be the invention of the USPS.

Also in the same newspaper, the new Indian restaurant on Baltimore avenue was advertising that they "specialize in vegetarian and non-vegetarian food". It's just a cornucopia of stupidity today, isn't it?

[Other articles in category /math] permanent link

Thu, 21 May 2009

The number of crank dissertations
 dealing with the effects of sunspots
on political and economic events
 peaks every eleven years.

[Other articles in category /misc] permanent link

Wed, 20 May 2009

No flimping
Advance disclaimer: I am not a linguist, have never studied linguistics, and am sure to get some of the details wrong in this article. Caveat lector.

There is a standard example in linguistics that is attached to the word "flimp". The idea it labels is that certain grammatical operations are restricted in the way they behave, and cannot reach deeply into grammatical structures and rearrange them.

For instance, you can ask "What did you use to see the girl on the hill in the blue dress?" and I can reply "I used a telescope to see the girl on the hill in the blue dress". Here "the girl on the hill in the blue dress" is operating as a single component, which could, in principle, be arbitrarily long. ("The girl on the hill that was fought over in the war between the two countries that have been at war since the time your mother saw that monkey climb the steeple of the church...") This component can be extracted whole from one sentence and made the object of a new sentence, or the subject of some other sentence.

But certain other structures are not transportable. For example, in "Bill left all his money to Fred and someone", one can reach down as far as "Fred and someone" and ask "What did Bill leave to Fred and someone?" but one cannot reach all the way down to "someone" and ask "Who did Bill leave all his money to Fred and"?

Under certain linguistic theories of syntax, analogous constraints rule out the existence of certain words. "Flimped" is the hypothetical nonexistent word which, under these theories, cannot exist. To flimp is to kiss a girl who is allergic to. For example, to flimp coconuts is to kiss a girl who is allergic to coconuts. (The grammatical failure in the last sentence but one illustrates the syntactic problem that supposedly rules out the word "flimped".

I am not making this up; for more details (from someone who, unlike me, may know what he is talking about) See Word meaning and Montague grammar by David Dowty, p. 236. Dowty cites the earlier sources, from 1969–1973 who proposed this theory in the first place. The "flimped" example above is exactly the same as Dowty's, and I believe it is the standard one.

Dowty provides a similar, but different example: there is not, and under this theory there cannot be, a verb "to thork" which means "to lend your uncle and", so that "John thorked Harry ten dollars" would mean "John lent his uncle and Harry ten dollars".

I had these examples knocking around in my head for many years. I used to work for the University of Pennsylvania Computer and Information Sciences department, and from my frequent contacts with various cognitive-science types I acquired a lot of odds and ends of linguistic and computational folklore. Michael Niv told me this one sometime around 1992.

The "flimp" thing rattled around my head, surfacing every few months or so, until last week, when I thought of a counterexample: Wank.

The verb "to wank to" means "to rub one's genitals while considering", and so seems to provide a countexample to the theory that says that verbs of this type are illegal in English.

When I went to investigate, I found that the theory had pretty much been refuted anyway. The Dowty book (published 1979) produced another example: "to cuckold" is "to have sexual intercourse with the woman who is married to".

Some Reddit person recently complained that one of my blog posts had no point. Eat this, Reddit person.

[Other articles in category /lang] permanent link

Sun, 17 May 2009

Bipartite matching and same-sex marriage
My use of the identifiers husband and wife in Thursday's example code should not be taken as any sort of political statement against same-sex marriage. The function was written as part of a program to solve the stable bipartite matching problem. In this problem, which has historically been presented as concerning "marriage", there are two disjoint equinumerous sets, which we may call "men" and "women". Each man ranks the women in preference order, and each woman ranks the men in preference order. Men are then matched to women. A matching is "stable" if there is no man m and no woman w such that m and w both prefer each other to their current partners. A theorem of Gale and Shapley guarantees the existence of a stable matching and provides an algorithm to construct one.

However, if same-sex marriages are permitted, there may not be a stable matching, so the character of the problem changes significantly.

A minimal counterexample is:

A prefers: B C X
B prefers: C A X
C prefers: A B X
X prefers: A B C

Suppose we match AB, CX. Then since B prefers C to A, and C prefers B to X, B and C divorce their mates and marry each other, yielding BC, AX.

But now C can improve her situation further by divorcing B in favor of A, who is only too glad to dump the miserable X. The marriages are now AC, BX.

B now realizes that his first divorce was a bad idea, since he thought he was trading up from A to C, but has gotten stuck with X instead. So he reconciles with A, who regards the fickle B as superior to her current mate C. The marriages are now AB, CX, and we are back where we started, having gone through every possible matching.

This should not be taken as an argument against same-sex marriage. The model fails to generate the following obvious real-world solution: A, B, and C should all move in together and live in joyous tripartite depravity, and X should jump off a bridge.

[Other articles in category /math] permanent link

Fri, 15 May 2009

"Known to Man" and the advent of the space aliens
Last week I wrote about how I was systematically changing the phrase "known to man" to just "known" in Wikipedia articles.

Two people so far have written to warn me that I would regret this once the space aliens come, and I have to go around undoing all my changes. But even completely leaving aside Wikipedia's "Wikipedia is not a crystal ball" policy, which completely absolves me from having to worry about this eventuality, I think these people have not analyzed the situation correctly. Here is how it seems to me.

Consider these example sentences:

  • Diamond is the hardest substance known to man.
  • Diamond is the hardest substance known.

There are four possible outcomes for the future:

  • Aliens reveal superhardium, a substance harder than diamond.
  • Aliens exist, but do not know about superhardium.
  • The aliens do not turn up, but humans discover superhardium on their own.
  • No aliens and no superhardium.

In cases (1) and (3), both sentences require revision.

In case (4), neither sentence requires revision.

But in case (2), sentence (a) requires revision, while (b) does not. So my change is a potential improvement in a way I had not appreciated.

Also in last week's article, I said it would be nice to find a case where a Wikipedia article's use of "known to man" actually intended a contrast with divine or feminine knowledge, rather than being a piece of inept blather. I did eventually find such a case: the article on runic alphabet says, in part:

In the Poetic Edda poem Rígþula another origin is related of how the runic alphabet became known to man. The poem relates how Ríg, identified as Heimdall in the introduction, ...

I gratefully acknowledge the gift of Thomas Guest. Thank you!

[Other articles in category /aliens] permanent link

Thu, 14 May 2009

Product types in Java
Recently I wanted a Java function that would return two Person objects. Java functions return only a single value. I could, of course, make a class that encapsulates two Persons:

        class Persons2 {
          Person personA, personB;

          Persons2(Person a, Person b) {
            personA = a; personB = b;

          Person getPersonA() { return personA; }
Java is loathsome in its verbosity, and this sort of monkey code is Java's verbosity at its most loathsome. So I did not do this.

Haskell functions return only one value also, but this is no limitation, because Haskell has product types. And starting in Java 5, the Java type system is a sort of dented, bolted-on version of the type systems that eventually evolved into the Haskell type system. But product types are pretty simple. I can make a generic product type in Java:

        class Pair<A,B> {
          A a;  B b;

          Pair(A a, B b) { this.a = a; this.b = b; }

          A fst() { return a; }
          B snd() { return b; }
Then I can declare my function to return a Pair<Person,Person>:

        Pair<Person,Person> findMatch() {
          return new Pair(husband, wife);
Okay, that worked just fine. The boilerplate is still there, but you only have to do it once. This trick seems sufficiently useful that I can imagine that I will use it again, and that someone else reading this will want to use it too.

I've been saying for a while that up through version 1.4, Java was a throwback to the languages of the 1970s, but that with the introduction of generics in Java 5, it took a giant step forward into the 1980s. I think this is a point of evidence in favor of that claim.

I wonder why this class isn't in the standard library. I was not the first person to think of doing this; web search turns up several others, who also wonder why this class isn't in the standard library.

I wrote a long, irrelevant coda regarding my use of the identifiers husband and wife in the example, but, contrary to my usual practice, I will publish it another day.

[ Addendum 20090517: Here's the long, irrelevant coda. ]

I gratefully acknowledge the gift of Petr Kiryakov. Thank you!

[Other articles in category /prog/java] permanent link

Fri, 08 May 2009

Most annoying phrase known to man?
I have been wasting time, those precious minutes of my life that will never return, by eliminating the odious phrase "known to man" from Wikipedia articles. It is satisfying, in much the same way as doing the crossword puzzle, or popping bubble wrap.

In the past I have gone on search-and-destroy missions against certain specific phrases, for example "It should be noted that...", which can nearly always be replaced with "" with no loss of meaning. But "known to man" is more fun.

One pleasant property of this phrase is that one can sidestep the issue of whether "man" is gender-neutral. People on both sides of this argument can still agree that "known to man" is best replaced with "known". For example:

  • The only albino gorilla known to man...
  • The most reactive and electronegative substance known to man...
  • Copper and iron were known to man well before the copper age and iron age...
In examples like these, "to man" is superfluous, and one can delete it with no regret.

As a pleonasm and a cliché, "known to man" is a signpost to prose that has been written by someone who was not thinking about what they were saying, and so one often finds it amid other prose that is pleonastic and clichéd. For example:

Diamond ... is one of the hardest naturally occurring material known (another harder substance known today is the man-made substance aggregated diamond nanorods which is still not the hardest substance known to man).
Which I trimmed to say:

Diamond ... is one of the hardest naturally-occurring materials known. (Some artificial substances, such as aggregated diamond nanorods, are harder.)
Many people ridicule Strunk and White's fatuous advice to "omit needless words"—if you knew which words were needless, you wouldn't need the advice—but all editors know that beginning writers will use ten words where five will do. The passage above is a good example.

Can "known to man" always be improved by replacement with "known"? I might have said so yesterday, but I mentioned the issue to Yaakov Sloman, who pointed out that the original use was meant to suggest a contrast not with female knowledge but with divine knowledge, an important point that completely escaped my atheist self. In light of this observation, it was easy to come up with a counterexample: "His acts descended to a depth of evil previously unknown to man" partakes of the theological connotations very nicely, I think, and so loses some of its force if it is truncated to "... previously unknown". I suppose that many similar examples appear in the work of H. P. Lovecraft.

It would be nice if some of the Wikipedia examples were of this type, but so far I haven't found any. The only cases so far that I haven't changed are all direct quotations, including several from the introductory narration of The Twilight Zone, which asserts that "There is a fifth dimension beyond that which is known to man...". I like when things turn out better than I expected, but this wasn't one of those times. Instead, there was one example that was even worse than I expected. Bad writing it may be, but the wrongness of "known to man" is at least arguable in most cases. (An argument I don't want to make today, although if I did, I might suggest that "titanium dioxide is the best whitening agent known to man" be rewritten as "titanium dioxide is the best whitening agent known to persons of both sexes with at least nine and a half inches of savage, throbbing cockmeat.") But one of the examples I corrected was risibly inept, in an unusual way:

Wonder Woman's Amazon training also gave her limited telepathy, profound scientific knowledge, and the ability to speak every language known to man.
I have difficulty imagining that the training imparted to Diana, crown princess of the exclusively female population of Paradise Island, would be limited to languages known to man.

Earle Martin drew my attention to the Wikipedia article on "The hardest metal known to man". I did not dare to change this.

[ Addendum 20090515: There is a followup article. ]

[Other articles in category /lang] permanent link