| The Universe of Discourse | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
12 recent entries Archive:
In this section: Comments disabled |
Thu, 31 Jan 2008
Unnecessary imprecision
McCain has won all of the state's 57 delegates, and the last primary before voters in more than 20 states head to the polls next Tuesday.Why "more than 20 states"? Why not just say "23 states", which is shorter and conveys more information? I'm not trying to pick on CTV here. A Google News search finds 42,000 instances of "more than 20", many of which could presumably be replaced with "26" or whatever. Well, I had originally written "most of which", but then I looked at some examples, and found that the situation is better than I thought it would be. Here are the first ten matches:
#2 may be legitimate, if the number of cases of aerial espionage is not known with certitude, or if the anonymous source really did say "more than 20". Similarly #4 is entirely off the hook since it is a quotation. #3 may be legitimate if the price of farmland is uncertain and close to 20%. #5 is probably a loser. #7 is definitely a loser: it was the headline of an article that began "Nine people were killed and at least 22 injured when...". The headline could certainly have been "9 killed, 22 injured in bus accident". #8 and #9 are losers, but they are the same example with which I began the article, so they don't count. #10 is a loser. So I have, of eight examples (disregarding #8 and #9) three certain or near-certain failures (#5, #7, and #10), one certain non-failure (#4), and four cases to which I am willing to extend the benefit of the doubt. This is not as bad as I feared. I like when things turn out better than I thought they would. But I really wonder what is going on with all these instances of "more than 20 states". Is it just sloppy writing? Or is there some benefit that I am failing to appreciate?
[Other articles in category /lang] permanent link
Ramanujan's congruences
Ramanujan's congruences state that:
Looking at this, anyone could conjecture that p(13k+7) = 0 (mod 13), but it isn't so; p(7) = 15 and p(20) = 48·13+3. But there are other such congruences. For example, according to Partition Congruences and the Andrews-Garvan-Dyson Crank:
[Other articles in category /math] permanent link Wed, 30 Jan 2008
More on risk
One big problem with the Reddit posting is that the guy who posted it there titled the post on risk, or why poor people might not be stupid to play the lottery. So a lot of the Reddit comments complained that I had failed to prove that poor people must not be stupid to play the lottery, or that I was wrong on that point. They argued that the dollar cost of a lottery ticket is more valuable to a poor person than to a rich one, and so on. But I didn't say anything about poor people. People read this into the article based on the title someone else had attached to it, and they couldn't get rid of this association even after I pointed out that the article had nothing to say about poor people. Something I do a lot, in this blog, and in life, is point out fallacious arguments. You get some argument that X is true, because of P and Q and therefore R, and then I'll come along point out that P is false and Q is irrelevant, and anyway they don't imply R, and even if they did, you can't conclude X from R, because if you could, then you could also conclude Y and Z which are obviously false. For example, in a recent article I addressed the argument that:
You can double your workforce participation from 27% to 51% of the population, as Singapore did; you can't double it again.The argument being that you can't double a participation of 51% because you can't possibly have 102% workforce participation. (Peter Norvig pointed out that he made the same argument in a different context back in 1999.) But the argument here fails, for reasons I won't go into again. This doesn't mean that I believe that Singapore's workforce participation will double again. Just because I point out that an argument for X is fallacious doesn't mean that I believe X is false. The "risk" article was one of those. I wanted to refute one specific argument, which is that (a) the expected return on a lottery ticket is negative, so therefore (b) it's stupid to buy lottery tickets. My counter-argument was to point out that (a) the expected return on fire insurance is negative, but that you can't conclude that therefore (b) it's stupid to buy fire insurance. It might be stupid to buy lottery tickets, but if it is, it's not because the expected return is negative. Or at least it's not only because the expected return is negative. There must be more to it than that. I really like that pattern of argument, and I use it a lot: A can't imply B, because if it did, then it would also imply B', and B' is false, or at least B' is a belief held only by dumbasses. None of this addresses the question of whether or not I think it's stupid to buy lottery tickets. I have not weighed in on that matter. My only argument is that the argument from expected value is insufficient to prove the point. People have a lot of trouble with second-order arguments like this, though. If I argue "that argument against B is no good," they are likely to hear it as an argument in favor of B. Several of the Reddit people made this mistake. The converse mistake is to interpret "that argument against B is no good, because it can be converted into an argument against B'" as an argument against B'! Some of the Reddit people made this mistake too, and disdainfully explained to me why buying fire insurance is not stupid. Another problem with the article was that it followed my usual pattern of meandering digression. Although the main point of the article was to refute the argument from expected value, I threw in a bunch of marginally related stuff that I thought was fun and interesting: the stuff about estimating the value one ascribes to one's own life; the stuff about the surprisingly high chance of being killed by a meteor strike. Email correspondents and Reddit commenters mistook both of these for arguments about the lottery, and tried to refute them as such. Well, I have nobody to blame but myself for that. If you present a muddled, miscellaneous article, you can't complain when other people are confused by it. If I were going to do the article again, one thing I'd try to fix is the discussion of utility. I think my biggest screwup was to confuse two things that are not the same. One is the utility, which decreases for larger amounts of money; your second million dollars has less value than your first million. But another issue, which I didn't separate in my mind, was the administration cost of money. There must be a jargon term for this, but I don't know what it is. Economists like to pretend that money is perfectly fungible, and this is a reasonable simplifying assumption in most cases. But it's easy to prove that money isn't perfectly fungible. Imagine you've just won a prize. You can have one thousand dollars paid in hundred-dollar bills, or you can have a thousand and one dollars, paid in pennies. Anyone who really believes that money is perfectly fungible will take the pennies, even though they weigh six hundred pounds, because that way they get the one-dollar bonus. Money has a physical manifestation, even when it's just numerals written in a ledger somewhere, and managing the physical manifestation of money has an associated cost. The cost of managing a penny is a significant fraction of the value of the penny, to the point that many people throw away pennies or dump them in jars just to avoid the cost of dealing with them. In some circumstances, like the lottery ticket purchase, the non-fungibility of money is important. Blowing one dollar on a lottery that pays a thousand dollars is not the same as blowing a thousand dollars on a lottery that pays a million dollars, and it's not the same as blowing your whole paycheck on a big stack of lottery tickets. Partly it's the risk issue, and partly it's this other issue, that I don't know the name of, that a single dollar is worth less than one one-thousandth of a thousand dollars, because the cost to administer and manage it is proportionately higher. I didn't make this clear in the original article because it wasn't clear in my mind. Oh well, I'm not yet a perfect sage. One last point that has come up is that a couple of people have written to me to say that they would not take the Russian roulette bet for any amount of money at any odds. (Here's a blog post to that effect, for example.) One person even suggested that I only assumed he would take the bet at some odds because I'm an American, and I can't conceive of anyone refusing a big pot of money. Well, maybe that's true, but I don't think that's why I assumed that everyone would take the bet for some amount of money. I assumed it because that is what I have observed people to do. I now know there are people who say that they would not play Russian roulette at any odds for any payoff. And I think those people are fooling themselves. If you think you're one of those people, I have this question for you: Do you own a bicycle helmet? And if you do, did you buy the very top-of-the-line helmet? Or did you buy a mid-price model that might offer less protection? What, just to save money? I offered you a million dollars at million-to-one odds. Do you think that fifty dollars you saved on your bicycle helmet is paying you off for less risk than my million-to-one Russian roulette bet? Well, maybe you don't own a bicycle, so you think you have no need of a helmet. But if the people who wrote to me were as risk-averse as some of them said they were, the lack of a bicycle wouldn't stop them from wearing helmets all the time anyway—another reason I think they are fooling themselves. I've met some of these people, and they don't go around in helmets and padded armor all the time. Or maybe you do own the very safest helmet money can buy, since you have only one head, after all. But I bet you can find some other example? Have you ever flown in a plane? Did you refuse to fly anywhere not served by Qantas, like Raymond in Rain Man, because every other airline has had a crash? If you had a choice to pay double to fly with Qantas, would you take it? Or would you take the cheap flight and ignore the risk? One comment that replies to the blog I cited above really hits the nail on the head, I think. It says: "you don't get paid a million dollars to get in your car and drive somewhere, but what are the chances you'll be killed in an auto accident?" My Russian roulette game is a much better deal than driving your car. I'm going to end this article, as I did the last one, with an amusing anecdote about risk. My great-uncle Robert E. Machol was for a time the chief scientist of the Federal Aviation Administration. The regulations for infant travel were (and still are) that an infant may make an air trip on its parent's lap; parents do not need to buy a separate ticket and a seat for the infant. In one air disaster, an infant that was being held on its parent's lap was thrown loose, hurtled to the end of the corridor, and died. The FAA was considering changing the rules for infants to require that they purchase a separate ticket, entitling them to their own seat, into which would be installed an FAA-approved safety car seat. Infants in their own restraint seats would be much safer than those held on their parents' laps. Dr. Machol argued against this rule change, on the following grounds: If parents are required to buy separate tickets for their infants, air travel will be more expensive for them. As a result, some families will opt to take car trips instead of plane trips. Car trips are much more dangerous than plane trips; the fatalities per passenger per mile are something like twenty times higher. More babies can be expected to be killed in the resulting auto crashes than can be expected to be saved by the restraint seat requirement. As before, this is not intended as an argument for or against anything in particular, except perhaps that the idea of risk is complex and hard to understand. Probably people will try to interpret it as an argument about the fungibility of money, or whatever the next Reddit person decides to put in the article title. You'd think I would have learned my lesson by now, but, as I said, I'm not yet a perfect sage.
[Other articles in category ] permanent link
Nonstandard adjectives in mathematics
The property is not really attached to the adjective itself. Red emeralds are not emeralds, so "red" is nonstandard when applied to emeralds. Fake expressions of sympathy are still expressions of sympathy, however insincere. "Toy" often goes both ways: a toy fire engine is not a fire engine, but a toy ball is a ball and a toy dog is a dog. Adjectives in mathematics are rarely nonstandard. An Abelian group is a group, a second-countable topology is a topology, an odd integer is an integer, a partial derivative is a derivative, a well-founded order is an order, an open set is a set, and a limit ordinal is an ordinal. When mathematicians want to express that a certain kind of entity is similar to some other kind of entity, but is not actually some other entity, they tend to use compound words. For example, a pseudometric is not (in general) a metric. The phrase "pseudo metric" would be misleading, because a "pseudo metric" sounds like some new kind of metric. But there is no such term. But there is one glaring exception. A partial function is not (in general) a function. The containment is in the other direction: all functions are partial functions, but not all partial functions are functions. The terminology makes more sense if one imagines that "function" is shorthand for "total function", but that is not usually what people say. If I were more quixotic, I would propose that partial functions be called "partialfunctions" instead. Or perhaps "pseudofunctions". Or one could go the other way and call them "normal relations", where "normal" can be replaced by whatever adjective you prefer—ejective relations, anyone? I was about to write "any of these would be preferable to the current confusion", but actually I think it probably doesn't matter very much. [ Addendum 20080201: Another example, and more discussion of "partial". ] [ Addendum 20081205: A contravariant functor is not a functor. ] [ Addendum 20090121: A hom-set is not a set. ]
[Other articles in category /math] permanent link Tue, 29 Jan 2008
The Census Bureau's data file
The data is available from the Census Bureau's web site. It is a CSV file. Most of the file contains actual data, like this:
20220,,"Dubuque, IA",Metropolitan Statistical Area,"92,384","91,603","91,223","90,635","89,571","89,216","89,265","89,156","89,143"Experienced data mungers will feel a sense of foreboding as they look at the commas in those numerals. Commas are for people, and if the data file is written for people, rather than for computers, then getting the computer to read it is going to require at least a little bit of suffering. Indeed, the rest of the data is rather dirty. There is a useless header: table with row headers in column A and column headers in rows 3 through 4 (leading dots indicate sub-parts),,,,,,,,,,,,^M "Table 1. Annual Estimates of the Population of Metropolitan and Micropolitan Statistical Areas: April 1, 2000 to July 1, 2006",,,,,,,,,,,,^M CBSA Code,"Metro Division Code",Geographic area,"Legal/statistical area description",Population estimates,,,,,,,"April 1, 2000",^M ,,,,"July 1, 2006","July 1, 2005","July 1, 2004","July 1, 2003","July 1, 2002","July 1, 2001","July 1, 2000",Estimates base,Census^M ,,Metropolitan statistical areas,,,,,,,,,,^MAnd there is a similarly useless footer on the bottom of the file. Any program that wants to use this data has to trim off the header and the footer, or ignore them, or the user will have to trim them off manually. (I've translated ASCII CR characters to ^M sequences so that you can see that although the lines of the file are CR-LF terminated, some of the items contain extra LFs for no particular reason.) Well, all this is minor. My real complaint is that some of the state name abbreviations are garbled:
19740,,"Denver-Aurora, CO1",Metropolitan Statistical Area,"2,408,750","2,361,778","2,326,126","2,299,879","2,276,592","2,245,030","2,193,737","2,179,320","2,179,240"
Notice that it says CO1 rather than CO, short for
"Colorado". I was fortunate to notice this garbling. Since it
occurred on the line for Denver (among others) the result was that the
program was unable to locate the population of Denver, which is the
capital of Colorado, and a mandatory part of the program's output. So
it raised a warning. Then I went in and manually corrected the
CO1 to say CO. I also added a check to the program
to make sure that it recognized all the state abbreviations; I should
have had this in there in the first place.Then I sent email to an acquaintance who works for the Census Bureau (identity suppressed to protect the innocent), pointing out the errors so that they could be corrected. My contact checked with the people who produced the data, and informed me that, according to them, CO1 was not an error. Rather, the 1 was a footnote mark, directing me to a footnote at the bottom of the file:
"1Broomfield, CO was formed from parts of Adams, Boulder, Jefferson, and Weld Counties, CO on November 15, 2001 and was coextensive with Broomfield city.",,,,,,,,,,,,^M "For purposes of presenting data for metropolitan and micropolitan statistical areas for Census 2000, Broomfield is treated as if it were a county at the time of the 2000 census.",,,,,,,,,,,,^MA footnote.
I would like to suggest the following as a basic principle of computerized data processing:
Data files should contain data. Not metadata. Not explanations. Not little essays. And not footnotes. Just the data.There's a larger issue here about confusing content and presentation. But "Data files should contain data" is simpler and easier to remember. I suspect that this file was exported from a spreadsheet program, probably Excel. Spreadsheet programs desperately want you to confuse content and presentation. This is why one should not use a spreadsheet as a database. I now recall another occasion when I had to deal with data that was exported from a spreadsheet that was pretending to be a database. It was a database of products made by a large cosmetics company. A typical record looked like this:
"Soft-Pressed Powder Blusher","618J-05","Warm, natural-looking powder colour for all skins. Wide range of shades-subtle to vibrant. With applicator brush.","Cheeks","Nudes","Chestnut Blush","All","","19951201","Yes","","14.5",""The 618J-05 here is a product code. Bonus points if you see what's coming next.
"Water-Dissolve Cream Cleanser","6.61E+01","Creamy cleanser for drier, more sensitive skins. Dissolves even the most tenacious makeups.","Cleansers","","","","Sub I, I, II","19951201","Yes","1","14.5",""
That 6.61E+01 should have been 661E-01, but Excel
decided that it was a numeral, in scientific notation, and put it into
normal form.Back to the Census Bureau, which almost screwed me by putting a footnote on a state name. What if they had decided to put footnotes on the population figures? Then I would have been really screwed, because it would have been completely undetectable. No, wait! It's all become clear. That's why they put the commas in the numerals! [ Addendum 20080129: My Census Bureau contact tells me that the authors of the data file have seen the wisdom of my point of view, in spite of my unconstructive and unhelpful feedback (I said "Wow, that is an incredibly terrible idea") and are planning to address the issue in the next release of the data. Hooray for happy endings! ] [ Addendum 20080129: My Census Bureau contact tells me that they do sometimes put footnotes on the data items, so don't laugh too hard at my remark about the commas. ]
[Other articles in category /misc] permanent link Wed, 23 Jan 2008
Smallest state capitals
At the other end of the scale, of course, we have state capitals like Boston, Denver, Atlanta, and Honolulu that are their state's largest cities. For these states, the population quotient is 1, its theoretical minimum. Well, James, it only took me thirty years, but here it is. I tried to resolve the question manually a few weeks ago, by browsing Wikipedia for the populations of likely candidates. Today I took a more methodical approach, downloading the U.S. Census Bureau's July 2006 estimates for populations of metropolitan areas, and writing a couple of little programs to grovel the data. I had to augment the Census Bureau's data with two items: Annapolis, MD, and Montpelier, VT are not large enough to be included in the metropolitan area data file. I used U.S. Census 2006 estimates for these cities as well. I discarded one conurbation: the Census Bureau includes a "Metropolitan Division" in New Hampshire that consists of Rockingham and Strafford counties; this was the most populous identified area in New Hampshire. It didn't seem entirely germane to the question, so I took it out. On the other hand, including it doesn't change the results much: its population is 416,000, compared with Manchester-Nashua's 402,000. The results follow.
Vermont is an interesting outlier here. It makes fourth place not because it has a large city, but because its capital, Montpelier, is so very small. I tried doing some scatter plots, to see if anything else jumped out, but they weren't very illuminating. If anything, the data is suprisingly evenly distributed. Here's an example:
[ Addendum 20080129: Some remarks about the format of the Census Bureau's data file. ] [ Addendum 20090217: A comparison of the relative sizes of each state's largest and second-largest cities. ]
[Other articles in category /misc] permanent link Sun, 20 Jan 2008
Utterly Useless Book Reviews (#1 in a series?)
Graves was a classical scholar, and based his novel on the historical accounts available, principally The Twelve Caesars of Suetonius. Suetonius wrote his history after all the people involved were dead, and his book reads like a collection of anecdotes placed in approximately chronological order. Suetonius seems to have dug up and recorded as fact every scurrilous rumor he could find. Some of the rumors are contradictory, and some merely implausible. When Graves turned The Twelve Caesars into I, Claudius, he resolved this mass of unprocessed material into a coherent product. The puzzling trivialities are explained. The contradictions are cleared up. Sometimes the scurrilous rumors are explained as scurrilous rumors; sometimes Claudius explains the grain of truth that lies at their center. Other times the true story, as related by Claudius, is even worse than the watered-down version that came to Suetonius's ears. Suetonius mentions that, as emperor, Claudius tried to introduce three new letters into the alphabet. Huh? In Graves' novel, this is foreshadowed early, and when it finally happens, it makes sense.
There is a story that Borges tells about the miracles performed by the Buddha, who generally eschewed miracles as being too showy. But Borges tells the story that one day the Buddha had to cross a desert, and seven different gods each gave him a parasol to shade his head. The Buddha did not want to offend any of the gods, so he split himself into seven Buddhas, and each one crossed the desert using a different parasol. He performed a miracle of politeness. (The trouble with Borges's stories is that you never know which ones he read in some obscure 17th-century book, and which ones he made up himself. I spent a whole year thinking how clever Borges had been to have invented the novelist Adolfo Bioy Casares, with his alphabetical initials, and then one day I was in the bookstore and came upon the Adolfo Bioy Casares section. Oops.) Anyway, Graves lets Jesus have the miracles, and they are indeed miraculous, but they are miracles of kindness and insight, not miracles of stage magic. When Graves explains the miracles, you say "oh, of course", without then saying "is that all?" I have not yet gotten to the part where Jesus silences the storm and walks on water, but I am looking forward to it. I did get to the loaves and fishes, and it was quite satisfactory. I am not going to spoil the surprise. I recommend it. Check it out.
[ Addendum 20080201: James
Russell has read both I, Claudius and Twelve
Caesars. ]
[Other articles in category /book]
permanent link
Help, help!
Przemek Klosowski wrote to offer me physics help, and also to ask
about introspection on Perl objects. Specifically, he said that if
you called a nonexistent method on a TCL object, the error message
would include the names of all the methods that would have worked. He
wanted to know if there was a way to get Perl to do something
similar.
There isn't, precisely, because Perl has only a conventional
distinction between methods and subroutines, and you Just Have To Know
which is which, and avoid calling the subroutines as methods, because
the Perl interpreter has no idea which is which. But it does have
enough introspection features that you can get something like what you
want. This article will explain how to do that.
Here is a trivial program that invokes an undefined method on an
object:
Now consider the following program instead:
Some of the items may be intended to be called as functions, and not
as methods. Some may be functions imported from some other module. A
common offender here is Carp, which places a carp
function into another module's namespace; this function will show up
in a list like the one above, without even an "inherited from" note,
even though it is not a method and it does not make sense to call it
on an object at all.
Even when the items in the list really are methods, they may be
undocumented, internal-use-only methods, and may disappear in future
versions of the YAML module.
But even with all these warnings, Help is at least a partial
solution to the problem.
The real reason for this article is to present the code for
Help.pm, not because the module is so intrinsically useful
itself, but because it is almost a catalog of weird-but-useful Perl
module hackery techniques. A full and detailed tour of this module's
30 lines of code would probably make a decent 60- or 90-minute class
for intermediate Perl programmers who want to become wizards. (I have
given many classes on exactly that topic.)
Here's the code:
Typically, a module's import method is inherited from
Exporter, which gets control at this point and arranges to
make some of the module's functions available in the caller's
namespace. So, for example, when you invoke use YAML
'freeze' in your module, Exporter's import
method gets control and puts YAML's "freeze"
function into your module's namespace. But that is not what we are
doing here. Instead, Help has its own import
method:
@Foo::ISA is the array that is searched whenever a method call on a
Foo objects fails because the method doesn't exist. Perl
will search the classes named in @Foo::ISA, in order. It
will search the Help class last. That's important, because
we don't want Help to interfere with Foo's ordinary
inheritance.
Notice the way the variable name Foo::ISA is generated
dynamically by concatenating the value of $class with the
literal string ::ISA. This is how you access a variable
whose name is not known at compile time in Perl. We will see this
technique over and over again in this module.
The backslash in @{"$class\::ISA"} is necessary, because if
we wrote @{"$class::ISA"} instead, Perl would try to
interpolate the value of $ISA variable from the package named
class. We could get around this by writing something like
@{$class . '::ISA'}, but the backslash is easier to read.
But when method search fails, Perl doesn't give up right away.
Instead, it tries the method search a second time, this time looking
for a method named AUTOLOAD. If it finds one, it calls it.
It only throws an exception of there is no AUTOLOAD.
The Help class doesn't have a nosuchmethod method
either, but it does have AUTOLOAD. If Foo or one of
its other parent classes defines an AUTOLOAD, one of those
will be called instead. But if there's no other AUTOLOAD,
then Help's AUTOLOAD will be called as a last
resort.
This pattern match dismantles the contents of $AUTOLOAD into
a class name and a method name:
The AUTOLOAD function is now going to accumulate a table of
all the methods that could have been called on the target
object, print out a report, and throw a fatal exception.
The accumulated table will reside in the private hash
%known_method. Keys in this hash will be method names.
Values will be the classes in which the names were found.
Before the loop actually looks at the methods in the current class
it's searching, it looks to see if the class has any base classes. If
there are any, it pushes them onto the stack to be searched next:
To find out if a name denotes a subroutine, we use
defined(&{subroutine_name}) for each name in the
package symbol table. If there is a subroutine by that name, the program
inserts it and the class name into %known_method. Otherwise,
the name is a variable or filehandle name and is ignored:
If you have any clever techniques for identifying other stuff that
should be omitted from the output, this is where you would put them.
For example, many authors use the convention that functions whose
names have a leading underscore are private to the implementation, and
should not be called by outsiders. We might omit such items from the
output by
adding a line here:
The output for my example would look like this:
You can always force the help message by calling
$object->Help::help. This calls a method named
help, and it starts the inheritance search in the
Help package. Control is transferred to the following
help method:
Calling AUTOLOAD in the normal way, without goto,
would have worked also. I did it this way just to be a fusspot.
It is very common for objects to lack a DESTROY method;
usually nothing additional needs to be done when the object's lifetime
is over. But we do not want the
Help::AUTOLOAD function to be invoked automatically whenever
such an object is destroyed! So Help defines a last-resort
DESTROY method that is called instead; this prevents Perl
from trying the AUTOLOAD search when an object with no
DESTROY method is
destroyed:
Well, this code will not run with "use strict". It does a lot of
stuff on purpose that "strict" was put in specifically to keep you
from doing by accident.
At some point you have to take off the training wheels, kiddies.
Share and enjoy.
[Other articles in category /prog/perl]
permanent link
Clubbing someone to death with a loaded Uzi
This is the sort of mistake you expect from an intern. I chuckled and
corrected him. But I've seen it several times since from non-interns.
Here's another example. I
am not making this up. Whether it's more or less odious than the
intern code is up to you to decide:
It's appalling how many supposedly professional programmers see
nothing wrong here. They squint at the code, and say "I think you
need parentheses around %hash there", or they criticize the
choice of variable names.
I first used this as an interview question because the Python code
sample submitted by a job applicant contained an example of it.
"Weird," I thought, "but maybe she's outgrown that." Since she
claimed to be an expert Perl user, I asked her about it in Perl, using
code like the example above. After she made a syntactic suggestion, I
said "It's not a syntax problem, and it's not a trick question." She
criticized the syntax some more. Finally I told her the answer:
"Couldn't you just use $hash{name}++?"
"Oh, yeah, I guess so," she said.
A few minutes later we were going over her Python code sample and I
pointed out the place where she had done the
exact same thing, and asked if she was happy with that loop and
wanted to change it. No, she thought it was just fine.
"Doesn't this
look like the example I showed you on the whiteboard a little while
ago?"
"Oh, I guess it does."
We didn't hire her.
Larry Wall once said that iterating over the keys of a hash is like
clubbing someone to death with a loaded Uzi.
I had already realized that you could, in principle, commit this error
with a regular array instead of with a hash, but I had never seen an
example until today's
episode of the Daily WTF. The Daily WTF code is so awful, all the
way through, that I was afraid that people might miss this
slightly-more subtle gem lurking in the middle, and that was what
motivated me to write this article in the first place. Here's the gem:
[ Addendum 20080201: A bit more. ]
[ Addendum 20090213: A counterexample. ]
[Other articles in category /prog]
permanent link
Squillions
Google book search is a good way to answer questions like that,
because if "squillion" is widely used, you will find a lot of examples
of it. And indeed it is widely used, and I did find a lot of examples
of it. So there was no need to remove it from the article.
One of the Google hits was from the Cormac Ó Cuilleanáin
translation of Giovanni Boccaccio's Decameron. The Decameron is a great classic
of Italian Renaissance literature, probably the greatest classic that
Italian has, after Dante's Divine Comedy. It was written
around 1350. In this
particular chapter (the tenth story on the sixth day, if you want to
look it up) Guccio, a priest, is trying
to seduce a hideous kitchen-maid:
The kitchen-maid, by the way, is described as having "a pair of tits
like two baskets of manure".
This was amusing, and as I had never read the Decameron, I wanted to read
more, and learn how it turned out. But the Google excerpt was
limited, so I asked the library to get me a copy of that version of
the Decameron. Of course they have many copies on the shelf, but not that
particular translation. So I asked the interlibrary loan people for
it, and they got it for me.
When it arrived, I was rather dismayed. The ILL people get the book
from the most convenient place, and that means that it often comes
from the Drexel library, up the street, or the Temple library, across
town, or the West Chester Community College library, or Lehigh
University, about an hour away in Bethlehem. (Steel
Bethlehem, of course, not Jesus Bethlehem.) The farthest I had ever
gotten a book from was an extremely obscure quilting manual that
Lorrie asked for; it eventually arrived from the Sno-Isles regional
library system of Marysville,
Washington.
But this copy of the Decameron came from the Sloman library of the
University of Essex. I was so shocked that I had to look it up online
to make sure that it was not Essex, New Jersey, or something like
that. I was not. It was East Saxony. I was upset because I felt
that the trouble and effort had been wasted. If I had known that the
nearest available copy of
Cormac Ó Cuilleanáin's translation was in Essex, I would
have been happy to take a different version that was on the shelf.
And then to top it off, I had hardly begun to read it before it came
due and had to be sent back to Essex.
So I went to the library and got another Decameron, this one translated by
Mark Musa and Peter Bondanella. Here is the corresponding passage:
And there is a footnote on "thousand hundreds" explaining "Guccio
invents this amount, as well as the previous phrase 'by procuration,'
in order to impress his lady." By the way, in this version, Nuta has
"a pair of tits that looked like two clumps of cowshit".
Anyway, I think I liked "squillions" better than "thousand hundreds",
although I suppose "thousand hundreds" is probably a more literal
translation.
Well, I can find this out. Of course, one can find the Decameron online in
Italian; the copyright expired about five hundred years ago. Here it
is in Italian, courtesy of Brown
University:
Nuta in this version has "a pair of breasts that shewed as two buckets of muck". Feh. The Italian is "con un paio di poppe che parean due ceston da letame". The operative phrase here seems to be "ceston da letame". I don't know what those words mean, but, happily, Italian Wikipedia has an article about letame, and as the picture makes clear, it is indeed manure. Oh, did you want this article to have a point? Too bad. I recommend the Decameron. It is funny and salacious. There are a lot of stories about women cheating on their husbands, and then getting away with it through some clever trick, and then everyone who hears the story laughs and admires the cleverness of the ladies. (The counterpoint to this is that there are a number of stories of wife-beating, in which everyone who hears the story laughs and admires the wisdom of the husbands. I don't like that so much.) There are farcical stories of bed-swapping and wife-swapping, and one story about an abbess who comes out of her cell to berate a nun for having her lover in to visit, but the abbess is wearing a pair of men's trousers on her head instead of her wimple. Oops. This reminds me of when I was in high school, I was talking to one of my friends, who opted to study French, and this friend told me studying French is fun, because when you get to the third year and start reading real French literature, you read that great classic of French Literature, La Vie de Gargantua et de Pantagruel. If you have not read this master treasure of French culture, I should explain that the first chapter is mainly taken up with Gargantua and Pantagruel having a discussion about what is the best sort of thing to wipe your ass with, and it goes on from there. I took Latin, and in third-year Latin we read the orations of Cicero against Cataline. Fun stuff, but not the sort of thing that has you rushing to translate the next word.
[ Addendum 20080201: More about 'milliantanove'. ]
[Other articles in category /lang] permanent link Sat, 05 Jan 2008
Pepys' footballs explained
Walt found a reference in Montague Shearman's 1887 book on the history of football in England that specifically mentions this. Folks were playing football in the street, and because of this, Pepys took his coach to Sir Philip Warwicke's, rather than walking. I didn't ask, but I presume Walt found this by doing some straightforward Google search for "pepys footballs" or something of the sort. For some reason, this did not even occur to me. Once Big Dictionary failed me, I was stumped. Perhaps this marks me as a member of the pre-Internet generation. I imagined this morning that this episode would be repeated, with my daughter Iris in place of Walt. "Oh, Daddy! You're so old-fashioned. Just use a Google search." Anyway, inspired by Walt's example, or by what I imagined Walt's example to be, I did the search myself, and found the Shearman reference, as well as the following discussion in William Carew Hazlitt's Faiths and Folklore of 1905:
Mission, writing about 1690, says: "In winter foot-ball is a useful and charming exercise. It is a leather ball about as big as one's head, fill'd with wind. This is kick'd about from one to t'other in the streets, by him that can get at it, and that is all the art of it."This book looks like it would be good reading in general. [ Addendum 20080106: This is not the William Hazlitt, but his grandson. Thank you, Wikipedia. ] Thanks very much, Walt.
[Other articles in category /lang] permanent link Fri, 04 Jan 2008
Your age as a fraction
However, these reports are not quite accurate. On January 2, 1973, exactly 3 years and 9 months from your birthday, you would be 1,371 days old, or 3 years plus 275 days. 275/365 = 0.7534. On January 1, you were only 3 + 274/365 days old, which is 3.7507 years, and so January 1 is the day on which you should have been allowed to start reporting your age as "three and three quarters". This slippage between days and months occurs in the other direction as well, so there may be kids wandering around declaring themselves as "three and a half" a full day before they actually reach that age. Clearly this is one of the major problems facing our society, so I wanted to make up a table showing, for each number of days d from 1 to 365, what is the simplest fraction a/b such that when it is d days after your birthday, you are (some whole number and) a/b years. That is, I wanted a/b such that d/365 ≤ a/b < (d+1)/365. Then, by consulting the table each day, anyone could find out what new fraction they might have qualified for, and, if they preferred the new fraction to the old, they might start reporting their age with that fraction. There is a well-developed branch of mathematics that deals with this problem. To find simple fractions that approximate any given rational number, or lie in any range, you first expand the bounds of the range in continued fraction form. For example, suppose it has been 208 days since your birthday. Then today your age will range from (y years and) 208/365 days up to (y years and) 209/365 days. Then we expand 208/365 and 209/365 as continued fractions:
208/365 = [0; 1, 1, 3, 12, 1, 3]Where [0; 1, 1, 3, 12, 1, 3] is an abbreviation for the typographically horrendous expression:
Then you need to find a continued fraction that lies numerically in between these two but is as short as possible. (Shortness of continued fractions corresponds directly to simplicity of the rational numbers they represent.) To do this, take the common initial segment, which is [0; 1, 1], and then apply an appropriate rule for the next place, which depends on whether the numbers in the next place differ by 1 or by more than 1, whether the first difference occurs in an even position or an odd one, mumble mumble mumble; in this case the rules say we should append 3. The result is [0; 1, 1, 3], or, in conventional notation:
Since I already had a library for calculating with continued fractions, I started extending it with functions to handle this problem, to apply all the fussy little rules for truncating the continued fraction in the right place, and so on. Then I came to my senses, and realized there was a better way, at least for the cases I wanted to calculate. Given d, we want to find the simplest fraction a/b such that d/365 ≤ a/b < (d+1)/365. Equivalently, we want the smallest integer b such that there is some integer a with db/365 ≤ a < (d+1)b/365. But b must be in the range (2 .. 365), so we can easily calculate this just by trying every possible value of b, from 2 on up:
use POSIX 'ceil', 'floor';
sub approx_frac {
my ($n, $d) = @_;
for my $b (1 .. $d) {
my ($lb, $ub) = ($n*$b/$d, ($n+1)*$b/$d);
if (ceil($lb) < ceil($ub) && ceil($ub) > $ub) {
return (int($ub), $b);
}
}
return ($n, $d);
}
The fussing with ceil() in the main test is to make the
ranges open on the upper end: 2/5 is not in the range
[3/10, 4/10), but it is in the range
[4/10, 5/10). Then we can embed this in a simple report-printing
program:
my $N = shift || 365;
for my $i (1..($N-1)) {
my ($a, $b) = approx_frac($i, $N);
print "$i/$N: $a/$b\n";
}
For tenths, the simplest fractions are:
This works fine, and it is a heck of a lot simpler than all the continued fraction stuff. The more so because the continued fraction library is written in C. For the application at hand, an alternative algorithm is to go through all fractions, starting with the simplest, placing each one into the appropriate d/365 slot, unless that slot is already filled by a simpler fraction:
my $N = shift || 365;
my $unfilled = $N;
DEN:
for my $d (2 .. $N) {
for my $n (1 .. $d-1) {
my $a = int($n * $N / $d);
unless (defined $simple[$a]) {
$simple[$a] = [$n, $d];
last DEN if --$unfilled == 0;
}
}
}
for (1 .. $N-1) {
print "$_/$N: $simple[$_][0]/$simple[$_][1]\n";
}
A while back I wrote an article about using the sawed-off
shotgun approach instead of the subtle technique approach. This
is another case where the simple algorithm wins big. It is an
n2 algorithm, whereas I think the continued fraction
one is n log n in the worst case. But unless you're
preparing enormous tables, it really doesn't matter much. And the
proportionality constant on the O() is surely a lot smaller for
the simple algorithms.(It might also be that you could optimize the algorithms to go faster: you can skip the body of the loop in the slot-filling algorithm whenever $n and $d have a common factor, which means you are executing the body only n log n times. But testing for common factors takes time too...) I was going to paste in a bunch of tabulations, but once again I remembered that it makes more sense to just let you run the program for yourself. Here is a form that will generate the table for all the fractions 1/N .. (N-1)/N; use N=365 to generate a table of year fractions for common years, and N=366 to generate the table for leap years:
[ Addendum 20070429: There is a followup to this article. ]
[Other articles in category /math] permanent link
Footballs?
Up, and by coach to Sir Ph. Warwicke's, the streete being full of footballs, it being a great frost, and found him and Mr. Coventry walking in St. James's Parke."The street being full of footballs?" Huh? I tried looking in the Big Dictionary, and it was no help at all. My best guess is that it's big chunks of frozen mud that you have to kick out of the way. Do any gentle readers know for sure? The Diary of Samuel Pepys has a syndication feed you can subscribe to. You get a diary entry every day or so, with all the names and places linked to a glossary. It's fun reading. [ Addendum 20080105: The answer. ]
[Other articles in category /lang] permanent link
Iris is not a vegetarian
I went to visit Iris at school last week, and stayed for lunch. I was seated with Iris and three other little girls. As the food was served, one of the girls, Riley, made some joke about how the food cart contained guinea pigs instead. This sort of joke is very funny to preschoolers. My sense of humor is very close to a preschooler's, and I would have thought that this was funny if she had said that the food cart contained clocks, or nose hairs, or a speech in defense of the Corn Laws, or the Trans-Siberian Railroad, or fish-shaped solid waste. But she said guinea pigs, and instead of laughing, I mused aloud that I had never eaten a guinea pig. Riley informed me that "You can't eat guinea pigs! They're animals, not food." "Sure you can," I said. "Meat is made from animals." Riley got this big grin on her face, the one that preschoolers get when they know that the adults are teasing them, and said "Nawww!" "Yes," I said. "Meat comes from animals." Riley shook her head. She knew I was joking. A general discussion ensued, with Iris taking my side, and another girl, Flora, taking Riley's. In the end, I did not convince them. "Well," I said, mostly to myself, at the end, "you girls are in for a rude awakening someday." Now, I know that not everyone is as direct as I am. And I know that not all non-vegetarians are as concerned as I am about the ethics of eating meat. But wow. I would have thought that someone would have explained to these girls where meat came from, just as a point of interest if nothing else. Or maybe they would have made the connection between chicken-the-food and chicken-the-farm-animal. I mean, they are constantly getting all these stories set on farms. Since three-year-olds ask about a billion questions a day. they must ask around a thousand questions a day about the farms, so how is it that the subject never came up? Iris was accidentally exposed to a movie version of Charlotte's Web on an airplane, and the plot of Charlotte's Web is that Charlotte is trying to save Wilbur from being turned into smoked ham. Left to myself I wouldn't have exposed Iris to Charlotte's Web so soon—it is too long for her, for one thing—but my point here is that the world is full of reminders of the true nature of meat, and they can be hard to avoid. So I was very surprised when it turned out that these two age-mates of Iris's were so completely unaware of it. Anyway, Iris has known from a very early age where meat comes from. Early in her meat-eating career, probably before she was two years old, I specifically explained it to her. I wanted to make sure that she understood that meat comes from animals. Because there are serious ethical issues involved when one eats animals, and I think they must be considered. We may choose to kill and destroy thinking beings to make food, but we should at least be aware that that is what is happening. I'm not sure I think it is evil, but I want to at least be aware of the possibility. I have never been a vegetarian, but I want to try to face the ethical results of that choice head on, and not pretend that they are not there. I did not want Iris growing up to identify meat with sterile packages in the supermarket. Meat was once alive, moving around with its own agenda, and I think it is important to understand this. So I made an effort to bring up the subject at home, and then one day when Iris was around twenty months old we went to a Chinese restaurant that has live fish in tanks at the front of the restaurant, and you can ask them to take one of these fish into the kitchen to be cooked for your dinner. Iris has loved to eat fish since she was a tiny baby. We ordered a striped bass, and then I took Iris to look at the fish in the tanks. I explained to her that these fish swimming in the tanks were for people to eat, and that when we ordered our fish for dinner, a waiter came out and caught one of the fish in a net, took it back to the kitchen, and they killed it and were cooking it for us. As I said, I had made the point before, but never so directly. We had never before seen the live animals that were turned into food for us. I really did not know how Iris would respond to this. Some people have a very strong negative response when they first learn that meat comes from animals, so negative that they never eat meat again. But I thought Iris should know the truth and make her own decision about how to respond. Iris's response was to point at one of the striped bass and say "I want to eat that one." Then she took me to each tank in turn, and told me me which kind of fish she wanted to eat and which ones she did not want to eat. (She favored the fish-looking fish, and rejected the crabs, shrimp, and eels.) Then when the fish arrived on our table Iris asked if it had been swimming in the tank, and I said it had. "Yum yum," said Iris, and dug in.
[Other articles in category /food] permanent link Tue, 01 Jan 2008
Santa Claus
My vocabulary here is failing me. "Telling them the story" is not what I want, because the Santa Claus thing is deceptive, and telling stories is not normally deceptive: "fiction" and "lies" mean different things. When I tell Iris the story of the Little Red Hen, there is no presumption that there is an actual, literal Red Hen. Iris might think there is, or not, or might not think about it at all; I don't know which. Ditto Cinderella, or Olivia the Pig, or any other story I tell or read to her. But when people tell their kids about Santa Claus, they present it not as a story, but as a literal truth. They present it in a way that is calculated to make the kids believe there is actually a fat, benevolent, white-bearded immortal, manufacturing toys in a secret arctic workshop. This is no longer mere fiction; it is a lie. So what I want to say is that this lady thought she would be depriving her kids of the magic of Santa Claus by not telling them this lie. But I really don't want to use the word "lie" here, because it's so pejorative. It makes it sound as though I think badly of this good woman for telling her kids that Santa Claus was real. But I don't, at all. She is generally wise and honest and I respect her. Parents tell their kids all sorts of awful, appalling lies, which upsets me a lot, but this lie is quite benign by comparison, and bothers me not at all. Let me be perfectly clear: I have nothing, absolutely nothing, against the Santa Claus story. I have an article in progress about how much I hate the way parents routinely lie to their kids, to manipulate them, and this one isn't in the article, because it doesn't even register. It's just for fun, or nearly so. Santa Claus seems pretty harmless to me. Unlike many of the pernicious lies children are told, Santa Claus is a great story. It would be really wonderful to believe that I would get presents every year because there was a fat guy manufacturing toys at the North Pole. Delightful! And the only thing wrong with it is that it isn't true. Oh well. There are a lot of pretty stories that aren't true. Anyway, at the time I had this conversation about Santa Claus, Iris was too young to have heard about Santa Claus anyway, and my co-worker asked if I was planning to tell Iris the Santa Claus story. Now that I've written this article, it occurs to me what she meant to ask, was not whether I was going to tell Iris the story, but whether I was going to tell her that it was true. Having realized that now, my reply seems a lot more obvious in retrospect than it did at the time. I hadn't thought about it before, but I said I didn't think I would. "But what are you going to tell her?" "The truth, I guess." The truth, though, is pretty wonderful, although less astonishing. You don't get presents because of the fat guy in the red suit, which is a shame, because wouldn't it be fun if it were true? But you do get them anyway, and it's because your family loves you. As consolation prizes go, that one's pretty good. So we did tell her the truth. Santa Claus is just a story. Iris will have to grow up without that piece of childhood delight. Sorry, Iris. But she'll also grow up knowing that her parents respect her enough to tell her the truth instead of a pretty lie, and maybe that will be enough of a consolation prize to make up for it.
[Other articles in category /kids] permanent link |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||