| The Universe of Discourse | ||||||||||||||||||||||||||||||
|
12 recent entries Archive:
In this section:
Comments disabled |
Sat, 15 Sep 2007
Families of scalars
if ($FORM{'h01'}) {$checked01 = " CHECKED "}
if ($FORM{'h02'}) {$checked02 = " CHECKED "}
if ($FORM{'h03'}) {$checked03 = " CHECKED "}
if ($FORM{'h04'}) {$checked04 = " CHECKED "}
if ($FORM{'h05'}) {$checked05 = " CHECKED "}
if ($FORM{'h06'}) {$checked06 = " CHECKED "}
(I did not make this up; I got it from here.)
The flag here is the family
$checked01,
$checked02, etc. Such code is almost always improved by
replacing the family with an array, and the repeated code with a
loop:
$checked[$_] = $FORM{"h$_"} for "01" .. "06";
Actually in this particular case a better solution was to eliminated
the checked variables entirely, but that is not what I
was planning to discuss. Rather, I planned to discuss a recent
instance in which I wrote some code with a family of variables myself,
and the fix was somewhat different.The program I was working on was a digester for the qmail logs, translating them into a semblance of human-readable format. (This is not a criticism; log files need not be human-readable; they need to be easy to translate, scan, and digest.) The program scans the log, gathering information about each message and all the attempts to deliver it to each of its recipient addresses. Each delivery can be local or remote. Normally the program prints information about each message and all its deliveries. I was adding options to the program to allow the user to specify that only local deliveries or only remote deliveries were of interest. The first thing I did was to add the option-processing code:
...
} elsif ($arg eq "--local-only" || $arg eq '-L') {
$local_only = 1;
} elsif ($arg eq "--remote-only" || $arg eq '-R') {
$remote_only = 1;
As you see, this is where I made my mistake, and introduced a
(two-member) family of variables. The conventional fix says that this
should have been something like $do_only{local} and
$do_only{remote}. But I didn't notice my mistake right away.Later on, when processing a message, I wanted to the program to scan its deliveries, and skip all processing and display of the message unless some of its deliveries were of the interesting type:
if ($local_only || $remote_only) {
...
}
I had vague misgivings at this point about the test, which seemed
redundant, but I pressed on anyway, and found myself in minor trouble.
Counting the number of local or remote deliveries was complicated:
if ($local_only || $remote_only) {
my $n_local_deliveries =
grep $msg->{del}{$_}{lr} eq "local", keys %{$msg->{del}};
my $n_remote_deliveries =
grep $msg->{del}{$_}{lr} eq "remote", keys %{$msg->{del}};
...
}
There is a duplication of code here. Also, there is a waste of CPU
time, since the program never needs to have both numbers available.
This latter waste could be avoided at the expense of complicating the
code, by using something like $n_remote_deliveries = keys(%{$msg->{del}}) -
$n_local_deliveries, but that is not a good solution.Also, the complete logic for skipping the report was excessively complicated:
if ($local_only || $remote_only) {
my $n_local_deliveries =
grep $msg->{del}{$_}{lr} eq "local", keys %{$msg->{del}};
my $n_remote_deliveries =
grep $msg->{del}{$_}{lr} eq "remote", keys %{$msg->{del}};
return if $local_only && $local_deliveries == 0
|| $remote_only && $remote_deliveries == 0;
}
I could have saved the wasted CPU time (and the repeated tests of the
flags) by rewriting the code like this:
if ($local_only) {
return unless
grep $msg->{del}{$_}{lr} eq "local", keys %{$msg->{del}};
} elsif ($remote_only) {
return unless
grep $msg->{del}{$_}{lr} eq "remote", keys %{$msg->{del}};
}
but that is not addressing the real problem, which was the family of
variables, $local_only and $remote_only, which
inevitably lead to duplicated code, as they did here.Such variables are related by a convention in the programmer's mind, and nowhere else. The language itself is as unaware of the relationship as if the variables had been named $number_of_nosehairs_on_typical_goat and $fusion_point_of_platinum. A cardinal rule of programming is to make such conventional relationships explicit, because then the programming system can give you some assistance in dealing with them. (Also because then they are apparent to the maintenance programmer, who does not have to understand the convention.) Here, the program was unable to associate $local_only with the string "local" and $remote_only with "remote", and I had to make up the lack by writing additional code. For families of variables, the remedy is often to make the relationship explicit by using an aggregate variable, such as an array or a hash, something like this:
if (%use_only) {
my ($only_these) = keys %use_only;
return unless
grep $msg->{del}{$_}{lr} eq $only_these, keys %{$msg->{del}};
}
Here the relationship is explicit because $use_only{"remote"}
indicates an interest in remote deliveries and
$use_only{"local"} indicates an interest in local deliveries,
and the program can examine the key in the hash to determine what to
look for in the {lr} data.But in this case the alternatives are disjoint, so the %use_only hash will never contain more than one element. The tipoff is the bizarre ($only_these) = keys ... line. Since the hash is really storing a single scalar, it can be replaced with a scalar variable:
} elsif ($arg eq "--local-only" || $arg eq '-L') {
$only_these = "local";
} elsif ($arg eq "--remote-only" || $arg eq '-R') {
$only_these = "remote";
Then the logic for skipping uninteresting messages becomes:
if ($only_these) {
return unless
grep $msg->{del}{$_}{lr} eq $only_these, keys %{$msg->{del}};
}
Ahh, better.A long time ago I started to suspect that flag variables themselves are a generally bad practice, and are best avoided, and I think this example is evidence in favor of that theory. I had a conversation about this yesterday with Aristotle Pagaltzis, who is very thoughtful about this sort of thing. One of our conclusions was that although the flag variable can be useful to avoid computing the same boolean value more than once, if it is worth having, it is because your program uses it repeatedly, and so it is probably testing the same boolean value more than once, and so it is likely that the program logic would be simplified if one could merge the blocks that would have been controlled by those multiple tests into one place, thus keeping related code together, and eliminating the repeated tests.
[Other articles in category /prs] permanent link
John Wilkins invents the meter
In skimming over it, I noticed that Wilkins' language contained words for units of measure: "line", "inch", "foot", "standard", "pearch", "furlong", "mile", "league", and "degree". I thought oh, this was another example of a foolish Englishman mistaking his own provincial notions for universals. Wilkins' language has words for Judaism, Christianity, Islam; everything else is under the category of paganism and false gods, and I thought that the introduction of words for inches and feet was another case like that one. But when I read the details, I realized that Wilkins had been smarter than that. Wilkins recognizes that what is needed is a truly universal measurement standard. He discusses a number of ways of doing this and rejects them. One of these is the idea of basing the standard on the circumference of the earth, but he thinks this is too difficult and inconvenient to be practical. But he settles on a method that he says was suggested by Christopher Wren, which is to base the length standard on the time standard (as is done today) and let the standard length be the length of a pendulum with a known period. Pendulums are extremely reliable time standards, and their period depends only their length and on the local effect of gravity. Gravity varies only a very little bit over the surface of the earth. So it was a reasonable thing to try. Wilkins directed that a pendulum be set up with the heaviest, densest possible spherical bob at the end of lightest, most flexible possible cord, and the the length of the cord be adjusted until the period of the pendulum was as close to one second as possible. So far so good. But here is where I am stumped. Wilkins did not simply take the standard length as the length from the fulcrum to the center of the bob. Instead:
...which being done, there are given these two Lengths, viz. of the String, and of the Radius of the Ball, to which a third Proportional must be found out; which must be as the length of the String from the point of Suspension to the Centre of the Ball is to the Radius of the Ball, so must the said Radius be to this third which being so found, let two fifths of this third Proportional be set off from the Centre downwards, and that will give the Measure desired.Wilkins is saying, effectively: let d be the distance from the point of suspension to the center of the bob, and r be the radius of the bob, and let x be such that d/r = r/x. Then d+(0.4)x is the standard unit of measurement. Huh? Why 0.4? Why does r come into it? Why not just use d? Huh? These guys weren't stupid, and there must be something going on here that I don't understand. Can any of the physics experts out there help me figure out what is going on here? Anyway, the main point of this note is to point out an extraordinary coincidence. Wilkins says that if you follow his instructions above, the standard unit of measurement "will prove to be . . . 39 Inches and a quarter". In other words, almost exactly one meter. I bet someone out there is thinking that this explains the oddity of the 0.4 and the other stuff I don't understand: Wilkins was adjusting his definition to make his standard unit come out to exactly one meter, just as we do today. (The modern meter is defined as the distance traveled by light in 1/299,792,458 of a second. Why 299,792,458? Because that's how long it happens to take light to travel one meter.) But no, that isn't it. Remember, Wilkins is writing this in 1668. The meter wasn't invented for another 110 years. [ Addendum 20070915: There is a followup article, which explains the mysterious (0.4)x in the formula for the standard length. ] Having defined the meter, which he called the "Standard", Wilkins then went on to define smaller and larger units, each differing from the standard by a factor that was a power of 10. So when Wilkins puts words for "inch" and "foot" into his universal language, he isn't putting in words for the common inch and foot, but rather the units that are respectively 1/100 and 1/10 the size of the Standard. His "inch" is actually a centimeter, and his "mile" is a kilometer, to within a fraction of a percent. Wilkins also defined units of volume and weight measure. A cubic Standard was called a "bushel", and he had a "quart" (1/100 bushel, approximately 10 liters) and a "pint" (approximately one liter). For weight he defined the "hundred" as the weight of a bushel of distilled rainwater; this almost precisely the same as the original definition of the gram. A "pound" is then 1/100 hundred, or about ten kilograms. I don't understand why Wilkins' names are all off by a factor of ten; you'd think he would have wanted to make the quart be a millibushel, which would have been very close to a common quart, and the pound be the weight of a cubic foot of water (about a kilogram) instead of ten cubic feet of water (ten kilograms). But I've read this section over several times, and I'm pretty sure I didn't misunderstand. Wilkins also based a decimal currency on his units of volume: a "talent" of gold or silver was a cubic standard. Talents were then divided by tens into hundreds, pounds, angels, shillings, pennies, and farthings. A silver penny was therefore 10-5 cubic Standard of silver. Once again, his scale seems off. A cubic Standard of silver weighs about 10.4 metric tonnes. Wilkins' silver penny is about is nearly ten cubic centimeters of metal, weighing 104 grams (about 3.5 troy ounces), and his farthing is 10.4 grams. A gold penny is about 191 grams, or more than six ounces of gold. For all its flaws, however, this is the earliest proposal I am aware of for a fully decimal system of weights and measures, predating the metric system, as I said, by about 110 years.
[Other articles in category /physics] permanent link
Girls of the SEC
This isn't the first time I've had this reaction on learning about a Playboy pictorial; last time was probably in August 2002 when I saw the "Women of Enron" cover. (I am not making this up.) I wasn't aware of The December 2002 feature, "Women of Worldcom" (I swear I'm not making this up), but I would have had the same reaction if I had been. I know that in recent years the Playboy franchise has fallen from its former heights of glory: circulation is way down, the Playboy Clubs have all closed, few people still carry Playboy keychains. But I didn't remember that they had fallen quite so far. They seem to have exhausted all the plausible topics for pictorial features, and are now well into the scraping-the-bottom-of-the-barrel stage. The June 1968 feature was "Girls of Scandinavia". July 1999, "Girls of Hawaiian Tropic". Then "Women of Enron" and now "Girls of the SEC". How many men have ever had a fantasy about sexy SEC employees, anyway? How can you even tell? Sexy flight attendants, sure; they wear recognizable uniforms. But what characterizes an SEC employee? A rumpled flannel suit? An interest in cost accounting? A tendency to talk about the new Basel II banking regulations? I tried to think of a category that would be less sexually inspiring than "SEC employees". It's difficult. My first thought was "Girls of Wal-Mart." But no, Wal-Mart employees wear uniforms. If you go too far in that direction you end up in the realm of fetish. For example, Playboy is unlikely to do a feature on "girls of the infectious disease wards". But if they did, there is someone (probably on /b/), who would be extremely interested. It is hard to imagine anyone with a similarly intense interest in SEC employees. So what's next for Playboy? Girls of the hospital gift shops? Girls of State Farm Insurance telephone customer service division? Girls of the beet canneries? Girls of Acadia University Grounds and Facilities Services? Girls of the DMV? [ Pre-publication addendum: After a little more research, I figured out that SEC refers here to "Southeastern Conference" and that Playboy has done at least two other features with the same title, most recently in October 2001. I decided to run the article anyway, since I think I wouldn't have made the mistake if I hadn't been prepared ahead of time by "Women of Enron". ] [ Addendum 20070913: This article is now on the first page of Google searches for "girls of the sec playboy". ] [ Addendum 20070915: The article has moved up from tenth to third place. Truly, Google works in mysterious ways. ]
[Other articles in category /misc] permanent link
The Wilkins pendulum mystery resolved
This article got some attention back in July, when a lot of people were Google-searching for "john wilkins metric system", because the UK Metric Association had put out a press release making the same points, this time discovered by an Australian, Pat Naughtin. For example, the BBC Video News says:
According to Pat Naughtin, the Metric System was invented in England in 1668, one hundred and twenty years before the French adopted the system. He discovered this in an ancient and rare book...Actually, though, he did not discover it in Wilkins' ancient and rare book. He discovered it by reading The Universe of Discourse, and then went to the ancient and rare book I cited, to confirm that it said what I had said it said. Remember, folks, you heard it here first. Anyway, that is not what I planned to write about. In the earlier article, I discussed Wilkins' original definition of the Standard, which was based on the length of a pendulum with a period of exactly one second. Then:
Let d be the distance from the point of suspension to the center of the bob, and r be the radius of the bob, and let x be such that d/r = r/x. Then d+(0.4)x is the standard unit of measurement.(This is my translation of Wilkins' Baroque language.) But this was a big puzzle to me:
Huh? Why 0.4? Why does r come into it? Why not just use d? Huh?Soon after the press release came out, I got email from a gentleman named Bill Hooper, a retired professor of physics of the University of Virginia's College at Wise, in which he explained this puzzle completely, and in some detail. According to Professor Hooper, you cannot just use d here, because if you do, the length will depend on the size, shape, and orientation of the bob. I did not know this; I would have supposed that you can assume that the mass of the bob is concentrated at its center of mass, but apparently you cannot. The usual Physics I calculation that derives the period of a pendulum in terms of the distance from the fulcrum to the center of the bob assumes that the bob is infinitesimal. But in real life the bob is not infinitesimal, and this makes a difference. (And Wilkins specified that one should use the most massive possible bob, for reasons that should be clear.) No, instead you have to adjust the distance d in the formula by adding I/md, where m is the mass of the bob and I is the moment of intertia of the bob, a property which depends on the shape, size, and mass of the bob. Wilkins specified a spherical bob, so we need only calculate (or look up) the formula for the moment of inertia of a sphere. It turns out that for a solid sphere, I = 2mr2/5. That is, the distance needed is not d, but d + 2r2/5d. Or, as I put it above, d + (0.4)x, where d/r = r/x. Well, that answers that question. My very grateful thanks to Professor Hooper for the explanantion. I think I might have figured it out myself eventually, but I am not willing to put a bound of less than two hundred years on how long it would have taken me to do so. One lesson to learn from all this is that those early Royal Society guys were very smart, and when they say something has a mysterious (0.4)x in it, you should assume they know what they are doing. Another lesson is that mechanics was pretty well-understood by 1668.
[Other articles in category /physics] permanent link |
|||||||||||||||||||||||||||||