The Universe of Discourse | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
12 recent entries Archive:
In this section: Comments disabled |
Thu, 31 Dec 2009
A monad for probability and provenance
Suppose a monad value represents all the possible outcomes of an event, each with a probability of occurrence. For concreteness, let's suppose all our probability distributions are discrete. Then we might have: data ProbDist p a = ProbDist [(a,p)] deriving (Eq, Show) unpd (ProbDist ps) = psEach a is an outcome, and each p is the probability of that outcome occurring. For example, biased and unbiased coins:
unbiasedCoin = ProbDist [ ("heads", 0.5), ("tails", 0.5) ]; biasedCoin = ProbDist [ ("heads", 0.6), ("tails", 0.4) ]; Or a couple of simple functions for making dice:
import Data.Ratio d sides = ProbDist [(i, 1 % sides) | i <- [1 .. sides]] die = d 6
d n is an n-sided die. The Functor instance is straightforward:
instance Functor (ProbDist p) where fmap f (ProbDist pas) = ProbDist $ map (\(a,p) -> (f a, p)) pasThe Monad instance requires return and >>=. The return function merely takes an event and turns it into a distribution where that event occurs with probability 1. I find join easier to think about than >>=. The join function takes a nested distribution, where each outcome of the outer distribution specifies an inner distribution for the actual events, and collapses it into a regular, overall distribution. For example, suppose you put a biased coin and an unbiased coin in a bag, then pull one out and flip it:
bag :: ProbDist Double (ProbDist Double String) bag = ProbDist [ (biasedCoin, 0.5), (unbiasedCoin, 0.5) ]The join operator collapses this into a single ProbDist Double String:
ProbDist [("heads",0.3), ("tails",0.2), ("heads",0.25), ("tails",0.25)]It would be nice if join could combine the duplicate heads into a single ("heads", 0.55) entry. But that would force an Eq a constraint on the event type, which isn't allowed, because (>>=) must work for all data types, not just for instances of Eq. This is a problem with Haskell, not with the monad itself. It's the same problem that prevents one from making a good set monad in Haskell, even though categorially sets are a perfectly good monad. (The return function constructs singletons, and the join function is simply set union.) Maybe in the next language. Perhaps someone else will find the >>= operator easier to understand than join? I don't know. Anyway, it's simple enough to derive once you understand join; here's the code:
instance (Num p) => Monad (ProbDist p) where return a = ProbDist [(a, 1)] (ProbDist pas) >>= f = ProbDist $ do (a, p) <- pas let (ProbDist pbs) = f a (b, q) <- pbs return (b, p*q)So now we can do some straightforward experiments:
liftM2 (+) (d 6) (d 6) ProbDist [(2,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 % 36),(7,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 % 36),(7,1 % 36),(8,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 % 36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(5,1 % 36),(6,1 % 36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(6,1 % 36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 % 36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 % 36),(12,1 % 36)]This is nasty-looking; we really need to merge the multiple listings of the same event. Here is a function to do that:
agglomerate :: (Num p, Eq b) => (a -> b) -> ProbDist p a -> ProbDist p b agglomerate f pd = ProbDist $ foldr insert [] (unpd (fmap f pd)) where insert (k, p) [] = [(k, p)] insert (k, p) ((k', p'):kps) | k == k' = (k, p+p'):kps | otherwise = (k', p'):(insert (k,p) kps) agg :: (Num p, Eq a) => ProbDist p a -> ProbDist p a agg = agglomerate idThen agg $ liftM2 (+) (d 6) (d 6) produces:
ProbDist [(12,1 % 36),(11,1 % 18),(10,1 % 12),(9,1 % 9), (8,5 % 36),(7,1 % 6),(6,5 % 36),(5,1 % 9), (4,1 % 12),(3,1 % 18),(2,1 % 36)]Hey, that's correct. There must be a shorter way to write insert. It really bothers me, because it looks look it should be possible to do it as a fold. But I couldn't make it look any better. You are not limited to calculating probabilities. The monad actually will count things. For example, let us throw three dice and count how many ways there are to throw various numbers of sixes:
eq6 n = if n == 6 then 1 else 0
agg $ liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die
ProbDist [(3,1),(2,15),(1,75),(0,125)]
There is one way to throw three sixes, 15 ways to throw two sixes, 75
ways to throw one six, and 125 ways to throw no sixes. So
ProbDist is a misnomer. It's easy to convert counts to probabilities:
probMap :: (p -> q) -> ProbDist p a -> ProbDist q a probMap f (ProbDist pds) = ProbDist $ (map (\(a,p) -> (a, f p))) pds normalize :: (Fractional p) => ProbDist p a -> ProbDist p a normalize pd@(ProbDist pas) = probMap (/ total) pd where total = sum . (map snd) $ pas normalize $ agg $ probMap toRational $ liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die ProbDist [(3,1 % 216),(2,5 % 72),(1,25 % 72),(0,125 % 216)]I think this is the first time I've gotten to write die die die in a computer program. The do notation is very nice. Here we calculate the distribution where we roll four dice and discard the smallest:
stat = do a <- d 6 b <- d 6 c <- d 6 d <- d 6 return (a+b+c+d - minimum [a,b,c,d]) probMap fromRational $ agg stat ProbDist [(18,1.6203703703703703e-2), (17,4.1666666666666664e-2), (16,7.253086419753087e-2), (15,0.10108024691358025), (14,0.12345679012345678), (13,0.13271604938271606), (12,0.12885802469135801), (11,0.11419753086419752), (10,9.41358024691358e-2), (9,7.021604938271606e-2), (8,4.7839506172839504e-2), (7,2.9320987654320986e-2), (6,1.6203703703703703e-2), (5,7.716049382716049e-3), (4,3.0864197530864196e-3), (3,7.716049382716049e-4)]One thing I was hoping to get didn't work out. I had this idea that I'd be able to calculate the outcome of a game of craps like this:
dice = liftM2 (+) (d 6) (d 6) point n = do roll <- dice case roll of 7 -> return "lose" _ | roll == n = "win" _ | otherwise = point n craps = do roll <- dice case roll of 2 -> return "lose" 3 -> return "lose" 4 -> point 4 5 -> point 5 6 -> point 6 7 -> return "win" 8 -> point 8 9 -> point 9 10 -> point 10 11 -> return "win" 12 -> return "lose"This doesn't work at all; point is an infinite loop because the first value of dice, namely 2, causes a recursive call. I might be able to do something about this, but I'll have to think about it more. It also occurred to me that the use of * in the definition of >>= / join could be generalized. A couple of years back I mentioned a paper of Green, Karvounarakis, and Tannen that discusses "provenance semirings". The idea is that each item in a database is annotated with some "provenance" information about why it is there, and you want to calculate the provenance for items in tables that are computed from table joins. My earlier explanation is here. One special case of provenance information is that the provenances are probabilities that the database information is correct, and then the probabilities are calculated correctly for the joins, by multiplication and addition of probabilities. But in the general case the provenances are opaque symbols, and the multiplication and addition construct regular expressions over these symbols. One could generalize ProbDist similarly, and the ProbDist monad (even more of a misnomer this time) would calculate the provenance automatically. It occurs to me now that there's probably a natural way to view a database table join as a sort of Kleisli composition, but this article has gone on too long already. Happy new year, everyone. [ Addendum 20100103: unsurprisingly, this is not a new idea. Several readers wrote in with references to previous discussion of this monad, and related monads. It turns out that the idea goes back at least to 1981. ]
My thanks to Graham Hunter for his donation.
[Other articles in category /prog/haskell] permanent link Tue, 15 Dec 2009
Monads are like burritos
At first I thought the choice of burritos was only a facetious reference to the peculiar and sometimes strained analogies these tutorials make. But then I realized that monads are like burritos. I will explain. A monad is a special kind of a functor. A functor F takes each type T and maps it to a new type FT. A burrito is like a functor: it takes a type, like meat or beans, and turns it into a new type, like beef burrito or bean burrito. A functor must also be equipped with a map function that lifts functions over the original type into functions over the new type. For example, you can add chopped jalapeños or shredded cheese to any type, like meat or beans; the lifted version of this function adds chopped jalapeños or shredded cheese to the corresponding burrito. A monad must also possess a unit function that takes a regular value, such as a particular batch of meat, and turns it into a burrito. The unit function for burritos is obviously a tortilla. Finally, a monad must possess a join function that takes a ridiculous burrito of burritos and turns them into a regular burrito. Here the obvious join function is to remove the outer tortilla, then unwrap the inner burritos and transfer their fillings into the outer tortilla, and throw away the inner wrappings. The map, join, and unit functions must satisfy certain laws. For example, if B is already a burrito, and not merely a filling for a burrito, then join(unit(B)) must be the same as B. This means that if you have a burrito, and you wrap it in a second tortilla, and then unwrap the contents into the outer tortilla, the result is the same as what you started with. This is true because tortillas are indistinguishable. I know you are going to point out that some tortillas have the face of Jesus. But those have been toasted, and so are unsuitable for burrito-making, and do not concern us here. So monads are indeed like burritos. I asked Brent if this was actually what he had in mind when he first suggested the idea of tutorials explaining monads in terms of burritos, and if everyone else had understood this right away. But he said no, I was the lone genius. [ Addendum 20120106: Chris Done has presented this theory in cartoon form. ]
[Other articles in category /prog] permanent link Mon, 14 Dec 2009
Another short explanation of Gödel's theorem
A while back I started writing up an article titled "World's shortest explanation of Gödel's theorem". But I didn't finish it... I went and had a look to see what was wrong with it, and to my surprise, there seemed to be hardly anything wrong with it. Perhaps I just forgot to post it. So if you disliked yesterday's brief explanation of Gödel's theorem—and many people did—you'll probably dislike this one even more. Enjoy!
A reader wrote to question my characterization of Gödel's theorem in the previous article. But I think I characterized it correctly; I said:
The only systems of mathematical axioms strong enough to prove all true statements of arithmetic, are those that are so strong that they also prove all the false statements of arithmetic.I'm going to explain how this works. You start by choosing some system of mathematics that has some machinery in it for making statements about things like numbers and for constructing proofs of theorems like 1+1=2. Many such systems exist. Let's call the one we have chosen M, for "mathematics". Gödel shows that if M has enough mathematical machinery in it to actually do arithmetic, then it is possible to produce a statement S whose meaning is essentially "Statement S cannot be proved in system M." It is not at all obvious that this is possible, or how it can be done, and I am not going to get into the details here. Gödel's contribution was seeing that it was possible to do this. So here's S again:
S: Statement S cannot be proved in system M. Now there are two possibilities. Either S is in fact provable in system M, or it is not. One of these must hold. If S is provable in system M, then it is false, and so it is a false statement that can be proved in system M. M therefore proves some false statements of arithmetic. If S is not provable in system M, then it is true, and so it is a true statement that cannot be proved in system M. M therefore fails to prove some true statements of arithmetic. So something goes wrong with M: either it fails to prove some true statements, or else it succeeds in proving some false statements. List of topics I deliberately omitted from this article, that mathematicians should not write to me about with corrections: Presburger arithmetic. Dialetheism. Inexhaustibility. ω-incompleteness. Non-RE sets of axioms.
If you liked this post, you may enjoy Torkel Franzén's books Godel's Theorem: An Incomplete Guide to Its Use and Abuse and Inexhaustibility: A Non-Exhaustive Treatment. If you disliked this post, you are even more likely to enjoy them.
Many thanks to Robert Bond for his contribution.
[Other articles in category /math] permanent link Sun, 13 Dec 2009
World's shortest explanation of Gödel's theorem
We have some sort of machine that prints out statements in some sort of language. It needn't be a statement-printing machine exactly; it could be some sort of technique for taking statements and deciding if they are true. But let's think of it as a machine that prints out statements. In particular, some of the statements that the machine might (or might not) print look like these:
For example, NPR*FOO means that the machine will never print FOOFOO. NP*FOOFOO means the same thing. So far, so good. Now, let's consider the statement NPR*NPR*. This statement asserts that the machine will never print NPR*NPR*. Either the machine prints NPR*NPR*, or it never prints NPR*NPR*. If the machine prints NPR*NPR*, it has printed a false statement. But if the machine never prints NPR*NPR*, then NPR*NPR* is a true statement that the machine never prints. So either the machine sometimes prints false statements, or there are true statements that it never prints. So any machine that prints only true statements must fail to print some true statements. Or conversely, any machine that prints every possible true statement must print some false statements too.
The conclusion then translates directly: any machine or method that produces statements about arithmetic either sometimes produces false statements, or else there are true statements about arithmetic that it never produces. Because if it produces something like NPR*NPR* then it is wrong, but if it fails to produce NPR*NPR*, then that is a true statement that it has failed to produce. So any machine or other method that produces only true statements about arithmetic must fail to produce some true statements. Hope this helps! (This explanation appears in Smullyan's book 5000 BC and Other Philosophical Fantasies, chapter 3, section 65, which is where I saw it. He discusses it at considerable length in Chapter 16 of The Lady or the Tiger?, "Machines that Talk About Themselves". It also appears in The Mystery of Scheherezade.)
I gratefully acknowledge Charles Colht for his generous donation to this blog.
[ Addendum 20091214: Another article on the same topic. ]
[Other articles in category /math] permanent link Sat, 12 Dec 2009
On failing open
For example, consider a login component that accepts a username and a password and then queries a remote authentication server to find out if the password is correct. If the connection to the authentication server fails, or if the authentication server is down, the login component must decide whether to grant or deny access, in the absence of correct information from the server. The correct design is almost certainly to "fail closed", and to deny access. I used to teach security classes, and I would point out that programs sometimes have bugs, and do the wrong thing. If your program has failed closed, and if this is a bug, then you have an irate user. The user might call you up and chew you out, or might chew you out to your boss, and they might even miss a crucial deadline because your software denied them access when it should have granted access. But these are relatively small problems. If your program has failed open, and if this is a bug, then the user might abscond with the entire payroll and flee to Brazil. (I was once teaching one of these classes in Lisbon, and I reached the "flee to Brazil" example without having realized ahead of time that this had greater potential to offend the Portuguese than many other people. So I apologized. But my hosts very kindly told me that they would have put it the same way, and that in fact the Mayor of Lisbon had done precisely what I described a few years before. The moral of the story is to read over the slides ahead of time before giving the talk.) But I digress. One can find many examples in the history of security that failed the wrong way. However, the issue is on my mind because I was at a job interview a few weeks ago with giant media corporation XYZ. At the interview, we spent about an hour talking about an architectural problem they were trying to solve. XYZ operates a web site where people can watch movies and TV programs online. Thy would like to extend the service so that people who subscribe to premium cable services, such as HBO, can authenticate themselves to the web site and watch HBO programs there; HBO non-subscribers should get only free TV content. The problem in this case was that the authentication data was held on an underpowered legacy system that could serve only a small fraction of the requests that came in. The solution was to cache the authentication data on a better system, and gather and merge change information from the slow legacy system as possible. I observed during the discussion that this was a striking example of the rare situation in which one wants the authentication system to fail open instead of closed. For suppose one grants access that should not be granted. Then someone on the Internet gets to watch a movie or an episode of The Sopranos for free, which is not worth getting excited about and which happens a zillion times a day anyhow. But suppose the software denies access that should have been granted. Then there is a legitimate paying customer who has paid to watch The Sopranos, and we told them no. Now they are a legitimately irate customer, which is bad, and they may call the support desk, costing XYZ Corp a significant amount of money, which is also bad. So all other things being equal, one should err on the side of lenity: when in doubt, grant access.
I would like to thank Andrew Lenards for his gift.
[Other articles in category /tech] permanent link Wed, 09 Sep 2009
You think you're All That, but you're not!
So I was delighted to discover that there was a new book out called Term Rewriting and All That. The "...and All That" suffix is a reference to the tongue-in-cheek classic of British history, 1066 and All That, and promised a casual, accessible, and possibly humorous treatment. Unfortunately the promise was not kept. The book is very good, but it is not casual or humorous. Nor is it especially accessible. It is a solid slab of term rewriting, one of those books that make me think "I would not want to drop it on my foot." That is not a bad thing; the Barendregt book is superb, an enormous superb slab of lambda calculus that you would not want to drop on your foot. But it is not titled "Lambda Calculus and All That". So why the title? I don't know. The authors are Germans, so perhaps they don't understand the joke? [ Addendum: There's exactly one review of this book on Amazon, and it says the same thing I do. It begins: "My main criticism of this book is its title." ]
[Other articles in category /book] permanent link Sat, 01 Aug 2009
Dijkstra was not insane
I wrote a response, explaining where Dijkstra was coming from, and I am very happy with how it came out, so I'm reposting it here. The list subscriber said, in part:
On a side note, I never read anything by Dijkstra that wasn't noticeably out of touch with the reality of programming, which qualifies them as screeds to me.A lot of people bring up the premature-loop-exit prohibition without understanding why Dijkstra suggested it; it wasn't just that he was a tightassed Dutchman. Dijkstra's idea was this: suppose you want to prove, mathematically, that your program does what it is supposed to do. Please, everyone, suspend your judgment of this issue for a few paragraphs, and bear with me. Let's really suppose that we want to do this. Dijkstra's idea is that the program is essentially a concatenation of blocks, each of which is trying to accomplish something or other, and each of which does not make sense to run unless some part of the program state is set up for it ahead of time. For example, the program might be to print a sorted list of links from a web page. Then the obvious blocks are:
We say that the "precondition" for C is that the array be populated with URLs, and the "postcondition" is that the array be in sorted order. What you would want to prove about C is that if the precondition holds—that is, if the array is properly populated before C begins—then the postcondition will hold too—that is, the array will be in sorted order when C completes. It occurs to me that calling this a "proof" is probably biasing everyone's thinking. Let's forget about mathematical proofs and just think about ordinary programmers trying to understand if the program is correct. If the intern in the next cubicle handed you his code for this program, and you were looking it over, you would probably think in very much this way: you would identify block C (maybe it's a subroutine, or maybe not) and then you would try to understand if C, given an array of URLs, would produce a properly sorted array by the time it was done. C itself might depend on some sub-blocks or subroutines that performed sub-parts of the task; you could try to understand them similarly. Having proved (or convinced yourself) that C will produce the postcondition "array contains sorted list of URLs", you are in an excellent position to prove (or convince yourself) that block D prints out a sorted array of URLs, which is what you want. Without that belief about C, you are building on sand; you have almost nothing to go on, and you can conclude hardly anything useful about the behavior of D. Now consider a more complex block, one of the form:
if (q) { E; } else { F; }Suppose you believe that code E, given precondition x, is guaranteed to produce postcondition y. And suppose you believe the same thing about F. Then you can conclude the same thing about the entire if-else block: if x was true before it began executing, then y will be true when it is done.^{[2]} So you can build up proofs (or beliefs) about small bits of code into proofs (or beliefs) about larger ones. We can understand while loops similarly. Suppose we know that condition p is true prior to the commencement of some loop, and that if p is true before G executes, then p will also be true when G finishes. Then what can we say about this loop?
while (q) { G; }We can conclude that if p was true before the loop began, then p will still be true, and q will be false, when the loop ends. BUT BUT BUT BUT if your language has break, then that guarantee goes out the window and you can conclude nothing. Or at the very least your conclusions will become much more difficult. You can no longer treat G atomically; you have to understand its contents in detail. So this is where Dijkstra is coming from: features like break^{[3]} tend to sabotage the benefits of structured programming, and prevent the programmer from understanding the program as a composition of independent units. The other subscriber made a seemingly disparaging reference to "Dijkstra's idea of structure", but I hope it is clear that it was not an arbitrary idea. Dijkstra's idea of structure is what will allow you to understand a large program as a collection of modules. Regardless of your opinion about formal verification methods, or correctness proofs, or the practicality of omitting break from your language, it should at least be clear that Dijkstra was not being doctrinaire just for the sake of doctrine.
Some additional notesHere are some interesting peripheral points that I left out of my main discussion because I wanted to stick to the main point, which was: "Dijkstra was not insane".
Further reading
[Other articles in category /prog] permanent link Sun, 21 Jun 2009
Gray code at the pediatrician's office
How did the bracket know exactly what height to report? This was done in a way I hadn't seen before. It had a photosensor looking at the post, which was printed with this pattern: (Click to view the other pictures I took of the post.) The pattern is binary numerals. Each numeral is a certain fraction of a centimeter high, say 1/4 centimeter. If the sensor reads the number 433, that means that the bracket is 433/4 = 108.25 cm off the ground, and so that Iris is 108.25 cm tall. The patterned strip in the left margin of this article is a straightforward translation of binary numerals to black and white boxes, with black representing 1 and white representing 0:
0000000000If you are paying attention, you will notice that although the strip at left is similar to the pattern in the doctor's office, it is not the same. That is because the numbers on the post are Gray-coded. Gray codes solve the following problem with raw binary numbers. Suppose Iris is close to 104 = 416/4 cm tall, so that the photosensor is in the following region of the post:
...But suppose that the sensor (or the post) is slightly mis-aligned, so that instead of properly reading the (416) row, it reads the first half of the (416) row and last half of the (415) row. That makes 0110111111, which is 447 = 111.75 cm, an error of almost 7.5%. (That's three inches, for my American and Burmese readers.) Or the error could go the other way: if the sensor reads the first half of the (415) and the second half of the (416) row, it will see 0110000000 = 384 = 96 cm. Gray code is a method for encoding numbers in binary so that each numeral differs from the adjacent ones in only one position:
0000000000This is the pattern from the post, which you can also see at the right of this article. Now suppose that the mis-aligned sensor reads part of the (416) line and part of the (417) line. With ordinary binary coding, this could result in an error of up to 7.75 cm. (And worse errors for children of other heights.) But with Gray coding no error results from the misreading:
...No matter what parts of 0101110000 and 0101110001 are stitched together, the result is always either 416 or 417. Converting from Gray code to standard binary is easy: take the binary expansion, and invert every bit that is immediately to the right of a 1 bit. For example, in 1111101000, each red bit is to the right of a 1, and so is inverted to obtain the Gray code 1000011100. Converting back is also easy: of the Gray code. Replace every sequence of the form 1000...01 with 1111...10; also replace 1000... with 1111... if it appears at the end of the code. For example, Gray code 1000011100 contains two such sequences, 100001 and 11, which are replaced with 111110 and 10, to give 1111101000. [ Addendum 20110525: Every so often someone asks why the stadiometer is so sophisticated. Here is the answer. ]
[Other articles in category /math] permanent link Tue, 16 Jun 2009
Haskell logo fail
Ouch.
[Other articles in category /prog/haskell] permanent link Sat, 23 May 2009
A child is bitten by a dog every 0.07 seconds...
This is obviously nonsense, because suppose the post office employs half a million letter carriers. (The actual number is actually about half that, but we are doing a back-of-the-envelope estimate of plausibility.) Then the bite rate is six bites per thousand letter carriers per year, and if children are 900 times more likely to be bitten, they are getting bitten at a rate of 5,400 bites per thousand children per year, or 5.4 bites per child. Insert your own joke here, or use the prefabricated joke framework in the title of this article. I wrote to the reporter, who attributed the claim to the Postal Bulletin 22258 of 7 May 2009. It does indeed appear there. I am trying to track down the ultimate source, but I suspect I will not get any farther. I have discovered that the "900 times" figure appears in the Post Office's annual announcements of Dog Bite Prevention Month as far back as 2004, but not as far back as 2002. Meantime, what are the correct numbers? The Centers for Disease Control and Prevention have a superb on-line database of injury data. It immediately delivers the correct numbers for dog bite rate among children:
According to the USPS 2008 Annual Report, in 2008 the USPS employed 211,661 city delivery carriers and 68,900 full-time rural delivery carriers, a total of 280,561. Since these 280,561 carriers received 3,000 dog bites, the rate per 100,000 carriers per year is 1069.29 bites. So the correct statistic is not that children are 900 times more likely than carriers to be bitten, but rather that carriers are 6.6 times as likely as children to be bitten, 5.6 times if you consider only children under 13. Incidentally, your toddler's chance of being bitten in the course of a year is only about a quarter of a percent, ceteris paribus. Where did 900 come from? I have no idea. There are 293 times as many children as there are letter carriers, and they received a total of 44.5 times as many bites. The "900" figure is all over the Internet, despite being utterly wrong. Even with extensive searching, I was not able to find this factoid in the brochures or reports of any other reputable organization, including the American Veterinary Medical Association, the American Academy of Pediatrics, the Centers for Disease Control and Prevention, or the Humane Society of the Uniited States. It appears to be the invention of the USPS. Also in the same newspaper, the new Indian restaurant on Baltimore avenue was advertising that they "specialize in vegetarian and non-vegetarian food". It's just a cornucopia of stupidity today, isn't it?
[Other articles in category /math] permanent link Thu, 21 May 2009
[Other articles in category /misc] permanent link Wed, 20 May 2009
No flimping
There is a standard example in linguistics that is attached to the word "flimp". The idea it labels is that certain grammatical operations are restricted in the way they behave, and cannot reach deeply into grammatical structures and rearrange them. For instance, you can ask "What did you use to see the girl on the hill in the blue dress?" and I can reply "I used a telescope to see the girl on the hill in the blue dress". Here "the girl on the hill in the blue dress" is operating as a single component, which could, in principle, be arbitrarily long. ("The girl on the hill that was fought over in the war between the two countries that have been at war since the time your mother saw that monkey climb the steeple of the church...") This component can be extracted whole from one sentence and made the object of a new sentence, or the subject of some other sentence. But certain other structures are not transportable. For example, in "Bill left all his money to Fred and someone", one can reach down as far as "Fred and someone" and ask "What did Bill leave to Fred and someone?" but one cannot reach all the way down to "someone" and ask "Who did Bill leave all his money to Fred and"? Under certain linguistic theories of syntax, analogous constraints rule out the existence of certain words. "Flimped" is the hypothetical nonexistent word which, under these theories, cannot exist. To flimp is to kiss a girl who is allergic to. For example, to flimp coconuts is to kiss a girl who is allergic to coconuts. (The grammatical failure in the last sentence but one illustrates the syntactic problem that supposedly rules out the word "flimped". I am not making this up; for more details (from someone who, unlike me, may know what he is talking about) See Word meaning and Montague grammar by David Dowty, p. 236. Dowty cites the earlier sources, from 1969–1973 who proposed this theory in the first place. The "flimped" example above is exactly the same as Dowty's, and I believe it is the standard one. Dowty provides a similar, but different example: there is not, and under this theory there cannot be, a verb "to thork" which means "to lend your uncle and", so that "John thorked Harry ten dollars" would mean "John lent his uncle and Harry ten dollars". I had these examples knocking around in my head for many years. I used to work for the University of Pennsylvania Computer and Information Sciences department, and from my frequent contacts with various cognitive-science types I acquired a lot of odds and ends of linguistic and computational folklore. Michael Niv told me this one sometime around 1992. The "flimp" thing rattled around my head, surfacing every few months or so, until last week, when I thought of a counterexample: Wank. The verb "to wank to" means "to rub one's genitals while considering", and so seems to provide a countexample to the theory that says that verbs of this type are illegal in English. When I went to investigate, I found that the theory had pretty much been refuted anyway. The Dowty book (published 1979) produced another example: "to cuckold" is "to have sexual intercourse with the woman who is married to". Some Reddit person recently complained that one of my blog posts had no point. Eat this, Reddit person.
[Other articles in category /lang] permanent link Sun, 17 May 2009
Bipartite matching and same-sex marriage
However, if same-sex marriages are permitted, there may not be a stable matching, so the character of the problem changes significantly. A minimal counterexample is:
Suppose we match A–B, C–X. Then since B prefers C to A, and C prefers B to X, B and C divorce their mates and marry each other, yielding B–C, A–X. But now C can improve her situation further by divorcing B in favor of A, who is only too glad to dump the miserable X. The marriages are now A–C, B–X. B now realizes that his first divorce was a bad idea, since he thought he was trading up from A to C, but has gotten stuck with X instead. So he reconciles with A, who regards the fickle B as superior to her current mate C. The marriages are now A–B, C–X, and we are back where we started, having gone through every possible matching. This should not be taken as an argument against same-sex marriage. The model fails to generate the following obvious real-world solution: A, B, and C should all move in together and live in joyous tripartite depravity, and X should jump off a bridge.
[Other articles in category /math] permanent link Fri, 15 May 2009
"Known to Man" and the advent of the space aliens
Two people so far have written to warn me that I would regret this once the space aliens come, and I have to go around undoing all my changes. But even completely leaving aside Wikipedia's "Wikipedia is not a crystal ball" policy, which completely absolves me from having to worry about this eventuality, I think these people have not analyzed the situation correctly. Here is how it seems to me. Consider these example sentences:
There are four possible outcomes for the future:
In cases (1) and (3), both sentences require revision. In case (4), neither sentence requires revision. But in case (2), sentence (a) requires revision, while (b) does not. So my change is a potential improvement in a way I had not appreciated. Also in last week's article, I said it would be nice to find a case where a Wikipedia article's use of "known to man" actually intended a contrast with divine or feminine knowledge, rather than being a piece of inept blather. I did eventually find such a case: the article on runic alphabet says, in part:
In the Poetic Edda poem Rígþula another origin is related of how the runic alphabet became known to man. The poem relates how Ríg, identified as Heimdall in the introduction, ... I gratefully acknowledge the gift of Thomas Guest. Thank you!
[Other articles in category /aliens] permanent link Thu, 14 May 2009
Product types in Java
class Persons2 { Person personA, personB; Persons2(Person a, Person b) { personA = a; personB = b; } Person getPersonA() { return personA; } ... }Java is loathsome in its verbosity, and this sort of monkey code is Java's verbosity at its most loathsome. So I did not do this. Haskell functions return only one value also, but this is no limitation, because Haskell has product types. And starting in Java 5, the Java type system is a sort of dented, bolted-on version of the type systems that eventually evolved into the Haskell type system. But product types are pretty simple. I can make a generic product type in Java:
class Pair<A,B> { A a; B b; Pair(A a, B b) { this.a = a; this.b = b; } A fst() { return a; } B snd() { return b; } }Then I can declare my function to return a Pair<Person,Person>:
Pair<Person,Person> findMatch() { ... return new PairOkay, that worked just fine. The boilerplate is still there, but you only have to do it once. This trick seems sufficiently useful that I can imagine that I will use it again, and that someone else reading this will want to use it too. I've been saying for a while that up through version 1.4, Java was a throwback to the languages of the 1970s, but that with the introduction of generics in Java 5, it took a giant step forward into the 1980s. I think this is a point of evidence in favor of that claim. I wonder why this class isn't in the standard library. I was not the first person to think of doing this; web search turns up several others, who also wonder why this class isn't in the standard library. I wrote a long, irrelevant coda regarding my use of the identifiers husband and wife in the example, but, contrary to my usual practice, I will publish it another day. [ Addendum 20090517: Here's the long, irrelevant coda. ]
I gratefully acknowledge the gift of Petr Kiryakov. Thank you!
[Other articles in category /prog/java] permanent link Fri, 08 May 2009
Most annoying phrase known to man?
In the past I have gone on search-and-destroy missions against certain specific phrases, for example "It should be noted that...", which can nearly always be replaced with "" with no loss of meaning. But "known to man" is more fun. One pleasant property of this phrase is that one can sidestep the issue of whether "man" is gender-neutral. People on both sides of this argument can still agree that "known to man" is best replaced with "known". For example:
As a pleonasm and a cliché, "known to man" is a signpost to prose that has been written by someone who was not thinking about what they were saying, and so one often finds it amid other prose that is pleonastic and clichéd. For example:
Diamond ... is one of the hardest naturally occurring material known (another harder substance known today is the man-made substance aggregated diamond nanorods which is still not the hardest substance known to man).Which I trimmed to say:
Diamond ... is one of the hardest naturally-occurring materials known. (Some artificial substances, such as aggregated diamond nanorods, are harder.)Many people ridicule Strunk and White's fatuous advice to "omit needless words"—if you knew which words were needless, you wouldn't need the advice—but all editors know that beginning writers will use ten words where five will do. The passage above is a good example. Can "known to man" always be improved by replacement with "known"? I might have said so yesterday, but I mentioned the issue to Yaakov Sloman, who pointed out that the original use was meant to suggest a contrast not with female knowledge but with divine knowledge, an important point that completely escaped my atheist self. In light of this observation, it was easy to come up with a counterexample: "His acts descended to a depth of evil previously unknown to man" partakes of the theological connotations very nicely, I think, and so loses some of its force if it is truncated to "... previously unknown". I suppose that many similar examples appear in the work of H. P. Lovecraft. It would be nice if some of the Wikipedia examples were of this type, but so far I haven't found any. The only cases so far that I haven't changed are all direct quotations, including several from the introductory narration of The Twilight Zone, which asserts that "There is a fifth dimension beyond that which is known to man...". I like when things turn out better than I expected, but this wasn't one of those times. Instead, there was one example that was even worse than I expected. Bad writing it may be, but the wrongness of "known to man" is at least arguable in most cases. (An argument I don't want to make today, although if I did, I might suggest that "titanium dioxide is the best whitening agent known to man" be rewritten as "titanium dioxide is the best whitening agent known to persons of both sexes with at least nine and a half inches of savage, throbbing cockmeat.") But one of the examples I corrected was risibly inept, in an unusual way:
Wonder Woman's Amazon training also gave her limited telepathy, profound scientific knowledge, and the ability to speak every language known to man.I have difficulty imagining that the training imparted to Diana, crown princess of the exclusively female population of Paradise Island, would be limited to languages known to man. Earle Martin drew my attention to the Wikipedia article on "The hardest metal known to man". I did not dare to change this. [ Addendum 20090515: There is a followup article. ]
[Other articles in category /lang] permanent link Sun, 22 Mar 2009
Worst error messages this month
Line 319 in XML document from class path resource [applicationContext-standalone.xml] is invalid; nested exception is org.xml.sax.SAXParseException: cvc-complex-type.2.3: Element 'beans' cannot have character [children], because the type's content type is element-only.Experienced technicians will of course want to look at line 319. Silly! If looking at line 319 were any help, this would not be this month's lucky winner. Line 319 is the last line of the document, and says, in whole, "</beans>". What this actually means is that there is a stray plus sign at the end of line 54. Well, that is the ultimate cause. The Fregean Bedeutung, as it were. What it really means (the Sinn) is that the <beans>...</beans> element is allowed to contain sub-elements, but not naked text ("content type is element-only") and the stray plus sign is naked text. The mixture of weird jargon ("cvc-complex-type.2.3") and obscure anaphora ("character [children]" for "plus sign") got this message nominated for the competition. The totally wrong line number is a bonus. But what won this message the prize is that even if you somehow understand what it means, it doesn't help you find the actual problem! You get to grovel over the 319-line XML file line-by-line, looking for the extra character. Come on, folks, it's a SAX parser, so how hard is it to complain about the plus sign as soon as it shows up? What do we have for the lucky winner, Johnny?
You'll be flown to lovely Centralia, Pennsylvania, where you'll enjoy four days and three nights of solitude in an abandoned coal mine being flogged with holly branches and CAT-5 ethernet cable by the cast of "The Hills"!Thank you, Johnny. And there is a runner-up! The badblocks utility that is distributed as part of the Linux e2fsprogs package, produces the following extremely useful error message:
% badblocks /home badblocks: invalid starting block (0): must be less than 0Apparently this is Linux-speak for "This program needs the name of a device file, and the programmer was too lazy to have it detect that you supplied the name of the mount point instead". Happy spring, everyone!
[Other articles in category /prog] permanent link Mon, 09 Mar 2009
Happy birthday
Happy birthday, girls.
[Other articles in category /math] permanent link Tue, 17 Feb 2009
Second-largest cities
It appears that the second-largest city in New York state is some place called (get this) "Buffalo". Okay, whatever. But that got me wondering if New York was the state with the greatest disparity between its largest and second-largest city. Since I had the census data lying around from a related project (and a good thing too, since the Census Bureau website moved the file) I decided to find out. The answer is no. New York state has only one major city, since its next-largest settlement is Buffalo, with 1.1 million people. (Estimated, as of 2006.) But the second-largest city in Illinois is Peoria, which is actually the punchline of jokes. (Not merely because of its small size; compare Dubuque, Iowa, a joke, with Davenport, Iowa, not a joke.) The population of Peoria is around 370,000, less than one twenty-fifth that of Chicago. But if you want to count weird exceptions, Rhode Island has everyone else beat. You cannot compare the sizes of the largest and second-largest cities in Rhode Island at all. Rhode Island is so small that it has only one city, Seriously. No, stop laughing! Rhode Island is no laughing matter. The Articles of Confederation required unanimous consent to amend, and Rhode Island kept screwing everyone else up, by withholding consent, so the rest of the states had to junk the Articles in favor of the current United States Constitution. Rhode Island refused to ratify the new Constitution, insisting to the very end that the other states had no right to secede from the Confederation, until well after all of the other twelve had done it, and they finally realized that the future of their teeny one-state Confederation as an enclave of the United States of America was rather iffy. Even then, their vote to join the United States went 34–32. But I digress. Actually, for many years I have said that you can impress a Rhode Islander by asking where they live, and then—regardless of what they say—remarking "Oh, that's near Providence, isn't it?" They are always pleased. "Yes, that's right!" The census data proves that this is guaranteed to work. (Unless they live in Providence, of course.) Here's a joke for mathematicians. Q: What is Rhode Island? A: The topological closure of Providence. Okay, I am finally done ragging on Rhode Island. Here is the complete data, ordered by size disparity. I wasn't sure whether to put Rhode Island at the top or the bottom, so I listed it twice, just like in the Senate.
Some of this data is rather odd because of the way the census bureau aggregates cities. For example, the largest city in New Jersey is Newark. But Newark is counted as part of the New York City metropolitan area, so doesn't count separately. If it did, New Jersey's quotient would be 5.86 instead of 1.35. I should probably rerun the data without the aggregation. But you get oddities that way also. I also made a scatter plot. The x-axis is the population of the largest city, and the y-axis is the population of the second-largest city. Both axes are log-scale: Nothing weird jumps out here. I probably should have plotted population against quotient. The data and programs are online if you would like to mess around with them.
I gratefully acknowledge the gift of Tim McKenzie. Thank you!
[Other articles in category /misc] permanent link Sun, 15 Feb 2009
Milo of Croton and the sometimes failure of inductive proofs
"Did it work?" I asked. "No," said Ranjit. "A newborn calf already weighs like a hundred pounds." Usually you expect the induction step to fail, but sometimes it's the base case that gets you.
[Other articles in category /misc] permanent link
Stupid crap, presented by Plato
"She is not 'your' girlfriend," said this knucklehead. "She does not belong to you."Through pure happenstance, I discovered last night that there is an account of this same bit of equivocation in Plato's Euthydemus. In this dialogue, Socrates tells of a sophist named Dionysodorus, who is so clever that he can refute any proposition, whether true or false. Here Dionysodorus demonstrates that Ctesippus's father is a dog:
You say that you have a dog.So my knuckleheaded interlocutor was not even being original.
I gratefully acknowledge the gift of Thomas Guest. Thank you very much!
[Other articles in category /lang] permanent link Sat, 14 Feb 2009
The junk heap of (blog) history
I have an idea that I might inaugurate a new section of the blog, called "junkheap", where unfinished articles would appear after aging in the cellar for three years, regardless what sort of crappy condition they are in.Some of the stuff in the cellar is in pretty good shape. Some is really trash. I don't want to publish any of it on the main blog, for various reasons; if I did, I would have already. But some of it might have some interest for someone, or I wouldn't be revisiting it at all. Here's a summary of the cellared items from February-April 2006, with the reasons why each was abandoned:
I invite your suggestions for what to do with this stuff. Mailing list? Post brief descriptions in the blog and let people request them by mail? Post them on a wiki and let people hack on them? Stop pretending that my every passing thought is so fascinating that even my failures are worth reading? The last one is the default, and I apologize for taking up your valuable time with this nonsense.
[Other articles in category /meta] permanent link Fri, 13 Feb 2009
Stupid crap
Anyway, it's been on my to-do list for about three years, so what the hell.
Around 1986, I heard it claimed that Ronald Reagan did not have practical qualifications for the presidency, because he had not been a lawyer or a general or anything like that, but rather an actor. "An actor?" said this person. "How does being an actor prepare you to be President?" I pointed out that he had also been the Governor of California. "Oh, yeah." But it doesn't even stop there. Who says some actor is qualified to govern California? Well, he had previously been president of the Screen Actors' Guild, which seems like a reasonable thing for the Governor of California to have done.
Around 1992, I was talking to a woman who claimed that the presidency was not open to the disabled, because the President was commander-in-chief of the army, he had to satisfy the army's physical criteria, and they got to disqualify him if he couldn't complete basic training, or something like that. I asked how her theory accommodated Ronald Reagan, who had been elected at the age of 68 or whatever. Then I asked how the theory accommodated Franklin Roosevelt, who could not walk, or even stand without assistance, and who traveled in a wheelchair. "Huh."
I was once harangued by someone for using the phrase "my girlfriend." "She is not 'your' girlfriend," said this knucklehead. "She does not belong to you." Sometimes you can't think of the right thing to say at the right time, but this time I did think of the right thing. "My father," I said. "My brother. My husband. My doctor. My boss. My congressman." "Oh yeah." My notes also suggest a long article about dumb theories in general. For example, I once read about someone who theorized that people were not actually smaller in the Middle Ages than they are today. We only think they were, says this theory, because we have a lot of leftover suits of armor around that are too small to fit on modern adults. But, according to the theory, the full-sized armor got chopped up in battles and fell apart, whereas the armor that's in good condition today is the armor of younger men, not full-grown, who outgrew their first suits, couldn't use them any more, and hung then on the wall as mementoes. (Or tossed them in the cellar.) I asked my dad about this, and he wanted to know how that theory applied to the low doorways and small furniture. Heh. I think Herbert Illig's theory is probably in this category, Herbert Illig, in case you missed it, believes that this is actually the year 1712, because the years 614–911 never actually occurred. Rather, they were created by an early 7th-century political conspiracy to rewrite history and tamper with the calendar. Unfortunately, most of the source material is in German, which I cannot read. But it would seem that cross-comparisons of astronomical and historical records should squash this theory pretty flat. In high school I tried to promulgate the rumor that John Entwistle and Keith Moon were so disgusted by the poor quality of the album Odds & Sods that they refused to pose for the cover photograph. The rest of the band responded by finding two approximate lookalikes to stand in for Moon and Enwistle, and by adopting a cover design calculated to obscure the impostors' faces as much as possible. This sort of thing was in some ways much harder to pull off successfully in 1985 than it is today. But if you have heard this story before, please forget it, because I made it up.
I would like to acknowledge the generous gift of Jack Kennedy. Thank you very much! This acknowledgement is not intended to be apropos of this blog post. I just decided I should start acknowledging gifts, and this happened to be the first post since I made the decision. [ Addendum 20090214: A similar equivocation of "your" is mentioned by Plato. ]
[Other articles in category /misc] permanent link Thu, 12 Feb 2009
More Uzi-clubbing: a counterexample
I ended the article by saying:
I had already realized that you could, in principle, commit this error with a regular array instead of with a hash, but I had never seen an example until...Just recently I saw another example, which I think is interesting because it seems to be a counterexample. It's part of a somewhat longer Java program. The crucial section is:
... LINE: while ( ( line = in.readLine()) != null ) { String[] fields = line.split("\t"); ... for ( int i = 0; i < fields.length; i++ ) { if ( ! isEmpty(fields[i]) ) { switch(i) { case 0: citation.setCitationType(fields[i]); break; case 1: setAuthors(citation,fields[i],personHome,false); break; case 2: citation.setPublishYear(Integer.parseInt(fields[i])); break; case 3: citation.setTitle(fields[i]); break; ... case 19: citation.setURL(fields[i]); break; case 20: citation.setDoi(fields[i]); break; default: warn("Empty field expected, found: " + fields[i] + " for line: " + line); break; } } } } ...The Perlishness of this Java code might lead you to think that I wrote it, but I did not. My temptation here was to replace the loop and the switch with code like this:
citation.setCitationType(fields[0]); setAuthors(citation,fields[1],personHome,false); citation.setPublishYear(Integer.parseInt(fields[2])); citation.setTitle(fields[3]); ... citation.setURL(fields[19]); citation.setDoi(fields[20]);We lost the warnings, but there were only 4 of those, so we can add them back explicitly:
if (! isEmpty(fields[13])) warn("Empty field expected...");This might have been an improvement, except that we also lost the isEmpty tests on the nonempty fields. To get them back we must spend at least all our gains, possibly more:
if (! isEmpty(fields[0])) citation.setCitationType(fields[0]); if (! isEmpty(fields[1])) setAuthors(citation,fields[1],personHome,false); if (! isEmpty(fields[2])) citation.setPublishYear(Integer.parseInt(fields[2])); if (! isEmpty(fields[3])) citation.setTitle(fields[3]); ... if (! isEmpty(fields[13])) warn("Empty field expected..."); ... if (! isEmpty(fields[19])) citation.setURL(fields[19]); if (! isEmpty(fields[20])) citation.setDoi(fields[20]);So at least in this case, my instinct to eliminate the loop-switch was not helpful. There are plenty of Java-esque techniques for cutting up the complexity and sweeping each little piece underneath its own little carpet ("Replace fields with an object! Or with a series of 20 objects!") but nothing that actually reduces the entia multiplicantis. There may be ways to easily improve this code, but I have not been able to think of any.
[Other articles in category /prog] permanent link Fri, 06 Feb 2009
Maybe energy is really real
I call it crackpottish, but I do think I made a reasonable case, and of the many replies I got to that article, I don't think anyone said conclusively that I was a complete jackass. (Of course, it might be that none of the people who really know wanted to argue with a crackpot.) I have thought about it a lot before and since; it continues to bother me. But I few months ago I did remember an argument that energy is a real thing. Specifically, I remembered Noether's theorem. Noether's theorem, if I understand correctly, claims that for every symmetry in the physical universe, there is a corresponding conservation law, and vice versa. For example, let's suppose that space itself is uniform. That is, let's suppose that the laws of physics are invariant under a change of position that is a translation. In this special case, Noether's theorem says that the laws must include conservation of momentum: conservation of momentum is mathematically equivalent to the claim that physics is invariant unter a translation transformation. Perhaps this is a good time to add that I do not (yet) understand Noether's theorem, that I am only parroting stuff that I have read elsewhere, and that my usual physics-related disclaimer applies: I understand just barely enough physics to spin a plausible-sounding line of bullshit. Anyway, going on with my plausible-sounding bullshit about Noether's theorem, invariance of the laws under a spatial rotation is equivalent to the law of conservation of angular momentum. Actually I think I might have remembered that one wrong. But the crucial one for me I am sure I am not remembering wrong: invariance of physical law under a translation in time rather than in space is equivalent to conservation of energy. Aha. If this is right, then perhaps there is a good basis for the concept of energy after all, because any physics that is time-invariant must have an equivalent concept of energy. Time-invariance might not be true, but I have no philosophical objection to it, nor do I claim that the notion is incoherent. So it seems that if I am to understand this properly, I need to understand Noether's theorem. Maybe I'll make that a resolution for 2009. First stop, Wikipedia, to find out what the prerequisites are.
[Other articles in category /physics] permanent link Thu, 29 Jan 2009
A simple trigonometric identity
Here I put A at (0,0), B at (4,0), and 1 at (2,2). This is clear enough, but I wished that it were more nearly equilateral. So that night as I was waiting to fall asleep, I thought about the problem of finding lattice points that are at the vertices of an equilateral triangle. This is a sort of two-dimensional variation on the problem of finding rational approximations to surds, which is a topic that has turned up here many times over the years. Or rather, I wanted to find lattice points that are almost at the vertices of an equilateral triangle, because I was pretty sure that there were no equilateral lattice triangles. But at the time I could not remember a proof. I started doing some calculations based on the law of cosines, which was a mistake, because nobody but John Von Neumann can do calculations like that in their head as they wait to fall asleep, and I am not John Von Neumann, in case you hadn't noticed. A simple proof that there are no equilateral lattice triangles has just now occurred to me, though, and I am really pleased with it, so we are about to have a digression. The area A of an equilateral triangle is s√3/2, where s is the length of the side. And s has the form √t because of the Pythagorean theorem, so A = √(3t)/2, where t is a sum of two squares, because the endpoints of the side are lattice points. By Pick's theorem, the area of any lattice triangle is a half-integer. So 3t is a perfect square, and thus there are an odd number of threes in t's prime factorization. But t is a sum of two squares, and by the sum of two squares theorem, its prime factorization must have an even number of threes. We now have a contradiction, so there was no such triangle. Wasn't that excellent? That is just the sort of thing that I could have thought up while waiting to fall asleep, so it proves even more conclusively that starting with the law of cosines was a mistake. Okay, end of digression. Back to the law of cosines. We have a triangle with sides a, b, and c, and opposite angles A, B, and C, and you no doubt recall from high school that c^{2} = a^{2} + b^{2} - 2ab cos C. We'll call this "law C". Before I fell alseep, it occurred to me that you could take the analogous law B, which is b^{2} = a^{2} + c^{2} - 2ac cos B, and substitute the right-hand side for the b^{2} term in law C. Then a bunch of stuff will cancel out and you should either get something interesting or something tautological. Von Neumann would have known right away which it was, but I needed paper. So today I got out the paper and did the thing, and came up with the very simple relation that:
c = a cos B + b cos AWhich holds in any triangle. But somehow I had never seen this before, or, if I had, I had completely forgotten it. The thing is so simple that I thought that it must be wrong, or I would have known it already. But no, it checked out for the easy cases (right triangles, equilateral triangles, trivial triangles) and the geometric proof is easy: Just drop a perpendicular from C. The foot of the perpendicular divides the base c into two segments, which, by the simplest possible trigonometry, have lengths a cos B and b cos A, respectively. QED. Perhaps that was anticlimactic. Have I mentioned that I have a sign on the door of my office that says "Penn Institute of Lower Mathematics"? This is the kind of thing I'm talking about. I will let you all know if I come up with anything about the almost-equilateral lattice triangles. Clearly, you can approximate the equilateral triangle as closely as you like by making the lattice coordinates sufficiently large, just as you can approximate √3 as closely as you like with rationals by making the numerator and denominator sufficiently large. Proof: Your computer draws equilateral-seeming triangles on the screen all the time. I note also that it is important that the lattice is two-dimensional. In three or more dimensions the triangle (1,0,0,0...), (0,1,0,0...), (0,0,1,0...) is a perfectly equilateral lattice triangle with side √2. [ Addendum 20090130: Vilhelm Sjöberg points out that the area of an equilateral triangle is s^{2}√3/4, not s√3/2. Whoops. This spoils my lovely proof, because the theorem now follows immediately from Pick's: s^{2} is an integer by Pythagoras, so the area is irrational rather than a half-integer as Pick's theorem requires. ] [ Addendum 20140403: As a practical matter, one can draw a good lattice approximation to an equilateral triangle by choosing a good rational approximation to !!\sqrt3!!, say !!\frac ab!!, and then drawing the points !!(0,0), (b,a),!! and !!(2b, 0)!!. The rational approximations to !!\sqrt3!! quickly produce triangles that are indistinguishable from equilateral. For example, the rational approximation !!\frac74!! gives the isosceles triangle with vertices !!(0,0), (4,7), (8,0)!! which has one side of length 8 and two sides of length !!\sqrt{65}\approx 8.06!!, an error of less than one percent. The next such approximation, !!\frac{26}{15}!!, gives a triangle that is correct to about 1 part in 1800. (For more about rational approximations to !!\sqrt3!!, see my article on Archimedes and the square root of 3.) ]
[Other articles in category /math] permanent link Tue, 27 Jan 2009
Amusements in Hyperspace
This evening I tried to imagine life in a 1000-dimensional universe. I didn't get too far, but what I did get seemed pretty interesting. What's it like? Well, it's very dark. Lamps wouldn't work very well, because if the illumination one foot from the source is I, then the illumination two feet from the source is I · 9.3·10^{-302}. Actually it's even worse than that; there's a double whammy. Suppose you had a cubical room ten feet across. If you thought it was hard to light up the dark corners of a big room in Boston in February, imagine how much worse it is in hyperspace where the corners are 158 feet away. There are some upsides, however. Rooms won't have to be ten feet on a side because everything will be smaller. You take up about 70,000 cubic centimeters of space; in hyperspace that is just not a lot of room, because a box barely more than a centimeter on a side takes up 70,000 hypercentimeters. In fact, a box barely more than a centimeter on a side can hold as much as you want; an 11 millimeter box already contains 2.5·10^{41} hypercentimeters. It's hard to put people in prison in hyperspace, because there are so many directions that you can go to get out. Flatland prison cells have four walls; ours have six, if you count the ceiling and the floor. Hyperspace prison cells have 2000 walls, and each one is very expensive to build. So that's hyperspace: Big, dark, and easy to get around. [ Addenda 20120510: An anonymous commenter on Colm Mulcahy's blog observed that "high dimension cubes are qualitatively more like hedgehogs than building blocks". And recently someone asked on stackexchange.math for "What are some examples of a mathematical result being counterintuitive?"; the top-scoring reply concerned the bizarre behavior of high-dimension cubes. ]
[Other articles in category /math] permanent link Sat, 24 Jan 2009
Higher-Order Perl: nonmemoizing streams
sub tail { my $s = shift; if (is_promise($s->[1])) { return $s->[1]->(); # Force promise } else { return $s->[1]; } }But this is soon replaced with a version that caches the value returned by the promise:
sub tail {
my $s = shift;
if (is_promise($s->[1])) {
$s->[1] = $s->[1]->(); # Force and save promise
}
return $s->[1];
}
The reason that I give for this in the book is a performance reason.
It's accompanied by an extremely bad explanation. But I couldn't do
any better at the time.There are much stronger reasons for the memoizing version, also much easier to explain. Why use streams at all instead of the iterators of chapter 4? The most important reason, which I omitted from the book, is that the streams are rewindable. With the chapter 4 iterators, once the data comes out, there is no easy way to get it back in. For example, suppose we want to process the next bit of data from the stream if there is a carrot coming up soon, and a different way if not. Consider:
# Chapter 4 iterators my $data = $iterator->(); if (carrot_coming_soon($iterator)) { # X } else { # Y } sub carrot_coming_soon { my $it = shift; my $soon = shift || 3; while ($soon-- > 0) { my $next = $it->(); return 1 if is_carrot($next); } return; # No carrot }Well, this probably doesn't work, because the carrot_coming_soon() function extracts and discards the upcoming data from the iterator, including the carrot itself, and now that data is lost. One can build a rewindable iterator:
sub make_rewindable { my $it = shift; my @saved; # upcoming values in LIFO order return sub { my $action = shift || "next"; if ($action eq "put back") { push @saved, @_; } elsif ($action eq "next") { if (@saved) { return pop @saved; } else { return $it->(); } } }; }But it's kind of a pain in the butt to use:
sub carrot_coming_soon { my $it = shift; my $soon = shift || 3; my @saved; my $saw_carrot; while ($soon-- > 0) { push @saved, $it->(); $saw_carrot = 1, last if is_carrot($saved[-1]); } $it->("put back", @saved); return $saw_carrot; }Because you have to explicitly restore the data you extracted. With the streams, it's all much easier:
sub carrot_coming_soon { my $s = shift; my $soon = shift || 3; while ($seen-- > 0) { return 1 if is_carrot($s->head); drop($s); } return; }The working version of carrot_coming_soon() for streams looks just like the non-working version for iterators. But this version of carrot_coming_soon() only works for memoizing streams, or for streams whose promise functions are pure. Let's consider a counterexample:
my $bad = filehandle_stream(\*DATA); sub filehandle_stream { my $fh = shift; return node(scalar <$fh>, promise { filehandle_stream($fh) }); } __DATA__ fish dog carrot goat rectumNow consider what happens if I do this:
$carrot_soon = carrot_coming_soon($bad); print "A carrot appears soon after item ", head($bad), "\n" if $carrot_soon;It says "A carrot appears soon after item fish". Fine. That's because $bad is a node whose head contains "fish". Now let's see what's after the fish:
print "After ", head($bad), " is ", head(tail($bad)), "\n";This should print After fish is dog, and for the memoizing streams I used in the book, it does. But a non-memoizing stream will print "After fish is goat rectum". Because tail($bad) invokes the promise function, which, since the next() was not saved after carrot_coming_soon() examined it, builds a new node, which reads the next item from the filehandle, which is "goat rectum". I wish I had explained the rewinding property of the streams in the book. It's one of the most significant omissions I know about. And I wish I'd appreciated sooner that the rewinding property only works if the tail() function autosaves the tail node returned from the promise.
[Other articles in category /prog/perl] permanent link Fri, 23 Jan 2009
Archimedes and the square root of 3, revisited
I pointed out that although the approximations seem to come out of thin air, a little thought reveals where they probably did come from; it's not very hard. Briefly, you tabulate a^{2} and 3b^{2}, and look for numbers from one column that are close to numbers from the other; see the previous article for details. But Dr. Chuck Lindsey, the author of a superb explanation of Archimedes' methods, and a professor at Florida Gulf Coast University, seemed mystified by the appearance of the fraction 265/153:
Throughout this proof, Archimedes uses several rational approximations to various square roots. Nowhere does he say how he got those approximations—they are simply stated without any explanation—so how he came up with some of these is anybody's guess.I left it there for a few years, but just recently I got puzzled email from a gentleman named Peter Nockolds. M. Nockolds was not puzzled by the 265/153. Rather, he wanted to know why so many noted historians of mathematics should be so puzzled by the 265/153. This was news to me. I did not know anyone else had been puzzled by the 265/153. I had assumed that nearly everyone else saw it the same way that M. Nockolds and I did. But M. Nockolds provided me with a link to an extensive discussion of the matter, which included quotations from several noted mathematicians and historians of mathematics:
It would seem...that [Archimedes] had some (at present unknown) method of extracting the square root of numbers approximately. ...the calculation [of π] starts from a greater and lesser limit to the value of √3, which Archimedes assumes without remark as known, namely 265/153 < √3 < 1351/780. How did Archimedes arrive at this particular approximation? No puzzle has exercised more fascination upon writers interested in the history of mathematics... The simplest supposition is certainly [the "Babylonian method"; see Kline below]. Another suggestion...is that the successive solutions in integers of the equations x^{2}-3y^{2}=1 and x^{2}-3y^{2}=-2 may have been found...in a similar way to...the Pythagoreans. The rest of the suggestions amount for the most part to the use of the method of continued fractions more or less disguised.Heath said "The simplest supposition is certainly ..." and then followed with the "Babylonian method", which is considerably more complicated than the extremely simple method I suggested in my earlier article. Morris Kline explains the Babylonian method:
He also obtained an excellent approximation to √3, namely 1351/780 > √3 > 265/153, but does not explain how he got this result. Among the many conjectures in the historical literature concerning its derivation the following is very plausible. Given a number A, if one writes it as a^{2} ± b where a^{2} is the rational square nearest to A, larger or smaller, and b is the remainder, then a ± b/2a > √A > a ± b/(2a±1). Several applications of this procedure do produce Archimedes' result.And finally:
Archimedes approximated √3 by the slightly smaller value 265/153... How he managed to extract his square roots with such accuracy...is one of the puzzles that this extraordinary man has bequeathed to us.Nockolds asked me "Have you had any feedback from historians of maths who explain why it wasn't so easy to arrive at 265/153 or even 1351/780? Have you any idea why they make such a big deal out of this?" No, I'm mystified. Even working with craptastic Greek numerals, it would not take Archimedes very long to tabulate kn^{2} far enough to discover that 3·780^{2} = 1351^{2} - 1. Or, if you don't like that theory, try this one: He tabulated n^{2} and 3n^{2} far enough to discover the following approximations:
I know someone wants to claim that this is nothing more than the Babylonian method. But this is missing an important point. Although this sort of numeric tinkering might well lead you to discover the Babylonian method, especially if you were Archimedes, it is not the Babylonian method, and it can be done in complete ignorance of the Babylonian method. But it yields the required approximations anyway. So I will echo Nockolds' puzzlement here. There are a lot of things that Archimedes did that were complex and puzzling, but this is not one of them. You do not need sophisticated algebraic technique to find approximations to surds. You only need to do (at most) a few hours of integer calculation. The puzzle is why people like Rouse Ball and Heath think it is puzzling. There's an explanation I'm groping for but can't quite articulate, but which goes something like this: Perhaps mathematicians of the late Victorian age lent too much weight to theory and analysis, and not enough to heuristic and simple technique. As a lifelong computer programmer, I have a great appreciation for what can be accomplished by just grinding out the numbers. See my anecdote about the square root algorithm used by the ENIAC, for example. I guessed then that perhaps computer science professors know more about mathematics than I expect, but less about computation. I can imagine the same thing of Victorian mathematicians—but not of Archimedes. One thing you often hear about pre-19th-century mathematicians is that they were great calculators. I wonder if appreciation of simple arithmetic technique might not have been sometimes lost to the mathematicans from the very end of the pre-computation age, say 1880–1940. Then again, perhaps I'm not giving them enough credit. Maybe there's something going on that I missed. I haven't checked the original sources to see what they actually say, so who knows? Perhaps Heath discusses the technique I suggested, and then rejects it for some fascinating reason that I, not being an expert in Greek mathematics, can't imagine. If I find out anything else, I will report further.
[Other articles in category /math] permanent link Wed, 21 Jan 2009
The loophole in the U.S. Constitution: recent developments
Recently Jeffrey Kegler wrote to inform me of some startling new developments on this matter. Although it previously appeared that the story was probably true, there was no firsthand evidence that it had actually occurred. The three witnesses would have been Philip Forman (the examining judge), Oskar Morgenstern and Albert Einstein. But, although Morgenstern apparently wrote up an account of the epsiode, it was lost. Until now, that is. The Institute for Advanced Study (where Gödel, Einstein, and Morgenstern were all employed) posted an account on its web site, and M. Kegler was perceptive enough to realize that this account was probably written by someone who had access to the lost Morgenstern document but did not realize its significance. M. Kegler followed up the lead, and it turned out to be correct. Kegler's Morgenstern website has a lot of additional detail, including the original document itself.
Now came an exciting development. [Gödel] rather excitedly told me that in looking at the Constitution, to his distress, he had found some inner contradictions, and he could show how in a perfectly legal manner it would be possible for somebody to become a dictator and set up a Fascist regime, never intended by those who drew up the Constitution.But before I let you get too excited about this, a warning: Morgenstern doesn't tell us what Gödel's loophole was! (Kegler's reading is that Morgenstern didn't care.) So although the truth of story has finally been proved beyond doubt, the central mystery remains. The document is worth reading anyway. It's only three pages long, and it paints a fascinating picture of both Gödel, who is exactly the sort of obsessive geek that you always imagined he was, and of Einstein, who had a cruel streak that he was careful not to show to the public. Kegler's website is also worth reading for its insightful analysis of the lost document and its story.
[Other articles in category /law] permanent link Tue, 20 Jan 2009
Triples and Closure
In topology, we have the idea of a "closure" of a set, which is essentially the union of the set with its boundary. For example, consider an open disk D, say the set of all points less than one mile from my house. The boundary of this set is a circle with radius one mile, centered at my house. The closure of D is the union of D with its boundary, and so is closed disk consisting of all points less than or equal to one mile from my house. Representing the closure of a set S as C(S), we have the obvious theorem that S ⊂ C(S), because the closure includes everything in S, plus the boundary. Another easy, but not quite obvious theorem is that C(C(S)) ⊂ C(S). This says that once you take the closure, you have included the boundary, and you do not get any more boundary by taking the closure again. The closure of a set is "closed"; the closure of a "closed" set C is just C. A third fundamental theorem about closures is that A⊂ B → C(A) ⊂ C(B). Now we turn to monads. A monad is first of all a functor, which, if you restrict your attention to programming languages, means that a monad is a type constructor M with an associated function fmap such that for any function f of type α → β, fmap f has type M α → M β. But a monad is also equipped with two other functions. There is a return function, which has type α → M α, and a join function, which has type M M α → M α. Haskell provides monads with a "bind" function, written >>=, which is interdefinable with join:
join x = x >>= id a >>= b = join (fmap b a)but we are going to forget about >>= for now. So the monad is equipped with three fundamental operations: fmap :: (a → b) → (M a → M b) join :: M M a → M a return :: a → M aThe three basic theorems about topological closures are:
(A ⊂ B) → (C(A) ⊂ C(B)If we imagine that ⊂ is a special kind of implication, the similarity with the monad laws is clear. And ⊂ is a special kind of implication, since (A ⊂ B) is just an abbreviation for (x ∈ A → x ∈ B). If we name the three closure theorems "fmap", "join", and "return", we might guess that "bind" also turns out to be a theorem. And it is, because >>= has the type M a → (a → M b) → M b. The corresponding theorem is: x ∈ C(A) → (A ⊂ C(B)) → x ∈ C(B)If the truth of this is hard to see, it is partly because the implications are in an unnatural order. The theorem is stated in the form P → Q → R, but it would be easier to understand as the equivalent Q → P → R:
A ⊂ C(B) → x ∈ C(A) → x ∈ C(B)Or more briefly:
A ⊂ C(B) → C(A) ⊂ C(B)This is quite true. We can prove it from the other three theorems as follows. Suppose A ⊂ C(B). Then by "fmap", C(A) ⊂ C(C(B)). By "join", C(C(B)) ⊂ C(B). By transitivity of ⊂, C(A) ⊂ C(B). This is what we wanted. Haskell defines a =<< operator which is the same as >>= except with the arguments forwards instead of backwards:
=<< :: (a → M b) → M a → M b a =<< b = b >>= aThe type of this function is analogous to the bind theorem, and I have seen claims in the literature that the argument order is in some ways more natural. Where the >>= function takes a value first, and then feeds it to a given function, the =<< function makes more sense as a curried function, taking a function of type a → M b and yielding the corresponding function of type M a → M b. I think it's also worth noticing that the structure of the proof of the bind theorem (invoke "fmap" and then "join") is exactly the same as the structure of the code that defines "bind". We can go the other way also, and prove the "join" theorem from the "bind" theorem. The definition of join in terms of >>= is:
join a = a >>= idFollowing the program again, id in the program code corresponds to the theorem that B ⊂ B for any B. A special case of this theorem is that C(B) ⊂ C(B) for any B. Then in the "bind" theorem: A ⊂ C(B) → C(A) ⊂ C(B)take A = C(B):
C(B) ⊂ C(B) → C(C(B)) ⊂ C(B)The left side of the implication is satisfied, so we conclude the consequent, C(C(B)) ⊂ C(B), which is what we wanted. But wait, monad operations are also required to satisfy some monad laws. For example, join (return x) = x. How does this work out in topological closure world? In programming language world, x here is required to have monad type. Monad types correspond to closed sets, so this is a theorem about closed sets. The theorem says that if X is a closed set, then the closure of X is the same as x. This is true. The identity between these two things can be found in (surprise) category theory. In category theory, a monad is a (categorial) functor equipped with two natural transformations, the "return" and "join" operations. The categorial version of a closure operator is essentially the same. Closure operations have a natural opposite. In topology, it is the "interior of" operation. The interior of a set is what you get if you discard the boundary of the set. The interior of a closed disc is an open disc; the interior of an open disc is the same open disc. Interior operations satisfy laws analogous but opposite to those enjoyed by closures:
Notice that the third theorem does not get turned around. I think this is because it comes from the functor itself, which goes the same way, not from the natural transformations, which go the other way. But I have not finished thinking abhout it carefully yet. Sooner or later I am going to program in Haskell with comonads, and it gives me a comfortable feeling to know that I am pre-equipped with a way to understand them as interior operations. I have an idea that the power of mathematics comes principally from the places where it succeeds in understanding two different things as aspects of the same thing. For example, why is group theory so useful? Because it understands transformations of objects (say, rotations of a polyhedron) and algebraic operations as essentially the same thing. If you have a hard problem about one, you can often make it into an easier problem about the other one. Similarly analytic geometry transforms numerical problems into geometric problems and back again. Most often the geometry is harder than the numerical problem, and you use it in that direction, but often you go in the other direction instead. It is quite possible that this notion is too vague to qualify as an actual theory. But category theory fits the description. Category theory lets you say that types are objects, type constructors are functors, and polymorphic functions are natural transformations. Then you can understand natural transformations as structure-preserving maps of something or other and get some insight into polymorphic functions, or vice-versa. Category theory is a large agglomeration of such identities. Lambek and Scott's book starts with several slogans about category theory. One of these is that many objects of interest to mathematicians form categories, such as the category of sets. Another is that many objects of interest to mathematicians are categories. (For example, each set is a discrete category.) So one of the reasons category theory is so extremely useful is that it sets up these multiple entities as different aspects of the same thing. I went to lunch and found more to say on the subject, but it will have to wait until another time.
[Other articles in category /math] permanent link Wed, 07 Jan 2009
Addenda to recent articles 200811
[Other articles in category /addenda] permanent link |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||