The Universe of Discourse
           
Fri, 27 Aug 2010

A dummy generator for mock objects
I am not sure how useful this actually is, but I after having used it once it was not yet obvious that it was a bad idea, so I am writing it up here.

Suppose you are debugging some method, say someMethod, which accepts as one of its arguments complicated, annoying objects $annoying that you either can't or don't want to instantiate. This might be because $annoying is very complicated, with many sub-objects to set up, or perhaps you simply don't know how to build $annoying and don't care to find out.

That is okay, because you can get someMethod to run without the full behavior of $annoying. Say for example someMethod calls $annoying->foo_manager->get_foo(...)->get_user_id. You don't understand or care about the details because for debugging someMethod it is enough to suppose that the end result is the user ID 3. You could supply a mock object, or several, that implement the various methods, but that requires some work up front.

Instead, use this canned Dummy class. Instead of instantiating a real $annoying (which is difficult) or using a bespoke mock object, use Dummy->new("annoying"):

        package Dummy;
        use Data::Dumper;
        $Data::Dumper::Terse = 1;
        our $METHOD;

        my @names = qw(bottle corncob euphonium octopus potato slide);
        my $NAME = "aaa";

        sub new {
          my ($class, $name) = @_;
          $name ||= $METHOD || shift(@names) || $NAME++;
          bless { N => $name } => $class;
        }
The call Dummy->new("annoying") will generate an ad-hoc mock object; whenever any method is called on this dummy object, the call will be caught by an AUTOLOAD that will prompt you for the return value you want it to produce:

        sub AUTOLOAD {
          my ($self, @args) = @_;
          my ($p, $m) = $AUTOLOAD =~ /(.*)::(.*)/;
          local $METHOD = $m;
          print STDERR "<< $_[0]{N}\->$m >>\n";
          print STDERR "Arguments: " . Dumper(\@args) . "\n";
          my $v;
          do {
            print STDERR "Value?  ";
            chomp($v = <STDIN>);
          } until eval "$v; 1";
          return(eval $v);
        }

        sub DESTROY { }

        1;
The prompt looks like this:

  << annoying->foo_manager >>
  Arguments: []
  Value? 
If the returned value should be a sub-object, no problem: just put in new Dummy and it will make a new Dummy object named foo_manager, and the next prompt will be:

  << foo_manager->get_foo >>
  Arguments: ...
  ...
  Value? 
Now you can put in new Dummy "(Fred's foo)" or whatever. Eventually it will ask you for a value for (Fred's foo)->id and you can have it return 4.

It's tempting to add caching, so that it won't ask you twice for the results of the same method call. But that would foreclose the option to have the call return different results twice. Better, I think, is for the user to cache the results themselves if they plan to use them again; there is nothing stopping the user from entering a value expression like $::val = ....

This may turn out to be one of those things that is mildly useful, but not useful enough to actually use; we'll see.


[Other articles in category /prog/perl] permanent link

Thu, 26 Aug 2010

Monad terminology problem
I think one problem (of many) that beginners might have with Haskell monads is the confusing terminology. The word "monad" can refer to four related but different things:

  1. The Monad typeclass itself.

  2. When a type constructor T of kind ∗ → ∗ is an instance of Monad we say that T "is a monad". For example, "Tree is a monad"; "((→) a) is a monad". This is the only usage that is strictly corrrect.

  3. Types resulting from the application of monadic type constructors (#2) are sometimes referred to as monads. For example, "[Integer] is a monad".

  4. Individual values of monadic types (#3) are often referred to as monads. For example, the "All About Monads" tutorial says "A list is also a monad".

Usage #1 is not a real problem; it does not occur that often, and is readily distinguished by context, capitalization, type font, and other markers. #2 is actually correct, so there is no problem there. #3 seems to be an uncommon colloquialism.

The most serious problem here is #4, that people refer to individual values of monadic types as "monads". Even when they don't do this, they are hampered by the lack of a good term for it. As I know no good alternative has been proposed. People often say "monadic value" (I think), which is accurate, but something of a mouthful.

One thing I have discovered in my writing life is that the clarity of a confusing document can sometimes be improved merely by replacing a polysyllabic noun phrase with a monosyllable. For example, chapter 3 of Higher-Order Perl discussed the technique of memoizing a function by generating an anonymous replacement for it that maintains a cache and calls the real function on a cache miss. Early drafts were hard to understand, and improved greatly when I replaced the phrase "anonymous replacement function" with "stub". The Perl documentation was significantly improved merely by replacing "associative array" everywhere with "hash" and "funny punctuation character" with "sigil".

I think a monosyllabic replacement for "monadic value" would be a similar boon to discussion of monads, not just for beginners but for everyone else too. The drawback, of introducing yet another jargon term, would in this case be outweighed by the benefits. Jargon can obscure, but sometimes it can clarify.

The replacement word should be euphonious, clear but not overly specific, and not easily confused with similar jargon words. It would probably be good for it to begin with the letter "m". I suggest:

mote

So return takes a value and returns a mote. The >>= function similarly lifts a function on pure values to a function on motes; when the mote is a container one may think of >>= as applying the function to the values in the container. [] is a monad, so lists are motes. The expression on the right-hand side of a var ← expr in a do-block must have mote type; it binds the mote on the right to the name on the left, using the >>= operator.

I have been using this term privately for several months, and it has been a small but noticeable success. Writing and debugging monadic programs is easier because I have a simple name for the motes that the program manipulates, which I can use when I mumble to myself: "What is the type error here? Oh, commit should be returning a mote." And then I insert return in the right place.

I'm don't want to oversell the importance of this invention. But there is clearly a gap in the current terminology, and I think it is well-filled by "mote".

(While this article was in progress I discovered that What a Monad is not uses the nonceword "mobit". I still prefer "mote".)


[Other articles in category /prog/haskell] permanent link

Sun, 03 Jan 2010

A short bibliography of probability monads
Several people helpfully wrote to me to provide references to earlier work on probability distribution monads. Here is a summary:

My thanks to Stephen Tetley, Gaal Yahas, and Luke Palmer for these.

I did not imagine that my idea was a new one. I arrived at it by thinking about List as a representation of non-deterministic computation. But if you think of it that way, the natural interpretation is that every list element represents an equally likely outcome, and so annotating the list elements with probabilities is the obvious next step. So the existence of the Erwig library was not a big surprise.

A little more surprising though, were the references in the Erwig paper. Specifically, the idea dates back to at least 1981; Erwig cites a paper that describes the probability monad in a pure-mathematics context.

Nobody responded to my taunting complaint about Haskell's failure to provide support a good monad of sets. It may be that this is because they all agree with me. (For example, the documentation of the Erwig package says "Unfortunately we cannot use a more efficient data structure because the key type must be of class Ord, but the Monad class does not allow constraints for result types.") But a number of years ago I said that the C++ macro processor blows goat dick. I would not have put it so strongly had I not naïvely believed that this was a universally-held opinion. But no, plenty of hapless C++ programmers wrote me indignant messages defending their macro system. So my being right is no guarantee that language partisans will not dispute with me, and the Haskell community's failure to do so in this case reflects well on them, I think.


[Other articles in category /prog/haskell] permanent link

Thu, 31 Dec 2009

A monad for probability and provenance
I don't quite remember how I arrived at this, but it occurred to me last week that probability distributions form a monad. This is the first time I've invented a new monad that I hadn't seen before; then I implemented it and it behaved pretty much the way I thought it would. So I feel like I've finally arrived, monadwise.

Suppose a monad value represents all the possible outcomes of an event, each with a probability of occurrence. For concreteness, let's suppose all our probability distributions are discrete. Then we might have:

	data ProbDist p a = ProbDist [(a,p)] deriving (Eq, Show)
	unpd (ProbDist ps) = ps
Each a is an outcome, and each p is the probability of that outcome occurring. For example, biased and unbiased coins:

    unbiasedCoin = ProbDist [ ("heads", 0.5),
                              ("tails", 0.5) ];

    biasedCoin   = ProbDist [ ("heads", 0.6),
                              ("tails", 0.4) ];

Or a couple of simple functions for making dice:

    import Data.Ratio

    d sides = ProbDist [(i, 1 % sides) | i <- [1 .. sides]]
    die = d 6

d n is an n-sided die.

The Functor instance is straightforward:

    instance Functor (ProbDist p) where
      fmap f (ProbDist pas) = ProbDist $ map (\(a,p) -> (f a, p)) pas
The Monad instance requires return and >>=. The return function merely takes an event and turns it into a distribution where that event occurs with probability 1. I find join easier to think about than >>=. The join function takes a nested distribution, where each outcome of the outer distribution specifies an inner distribution for the actual events, and collapses it into a regular, overall distribution. For example, suppose you put a biased coin and an unbiased coin in a bag, then pull one out and flip it:

	  bag :: ProbDist Double (ProbDist Double String)
	  bag = ProbDist [ (biasedCoin, 0.5),
                           (unbiasedCoin, 0.5) ]
The join operator collapses this into a single ProbDist Double String:

	ProbDist [("heads",0.3),
                  ("tails",0.2),
                  ("heads",0.25),
                  ("tails",0.25)]
It would be nice if join could combine the duplicate heads into a single ("heads", 0.55) entry. But that would force an Eq a constraint on the event type, which isn't allowed, because (>>=) must work for all data types, not just for instances of Eq. This is a problem with Haskell, not with the monad itself. It's the same problem that prevents one from making a good set monad in Haskell, even though categorially sets are a perfectly good monad. (The return function constructs singletons, and the join function is simply set union.) Maybe in the next language.

Perhaps someone else will find the >>= operator easier to understand than join? I don't know. Anyway, it's simple enough to derive once you understand join; here's the code:

	instance (Num p) => Monad (ProbDist p) where
	  return a = ProbDist [(a, 1)]
	  (ProbDist pas) >>= f = ProbDist $ do
				   (a, p) <- pas
				   let (ProbDist pbs) = f a
				   (b, q) <- pbs
				   return (b, p*q)
So now we can do some straightforward experiments:

	liftM2 (+) (d 6) (d 6)

	ProbDist [(2,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
	36),(7,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
	36),(7,1 % 36),(8,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
	36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(5,1 % 36),(6,1 %
	36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(6,1 %
	36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 %
	36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 %
	36),(12,1 % 36)]
This is nasty-looking; we really need to merge the multiple listings of the same event. Here is a function to do that:

        agglomerate :: (Num p, Eq b) => (a -> b) -> ProbDist p a -> ProbDist p b
        agglomerate f pd = ProbDist $ foldr insert [] (unpd (fmap f pd)) where
          insert (k, p) [] = [(k, p)]
          insert (k, p) ((k', p'):kps) | k == k' = (k, p+p'):kps
                                       | otherwise = (k', p'):(insert (k,p) kps)


        agg :: (Num p, Eq a) => ProbDist p a -> ProbDist p a
        agg = agglomerate id
Then agg $ liftM2 (+) (d 6) (d 6) produces:

        ProbDist [(12,1 % 36),(11,1 % 18),(10,1 % 12),(9,1 % 9),
                  (8,5 % 36),(7,1 % 6),(6,5 % 36),(5,1 % 9),
                  (4,1 % 12),(3,1 % 18),(2,1 % 36)]
Hey, that's correct.

There must be a shorter way to write insert. It really bothers me, because it looks look it should be possible to do it as a fold. But I couldn't make it look any better.

You are not limited to calculating probabilities. The monad actually will count things. For example, let us throw three dice and count how many ways there are to throw various numbers of sixes:

        eq6 n = if n == 6 then 1 else 0
        agg $ liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die

      ProbDist [(3,1),(2,15),(1,75),(0,125)]
There is one way to throw three sixes, 15 ways to throw two sixes, 75 ways to throw one six, and 125 ways to throw no sixes. So ProbDist is a misnomer.

It's easy to convert counts to probabilities:

	probMap :: (p -> q) -> ProbDist p a -> ProbDist q a
	probMap f (ProbDist pds) = ProbDist $ (map (\(a,p) -> (a, f p))) pds

	normalize :: (Fractional p) => ProbDist p a -> ProbDist p a
	normalize pd@(ProbDist pas) = probMap (/ total) pd where
	    total = sum . (map snd) $ pas

        normalize $ agg $ probMap toRational $ 
               liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die

      ProbDist [(3,1 % 216),(2,5 % 72),(1,25 % 72),(0,125 % 216)]
I think this is the first time I've gotten to write die die die in a computer program.

The do notation is very nice. Here we calculate the distribution where we roll four dice and discard the smallest:

        stat = do
                 a <- d 6
                 b <- d 6
                 c <- d 6
                 d <- d 6
                 return (a+b+c+d - minimum [a,b,c,d])

        probMap fromRational $ agg stat

	ProbDist [(18,1.6203703703703703e-2),
                  (17,4.1666666666666664e-2), (16,7.253086419753087e-2),
                  (15,0.10108024691358025),   (14,0.12345679012345678),
                  (13,0.13271604938271606),   (12,0.12885802469135801),
                  (11,0.11419753086419752),   (10,9.41358024691358e-2),
                   (9,7.021604938271606e-2),   (8,4.7839506172839504e-2),
                   (7,2.9320987654320986e-2),  (6,1.6203703703703703e-2),
                   (5,7.716049382716049e-3),   (4,3.0864197530864196e-3),
                   (3,7.716049382716049e-4)]

One thing I was hoping to get didn't work out. I had this idea that I'd be able to calculate the outcome of a game of craps like this:

	dice = liftM2 (+) (d 6) (d 6)

	point n = do
	  roll <- dice
	  case roll of 7 -> return "lose"
                       _ | roll == n  = "win"
                       _ | otherwise  = point n

        craps = do
          roll <- dice
          case roll of 2 -> return "lose"
                       3 -> return "lose"
                       4 -> point 4
                       5 -> point 5
                       6 -> point 6
                       7 -> return "win"
                       8 -> point 8
                       9 -> point 9
                       10 -> point 10
                       11 -> return "win"
                       12 -> return "lose"
This doesn't work at all; point is an infinite loop because the first value of dice, namely 2, causes a recursive call. I might be able to do something about this, but I'll have to think about it more.

It also occurred to me that the use of * in the definition of >>= / join could be generalized. A couple of years back I mentioned a paper of Green, Karvounarakis, and Tannen that discusses "provenance semirings". The idea is that each item in a database is annotated with some "provenance" information about why it is there, and you want to calculate the provenance for items in tables that are computed from table joins. My earlier explanation is here.

One special case of provenance information is that the provenances are probabilities that the database information is correct, and then the probabilities are calculated correctly for the joins, by multiplication and addition of probabilities. But in the general case the provenances are opaque symbols, and the multiplication and addition construct regular expressions over these symbols. One could generalize ProbDist similarly, and the ProbDist monad (even more of a misnomer this time) would calculate the provenance automatically. It occurs to me now that there's probably a natural way to view a database table join as a sort of Kleisli composition, but this article has gone on too long already.

Happy new year, everyone.

[ Addendum 20100103: unsurprisingly, this is not a new idea. Several readers wrote in with references to previous discussion of this monad, and related monads. It turns out that the idea goes back at least to 1981. ]


My thanks to Graham Hunter for his donation.


[Other articles in category /prog/haskell] permanent link

Tue, 15 Dec 2009

Monads are like burritos
A few months ago Brent Yorgey complained about a certain class of tutorials which present monads by explaining how monads are like burritos.

At first I thought the choice of burritos was only a facetious reference to the peculiar and sometimes strained analogies these tutorials make. But then I realized that monads are like burritos.

I will explain.

A monad is a special kind of a functor. A functor F takes each type T and maps it to a new type FT. A burrito is like a functor: it takes a type, like meat or beans, and turns it into a new type, like beef burrito or bean burrito.

A functor must also be equipped with a map function that lifts functions over the original type into functions over the new type. For example, you can add chopped jalapeños or shredded cheese to any type, like meat or beans; the lifted version of this function adds chopped jalapeños or shredded cheese to the corresponding burrito.

A monad must also possess a unit function that takes a regular value, such as a particular batch of meat, and turns it into a burrito. The unit function for burritos is obviously a tortilla.

Finally, a monad must possess a join function that takes a ridiculous burrito of burritos and turns them into a regular burrito. Here the obvious join function is to remove the outer tortilla, then unwrap the inner burritos and transfer their fillings into the outer tortilla, and throw away the inner wrappings.

The map, join, and unit functions must satisfy certain laws. For example, if B is already a burrito, and not merely a filling for a burrito, then join(unit(B)) must be the same as B. This means says that if you have a burrito, and you wrap it in a second tortilla, and then unwrap the contents into the outer tortilla, the result is the same as what you started with.

This is true because tortillas are indistinguishable.

I know you are going to point out that some tortillas have the face of Jesus. But those have been toasted, and so are unsuitable for burrito-making, and do not concern us here.

So monads are indeed like burritos.

I asked Brent if this was actually what he had in mind when he first suggested the idea of tutorials explaining monads in terms of burritos, and if everyone else had understood this right away.

But he said no, I was the lone genius.


[Other articles in category /prog] permanent link

Mon, 14 Dec 2009

Another short explanation of Gödel's theorem
In yesterday's article, I said:

A while back I started writing up an article titled "World's shortest explanation of Gödel's theorem". But I didn't finish it...

I went and had a look to see what was wrong with it, and to my surprise, there seemed to be hardly anything wrong with it. Perhaps I just forgot to post it. So if you disliked yesterday's brief explanation of Gödel's theorem—and many people did—you'll probably dislike this one even more. Enjoy!


A reader wrote to question my characterization of Gödel's theorem in the previous article. But I think I characterized it correctly; I said:

The only systems of mathematical axioms strong enough to prove all true statements of arithmetic, are those that are so strong that they also prove all the false statements of arithmetic.
I'm going to explain how this works.

You start by choosing some system of mathematics that has some machinery in it for making statements about things like numbers and for constructing proofs of theorems like 1+1=2. Many such systems exist. Let's call the one we have chosen M, for "mathematics".

Gödel shows that if M has enough mathematical machinery in it to actually do arithmetic, then it is possible to produce a statement S whose meaning is essentially "Statement S cannot be proved in system M."

It is not at all obvious that this is possible, or how it can be done, and I am not going to get into the details here. Gödel's contribution was seeing that it was possible to do this.

So here's S again:

S: Statement S cannot be proved in system M.

Now there are two possibilities. Either S is in fact provable in system M, or it is not. One of these must hold.

If S is provable in system M, then it is false, and so it is a false statement that can be proved in system M. M therefore proves some false statements of arithmetic.

If S is not provable in system M, then it is true, and so it is a true statement that cannot be proved in system M. M therefore fails to prove some true statements of arithmetic.

So something goes wrong with M: either it fails to prove some true statements, or else it succeeds in proving some false statements.

List of topics I deliberately omitted from this article, that mathematicians should not write to me about with corrections: Presburger arithmetic. Dialetheism. Inexhaustibility. ω-incompleteness. Non-RE sets of axioms.


Order
Godel's Theorem: An Incomplete Guide to Its Use and Abuse
Godel's Theorem: An Incomplete Guide to Its Use and Abuse
with kickback
no kickback
Order
Inexhaustibility: A Non-Exhaustive Treatment
Inexhaustibility: A Non-Exhaustive Treatment
with kickback
no kickback
Well, I see now that left out the step where I go from "M proves a false statement" to "M proves all false statements". Oh well, another topic for another post.

If you liked this post, you may enjoy Torkel Franzén's books Godel's Theorem: An Incomplete Guide to Its Use and Abuse and Inexhaustibility: A Non-Exhaustive Treatment. If you disliked this post, you are even more likely to enjoy them.



Many thanks to Robert Bond for his contribution.


[Other articles in category /math] permanent link

Sun, 13 Dec 2009

World's shortest explanation of Gödel's theorem
A while back I started writing up an article titled "World's shortest explanation of Gödel's theorem". But I didn't finish it, and later I encountered Raymond Smullyan's version, which is much shorter anyway. So here, shamelessly stolen from Smullyan, is the World's shortest explanation of Gödel's theorem.

We have some sort of machine that prints out statements in some sort of language. It needn't be a statement-printing machine exactly; it could be some sort of technique for taking statements and deciding if they are true. But let's think of it as a machine that prints out statements.

In particular, some of the statements that the machine might (or might not) print look like these:

P*x (which means that the machine will print x)
NP*x (which means that the machine will never print x)
PR*x (which means that the machine will print xx)
NPR*x (which means that the machine will never print xx)

For example, NPR*FOO means that the machine will never print FOOFOO. NP*FOOFOO means the same thing. So far, so good.

Now, let's consider the statement NPR*NPR*. This statement asserts that the machine will never print NPR*NPR*.

Either the machine prints NPR*NPR*, or it never prints NPR*NPR*.

If the machine prints NPR*NPR*, it has printed a false statement. But if the machine never prints NPR*NPR*, then NPR*NPR* is a true statement that the machine never prints.

So either the machine sometimes prints false statements, or there are true statements that it never prints.

So any machine that prints only true statements must fail to print some true statements.

Or conversely, any machine that prints every possible true statement must print some false statements too.


Order
5000 B.C. and Other Philosophical Fantasies
5000 B.C. and Other Philosophical Fantasies
with kickback
no kickback
The proof of Gödel's theorem shows that there are statements of pure arithmetic that essentially express NPR*NPR*; the trick is to find some way to express NPR*NPR* as a statement about arithmetic, and most of the technical details (and cleverness!) of Gödel's theorem are concerned with this trick. But once the trick is done, the argument can be applied to any machine or other method for producing statements about arithmetic.

The conclusion then translates directly: any machine or method that produces statements about arithmetic either sometimes produces false statements, or else there are true statements about arithmetic that it never produces. Because if it produces something like NPR*NPR* then it is wrong, but if it fails to produce NPR*NPR*, then that is a true statement that it has failed to produce.

So any machine or other method that produces only true statements about arithmetic must fail to produce some true statements.

Hope this helps!

(This explanation appears in Smullyan's books 5000 BC and Other Philosophical Fantasies, chapter 3, section 65 (which is where I saw it) and also in The Mystery of Scheherezade.)



I gratefully acknowledge Charles Colht for his generous donation to this blog.


[ Addendum 20091214: Another article on the same topic. ]


[Other articles in category /math] permanent link

Fri, 11 Dec 2009

On failing open
An axiom of security analysis is that nearly all security mechanisms must fail closed. What this means is that if there is an uncertainty about whether to grant or to deny access, the right choice is nearly always to deny access.

For example, consider a login component that accepts a username and a password and then queries a remote authentication server to find out if the password is correct. If the connection to the authentication server fails, or if the authentication server is down, the login component must decide whether to grant or deny access, in the absence of correct information from the server. The correct design is almost certainly to "fail closed", and to deny access.

I used to teach security classes, and I would point out that programs sometimes have bugs, and do the wrong thing. If your program has failed closed, and if this is a bug, then you have an irate user. The user might call you up and chew you out, or might chew you out to your boss, and they might even miss a crucial deadline because your software denied them access when it should have granted access. But these are relatively small problems. If your program has failed open, and if this is a bug, then the user might abscond with the entire payroll and flee to Brazil.

(I was once teaching one of these classes in Lisbon, and I reached the "flee to Brazil" example without having realized ahead of time that this had greater potential to offend the Portuguese than many other people. So I apologized. But my hosts very kindly told me that they would have put it the same way, and that in fact the Mayor of Lisbon had done precisely what I described a few years before. The moral of the story is to read over the slides ahead of time before giving the talk.)

But I digress. One can find many examples in the history of security that failed the wrong way.

However, the issue is on my mind because I was at a job interview a few weeks ago with giant media corporation XYZ. At the interview, we spent about an hour talking about an architectural problem they were trying to solve. XYZ operates a web site where people can watch movies and TV programs online. Thy would like to extend the service so that people who subscribe to premium cable services, such as HBO, can authenticate themselves to the web site and watch HBO programs there; HBO non-subscribers should get only free TV content. The problem in this case was that the authentication data was held on an underpowered legacy system that could serve only a small fraction of the requests that came in.

The solution was to cache the authentication data on a better system, and gather and merge change information from the slow legacy system as possible.

I observed during the discussion that this was a striking example of the rare situation in which one wants the authentication system to fail open instead of closed. For suppose one grants access that should not be granted. Then someone on the Internet gets to watch a movie or an episode of The Sopranos for free, which is not worth getting excited about and which happens a zillion times a day anyhow.

But suppose the software denies access that should have been granted. Then there is a legitimate paying customer who has paid to watch The Sopranos, and we told them no. Now they are a legitimately irate customer, which is bad, and they may call the support desk, costing XYZ Corp a significant amount of money, which is also bad. So all other things being equal, one should err on the side of lenity: when in doubt, grant access.


I would like to thank Andrew Lenards for his gift.


[Other articles in category /tech] permanent link

Wed, 09 Sep 2009

You think you're All That, but you're not!
I have long been interested in term rewriting systems, and one of my long-term goals is to implement the Knuth-Bendix completion algorithm, described by Knuth and Bendix in their famous paper "Word Problems in Universal Algebras". This paper grabbed my attention around 1988; I found it in an anthology edited by John Leech (of Leech lattice fame) that I was probably looking into because it also contained an enumeration by J.H. Conway of all knots with at most eleven crossings. I found the Knuth-Bendix paper very hard to read, but the examples at the end were extremely compelling. I still find the paper very hard to read, but fortunately better explanations are now available. (For example, this one by A.J.J. Dick.) One of the also-ran topics for Higher-Order Perl was a structured drawing system based on Wm Leler's "Bertrand" term-rewriting system.

So I was delighted to discover that there was a new book out called Term Rewriting and All That. The "...and All That" suffix is a reference to the tongue-in-cheek classic of British history, 1066 and All That, and promised a casual, accessible, and possibly humorous treatment.

Unfortunately the promise was not kept. The book is very good, but it is not casual or humorous. Nor is it especially accessible. It is a solid slab of term rewriting, one of those books that make me think "I would not want to drop it on my foot." That is not a bad thing; the Barendregt book is superb, an enormous superb slab of lambda calculus that you would not want to drop on your foot. But it is not titled "Lambda Calculus and All That".

So why the title? I don't know. The authors are Germans, so perhaps they don't understand the joke?

[ Addendum: There's exactly one review of this book on Amazon, and it says the same thing I do. It begins: "My main criticism of this book is its title." ]


[Other articles in category /book] permanent link

Fri, 31 Jul 2009

Dijkstra was not insane
Recently, a reader on the Higher-Order Perl discussion mailing list made a remark about Edsger Dijkstra and his well-known opposition to the break construction (in Perl, last) that escapes prematurely from a loop. People often use this as an example to show that Dijkstra was excessively doctrinaire, and out of touch with the reality of programming[1], but usually it's because they don't know what his argument was.

I wrote a response, explaining where Dijkstra was coming from, and I am very happy with how it came out, so I'm reposting it here.

The list subscriber said, in part:

On a side note, I never read anything by Dijkstra that wasn't noticeably out of touch with the reality of programming, which qualifies them as screeds to me.

And I say that as a former Pascal programmer, and as one who has read, and bought into, things like Kernighan's "Why Pascal is Not My Favorite Programming Language" and the valid rants about how some form of breaking out of a loop without having to proceed to the end is very useful, without destroying structure (except by Dijkstra's definition of structure)...

A lot of people bring up the premature-loop-exit prohibition without understanding why Dijkstra suggested it; it wasn't just that he was a tightassed Dutchman.

Dijkstra's idea was this: suppose you want to prove, mathematically, that your program does what it is supposed to do. Please, everyone, suspend your judgment of this issue for a few paragraphs, and bear with me. Let's really suppose that we want to do this.

Dijkstra's idea is that the program is essentially a concatenation of blocks, each of which is trying to accomplish something or other, and each of which does not make sense to run unless some part of the program state is set up for it ahead of time. For example, the program might be to print a sorted list of links from a web page. Then the obvious blocks are:

A
get the web page and store it in a variable

B
extract the links from the text in the variable into an array

C
sort the array

D
print out the array contents

Section C is trying to sort the array; if it is correct then the array will be sorted by the time step D commences. But it doesn't make sense to commence step C unless the array is populated. Garbage in, garbage out, as they used to say when I was in elementary school.

We say that the "precondition" for C is that the array be populated with URLs, and the "postcondition" is that the array be in sorted order. What you would want to prove about C is that if the precondition holds—that is, if the array is properly populated before C begins—then the postcondition will hold too—that is, the array will be in sorted order when C completes.

It occurs to me that calling this a "proof" is probably biasing everyone's thinking. Let's forget about mathematical proofs and just think about ordinary programmers trying to understand if the program is correct. If the intern in the next cubicle handed you his code for this program, and you were looking it over, you would probably think in very much this way: you would identify block C (maybe it's a subroutine, or maybe not) and then you would try to understand if C, given an array of URLs, would produce a properly sorted array by the time it was done.

C itself might depend on some sub-blocks or subroutines that performed sub-parts of the task; you could try to understand them similarly.

Having proved (or convinced yourself) that C will produce the postcondition "array contains sorted list of URLs", you are in an excellent position to prove (or convince yourself) that block D prints out a sorted array of URLs, which is what you want. Without that belief about C, you are building on sand; you have almost nothing to go on, and you can conclude hardly anything useful about the behavior of D.

Now consider a more complex block, one of the form:

        if (q) { E; }
        else { F; }
Suppose you believe that code E, given precondition x, is guaranteed to produce postcondition y. And suppose you believe the same thing about F. Then you can conclude the same thing about the entire if-else block: if x was true before it began executing, then y will be true when it is done.[2] So you can build up proofs (or beliefs) about small bits of code into proofs (or beliefs) about larger ones.

We can understand while loops similarly. Suppose we know that condition p is true prior to the commencement of some loop, and that if p is true before G executes, then p will also be true when G finishes. Then what can we say about this loop?

        while (q) { G; }
We can conclude that if p was true before the loop began, then p will still be true, and q will be false, when the loop ends.

BUT BUT BUT BUT if your language has break, then that guarantee goes out the window and you can conclude nothing. Or at the very least your conclusions will become much more difficult. You can no longer treat G atomically; you have to understand its contents in detail.

So this is where Dijkstra is coming from: features like break[3] tend to sabotage the benefits of structured programming, and prevent the programmer from understanding the program as a composition of independent units. The other subscriber made a seemingly disparaging reference to "Dijkstra's idea of structure", but I hope it is clear that it was not an arbitrary idea. Dijkstra's idea of structure is what will allow you to understand a large program as a collection of modules.

Regardless of your opinion about formal verification methods, or correctness proofs, or the practicality of omitting break from your language, it should at least be clear that Dijkstra was not being doctrinaire just for the sake of doctrine.


Some additional notes

Here are some interesting peripheral points that I left out of my main discussion because I wanted to stick to the main point, which was: "Dijkstra was not insane".

  1. I said in an earlier post that "I often find Dijkstra's innumerable screeds very tiresome in their unkind, unforgiving, and unrealistic attitudes toward programmers." But despite this, I believe he was a brilliant thinker, and almost every time he opened his mouth it was to make a carefully-considered argument. You may not like him, and you may not agree with him, but you'll be better off listening to him.

    An archive of Dijkstra's miscellaneous notes and essays (a pre-blogging blog, if you like) is maintained at the University of Texas. I recommend it.

  2. I said:

                    if (q) { E; }
                    else { F; }
    
    Suppose you believe that code E, given precondition x, is guaranteed to produce postcondition y. And suppose you believe the same thing about F. Then you can conclude the same thing about the entire if-else block.

    Actually, your job is slightly easier. Let's write this:

    [x] E [y]
    to mean that code E, given precondition x, produces postcondition y. That is, if we know that x is true when E begins execution, then we know that y is true when E finishes. Then my quoted paragraph above says that from these:

    [x] E [y]
    [x] F [y]
    we can conclude this:

    [x] if (q) {E} else {F} [y]
    But actually we can make a somewhat stronger statement. We can make the same conclusion from weaker assumptions. If we believe these:
    [x and q] E [y]
    [x and not q] F [y]
    then we can conclude this:

    [x] if (q) {E} else {F} [y]
    In fact this precisely expresses the complete semantics of the if-else construction. Why do we use if-else blocks anyway? This is the reason: we want to be able to write code to guarantee something like this:

    [x] BLAH [y]
    but we only know how to guarantee
    [x and q] FOO [y]
    and
    [x and not q] BAR [y]
    for some q. So we write two blocks of code, each of which accomplishes y under some circumstances, and use if-else to make sure that the right one is selected under the right circumstances.

  3. Similar to break (but worse), in the presence of goto you are on very shaky ground in trying to conclude anything about whether the program is correct. Suppose you know that C is correct if its precondition (an array of URLs) is satisfied. And you know that B will set up that precondition (that is, the array) if its precondition is satisfied, so it seems like you are all right. But no, because block W somewhere else might have goto C; and transfer control to C without setting up the precondition, and then C could cause winged demons to fly out of your nose.

Further reading

  • For a quick overview, see the Wikipedia article on Hoare logic. Hoare logic is the [x] E [y] notation I used above, and a set of rules saying how to reason with claims of that form. For example, one rule of Hoare logic defines the meaning of the null statement: if ; is the null statement, then [p] ; [p] for all conditions p.

    Hoare logic was invented by Tony Hoare, who also invented the Quicksort algorithm.

    Order
    A Discipline of Programming
    A Discipline of Programming
    with kickback
    no kickback
  • For further details, see Dijkstra's book "A Discipline of Programming". Dijkstra introduces a function called wp for "weakest precondition". Given a piece of code C and a desired postcondition q, wp(C, q) is the weakest precondition that is sufficient for code C to accomplish q. That is, it's the minimum prerequisite for C to accomplish q. Most of the book is about how to figure out what these weakest preconditions are, and, once you know them, how they can guide you to through the implementation of your program.

    I have an idea that the Dijkstra book might be easier to follow after having read this introduction than without it.

  • No discussion of structured programming and goto is complete without a mention of Donald Knuth's wonderful paper Stuctured Programming with go to Statements. This is my single all-time favorite computer science paper. Download it here.


    Order
    Software Tools in Pascal
    Software Tools in Pascal
    with kickback
    no kickback
  • Software Tools in Pascal is a book by Kernighan and Plauger that tries to translate the tool suite of their earlier Software Tools book into Pascal. They were repeatedly screwed by deficiencies in the Pascal language, and this was the inspiration for Kernighan's famous "Why Pascal is not my Favorite Programming Language" paper. In effect, Software Tools in Pascal is a book-length case study of the deficiencies of Pascal for practical programming tasks.


[Other articles in category /prog] permanent link

Sun, 21 Jun 2009

Gray code at the pediatrician's office
Last week we took Iris to the pediatrician for a checkup, during which they weighed, measured, and inoculated her. The measuring device, which I later learned is called a stadiometer, had a bracket on a slider that went up and down on a post. Iris stood against the post and the nurse adjusted the bracket to exactly the top of her head. Then she read off Iris's height from an attached display.

How did the bracket know exactly what height to report? This was done in a way I hadn't seen before. It had a photosensor looking at the post, which was printed with this pattern:

(Click to view the other pictures I took of the post.)

The pattern is binary numerals. Each numeral is a certain fraction of a centimeter high, say 1/4 centimeter. If the sensor reads the number 433, that means that the bracket is 433/4 = 108.25 cm off the ground, and so that Iris is 108.25 cm tall.

The patterned strip in the left margin of this article is a straightforward translation of binary numerals to black and white boxes, with black representing 1 and white representing 0:

0000000000
0000000001
0000000010
0000000011
0000000100
0000000101
0000000101
...
1111101000
1111101001
...
1111111111
If you are paying attention, you will notice that although the strip at left is similar to the pattern in the doctor's office, it is not the same. That is because the numbers on the post are Gray-coded.

Gray codes solve the following problem with raw binary numbers. Suppose Iris is close to 104 = 416/4 cm tall, so that the photosensor is in the following region of the post:

...
0110100001 (417)
0110100000 (416)
0110011111 (415)
0110011110 (414)
...
But suppose that the sensor (or the post) is slightly mis-aligned, so that instead of properly reading the (416) row, it reads the first half of the (416) row and last half of the (415) row. That makes 0110111111, which is 447 = 111.75 cm, an error of almost 7.5%. (That's three inches, for my American and Burmese readers.) Or the error could go the other way: if the sensor reads the first half of the (415) and the second half of the (416) row, it will see 0110000000 = 384 = 96 cm.

Gray code is a method for encoding numbers in binary so that each numeral differs from the adjacent ones in only one position:

0000000000
0000000001
0000000011
0000000010
0000000110
0000000111
0000000101
0000000100
0000001100
...
1000011100
1000011101
...
1000000000
This is the pattern from the post, which you can also see at the right of this article.

Now suppose that the mis-aligned sensor reads part of the (416) line and part of the (417) line. With ordinary binary coding, this could result in an error of up to 7.75 cm. (And worse errors for children of other heights.) But with Gray coding no error results from the misreading:

...
0101110000 (417)
0101010000 (416)
0101010001 (415)
0101010011 (414)
...
No matter what parts of 0101110000 and 0101110001 are stitched together, the result is always either 416 or 417.

Converting from Gray code to standard binary is easy: take the binary expansion, and invert every bit that is immediately to the right of a 1 bit. For example, in 1111101000, each red bit is to the right of a 1, and so is inverted to obtain the Gray code 1000011100.

Converting back is also easy: of the Gray code. Replace every sequence of the form 1000...01 with 1111...10; also replace 1000... with 1111... if it appears at the end of the code. For example, Gray code 1000011100 contains two such sequences, 100001 and 11, which are replaced with 111110 and 10, to give 1111101000.


[Other articles in category /math] permanent link

Tue, 16 Jun 2009

Haskell logo fail
The Haskell folks have chosen a new logo.

Ouch.


[Other articles in category /prog/haskell] permanent link