The Universe of Discourse
           
Thu, 26 Aug 2010

Monad terminology problem
I think one problem (of many) that beginners might have with Haskell monads is the confusing terminology. The word "monad" can refer to four related but different things:

  1. The Monad typeclass itself.

  2. When a type constructor T of kind ∗ → ∗ is an instance of Monad we say that T "is a monad". For example, "Tree is a monad"; "((→) a) is a monad". This is the only usage that is strictly corrrect.

  3. Types resulting from the application of monadic type constructors (#2) are sometimes referred to as monads. For example, "[Integer] is a monad".

  4. Individual values of monadic types (#3) are often referred to as monads. For example, the "All About Monads" tutorial says "A list is also a monad".

Usage #1 is not a real problem; it does not occur that often, and is readily distinguished by context, capitalization, type font, and other markers. #2 is actually correct, so there is no problem there. #3 seems to be an uncommon colloquialism.

The most serious problem here is #4, that people refer to individual values of monadic types as "monads". Even when they don't do this, they are hampered by the lack of a good term for it. As I know no good alternative has been proposed. People often say "monadic value" (I think), which is accurate, but something of a mouthful.

One thing I have discovered in my writing life is that the clarity of a confusing document can sometimes be improved merely by replacing a polysyllabic noun phrase with a monosyllable. For example, chapter 3 of Higher-Order Perl discussed the technique of memoizing a function by generating an anonymous replacement for it that maintains a cache and calls the real function on a cache miss. Early drafts were hard to understand, and improved greatly when I replaced the phrase "anonymous replacement function" with "stub". The Perl documentation was significantly improved merely by replacing "associative array" everywhere with "hash" and "funny punctuation character" with "sigil".

I think a monosyllabic replacement for "monadic value" would be a similar boon to discussion of monads, not just for beginners but for everyone else too. The drawback, of introducing yet another jargon term, would in this case be outweighed by the benefits. Jargon can obscure, but sometimes it can clarify.

The replacement word should be euphonious, clear but not overly specific, and not easily confused with similar jargon words. It would probably be good for it to begin with the letter "m". I suggest:

mote

So return takes a value and returns a mote. The >>= function similarly lifts a function on pure values to a function on motes; when the mote is a container one may think of >>= as applying the function to the values in the container. [] is a monad, so lists are motes. The expression on the right-hand side of a var ← expr in a do-block must have mote type; it binds the mote on the right to the name on the left, using the >>= operator.

I have been using this term privately for several months, and it has been a small but noticeable success. Writing and debugging monadic programs is easier because I have a simple name for the motes that the program manipulates, which I can use when I mumble to myself: "What is the type error here? Oh, commit should be returning a mote." And then I insert return in the right place.

I'm don't want to oversell the importance of this invention. But there is clearly a gap in the current terminology, and I think it is well-filled by "mote".

(While this article was in progress I discovered that What a Monad is not uses the nonceword "mobit". I still prefer "mote".)


[Other articles in category /prog/haskell] permanent link

Sun, 03 Jan 2010

A short bibliography of probability monads
Several people helpfully wrote to me to provide references to earlier work on probability distribution monads. Here is a summary:

My thanks to Stephen Tetley, Gaal Yahas, and Luke Palmer for these.

I did not imagine that my idea was a new one. I arrived at it by thinking about List as a representation of non-deterministic computation. But if you think of it that way, the natural interpretation is that every list element represents an equally likely outcome, and so annotating the list elements with probabilities is the obvious next step. So the existence of the Erwig library was not a big surprise.

A little more surprising though, were the references in the Erwig paper. Specifically, the idea dates back to at least 1981; Erwig cites a paper that describes the probability monad in a pure-mathematics context.

Nobody responded to my taunting complaint about Haskell's failure to provide support a good monad of sets. It may be that this is because they all agree with me. (For example, the documentation of the Erwig package says "Unfortunately we cannot use a more efficient data structure because the key type must be of class Ord, but the Monad class does not allow constraints for result types.") But a number of years ago I said that the C++ macro processor blows goat dick. I would not have put it so strongly had I not naïvely believed that this was a universally-held opinion. But no, plenty of hapless C++ programmers wrote me indignant messages defending their macro system. So my being right is no guarantee that language partisans will not dispute with me, and the Haskell community's failure to do so in this case reflects well on them, I think.


[Other articles in category /prog/haskell] permanent link

Thu, 31 Dec 2009

A monad for probability and provenance
I don't quite remember how I arrived at this, but it occurred to me last week that probability distributions form a monad. This is the first time I've invented a new monad that I hadn't seen before; then I implemented it and it behaved pretty much the way I thought it would. So I feel like I've finally arrived, monadwise.

Suppose a monad value represents all the possible outcomes of an event, each with a probability of occurrence. For concreteness, let's suppose all our probability distributions are discrete. Then we might have:

	data ProbDist p a = ProbDist [(a,p)] deriving (Eq, Show)
	unpd (ProbDist ps) = ps
Each a is an outcome, and each p is the probability of that outcome occurring. For example, biased and unbiased coins:

    unbiasedCoin = ProbDist [ ("heads", 0.5),
                              ("tails", 0.5) ];

    biasedCoin   = ProbDist [ ("heads", 0.6),
                              ("tails", 0.4) ];

Or a couple of simple functions for making dice:

    import Data.Ratio

    d sides = ProbDist [(i, 1 % sides) | i <- [1 .. sides]]
    die = d 6

d n is an n-sided die.

The Functor instance is straightforward:

    instance Functor (ProbDist p) where
      fmap f (ProbDist pas) = ProbDist $ map (\(a,p) -> (f a, p)) pas
The Monad instance requires return and >>=. The return function merely takes an event and turns it into a distribution where that event occurs with probability 1. I find join easier to think about than >>=. The join function takes a nested distribution, where each outcome of the outer distribution specifies an inner distribution for the actual events, and collapses it into a regular, overall distribution. For example, suppose you put a biased coin and an unbiased coin in a bag, then pull one out and flip it:

	  bag :: ProbDist Double (ProbDist Double String)
	  bag = ProbDist [ (biasedCoin, 0.5),
                           (unbiasedCoin, 0.5) ]
The join operator collapses this into a single ProbDist Double String:

	ProbDist [("heads",0.3),
                  ("tails",0.2),
                  ("heads",0.25),
                  ("tails",0.25)]
It would be nice if join could combine the duplicate heads into a single ("heads", 0.55) entry. But that would force an Eq a constraint on the event type, which isn't allowed, because (>>=) must work for all data types, not just for instances of Eq. This is a problem with Haskell, not with the monad itself. It's the same problem that prevents one from making a good set monad in Haskell, even though categorially sets are a perfectly good monad. (The return function constructs singletons, and the join function is simply set union.) Maybe in the next language.

Perhaps someone else will find the >>= operator easier to understand than join? I don't know. Anyway, it's simple enough to derive once you understand join; here's the code:

	instance (Num p) => Monad (ProbDist p) where
	  return a = ProbDist [(a, 1)]
	  (ProbDist pas) >>= f = ProbDist $ do
				   (a, p) <- pas
				   let (ProbDist pbs) = f a
				   (b, q) <- pbs
				   return (b, p*q)
So now we can do some straightforward experiments:

	liftM2 (+) (d 6) (d 6)

	ProbDist [(2,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
	36),(7,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
	36),(7,1 % 36),(8,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
	36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(5,1 % 36),(6,1 %
	36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(6,1 %
	36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 %
	36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 %
	36),(12,1 % 36)]
This is nasty-looking; we really need to merge the multiple listings of the same event. Here is a function to do that:

        agglomerate :: (Num p, Eq b) => (a -> b) -> ProbDist p a -> ProbDist p b
        agglomerate f pd = ProbDist $ foldr insert [] (unpd (fmap f pd)) where
          insert (k, p) [] = [(k, p)]
          insert (k, p) ((k', p'):kps) | k == k' = (k, p+p'):kps
                                       | otherwise = (k', p'):(insert (k,p) kps)


        agg :: (Num p, Eq a) => ProbDist p a -> ProbDist p a
        agg = agglomerate id
Then agg $ liftM2 (+) (d 6) (d 6) produces:

        ProbDist [(12,1 % 36),(11,1 % 18),(10,1 % 12),(9,1 % 9),
                  (8,5 % 36),(7,1 % 6),(6,5 % 36),(5,1 % 9),
                  (4,1 % 12),(3,1 % 18),(2,1 % 36)]
Hey, that's correct.

There must be a shorter way to write insert. It really bothers me, because it looks look it should be possible to do it as a fold. But I couldn't make it look any better.

You are not limited to calculating probabilities. The monad actually will count things. For example, let us throw three dice and count how many ways there are to throw various numbers of sixes:

        eq6 n = if n == 6 then 1 else 0
        agg $ liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die

      ProbDist [(3,1),(2,15),(1,75),(0,125)]
There is one way to throw three sixes, 15 ways to throw two sixes, 75 ways to throw one six, and 125 ways to throw no sixes. So ProbDist is a misnomer.

It's easy to convert counts to probabilities:

	probMap :: (p -> q) -> ProbDist p a -> ProbDist q a
	probMap f (ProbDist pds) = ProbDist $ (map (\(a,p) -> (a, f p))) pds

	normalize :: (Fractional p) => ProbDist p a -> ProbDist p a
	normalize pd@(ProbDist pas) = probMap (/ total) pd where
	    total = sum . (map snd) $ pas

        normalize $ agg $ probMap toRational $ 
               liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die

      ProbDist [(3,1 % 216),(2,5 % 72),(1,25 % 72),(0,125 % 216)]
I think this is the first time I've gotten to write die die die in a computer program.

The do notation is very nice. Here we calculate the distribution where we roll four dice and discard the smallest:

        stat = do
                 a <- d 6
                 b <- d 6
                 c <- d 6
                 d <- d 6
                 return (a+b+c+d - minimum [a,b,c,d])

        probMap fromRational $ agg stat

	ProbDist [(18,1.6203703703703703e-2),
                  (17,4.1666666666666664e-2), (16,7.253086419753087e-2),
                  (15,0.10108024691358025),   (14,0.12345679012345678),
                  (13,0.13271604938271606),   (12,0.12885802469135801),
                  (11,0.11419753086419752),   (10,9.41358024691358e-2),
                   (9,7.021604938271606e-2),   (8,4.7839506172839504e-2),
                   (7,2.9320987654320986e-2),  (6,1.6203703703703703e-2),
                   (5,7.716049382716049e-3),   (4,3.0864197530864196e-3),
                   (3,7.716049382716049e-4)]

One thing I was hoping to get didn't work out. I had this idea that I'd be able to calculate the outcome of a game of craps like this:

	dice = liftM2 (+) (d 6) (d 6)

	point n = do
	  roll <- dice
	  case roll of 7 -> return "lose"
                       _ | roll == n  = "win"
                       _ | otherwise  = point n

        craps = do
          roll <- dice
          case roll of 2 -> return "lose"
                       3 -> return "lose"
                       4 -> point 4
                       5 -> point 5
                       6 -> point 6
                       7 -> return "win"
                       8 -> point 8
                       9 -> point 9
                       10 -> point 10
                       11 -> return "win"
                       12 -> return "lose"
This doesn't work at all; point is an infinite loop because the first value of dice, namely 2, causes a recursive call. I might be able to do something about this, but I'll have to think about it more.

It also occurred to me that the use of * in the definition of >>= / join could be generalized. A couple of years back I mentioned a paper of Green, Karvounarakis, and Tannen that discusses "provenance semirings". The idea is that each item in a database is annotated with some "provenance" information about why it is there, and you want to calculate the provenance for items in tables that are computed from table joins. My earlier explanation is here.

One special case of provenance information is that the provenances are probabilities that the database information is correct, and then the probabilities are calculated correctly for the joins, by multiplication and addition of probabilities. But in the general case the provenances are opaque symbols, and the multiplication and addition construct regular expressions over these symbols. One could generalize ProbDist similarly, and the ProbDist monad (even more of a misnomer this time) would calculate the provenance automatically. It occurs to me now that there's probably a natural way to view a database table join as a sort of Kleisli composition, but this article has gone on too long already.

Happy new year, everyone.

[ Addendum 20100103: unsurprisingly, this is not a new idea. Several readers wrote in with references to previous discussion of this monad, and related monads. It turns out that the idea goes back at least to 1981. ]


My thanks to Graham Hunter for his donation.


[Other articles in category /prog/haskell] permanent link

Tue, 16 Jun 2009

Haskell logo fail
The Haskell folks have chosen a new logo.

Ouch.


[Other articles in category /prog/haskell] permanent link

Thu, 03 Jan 2008

Note on point-free programming style
This old comp.lang.functional article by Albert Y. C. Lai, makes the point that Unix shell pipeline programming is done in an essentially "point-free" style, using the shell example:

    grep '^X-Spam-Level' | sort | uniq | wc -l
and the analogous Haskell code:

    length . nub . sort . filter (isPrefixOf "X-Spam-Level")
Neither one explicitly mentions its argument, which is why this is "point-free". In "point-free" programming, instead of defining a function in terms of its effect on its arguments, one defines it by composing the component functions themselves, directly, with higher-order operators. For example, instead of:

  foo x y = 2 * x + y
one has, in point-free style:

  foo = (+) . (2 *)
where (2 *) is the function that doubles its argument, and (+) is the (curried) addition function. The two definitions of foo are entirely equivalent.

As the two examples should make clear, point-free style is sometimes natural, and sometimes not, and the example chosen by M. Lai was carefully selected to bias the argument in favor of point-free style.

Often, after writing a function in pointful style, I get the computer to convert it automatically to point-free style, just to see what it looks like. This is usually educational, and sometimes I use the computed point-free definition instead. As I get better at understanding point-free programming style in Haskell, I am more and more likely to write certain functions point-free in the first place. For example, I recently wrote:

        soln = int 1 (srt (add one (neg (sqr soln))))
and then scratched my head, erased it, and replaced it with the equivalent:

        soln = int 1 ((srt . (add one) . neg . sqr) soln)
I could have factored out the int 1 too:
        soln = (int 1 . srt . add one . neg . sqr) soln
I could even have removed soln from the right-hand side:

        soln = fix (int 1 . srt . add one . neg . sqr)
but I am not yet a perfect sage.

Sometimes I opt for an intermediate form, one in which some of the arguments are explicit and some are implicit. For example, as an exercise I wrote a function numOccurrences which takes a value and a list and counts the number of times the value occurs in the list. A straightforward and conventional implementation is:

        numOccurrences x []     = 0
        numOccurrences x (y:ys) = 
                if (x == y) then 1 + rest
                else                 rest
            where rest = numOccurrences x ys
but the partially point-free version I wrote was much better:

        numOccurrences x = length . filter (== x)
Once you see this, it's easy to go back to a fully pointful version:

        numOccurrences x y = length (filter (== x) y)
Or you can go the other way, to a point-free version:

        numOccurrences = (length .) . filter . (==)
which I find confusing.

Anyway, the point of this note is not to argue that the point-free style is better or worse than the pointful style. Sometimes I use the one, and sometimes the other. I just want to point out that the argument made by M. Lai is deceptive, because of the choice of examples. As an equally biased counterexample, consider:

        bar x = x*x + 2*x + 1
which the automatic converter informs me can be written in point-free style as:

        bar = (1 +) . ap ((+) . join (*)) (2 *)
Perusal of this example will reveal much to the attentive reader, including the definitions of join and ap. But I don't think many people would argue that it is an improvement on the original. (Maybe I'm wrong, and people would argue that it was an improvement. I won't know for sure until I have more experience.)

For some sort of balance, here is another example where I think the point-free version is at least as good as the pointful version: a recent comment on Reddit suggested a >>> operator that composes functions just like the . operator, but in the other order, so that:

        f >>> g = g . f
or, if you prefer:

        >>> f g x = g(f(x))
The point-free definition of >>> is:

        (>>>) = flip (.)
where the flip operator takes a function of two arguments and makes a new function that does the same thing, but with the arguments in the opposite order. Whatever your feelings about point-free style, it is undeniable that the point-free definition makes perfectly clear that >>> is nothing but . with its arguments in reverse order.


[Other articles in category /prog/haskell] permanent link