# The Universe of Discourse

Fri, 24 Apr 2015

Suppose you would like to perform an exhaustive search. Let's say for concreteness that we would like to solve this cryptarithm puzzle:

    S E N D
+   M O R E
-----------
M O N E Y


This means that we want to map the letters S, E, N, D, M, O, R, Y to distinct digits 0 through 9 to produce a five-digit and two four-digit numerals which, when added in the indicated way, produce the indicated sum.

(This is not an especially difficult example; my 10-year-old daughter Katara was able to solve it, with some assistance, in about 30 minutes.)

If I were doing this in Perl, I would write up either a recursive descent search or a solution based on a stack or queue of partial solutions which the program would progressively try to expand to a full solution, as per the techniques of chapter 5 of Higher-Order Perl. In Haskell, we can use the list monad to hide all the searching machinery under the surface. First a few utility functions:

    import Control.Monad (guard)

digits = [0..9]

to_number = foldl (\a -> \b -> a*10 + b) 0
remove rs ls = foldl remove' ls rs
where remove' ls x = filter (/= x) ls


to_number takes a list of digits like [1,4,3] and produces the number they represent, 143. remove takes two lists and returns all the things in the second list that are not in the first list. There is probably a standard library function for this but I don't remember what it is. This version is !!O(n^2)!!, but who cares.

Now the solution to the problem is:

    --     S E N D
--   + M O R E
--   ---------
--   M O N E Y

solutions = do
s <- remove [0] digits
e <- remove [s] digits
n <- remove [s,e] digits
d <- remove [s,e,n] digits
let send = to_number [s,e,n,d]
m <- remove [0,s,e,n,d] digits
o <- remove [s,e,n,d,m] digits
r <- remove [s,e,n,d,m,o] digits
let more = to_number [m,o,r,e]
y <- remove [s,e,n,d,m,o,r] digits
let money = to_number [m,o,n,e,y]
guard $send + more == money return (send, more, money)  Let's look at just the first line of this:  solutions = do s <- remove [0] digits …  The do notation is syntactic sugar for  (remove [0] digits) >>= \s -> …  where “…” is the rest of the block. To expand this further, we need to look at the overloading for >>= which is implemented differently for every type. The mote on the left of >>= is a list value, and the definition of >>= for lists is:  concat$ map (\s -> …) (remove [0] digits)


where “…” is the rest of the block.

So the variable s is bound to each of 1,2,3,4,5,6,7,8,9 in turn, the rest of the block is evaluated for each of these nine possible bindings of s, and the nine returned lists of solutions are combined (by concat) into a single list.

The next line is the same:

      e <- remove [s] digits


for each of the nine possible values for s, we loop over nine value for e (this time including 0 but not including whatever we chose for s) and evaluate the rest of the block. The nine resulting lists of solutions are concatenated into a single list and returned to the previous map call.

      n <- remove [s,e] digits
d <- remove [s,e,n] digits


This is two more nested loops.

      let send = to_number [s,e,n,d]


At this point the value of send is determined, so we compute and save it so that we don't have to repeatedly compute it each time through the following 300 loop executions.

      m <- remove [0,s,e,n,d] digits
o <- remove [s,e,n,d,m] digits
r <- remove [s,e,n,d,m,o] digits
let more = to_number [m,o,r,e]


Three more nested loops and another computation.

      y <- remove [s,e,n,d,m,o,r] digits
let money = to_number [m,o,n,e,y]


Yet another nested loop and a final computation.

      guard $send + more == money return (send, more, money)  This is the business end. I find guard a little tricky so let's look at it slowly. There is no binding (<-) in the first line, so these two lines are composed with >> instead of >>=:  (guard$ send + more == money) >> (return (send, more, money))


which is equivalent to:

      (guard $send + more == money) >>= (\_ -> return (send, more, money))  which means that the values in the list returned by guard will be discarded before the return is evaluated. If send + more == money is true, the guard expression yields [()], a list of one useless item, and then the following >>= loops over this one useless item, discards it, and returns yields a list containing the tuple (send, more, money) instead. But if send + more == money is false, the guard expression yields [], a list of zero useless items, and then the following >>= loops over these zero useless items, never runs return at all, and yields an empty list. The result is that if we have found a solution at this point, a list containing it is returned, to be concatenated into the list of all solutions that is being constructed by the nested concats. But if the sum adds up wrong, an empty list is returned and concated instead. After a few seconds, Haskell generates and tests 1.36 million choices for the eight bindings, and produces the unique solution:  [(9567,1085,10652)]  That is:  S E N D 9 5 6 7 + M O R E + 1 0 8 5 ----------- ----------- M O N E Y 1 0 6 5 2  It would be an interesting and pleasant exercise to try to implement the same underlying machinery in another language. I tried this in Perl once, and I found that although it worked perfectly well, between the lack of the do-notation's syntactic sugar and Perl's clumsy notation for lambda functions (sub { my ($s) = @_; … } instead of \s -> …) the result was completely unreadable and therefore unusable. However, I suspect it would be even worse in Python because of semantic limitations of that language. I would be interested to hear about this if anyone tries it.

[ Addendum: Thanks to Tony Finch for pointing out the η-reduction I missed while writing this at 3 AM. ]

[ Addendum: Several people so far have misunderstood the question about Python in the last paragraph. The question was not to implement an exhaustive search in Python; I had no doubt that it could be done in a simple and clean way, as it can in Perl. The question was to implement the same underlying machinery, including the list monad and its bind operator, and to find the solution using the list monad.

[ Peter De Wachter has written in with a Python solution that clearly demonstrates that the problems I was worried about will not arise, at least for this task. I hope to post his solution in the next few days. ]

[ Addendum 20150803: De Wachter's solution and one in Perl ]

Thu, 26 Aug 2010

I think one problem (of many) that beginners might have with Haskell monads is the confusing terminology. The word "monad" can refer to four related but different things:

2. When a type constructor T of kind ∗ → ∗ is an instance of Monad we say that T "is a monad". For example, "Tree is a monad"; "((→) a) is a monad". This is the only usage that is strictly corrrect.

3. Types resulting from the application of monadic type constructors (#2) are sometimes referred to as monads. For example, "[Integer] is a monad".

Usage #1 is not a real problem; it does not occur that often, and is readily distinguished by context, capitalization, type font, and other markers. #2 is actually correct, so there is no problem there. #3 seems to be an uncommon colloquialism.

The most serious problem here is #4, that people refer to individual values of monadic types as "monads". Even when they don't do this, they are hampered by the lack of a good term for it. As I know no good alternative has been proposed. People often say "monadic value" (I think), which is accurate, but something of a mouthful.

One thing I have discovered in my writing life is that the clarity of a confusing document can sometimes be improved merely by replacing a polysyllabic noun phrase with a monosyllable. For example, chapter 3 of Higher-Order Perl discussed the technique of memoizing a function by generating an anonymous replacement for it that maintains a cache and calls the real function on a cache miss. Early drafts were hard to understand, and improved greatly when I replaced the phrase "anonymous replacement function" with "stub". The Perl documentation was significantly improved merely by replacing "associative array" everywhere with "hash" and "funny punctuation character" with "sigil".

I think a monosyllabic replacement for "monadic value" would be a similar boon to discussion of monads, not just for beginners but for everyone else too. The drawback, of introducing yet another jargon term, would in this case be outweighed by the benefits. Jargon can obscure, but sometimes it can clarify.

The replacement word should be euphonious, clear but not overly specific, and not easily confused with similar jargon words. It would probably be good for it to begin with the letter "m". I suggest:

mote

So return takes a value and returns a mote. The >>= function similarly lifts a function on pure values to a function on motes; when the mote is a container one may think of >>= as applying the function to the values in the container. [] is a monad, so lists are motes. The expression on the right-hand side of a var ← expr in a do-block must have mote type; it binds the mote on the right to the name on the left, using the >>= operator.

I have been using this term privately for several months, and it has been a small but noticeable success. Writing and debugging monadic programs is easier because I have a simple name for the motes that the program manipulates, which I can use when I mumble to myself: "What is the type error here? Oh, commit should be returning a mote." And then I insert return in the right place.

I'm don't want to oversell the importance of this invention. But there is clearly a gap in the current terminology, and I think it is well-filled by "mote".

(While this article was in progress I discovered that What a Monad is not uses the nonceword "mobit". I still prefer "mote".)

Sun, 03 Jan 2010

A short bibliography of probability monads
Several people helpfully wrote to me to provide references to earlier work on probability distribution monads. Here is a summary:

My thanks to Stephen Tetley, Gaal Yahas, and Luke Palmer for these.

I did not imagine that my idea was a new one. I arrived at it by thinking about List as a representation of non-deterministic computation. But if you think of it that way, the natural interpretation is that every list element represents an equally likely outcome, and so annotating the list elements with probabilities is the obvious next step. So the existence of the Erwig library was not a big surprise.

A little more surprising though, were the references in the Erwig paper. Specifically, the idea dates back to at least 1981; Erwig cites a paper that describes the probability monad in a pure-mathematics context.

Nobody responded to my taunting complaint about Haskell's failure to provide support a good monad of sets. It may be that this is because they all agree with me. (For example, the documentation of the Erwig package says "Unfortunately we cannot use a more efficient data structure because the key type must be of class Ord, but the Monad class does not allow constraints for result types.") But a number of years ago I said that the C++ macro processor blows goat dick. I would not have put it so strongly had I not naïvely believed that this was a universally-held opinion. But no, plenty of hapless C++ programmers wrote me indignant messages defending their macro system. So my being right is no guarantee that language partisans will not dispute with me, and the Haskell community's failure to do so in this case reflects well on them, I think.

Thu, 31 Dec 2009

A monad for probability and provenance
I don't quite remember how I arrived at this, but it occurred to me last week that probability distributions form a monad. This is the first time I've invented a new monad that I hadn't seen before; then I implemented it and it behaved pretty much the way I thought it would. So I feel like I've finally arrived, monadwise.

Suppose a monad value represents all the possible outcomes of an event, each with a probability of occurrence. For concreteness, let's suppose all our probability distributions are discrete. Then we might have:

	data ProbDist p a = ProbDist [(a,p)] deriving (Eq, Show)
unpd (ProbDist ps) = ps

Each a is an outcome, and each p is the probability of that outcome occurring. For example, biased and unbiased coins:

    unbiasedCoin = ProbDist [ ("heads", 0.5),
("tails", 0.5) ];

biasedCoin   = ProbDist [ ("heads", 0.6),
("tails", 0.4) ];


Or a couple of simple functions for making dice:

    import Data.Ratio

d sides = ProbDist [(i, 1 % sides) | i <- [1 .. sides]]
die = d 6


d n is an n-sided die.

The Functor instance is straightforward:

    instance Functor (ProbDist p) where
fmap f (ProbDist pas) = ProbDist $map (\(a,p) -> (f a, p)) pas  The Monad instance requires return and >>=. The return function merely takes an event and turns it into a distribution where that event occurs with probability 1. I find join easier to think about than >>=. The join function takes a nested distribution, where each outcome of the outer distribution specifies an inner distribution for the actual events, and collapses it into a regular, overall distribution. For example, suppose you put a biased coin and an unbiased coin in a bag, then pull one out and flip it:  bag :: ProbDist Double (ProbDist Double String) bag = ProbDist [ (biasedCoin, 0.5), (unbiasedCoin, 0.5) ]  The join operator collapses this into a single ProbDist Double String:  ProbDist [("heads",0.3), ("tails",0.2), ("heads",0.25), ("tails",0.25)]  It would be nice if join could combine the duplicate heads into a single ("heads", 0.55) entry. But that would force an Eq a constraint on the event type, which isn't allowed, because (>>=) must work for all data types, not just for instances of Eq. This is a problem with Haskell, not with the monad itself. It's the same problem that prevents one from making a good set monad in Haskell, even though categorially sets are a perfectly good monad. (The return function constructs singletons, and the join function is simply set union.) Maybe in the next language. Perhaps someone else will find the >>= operator easier to understand than join? I don't know. Anyway, it's simple enough to derive once you understand join; here's the code:  instance (Num p) => Monad (ProbDist p) where return a = ProbDist [(a, 1)] (ProbDist pas) >>= f = ProbDist$ do
(a, p) <- pas
let (ProbDist pbs) = f a
(b, q) <- pbs
return (b, p*q)

So now we can do some straightforward experiments:

	liftM2 (+) (d 6) (d 6)

ProbDist [(2,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
36),(7,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
36),(7,1 % 36),(8,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(5,1 % 36),(6,1 %
36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(6,1 %
36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 %
36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 %
36),(12,1 % 36)]

This is nasty-looking; we really need to merge the multiple listings of the same event. Here is a function to do that:

        agglomerate :: (Num p, Eq b) => (a -> b) -> ProbDist p a -> ProbDist p b
agglomerate f pd = ProbDist $foldr insert [] (unpd (fmap f pd)) where insert (k, p) [] = [(k, p)] insert (k, p) ((k', p'):kps) | k == k' = (k, p+p'):kps | otherwise = (k', p'):(insert (k,p) kps) agg :: (Num p, Eq a) => ProbDist p a -> ProbDist p a agg = agglomerate id  Then agg$ liftM2 (+) (d 6) (d 6) produces:

        ProbDist [(12,1 % 36),(11,1 % 18),(10,1 % 12),(9,1 % 9),
(8,5 % 36),(7,1 % 6),(6,5 % 36),(5,1 % 9),
(4,1 % 12),(3,1 % 18),(2,1 % 36)]

Hey, that's correct.

There must be a shorter way to write insert. It really bothers me, because it looks look it should be possible to do it as a fold. But I couldn't make it look any better.

You are not limited to calculating probabilities. The monad actually will count things. For example, let us throw three dice and count how many ways there are to throw various numbers of sixes:

        eq6 n = if n == 6 then 1 else 0
agg $liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die ProbDist [(3,1),(2,15),(1,75),(0,125)]  There is one way to throw three sixes, 15 ways to throw two sixes, 75 ways to throw one six, and 125 ways to throw no sixes. So ProbDist is a misnomer. It's easy to convert counts to probabilities:  probMap :: (p -> q) -> ProbDist p a -> ProbDist q a probMap f (ProbDist pds) = ProbDist$ (map (\(a,p) -> (a, f p))) pds

normalize :: (Fractional p) => ProbDist p a -> ProbDist p a
normalize pd@(ProbDist pas) = probMap (/ total) pd where
total = sum . (map snd) $pas normalize$ agg $probMap toRational$
liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die

ProbDist [(3,1 % 216),(2,5 % 72),(1,25 % 72),(0,125 % 216)]

I think this is the first time I've gotten to write die die die in a computer program.

The do notation is very nice. Here we calculate the distribution where we roll four dice and discard the smallest:

        stat = do
a <- d 6
b <- d 6
c <- d 6
d <- d 6
return (a+b+c+d - minimum [a,b,c,d])

probMap fromRational \$ agg stat

ProbDist [(18,1.6203703703703703e-2),
(17,4.1666666666666664e-2), (16,7.253086419753087e-2),
(15,0.10108024691358025),   (14,0.12345679012345678),
(13,0.13271604938271606),   (12,0.12885802469135801),
(11,0.11419753086419752),   (10,9.41358024691358e-2),
(9,7.021604938271606e-2),   (8,4.7839506172839504e-2),
(7,2.9320987654320986e-2),  (6,1.6203703703703703e-2),
(5,7.716049382716049e-3),   (4,3.0864197530864196e-3),
(3,7.716049382716049e-4)]


One thing I was hoping to get didn't work out. I had this idea that I'd be able to calculate the outcome of a game of craps like this:

	dice = liftM2 (+) (d 6) (d 6)

point n = do
roll <- dice
case roll of 7 -> return "lose"
_ | roll == n  = "win"
_ | otherwise  = point n

craps = do
roll <- dice
case roll of 2 -> return "lose"
3 -> return "lose"
4 -> point 4
5 -> point 5
6 -> point 6
7 -> return "win"
8 -> point 8
9 -> point 9
10 -> point 10
11 -> return "win"
12 -> return "lose"

This doesn't work at all; point is an infinite loop because the first value of dice, namely 2, causes a recursive call. I might be able to do something about this, but I'll have to think about it more.

It also occurred to me that the use of * in the definition of >>= / join could be generalized. A couple of years back I mentioned a paper of Green, Karvounarakis, and Tannen that discusses "provenance semirings". The idea is that each item in a database is annotated with some "provenance" information about why it is there, and you want to calculate the provenance for items in tables that are computed from table joins. My earlier explanation is here.

One special case of provenance information is that the provenances are probabilities that the database information is correct, and then the probabilities are calculated correctly for the joins, by multiplication and addition of probabilities. But in the general case the provenances are opaque symbols, and the multiplication and addition construct regular expressions over these symbols. One could generalize ProbDist similarly, and the ProbDist monad (even more of a misnomer this time) would calculate the provenance automatically. It occurs to me now that there's probably a natural way to view a database table join as a sort of Kleisli composition, but this article has gone on too long already.

Happy new year, everyone.

[ Addendum 20100103: unsurprisingly, this is not a new idea. Several readers wrote in with references to previous discussion of this monad, and related monads. It turns out that the idea goes back at least to 1981. ]

My thanks to Graham Hunter for his donation.

Tue, 16 Jun 2009

The Haskell folks have chosen a new logo.

Ouch.

Thu, 03 Jan 2008

Note on point-free programming style
This old comp.lang.functional article by Albert Y. C. Lai, makes the point that Unix shell pipeline programming is done in an essentially "point-free" style, using the shell example:

    grep '^X-Spam-Level' | sort | uniq | wc -l


    length . nub . sort . filter (isPrefixOf "X-Spam-Level")

Neither one explicitly mentions its argument, which is why this is "point-free". In "point-free" programming, instead of defining a function in terms of its effect on its arguments, one defines it by composing the component functions themselves, directly, with higher-order operators. For example, instead of:

  foo x y = 2 * x + y

one has, in point-free style:

  foo = (+) . (2 *)

where (2 *) is the function that doubles its argument, and (+) is the (curried) addition function. The two definitions of foo are entirely equivalent.

As the two examples should make clear, point-free style is sometimes natural, and sometimes not, and the example chosen by M. Lai was carefully selected to bias the argument in favor of point-free style.

Often, after writing a function in pointful style, I get the computer to convert it automatically to point-free style, just to see what it looks like. This is usually educational, and sometimes I use the computed point-free definition instead. As I get better at understanding point-free programming style in Haskell, I am more and more likely to write certain functions point-free in the first place. For example, I recently wrote:

        soln = int 1 (srt (add one (neg (sqr soln))))

and then scratched my head, erased it, and replaced it with the equivalent:

        soln = int 1 ((srt . (add one) . neg . sqr) soln)

I could have factored out the int 1 too:
        soln = (int 1 . srt . add one . neg . sqr) soln

I could even have removed soln from the right-hand side:

        soln = fix (int 1 . srt . add one . neg . sqr)

but I am not yet a perfect sage.

Sometimes I opt for an intermediate form, one in which some of the arguments are explicit and some are implicit. For example, as an exercise I wrote a function numOccurrences which takes a value and a list and counts the number of times the value occurs in the list. A straightforward and conventional implementation is:

        numOccurrences x []     = 0
numOccurrences x (y:ys) =
if (x == y) then 1 + rest
else                 rest
where rest = numOccurrences x ys

but the partially point-free version I wrote was much better:

        numOccurrences x = length . filter (== x)

Once you see this, it's easy to go back to a fully pointful version:

        numOccurrences x y = length (filter (== x) y)

Or you can go the other way, to a point-free version:

        numOccurrences = (length .) . filter . (==)

which I find confusing.

Anyway, the point of this note is not to argue that the point-free style is better or worse than the pointful style. Sometimes I use the one, and sometimes the other. I just want to point out that the argument made by M. Lai is deceptive, because of the choice of examples. As an equally biased counterexample, consider:

        bar x = x*x + 2*x + 1

which the automatic converter informs me can be written in point-free style as:

        bar = (1 +) . ap ((+) . join (*)) (2 *)

Perusal of this example will reveal much to the attentive reader, including the definitions of join and ap. But I don't think many people would argue that it is an improvement on the original. (Maybe I'm wrong, and people would argue that it was an improvement. I won't know for sure until I have more experience.)

For some sort of balance, here is another example where I think the point-free version is at least as good as the pointful version: a recent comment on Reddit suggested a >>> operator that composes functions just like the . operator, but in the other order, so that:

        f >>> g = g . f

or, if you prefer:

        >>> f g x = g(f(x))

The point-free definition of >>> is:

        (>>>) = flip (.)

where the flip operator takes a function of two arguments and makes a new function that does the same thing, but with the arguments in the opposite order. Whatever your feelings about point-free style, it is undeniable that the point-free definition makes perfectly clear that >>> is nothing but . with its arguments in reverse order.