# The Universe of Discourse

Mon, 15 Oct 2018

Reading over my recent article complaining about the environment functor I realized there's yet another terminology problem that makes the discussion unnecessarily confusing. “The” environment functor isn't unique. There is a family of environment functors, one for each possible environment type e. If g is the environment functor at type e, a value of type g t is a function e → t. But e could be anything and if g and h are environment functors at two different types e and e’ they are of course different functors.

This is even obvious from the definition:

    data Environ e t = Env (e -> t)
instance Functor (Environ e) where
fmap f (Env x) = Env $\e -> f (x e)  The functor isn't Environ, it's Environ e, and the functor instance declaration, as it says on line 2. (It seems to me that the notation is missing a universal quantifier somewhere, but I'm not going to open that issue.) We should speak of Environ e as an environment functor, not the environment functor. So for example instead of: When operating in the environment functor, fmap has the type (a -> b) -> g a -> g b I should have said: When operating in an environment functor, fmap has the type (a -> b) -> g a -> g b And instead of: A function p -> q is a q parcel in the environment functor I should have said: A function p -> q is a q parcel in an environment functor or A function p -> q is a q parcel in the environment functor at p although I'm not sure I like the way the prepositions are proliferating there. The same issue affects ⸢the⸣ reader monad, ⸢the⸣ state monad, and many others. I'm beginning to find remarkable how much basic terminology Haskell is missing or gets wrong. Mathematicians have a very keen appreciation of the importance of specific and precise terminology, and you'd think this would have filtered into the Haskell world. People are forever complaining that Haskell uses unfamiliar terms like “functor”, and the community's response is (properly, I think) that these terms are pre-existing and there is no point to inventing a new term that will be just as unfamiliar, or, worse, lure people into thinking that the know what it means when they don't. You don't want to call a functor a “container”, says the argument, because many functors (environment functors for example) are nothing at all like containers. I think this is wise. But having planted their flag on that hill, the Haskell folks don't then use their own terminology correctly. I complained years ago that the term “monad” was used interchangeably for four subtly different concepts, and here we actually have a fifth. I pointed out that in the case of Environment e t, common usage refers to both Environment e and Environment e t as monads, and only the first is correct. But when people say “the environment monad” they mean that Environment itself is a monad, which it is not. Fri, 12 Oct 2018 Is there any good terminology for a value of type f a when f is an arbitrary functor? I will try calling an f t value a “t parcel” and see how that works. The more I think about “parcel” the happier I am with it. It strongly suggests container types, of course, so that a t parcel might be a boxful of ts. But it also hints at some other possible situations: • You might open the parcel and find it empty. (Maybe t) • You might open the parcel and find, instead of the t you expected, a surprising prank snake. (Either ErrorMessage t) • You might open the parcel and find that your t has been shipped with assembly required. (env -> t) • The parcel might explode when you open it. (IO t) • And, of course, a burrito is a sort of parcel of meat and beans. I coined “parcel” thinking that one would want different terminology for values of type f t depending on whether f was a functor (“parcel”) or also a monad (“mote”). Of course every mote is a parcel, but not always vice versa. Now I'm not sure that both terms are needed. Non-monadic functors are unusual, and non-applicative functors rare, so perhaps one term will do for all three. Thu, 11 Oct 2018 Here we have the well-known fmap function:  fmap :: Functor f => (a -> b) -> f a -> f b  It takes a single function and a (collection of input values / decorated input value / something something input value) and produces a (collection of output values / decorated output value / something something output value). Yow, that's not going to work. Is there any good terminology for a value of type f a when f is an arbitrary functor? A while back I discussed a similar problem and suggested the term “mote” for a value in a monadic type. I will try calling an f t value a “t parcel and see how that works. So [t], Maybe t, and IO t are all examples of t parcels, in various functors. Starting over then. Here we have the well-known fmap function:  fmap :: Functor f => (a -> b) -> f a -> f b  It takes a single function, and an a parcel, and produces a b parcel, by applying the function independently to the a values in the parcel. Here is a sort of reversed version of fmap that I call pamf:  pamf :: Functor f => f (a -> b) -> a -> f b  It takes a parcel of functions, and a single input and produces a parcel of outputs, by applying each function in the parcel independently to the single a value. It can be defined in terms of fmap:  pamf fs a = fmap ($ a) fs


So far so good. Now I ask you to predict the type of

    pamf fmap


Certainly it should start out with

    pamf fmap :: (Functor f, Functor g) => ...


because the pamf and the fmap might be operating in two different functors, right? Indeed, if I compose the functions the other way around, fmap pamf, the type does begin this way; it is:

    (Functor f, Functor g) => f (g (a -> b)) -> f (a -> g b)


The f here is the functor in which fmap operates, and the g is the functor in which pamf is operating. In general fmap takes an arbitrary function

              a       ->      b


and lifts it to a new function that operates in the f functor:

            f a       ->    f b


Here it has taken pamf, which is a function

          g (a -> b)  ->     (a -> g b)


and lifted it to a new function that operates in the f functor:

       f (g (a -> b))  ->  f (a -> g b)


This is complicated but straightforward. Okay, that was fmap pamf. What about pamf fmap though? The computed type is

        pamf fmap :: Functor f => f a -> (a -> b) -> f b


and when I saw this I said “What. Where did g go? What happened to g?”

Then I paused and for a while and said “… I bet it's that goddamn environment thing again.” Yep, that's what it was. It's the environment functor, always turning up where I don't want it and least expect it, like that one guy we all went to college with. The environment functor, by the way, is yet another one of those things that Haskell ought to have a standard name for, but doesn't. The phrase “the reader monad” is fairly common, but here I only want the functor part of the monad. And people variously say “reader monad”, “environment monad”, and “evaluation monad” to mean the same thing. In this article, it will be the environment functor.

Here's what happened. Here are fmap and pamf again:

    fmap :: Functor f => (p -> q) -> f p -> f q

pamf :: Functor g => g (a -> b) -> a -> g b


The first argument to pamf should be a parcel in the g functor. But fmap is not a parcel, so pamf fmap will be a type error, right? Wrong! If you are committed enough, there is a way to construe any function as a parcel. A function p -> q is a q parcel in the environment functor. Say that g denotes an environment functor. In this functor, a parcel of type g t is a function which consults an “environment” of type e and yields a result of type t. That is, $$g t \equiv e \to t.$$

When operating in the environment functor, fmap has the type (a -> b) -> g a -> g b, which is shorthand for (a -> b) -> (e -> a) -> (e -> b). This instance of fmap is defined this way:

    fmap f x = \e -> f (x e)


or shorter and more mysteriously

    fmap = (.)


which follows by η-reduction, something Haskell enthusiasts never seem to get enough of.

In fmap f x, the x isn't the actual value to give to f; instead it's a parcel, as it always is with fmap. In the context of the environment functor, x is a function that consults the environment e and returns an a. The result of fmap f x is a new parcel: it uses x to consult the supplied environment for a value of type a, which it then feeds to f to get the required value of type b.

In the application pamf fmap, the left side pamf wants fmap to be a parcel. But it's not a parcel, it's a function. So, type error, right? No! Any function is a parcel if you want it to be, it's a parcel in the environment functor! And fmap is a function:

    fmap :: Functor f => (p -> q) -> f p -> f q


so it can be understood as a parcel in the environment functor, where the environment e has type p -> q. Then pamf is operating in this environment functor, so $$g t = (p \to q) \to t.$$ A g t parcel is a function that consults an “environment” of type p -> q and somehow produces a t value. (Haskell folks, who are obsessed with currying all the things, will write this as the nearly unreadable g = ((->) (p -> q)).)

We wanted pamf to have this type:

    pamf :: Functor g =>            g (a -> b)  -> a ->            g b


and since Haskell has decided that g must be the environment functor with !!g x \equiv (p \to q) \to x!!, this is an abbreviation for:

    pamf ::              ((p -> q) -> (a -> b)) -> a -> ((p -> q) -> b)


To apply this to fmap, we have to unify the type of pamf's argument, which is (p -> q) -> (a -> b), and the type of fmap, which is (p -> q) -> (f p -> f q). Then !!a\equiv f p!! and !!b \equiv f q!!, so the result of pamf fmap is

    pamf fmap :: Functor f => f p -> ((p -> q) -> f q)


Where did g go? It was specialized to mean the environment functor ((->) (p -> q)), so it's gone.

The funny thing about the type of pamf fmap is that it is exactly the type of flip fmap, which is fmap with the order of its two arguments reversed:

   (flip fmap) x f ≡ fmap f x


and indeed, by some theorem or other, because the types are identical, the functions themselves must be identical also! (There are some side conditions, all of which hold here.) The two functions pamf fmap and flip fmap are identical. Analogous to the way fmap, restricted to the environment functor, is identical to (.), pamf, when similarly restricted, is exactly flip. You can even see this from its type:

    pamf :: ((p -> q) -> (a -> b)) -> a -> ((p -> q) -> b)


Or, cleaning up some superfluous parentheses and inserting some new ones:

    pamf :: ((p -> q) ->  a -> b) -> (a ->  (p -> q) -> b)


And putting !!c = p\to q!!:

    pamf :: (c        -> a -> b) -> (a -> c        -> b)
flip :: (                 the same                 )


Honestly, I would have preferred a type error: “Hey, dummy, fmap has the wrong type to be an argument to pamf, which wants a functorial value.” Instead I got “Okay, if you want functions to be a kind of functor I can do that, also wouldn't it be simpler if the universe was two-dimensional and there were only three kinds of quarks? Here you go, no need to thank me!” Maybe someone can explain to me why this is a useful behavior, and then explain why it is so useful that it should happen automatically and implicitly instead of being triggered by some lexical marker like:

    newtype Environment e a = Environment (e -> a)
instance Functor (Environment e) where
fmap f (Environment x) = Environment $\e -> f (x e)  I mean, seriously, suppose you wrote a + b where b was accidentally a function instead of a number. What if when you did that, instead of a type error, Haskell would silently shift into some restricted domain in which it could implicitly interpret b as a number in some weird way and give you something totally bizarre? Isn't the whole point of Haskell supposed to be that it doesn't implicitly convert things that way? Sat, 08 Sep 2018 Here's something else that often goes wrong when I am writing a Haskell program. It's related to the problem in the previous article but not the same. Let's say I'm building a module for managing polynomials. Say Polynomial a is the type of (univariate) polynomials over some number-like set of coefficients a. Now clearly this is going to be a functor, so I define the Functor instance, which is totally straightforward:  instance Functor Polynomial where fmap f (Poly a) = Poly$ map f a


Then I ask myself if it is also going to be an Applicative. Certainly the pure function makes sense; it just lifts a number to be a constant polynomial:

       pure a = Poly [a]


But what about <*>? This would have the type:

    (Polynomial (a -> b)) -> Polynomial a -> Polynomial b


The first argument there is a polynomial whose coefficients are functions. This is not something we normally deal with. That ought to be the end of the matter.

But instead I pursue it just a little farther. Suppose we did have such an object. What would it mean to apply a functional polynomial and an ordinary polynomial? Do we apply the functions on the left to the coefficients on the right and then collect like terms? Say for example

\begin{align} \left((\sqrt\bullet) \cdot x + \left(\frac1\bullet\right) \cdot 1 \right) ⊛ (9x+4) & = \sqrt9 x^2 + \sqrt4 x + \frac19 x + \frac14 \\ & = 3x^2 + \frac{19}{9} x + \frac 14 \end{align}

Well, this is kinda interesting. And it would mean that the pure definition wouldn't be what I said; instead it would lift a number to a constant function:

    pure a = Poly [λ_ -> a]


Then the ⊛ can be understood to be just like polynomial multiplication, except that coefficients are combined with function composition instead of with multiplication. The operation is associative, as one would hope and expect, and even though the ⊛ operation is not commutative, it has a two-sided identity element, which is Poly [id]. Then I start to wonder if it's useful for anything, and how ⊛ interacts with ordinary multiplication, and so forth.

This is different from the failure mode of the previous article because in that example I was going down a Haskell rabbit hole of more and more unnecessary programming. This time the programming is all trivial. Instead, I've discovered a new kind of mathematical operation and I abandon the programming entirely and go off chasing a mathematical wild goose.

Mon, 03 Sep 2018

Whenever I try to program in Haskell, the same thing always goes wrong. Here is an example.

I am writing a module to operate on polynomials. The polynomial !!x^3 - 3x + 1!! is represented as

    Poly [1, -3, 0, 1]


[ Addendum 20180904: This is not an error. The !!x^3!! term is last, not first. Much easier that way. Fun fact: two separate people on Reddit both commented that I was a dummy for not doing it the easy way, which is the way I did do it. Fuckin' Reddit, man. ]

I want to add two polynomials. To do this I just add the corresponding coefficients, so it's just

    (Poly a) + (Poly b) = Poly $zipWith (+) a b  Except no, that's wrong, because it stops too soon. When the lists are different lengths, zipWith discards the extra, so for example it says that !!(x^2 + x + 1) + (2x + 2) = 3x + 3!!, because it has discarded the extra !!x^2!! term. But I want it to keep the extra, as if the short list was extended with enough zeroes. This would be a correct implementation:  (Poly a) + (Poly b) = Poly$ addup a b   where


and I can write this off the top of my head.

But do I? No, this is where things go off the rails. “I ought to be able to generalize this,” I say. “I can define a function like zipWith that is defined over any Monoid, it will combine the elements pairwise with mplus, and when one of the lists runs out, it will pretend that that one has some memptys stuck on the end.” Here I am thinking of something like ffff :: Monoid a => [a] -> [a] -> [a], and then the (+) above would just be

    (Poly a) + (Poly b) = Poly (ffff a b)


as long as there is a suitable Monoid instance for the as and bs.

I could write ffff in two minutes, but instead I spend fifteen minutes looking around in Hoogle to see if there is already an ffff, and I find mzip, and waste time being confused by mzip, until I notice that I was only confused because mzip is for Monad, not for Monoid, and is not what I wanted at all.

So do I write ffff and get on with my life? No, I'm still not done. It gets worse. “I ought to be able to generalize this,” I say. “It makes sense not just for lists, but for any Traversable… Hmm, or does it?” Then I start thinking about trees and how it should decide when to recurse and when to give up and use mempty, and then I start thinking about the Maybe version of it.

Then I open a new file and start writing

    mzip :: (Traversable f, Monoid a) => f a -> f a -> f a
mzip as bs = …


And I go father and farther down the rabbit hole and I never come back to what I was actually working on. Maybe the next step in this descent into madness is that I start thinking about how to perform unification of arbitrary algebraic data structures, I abandon mzip and open a new file for defining class Unifiable

Actually when I try to program in Haskell there a lot of things that go wrong and this is only one of them, but it seems like this one might be more amenable to a quick fix than some of the other things.

[ Addendum 20180904: A lobste.rs user points out that I don't need Monoid, but only Semigroup, since I don't need mempty. True that! I didn't know there was a Semigroup class. ]

Wed, 08 Aug 2018

[ Previously: [1] [2] ]

In my original article, I said:

I was fairly confident I had seen something like this somewhere before, and that it was not original to me.

Jeremy Yallop brought up an example that I had definitely seen before.

In 2008 Conor McBride and Ross Paterson wrote an influential paper, “Idioms: applicative programming with effects” that introduced the idea of an applicative functor, a sort of intermediate point between functors and monads. It has since made its way into standard Haskell and was deemed sufficiently important to be worth breaking backward compatibility.

McBride and Paterson used several notations for operations in an applicative functor. Their primary notation was !!\iota!! for what is now known as pure and !!\circledast!! for what has since come to be written as <*>. But the construction

$$\iota f \circledast is_1 \circledast \ldots \circledast is_n$$

came up so often they wanted a less cluttered notation for it:

We therefore find it convenient, at least within this paper, to write this form using a special notation

$$[\![ f is_1 \ldots is_n ]\!]$$

The brackets indicate a shift into an idiom where a pure function is applied to a sequence of computations. Our intention is to provide a sufficient indication that effects are present without compromising the readability of the code.

On page 5, they suggested an exercise:

… show how to replace !![\![!! and !!]\!]!! by identifiers iI and Ii whose computational behaviour delivers the above expansion.

They give a hint, intended to lead the reader to the solution, which involves a function named iI that does some legerdemain on the front end and then a singleton type data Ii = Ii that terminates the legerdemain on the back end. The upshot is that one can write

iI f x y Ii


and have it mean

(pure f) <*> x <*> y


The haskell wiki has details, written by Don Stewart when the McBride-Paterson paper was still in preprint. The wiki goes somewhat further, also defining

 data J = J


so that

iI f x y J z Ii


now does a join on the result of f x y before applying the result to z.

I have certainly read this paper more than once, and I was groping for this example while I was writing the original article, but I couldn't quite put my finger on it. Thank you, M. Yallop!

[ By the way, I am a little bit disappointed that the haskell wiki is not called “Hicki”. ]

In the previous article I described a rather odd abuse of the Haskell type system to use a singleton type as a sort of pseudo-keyword, and asked if anyone had seen this done elsewhere.

Joachim Breitner reported having seen this before. Most recently in LiquidHaskell, which defines a QED singleton type:

 data QED = QED
infixl 2 ***

(***) :: a -> QED -> Proof
_ *** _ = ()


so that they can end every proof with *** QED:

singletonP x
=   reverse [x]
==. reverse [] ++ [x]
==. [] ++ [x]
==. [x]
*** QED


This example is from Vazou et al., Functional Pearl: Theorem Proving for All, p. 3. The authors explain: “The QED argument serves a purely aesthetic purpose, allowing us to conclude proofs with *** QED.”.

Or see the examples from the bottom of the LH splash page, proving the associative law for ++.

I looked in the rest of the LiquidHaskell distribution but did not find any other uses of the singleton-type trick. I would still be interested to see more examples.

A friend asked me the other day about techniques in Haskell to pretend to make up keywords. For example, suppose we want something like a (monadic) while loop, say like this:

      while cond act =
cond >>= \b -> if b then act >> while cond act
else return ()


This uses a condition cond (which might be stateful or exception-throwing or whatever, but which must yield a boolean value) and an action act (likewise, but its value is ignored) and it repeates the action over and over until the condition is false.

Now suppose for whatever reason we don't like writing it as while condition action and we want instead to write while condition do action or something of that sort. (This is a maximally simple example, but the point should be clear even though it is silly.) My first suggestion was somewhat gross:

      while c _ a = ...


Now we can write

      while condition "do" action


and the "do" will be ignored. Unfortunately we can also write while condition "wombat" action and you know how programmers are when you give them enough rope.

But then I had a surprising idea. We can define it this way:

      data Do = Do
while c Do a = ...


Now we write

      while condition
Do action


and if we omit or misspell the Do we get a compile-time type error that is not even too obscure.

For a less trivial (but perhaps sillier) example, consider:

    data Exception a = OK a | Exception String

data Catch = Catch
data OnSuccess = OnSuccess
data AndThen = AndThen

try computation Catch handler OnSuccess success AndThen continuation =
case computation of OK a        -> success >> (OK a) >>= continuation
Exception e ->            (handler e) >>= continuation


The idea here is that we want to try a computation, and do one thing if it succeeds and another if it throws an exception. The point is not the usefulness of this particular and somewhat contrived exception handling construct, it's the syntactic sugar of the Catch, OnSuccess, and AndThen:

    try (evaluate some_expression)
Catch (\error -> case error of "Divison by zero" -> ...
... )
OnSuccess ...
AndThen ...


I was fairly confident I had seen something like this somewhere before, and that it was not original to me. But I've asked several Haskell experts and nobody has said it was familar. I thought perhaps I had seen it somewhere in Brent Yorgey's code, but he vehemently denied it.

So my question is, did I make up this technique of using a one-element type as a pretend keyword?

[ Addendum: At least one example of this trick appears in LiquidHaskell. I would be interested to hear about other places it has been used. ]

[ Addendum: Jeremy Yallop points out that a similar trick was hinted at in McBride and Paterson “Idioms: applicative programming with effects” (2008), with which I am familiar, although their trick is both more useful and more complex. So this might have been what I was thinking of. ]

Fri, 24 Apr 2015

Suppose you would like to perform an exhaustive search. Let's say for concreteness that we would like to solve this cryptarithm puzzle:

    S E N D
+   M O R E
-----------
M O N E Y


This means that we want to map the letters S, E, N, D, M, O, R, Y to distinct digits 0 through 9 to produce a five-digit and two four-digit numerals which, when added in the indicated way, produce the indicated sum.

(This is not an especially difficult example; my 10-year-old daughter Katara was able to solve it, with some assistance, in about 30 minutes.)

If I were doing this in Perl, I would write up either a recursive descent search or a solution based on a stack or queue of partial solutions which the program would progressively try to expand to a full solution, as per the techniques of chapter 5 of Higher-Order Perl. In Haskell, we can use the list monad to hide all the searching machinery under the surface. First a few utility functions:

    import Control.Monad (guard)

digits = [0..9]

to_number = foldl (\a -> \b -> a*10 + b) 0
remove rs ls = foldl remove' ls rs
where remove' ls x = filter (/= x) ls


to_number takes a list of digits like [1,4,3] and produces the number they represent, 143. remove takes two lists and returns all the things in the second list that are not in the first list. There is probably a standard library function for this but I don't remember what it is. This version is !!O(n^2)!!, but who cares.

Now the solution to the problem is:

    --     S E N D
--   + M O R E
--   ---------
--   M O N E Y

solutions = do
s <- remove [0] digits
e <- remove [s] digits
n <- remove [s,e] digits
d <- remove [s,e,n] digits
let send = to_number [s,e,n,d]
m <- remove [0,s,e,n,d] digits
o <- remove [s,e,n,d,m] digits
r <- remove [s,e,n,d,m,o] digits
let more = to_number [m,o,r,e]
y <- remove [s,e,n,d,m,o,r] digits
let money = to_number [m,o,n,e,y]
guard $send + more == money return (send, more, money)  Let's look at just the first line of this:  solutions = do s <- remove [0] digits …  The do notation is syntactic sugar for  (remove [0] digits) >>= \s -> …  where “…” is the rest of the block. To expand this further, we need to look at the overloading for >>= which is implemented differently for every type. The mote on the left of >>= is a list value, and the definition of >>= for lists is:  concat$ map (\s -> …) (remove [0] digits)


where “…” is the rest of the block.

So the variable s is bound to each of 1,2,3,4,5,6,7,8,9 in turn, the rest of the block is evaluated for each of these nine possible bindings of s, and the nine returned lists of solutions are combined (by concat) into a single list.

The next line is the same:

      e <- remove [s] digits


for each of the nine possible values for s, we loop over nine value for e (this time including 0 but not including whatever we chose for s) and evaluate the rest of the block. The nine resulting lists of solutions are concatenated into a single list and returned to the previous map call.

      n <- remove [s,e] digits
d <- remove [s,e,n] digits


This is two more nested loops.

      let send = to_number [s,e,n,d]


At this point the value of send is determined, so we compute and save it so that we don't have to repeatedly compute it each time through the following 300 loop executions.

      m <- remove [0,s,e,n,d] digits
o <- remove [s,e,n,d,m] digits
r <- remove [s,e,n,d,m,o] digits
let more = to_number [m,o,r,e]


Three more nested loops and another computation.

      y <- remove [s,e,n,d,m,o,r] digits
let money = to_number [m,o,n,e,y]


Yet another nested loop and a final computation.

      guard $send + more == money return (send, more, money)  This is the business end. I find guard a little tricky so let's look at it slowly. There is no binding (<-) in the first line, so these two lines are composed with >> instead of >>=:  (guard$ send + more == money) >> (return (send, more, money))


which is equivalent to:

      (guard $send + more == money) >>= (\_ -> return (send, more, money))  which means that the values in the list returned by guard will be discarded before the return is evaluated. If send + more == money is true, the guard expression yields [()], a list of one useless item, and then the following >>= loops over this one useless item, discards it, and returns yields a list containing the tuple (send, more, money) instead. But if send + more == money is false, the guard expression yields [], a list of zero useless items, and then the following >>= loops over these zero useless items, never runs return at all, and yields an empty list. The result is that if we have found a solution at this point, a list containing it is returned, to be concatenated into the list of all solutions that is being constructed by the nested concats. But if the sum adds up wrong, an empty list is returned and concated instead. After a few seconds, Haskell generates and tests 1.36 million choices for the eight bindings, and produces the unique solution:  [(9567,1085,10652)]  That is:  S E N D 9 5 6 7 + M O R E + 1 0 8 5 ----------- ----------- M O N E Y 1 0 6 5 2  It would be an interesting and pleasant exercise to try to implement the same underlying machinery in another language. I tried this in Perl once, and I found that although it worked perfectly well, between the lack of the do-notation's syntactic sugar and Perl's clumsy notation for lambda functions (sub { my ($s) = @_; … } instead of \s -> …) the result was completely unreadable and therefore unusable. However, I suspect it would be even worse in Python because of semantic limitations of that language. I would be interested to hear about this if anyone tries it.

[ Addendum: Thanks to Tony Finch for pointing out the η-reduction I missed while writing this at 3 AM. ]

[ Addendum: Several people so far have misunderstood the question about Python in the last paragraph. The question was not to implement an exhaustive search in Python; I had no doubt that it could be done in a simple and clean way, as it can in Perl. The question was to implement the same underlying machinery, including the list monad and its bind operator, and to find the solution using the list monad.

[ Peter De Wachter has written in with a Python solution that clearly demonstrates that the problems I was worried about will not arise, at least for this task. I hope to post his solution in the next few days. ]

[ Addendum 20150803: De Wachter's solution and one in Perl ]

Thu, 26 Aug 2010

I think one problem (of many) that beginners might have with Haskell monads is the confusing terminology. The word "monad" can refer to four related but different things:

2. When a type constructor T of kind ∗ → ∗ is an instance of Monad we say that T "is a monad". For example, "Tree is a monad"; "((→) a) is a monad". This is the only usage that is strictly corrrect.

3. Types resulting from the application of monadic type constructors (#2) are sometimes referred to as monads. For example, "[Integer] is a monad".

Usage #1 is not a real problem; it does not occur that often, and is readily distinguished by context, capitalization, type font, and other markers. #2 is actually correct, so there is no problem there. #3 seems to be an uncommon colloquialism.

The most serious problem here is #4, that people refer to individual values of monadic types as "monads". Even when they don't do this, they are hampered by the lack of a good term for it. As I know no good alternative has been proposed. People often say "monadic value" (I think), which is accurate, but something of a mouthful.

One thing I have discovered in my writing life is that the clarity of a confusing document can sometimes be improved merely by replacing a polysyllabic noun phrase with a monosyllable. For example, chapter 3 of Higher-Order Perl discussed the technique of memoizing a function by generating an anonymous replacement for it that maintains a cache and calls the real function on a cache miss. Early drafts were hard to understand, and improved greatly when I replaced the phrase "anonymous replacement function" with "stub". The Perl documentation was significantly improved merely by replacing "associative array" everywhere with "hash" and "funny punctuation character" with "sigil".

I think a monosyllabic replacement for "monadic value" would be a similar boon to discussion of monads, not just for beginners but for everyone else too. The drawback, of introducing yet another jargon term, would in this case be outweighed by the benefits. Jargon can obscure, but sometimes it can clarify.

The replacement word should be euphonious, clear but not overly specific, and not easily confused with similar jargon words. It would probably be good for it to begin with the letter "m". I suggest:

mote

So return takes a value and returns a mote. The >>= function similarly lifts a function on pure values to a function on motes; when the mote is a container one may think of >>= as applying the function to the values in the container. [] is a monad, so lists are motes. The expression on the right-hand side of a var ← expr in a do-block must have mote type; it binds the mote on the right to the name on the left, using the >>= operator.

I have been using this term privately for several months, and it has been a small but noticeable success. Writing and debugging monadic programs is easier because I have a simple name for the motes that the program manipulates, which I can use when I mumble to myself: "What is the type error here? Oh, commit should be returning a mote." And then I insert return in the right place.

I'm don't want to oversell the importance of this invention. But there is clearly a gap in the current terminology, and I think it is well-filled by "mote".

(While this article was in progress I discovered that What a Monad is not uses the nonceword "mobit". I still prefer "mote".)

Sun, 03 Jan 2010

A short bibliography of probability monads
Several people helpfully wrote to me to provide references to earlier work on probability distribution monads. Here is a summary:

My thanks to Stephen Tetley, Gaal Yahas, and Luke Palmer for these.

I did not imagine that my idea was a new one. I arrived at it by thinking about List as a representation of non-deterministic computation. But if you think of it that way, the natural interpretation is that every list element represents an equally likely outcome, and so annotating the list elements with probabilities is the obvious next step. So the existence of the Erwig library was not a big surprise.

A little more surprising though, were the references in the Erwig paper. Specifically, the idea dates back to at least 1981; Erwig cites a paper that describes the probability monad in a pure-mathematics context.

Nobody responded to my taunting complaint about Haskell's failure to provide support a good monad of sets. It may be that this is because they all agree with me. (For example, the documentation of the Erwig package says "Unfortunately we cannot use a more efficient data structure because the key type must be of class Ord, but the Monad class does not allow constraints for result types.") But a number of years ago I said that the C++ macro processor blows goat dick. I would not have put it so strongly had I not naïvely believed that this was a universally-held opinion. But no, plenty of hapless C++ programmers wrote me indignant messages defending their macro system. So my being right is no guarantee that language partisans will not dispute with me, and the Haskell community's failure to do so in this case reflects well on them, I think.

Thu, 31 Dec 2009

A monad for probability and provenance
I don't quite remember how I arrived at this, but it occurred to me last week that probability distributions form a monad. This is the first time I've invented a new monad that I hadn't seen before; then I implemented it and it behaved pretty much the way I thought it would. So I feel like I've finally arrived, monadwise.

Suppose a monad value represents all the possible outcomes of an event, each with a probability of occurrence. For concreteness, let's suppose all our probability distributions are discrete. Then we might have:

	data ProbDist p a = ProbDist [(a,p)] deriving (Eq, Show)
unpd (ProbDist ps) = ps

Each a is an outcome, and each p is the probability of that outcome occurring. For example, biased and unbiased coins:

    unbiasedCoin = ProbDist [ ("heads", 0.5),
("tails", 0.5) ];

biasedCoin   = ProbDist [ ("heads", 0.6),
("tails", 0.4) ];


Or a couple of simple functions for making dice:

    import Data.Ratio

d sides = ProbDist [(i, 1 % sides) | i <- [1 .. sides]]
die = d 6


d n is an n-sided die.

The Functor instance is straightforward:

    instance Functor (ProbDist p) where
fmap f (ProbDist pas) = ProbDist $map (\(a,p) -> (f a, p)) pas  The Monad instance requires return and >>=. The return function merely takes an event and turns it into a distribution where that event occurs with probability 1. I find join easier to think about than >>=. The join function takes a nested distribution, where each outcome of the outer distribution specifies an inner distribution for the actual events, and collapses it into a regular, overall distribution. For example, suppose you put a biased coin and an unbiased coin in a bag, then pull one out and flip it:  bag :: ProbDist Double (ProbDist Double String) bag = ProbDist [ (biasedCoin, 0.5), (unbiasedCoin, 0.5) ]  The join operator collapses this into a single ProbDist Double String:  ProbDist [("heads",0.3), ("tails",0.2), ("heads",0.25), ("tails",0.25)]  It would be nice if join could combine the duplicate heads into a single ("heads", 0.55) entry. But that would force an Eq a constraint on the event type, which isn't allowed, because (>>=) must work for all data types, not just for instances of Eq. This is a problem with Haskell, not with the monad itself. It's the same problem that prevents one from making a good set monad in Haskell, even though categorially sets are a perfectly good monad. (The return function constructs singletons, and the join function is simply set union.) Maybe in the next language. Perhaps someone else will find the >>= operator easier to understand than join? I don't know. Anyway, it's simple enough to derive once you understand join; here's the code:  instance (Num p) => Monad (ProbDist p) where return a = ProbDist [(a, 1)] (ProbDist pas) >>= f = ProbDist$ do
(a, p) <- pas
let (ProbDist pbs) = f a
(b, q) <- pbs
return (b, p*q)

So now we can do some straightforward experiments:

	liftM2 (+) (d 6) (d 6)

ProbDist [(2,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
36),(7,1 % 36),(3,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
36),(7,1 % 36),(8,1 % 36),(4,1 % 36),(5,1 % 36),(6,1 %
36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(5,1 % 36),(6,1 %
36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(6,1 %
36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 %
36),(7,1 % 36),(8,1 % 36),(9,1 % 36),(10,1 % 36),(11,1 %
36),(12,1 % 36)]

This is nasty-looking; we really need to merge the multiple listings of the same event. Here is a function to do that:

        agglomerate :: (Num p, Eq b) => (a -> b) -> ProbDist p a -> ProbDist p b
agglomerate f pd = ProbDist $foldr insert [] (unpd (fmap f pd)) where insert (k, p) [] = [(k, p)] insert (k, p) ((k', p'):kps) | k == k' = (k, p+p'):kps | otherwise = (k', p'):(insert (k,p) kps) agg :: (Num p, Eq a) => ProbDist p a -> ProbDist p a agg = agglomerate id  Then agg$ liftM2 (+) (d 6) (d 6) produces:

        ProbDist [(12,1 % 36),(11,1 % 18),(10,1 % 12),(9,1 % 9),
(8,5 % 36),(7,1 % 6),(6,5 % 36),(5,1 % 9),
(4,1 % 12),(3,1 % 18),(2,1 % 36)]

Hey, that's correct.

There must be a shorter way to write insert. It really bothers me, because it looks look it should be possible to do it as a fold. But I couldn't make it look any better.

You are not limited to calculating probabilities. The monad actually will count things. For example, let us throw three dice and count how many ways there are to throw various numbers of sixes:

        eq6 n = if n == 6 then 1 else 0
agg $liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die ProbDist [(3,1),(2,15),(1,75),(0,125)]  There is one way to throw three sixes, 15 ways to throw two sixes, 75 ways to throw one six, and 125 ways to throw no sixes. So ProbDist is a misnomer. It's easy to convert counts to probabilities:  probMap :: (p -> q) -> ProbDist p a -> ProbDist q a probMap f (ProbDist pds) = ProbDist$ (map (\(a,p) -> (a, f p))) pds

normalize :: (Fractional p) => ProbDist p a -> ProbDist p a
normalize pd@(ProbDist pas) = probMap (/ total) pd where
total = sum . (map snd) $pas normalize$ agg $probMap toRational$
liftM3 (\a b c -> eq6 a + eq6 b + eq6 c) die die die

ProbDist [(3,1 % 216),(2,5 % 72),(1,25 % 72),(0,125 % 216)]

I think this is the first time I've gotten to write die die die in a computer program.

The do notation is very nice. Here we calculate the distribution where we roll four dice and discard the smallest:

        stat = do
a <- d 6
b <- d 6
c <- d 6
d <- d 6
return (a+b+c+d - minimum [a,b,c,d])

probMap fromRational \$ agg stat

ProbDist [(18,1.6203703703703703e-2),
(17,4.1666666666666664e-2), (16,7.253086419753087e-2),
(15,0.10108024691358025),   (14,0.12345679012345678),
(13,0.13271604938271606),   (12,0.12885802469135801),
(11,0.11419753086419752),   (10,9.41358024691358e-2),
(9,7.021604938271606e-2),   (8,4.7839506172839504e-2),
(7,2.9320987654320986e-2),  (6,1.6203703703703703e-2),
(5,7.716049382716049e-3),   (4,3.0864197530864196e-3),
(3,7.716049382716049e-4)]


One thing I was hoping to get didn't work out. I had this idea that I'd be able to calculate the outcome of a game of craps like this:

	dice = liftM2 (+) (d 6) (d 6)

point n = do
roll <- dice
case roll of 7 -> return "lose"
_ | roll == n  = "win"
_ | otherwise  = point n

craps = do
roll <- dice
case roll of 2 -> return "lose"
3 -> return "lose"
4 -> point 4
5 -> point 5
6 -> point 6
7 -> return "win"
8 -> point 8
9 -> point 9
10 -> point 10
11 -> return "win"
12 -> return "lose"

This doesn't work at all; point is an infinite loop because the first value of dice, namely 2, causes a recursive call. I might be able to do something about this, but I'll have to think about it more.

It also occurred to me that the use of * in the definition of >>= / join could be generalized. A couple of years back I mentioned a paper of Green, Karvounarakis, and Tannen that discusses "provenance semirings". The idea is that each item in a database is annotated with some "provenance" information about why it is there, and you want to calculate the provenance for items in tables that are computed from table joins. My earlier explanation is here.

One special case of provenance information is that the provenances are probabilities that the database information is correct, and then the probabilities are calculated correctly for the joins, by multiplication and addition of probabilities. But in the general case the provenances are opaque symbols, and the multiplication and addition construct regular expressions over these symbols. One could generalize ProbDist similarly, and the ProbDist monad (even more of a misnomer this time) would calculate the provenance automatically. It occurs to me now that there's probably a natural way to view a database table join as a sort of Kleisli composition, but this article has gone on too long already.

Happy new year, everyone.

[ Addendum 20100103: unsurprisingly, this is not a new idea. Several readers wrote in with references to previous discussion of this monad, and related monads. It turns out that the idea goes back at least to 1981. ]

My thanks to Graham Hunter for his donation.

Tue, 16 Jun 2009

The Haskell folks have chosen a new logo.

Ouch.

Thu, 03 Jan 2008

Note on point-free programming style
This old comp.lang.functional article by Albert Y. C. Lai, makes the point that Unix shell pipeline programming is done in an essentially "point-free" style, using the shell example:

    grep '^X-Spam-Level' | sort | uniq | wc -l


    length . nub . sort . filter (isPrefixOf "X-Spam-Level")

Neither one explicitly mentions its argument, which is why this is "point-free". In "point-free" programming, instead of defining a function in terms of its effect on its arguments, one defines it by composing the component functions themselves, directly, with higher-order operators. For example, instead of:

  foo x y = 2 * x + y

one has, in point-free style:

  foo = (+) . (2 *)

where (2 *) is the function that doubles its argument, and (+) is the (curried) addition function. The two definitions of foo are entirely equivalent.

As the two examples should make clear, point-free style is sometimes natural, and sometimes not, and the example chosen by M. Lai was carefully selected to bias the argument in favor of point-free style.

Often, after writing a function in pointful style, I get the computer to convert it automatically to point-free style, just to see what it looks like. This is usually educational, and sometimes I use the computed point-free definition instead. As I get better at understanding point-free programming style in Haskell, I am more and more likely to write certain functions point-free in the first place. For example, I recently wrote:

        soln = int 1 (srt (add one (neg (sqr soln))))

and then scratched my head, erased it, and replaced it with the equivalent:

        soln = int 1 ((srt . (add one) . neg . sqr) soln)

I could have factored out the int 1 too:
        soln = (int 1 . srt . add one . neg . sqr) soln

I could even have removed soln from the right-hand side:

        soln = fix (int 1 . srt . add one . neg . sqr)

but I am not yet a perfect sage.

Sometimes I opt for an intermediate form, one in which some of the arguments are explicit and some are implicit. For example, as an exercise I wrote a function numOccurrences which takes a value and a list and counts the number of times the value occurs in the list. A straightforward and conventional implementation is:

        numOccurrences x []     = 0
numOccurrences x (y:ys) =
if (x == y) then 1 + rest
else                 rest
where rest = numOccurrences x ys

but the partially point-free version I wrote was much better:

        numOccurrences x = length . filter (== x)

Once you see this, it's easy to go back to a fully pointful version:

        numOccurrences x y = length (filter (== x) y)

Or you can go the other way, to a point-free version:

        numOccurrences = (length .) . filter . (==)

which I find confusing.

Anyway, the point of this note is not to argue that the point-free style is better or worse than the pointful style. Sometimes I use the one, and sometimes the other. I just want to point out that the argument made by M. Lai is deceptive, because of the choice of examples. As an equally biased counterexample, consider:

        bar x = x*x + 2*x + 1

which the automatic converter informs me can be written in point-free style as:

        bar = (1 +) . ap ((+) . join (*)) (2 *)

Perusal of this example will reveal much to the attentive reader, including the definitions of join and ap. But I don't think many people would argue that it is an improvement on the original. (Maybe I'm wrong, and people would argue that it was an improvement. I won't know for sure until I have more experience.)

For some sort of balance, here is another example where I think the point-free version is at least as good as the pointful version: a recent comment on Reddit suggested a >>> operator that composes functions just like the . operator, but in the other order, so that:

        f >>> g = g . f

or, if you prefer:

        >>> f g x = g(f(x))

The point-free definition of >>> is:

        (>>>) = flip (.)

where the flip operator takes a function of two arguments and makes a new function that does the same thing, but with the arguments in the opposite order. Whatever your feelings about point-free style, it is undeniable that the point-free definition makes perfectly clear that >>> is nothing but . with its arguments in reverse order.