The Universe of Discourse


Sat, 20 Oct 2018

I struggle to understand Traversable

Haskell evolved a lot since the last time I seriously wrote any Haskell code, so much so that all my old programs broke. My Monad instances don't compile any more because I'm no longer allowed to have a monad which isn't also an instance of Applicative. Last time I used Haskell, Applicative wasn't even a thing. I had read the McBride and Paterson paper that introduced applicative functors, but that was years ago, and I didn't remember any of the details. (In fact, while writing this article, I realized that the paper I read was a preprint, and I probably read it before it was published, in 2008.) So to resuscitate my old code I had to implement a bunch of <*> functions and since I didn't really understand what it was supposed to be doing I couldn't do that. It was a very annoying experience.

Anyway I got that more or less under control (maybe I'll publish a writeup of that later) and moved on to Traversable which, I hadn't realized before, was also introduced in that same paper. (In the prepublication version, Traversable was been given the unmemorable name IFunctor.) I had casually looked into this several times in the last few years but I never found anything enlightening. A Traversable is a functor (which must also implement Foldable, but let's pass over that for now, no pun intended) that implements a traverse method with the following signature:

    traverse :: Applictive f => (a -> f b) -> t a -> f (t b) 

The traversable functor itself here is t. The f thing is an appurtenance. Often one looks at the type of some function and says “Oh, that's what that does”, but I did not get any understanding from this signature.

The first thing to try here is to make it less abstract. I was thinking about Traversable this time because I thought I might want it for a certain type of tree structure I was working with. So I defined an even simpler tree structure:

    data Tree a = Con a | Add (Tree a) (Tree a)
        deriving (Eq, Show)

Defining a bunch of other cases wouldn't add anything to my understanding, and it would make it take longer to try stuff, so I really want to use the simplest possible example here. And this is it: one base case, one recursive case.

Then I tried to make this type it into a Traversable instance. First we need it to be a Functor, which is totally straightforward:

    instance Functor Tree where
        fmap f (Con a) = Con (f a)
        fmap f (Add x y) = Add (fmap f x) (fmap f y)

Then we need it to be a Foldable, which means it needs to provide a version of foldr. The old-fashioned foldr was

    foldr :: (a -> b -> b) -> b -> [a] -> b

but these days the list functor in the third place has been generalized:

    foldr :: Foldable f => (a -> b -> b) -> b -> f a -> b

The idea is that foldr fn collapses a list of as into a single b value by feeding in the as one at a time. Each time, foldr takes the previous b and the current a and constructs a new b. The second argument is the initial value of b. Another way to think about it is that every list has the form

    e1 : e2 : .... : []

and foldr fn b applied to this list replaces the (:) calls with fn and the trailing [] with b, giving me

    e1 `f` e2 `f` .... `f` b

The canonical examples for lists are:

    sum = foldr (+) 0

(add up the elements, starting with zero) and

    length = foldr (\_ -> (+ 1)) 0

(ignore the elements, adding 1 to the total each time, starting with zero). Also foldr (:) [] is the identity function for lists because it replaces the (:) calls with (:) and the trailing [] with [].

Anyway for Tree it looks like this:

   instance Foldable Tree where
        foldr f b (Con a) = f a b
        foldr f b (Add x y) = (foldr f) (foldr f b x) y

The Con clause says to take the constant value and combine it with the default total. The Add clause says to first fold up the left-side subtree x to a single value, then use that as the initial value for folding up the right-side subtree y, so everything gets all folded up together. (We could of course do the right subtree before the left; the results would be different but just as good.)

I didn't write this off the top of my head, I got it by following the types, like this:

  1. In the first clause

        foldr f b (Con a) = ???
    

    we have a function f that wants an a value and a b value, and we have both an a and a b, so put the tabs in the slots.

  2. In the second clause

        foldr f b (Add x y) = ???
    

    f needs an a value and none is available, so we can't use f by itself. We can only use it recursively via foldr. So forget f, we will only be dealing only with foldr f, which has type b -> Tree a -> b. We need to apply this to a b value and the only one we have is b, and then we need to apply that to one of the subtrees, say x, and thus we have synthesized the foldr f b x subexpression. Then pretty much the same process gets us the rest of it: we need a b and the only one we have now is foldr f b x, and then we need another tree and the only one we haven't used is y.

It turns out it is easier and more straightforward to write foldMap instead, but I didn't know that at the time. I won't go into it further because I have already digressed enough. The preliminaries are done, we can finally get on to the thing I wanted, the Traversable:

    instance Traversable Tree where
      traverse = ....

and here I was stumped. What is this supposed to actually do? For our Tree functor it has this signature:

    traverse :: Applicative f => (a -> f b) -> Tree a -> f (Tree b) 

Okay, a function a -> f b I understand, it turns each tree leaf value into a list or something, so at each point of the tree it gets out a list of bs, and it potentially has one of those for each item in the input tree. But how the hell do I turn a tree of lists into a single list of Tree b? (The answer is that the secret sauce is in the Applicative, but I didn't understand that yet.)

I scratched my head and read a bunch of different explanations and none of them helped. All the descriptions I found were in either prose or mathematics and I still couldn't figure out what it was for. Finally I just wrote a bunch of examples and at last the light came on. I'm going to show you the examples and maybe the light will come on for you too.

We need two Traversable functors to use as examples. We don't have a Traversable implementation for Tree yet so we can't use that. When I think of functors, the first two I always think of are List and Maybe, so we'll use those.

    > traverse (\n -> [1..n]) Nothing
    [Nothing]
    > traverse (\n -> [1..n]) (Just 3)
    [Just 1,Just 2,Just 3]

Okay, I think I could have guessed that just from the types. And going the other way is not very interesting because the output, being a Maybe, does not have that much information in it.

    > let f x = if even x then Just (x `div` 2) else Nothing

If the !!x!! is even then the result is just half of !!x!!, and otherwise the division by 2 “fails” and the result is nothing. Now:

    > traverse f [ 1, 2, 3, 4 ]
    Nothing
    > traverse f [ 10, 4, 18 ]
    Just [5,2,9]

It took me a few examples to figure out what was going on here: When all the list elements are even, the result is Just a list of half of each. But if any of the elements is odd, that spoils the whole result and we get Nothing. (traverse f [] is Just [] as one would expect.)

That pretty much exhausts what can be done with lists and maybes. Now I have two choices about where to go next: I could try making both functors List, or I could use a different functor entirely. (Making both Maybe seemed like a nonstarter.) Using List twice seemed confusing, and when I tried it I could kinda see what it was doing but I didn't understand why. So I took a third choice: I worked up a Traversable instance for Tree just by following the types even though I didn't understand what it ought to be doing. I thought I'd at least see if I could get the easy clause:

    traverse :: Applicative f => (a -> f b) -> Tree a -> f (Tree b) 

    instance Traversable Tree where
      traverse fn (Con a) = ...

In the ... I have fn :: a -> f b and I have at hand a single a. I need to construct a Tree b. The only way to get a b is to apply fn to it, but this gets me an f b and I need f (Tree b). How do I get the Tree in there? Well, that's what Con is for, getting Tree in there, it turns a t into Tree t. But how do I do that inside of f? I tinkered around a little bit and eventually found

  traverse fn (Con a) = Con <$> (fn a)

which not only type checks but looks like it could even be correct. So now I have a motto for what <$> is about: if I have some function, but I want to use it inside of some applicative functor f, I can apply it with <$> instead of with $.

Which, now that I have said it myself, I realize it is exactly what everyone else was trying to tell me all along: normal function application takes an a -> b and applies to to an a giving a b. Applicative application takes an f (a -> b) and applies it to an f a giving an f b. That's what applicative functors are all about, doing stuff inside of f.

Okay, I can listen all day to an explanation of what an electric drill does, but until I hold it in my hand and drill some holes I don't really understand.

Encouraged, I tried the hard clause:

  traverse fn (Add x y) = ...

and this time I had a roadmap to follow:

  traverse fn (Add x y) = Add <$> ...

The Con clause had fn a at that point to produce an f b but that won't work here because we don't have an a, we have a whole Tree a, and we don't need an f b, we need an f (Tree b). Oh, no problem, traverse fn supposedly turns a Tree a into an f (Tree b), which is just what we want. And it makes sense to have a recursive call to traverse because this is the recursive part of the recursive data structure:

  traverse fn (Add x y) = Add <$> (traverse fn x) ...

Clearly traverse fn y is going to have to get in there somehow, and since the pattern for all the applicative functor stuff is

  f <$> ... <*> ... <*> ...

let's try that:

  traverse fn (Add x y) = Add <$> (traverse fn x) <*> (traverse fn y)

This looks plausible. It compiles, so it must be doing something. Partial victory! But what is it doing? We can run it and see, which was the whole point of an exercise: work up a Traversable instance for Tree so that I can figure out what Traversable is about.

Here are some example trees:

 t1 = Con 3                              -- 3
 t2 = Add (Con 3) (Con 4)                -- 3 + 4
 t3 = Add (Add (Con 3) (Con 4)) (Con 2)  -- (3 + 4) + 2

(I also tried Add (Con 3) (Add (Con 4) (Con 2)) but it did not contribute any new insights so I will leave it out of this article.)

First we'll try Maybe. We still have that f function from before:

    f x = if even x then Just (x `div` 2) else Nothing

but traverse f t1, traverse f t2, and traverse f t3 only produce Nothing, presumably because of the odd numbers in the trees. One odd number spoils the whole thing, just like in a list.

So try:

    traverse f (Add (Add (Con 10) (Con 4)) (Con 18))

which yields:

          Just (Add (Add (Con 5) (Con 2)) (Con 9))

It keeps the existing structure, and applies f at each value point, just like fmap, except that if f ever returns Nothing the whole computation is spoiled and we get Nothing. This is just like what traverse f was doing on lists.

But where does that spoilage behavior come from exactly? It comes from the overloaded behavior of <*> in the Applicative instance of Maybe:

 (Just f) <*> (Just x) = Just (f x)
 Nothing  <*> _        = Nothing
       _  <*> Nothing  = Nothing

Once we get a Nothing in there at any point, the Nothing takes over and we can't get rid of it again.

I think that's one way to think of traverse: it transforms each value in some container, just like fmap, except that where fmap makes all its transformations independently, and reassembles the exact same structure, with traverse the reassembly is done with the special Applicative semantics. For Maybe that means “oh, and if at any point you get Nothing, just give up”.

Now let's try the next-simplest Applicative, which is List. Say,

    g n = [ 1 .. n ]

Now traverse g (Con 3) is [Con 1,Con 2,Con 3] which is not exactly a surprise but traverse g (Add (Con 3) (Con 4)) is something that required thinking about:

    [Add (Con 1) (Con 1),
     Add (Con 1) (Con 2),
     Add (Con 1) (Con 3),
     Add (Con 1) (Con 4),
     Add (Con 2) (Con 1),
     Add (Con 2) (Con 2),
     Add (Con 2) (Con 3),
     Add (Con 2) (Con 4),
     Add (Con 3) (Con 1),
     Add (Con 3) (Con 2),
     Add (Con 3) (Con 3),
     Add (Con 3) (Con 4)]

This is where the light finally went on for me. Instead of thinking of lists as lists, I should be thinking of them as choices. A list like [ "soup", "salad" ] means that I can choose soup or salad, but not both. A function g :: a -> [b] says, in restaurant a, what bs are on the menu.

The g function says what is on the menu at each node. If a node has the number 4, I am allowed to choose any of [1,2,3,4], but if it has the number 3 then the choice 4 is off the menu and I can choose only from [1,2,3].

Traversing g over a Tree means, at each leaf, I am handed a menu, and I make a choice for what goes at that leaf. Then the result of traverse g is a complete menu of all the possible complete trees I could construct.

Now I finally understand how the t and the f switch places in

    traverse :: Applicative f => (a -> f b) -> t a -> f (t b) 

I asked “how the hell do I turn a tree of lists into a single list of Tree b”? And that's the answer: each list is a local menu of dishes available at one leaf, and the result list is the global menu of the complete dinners available over the entire tree.

Okay! And indeed traverse g (Add (Add (Con 3) (Con 4)) (Con 2)) has 24 items, starting

      Add (Add (Con 1) (Con 1)) (Con 1)
      Add (Add (Con 1) (Con 1)) (Con 2)
      Add (Add (Con 1) (Con 2)) (Con 1)
      ...

and ending

      Add (Add (Con 3) (Con 4)) (Con 1)
      Add (Add (Con 3) (Con 4)) (Con 2)

That was traversing a list function over a Tree. What if I go the other way? I would need an Applicative instance for Tree and I didn't really understand Applicative yet so that wasn't going to happen for a while. I know I can't really understand Traversable without understanding Applicative first but I wanted to postpone the day of reckoning as long as possible.

What other functors do I know? One easy one is the functor that takes type a and turns it into type (String, a). Haskell even has a built-in Applicative instance for this, so I tried it:

     > traverse (\x -> ("foo", x)) [1..3]
     ("foofoofoo",[1,2,3])                     
     > traverse (\x -> ("foo", x*x)) [1,5,2,3]
     ("foofoofoofoo",[1,25,4,9])

Huh, I don't know what I was expecting but I think that wouldn't have been it. But I figured out what was going on: the built-in Applicative instance for the a -> (String, a) functor just concatenates the strings. In general it is defined on a -> (m, b) whenever m is a monoid, and it does fmap on the right component and uses monoid concatenation on the left component. So I can use integers instead of strings, and it will add the integers instead of concatenating the strings. Except no, it won't, because there are several ways to make integers into a monoid, but each type can only have one kind of Monoid operations, and if one was wired in it might not be the one I want. So instead they define a bunch of types that are all integers in obvious disguises, just labels stuck on them that say “I am not an integer, I am a duck”; “I am not an integer, I am a potato”. Then they define different overloadings for “ducks” and “potatoes”. Then if I want the integers to get added up I can put duck labels on my integers and if I want them to be multiplied I can stick potato labels on instead. It looks like this:

   import Data.Monoid
   h n = (Sum 1, n*10)

Sum is the duck label. When it needs to combine two ducks, it will add the integers:

   > traverse h [5,29,83]
   (Sum {getSum = 3},[50,290,830]) 

But if we wanted it to multiply instead we could use the potato label, which is called Data.Monoid.Product:

    > traverse (\n -> (Data.Monoid.Product 7, 10*n)) [5,29,83]
    (Product {getProduct = 343}, [50,290,830])                                                                                        

There are three leaves, so we multiply three sevens and get 343.

Or we could do the same sort of thing on a Tree:

    > traverse (\n -> (Data.Monoid.Product n, 10*n)) (Add (Con 2) (Add (Con 3) (Con 4)))
    (Product {getProduct = 24}, Add (Con 20) (Add (Con 30) (Con 40)))               

Here instead of multiplying together a bunch of sevens we multiply together the leaf values themselves.

The McBride and Paterson paper spends a couple of pages talking about traversals over monoids, and when I saw the example above it started to make more sense to me. And their ZipList example became clearer too. Remember when we had a function that gave us a menu at every leaf of a tree, and traverse-ing that function over a tree gave us a menu of possible trees?

       > traverse (\n -> [1,n,n*n]) (Add (Con 2) (Con 3))
       [Add (Con 1) (Con 1),
        Add (Con 1) (Con 3),
        Add (Con 1) (Con 9),
        Add (Con 2) (Con 1),
        Add (Con 2) (Con 3),
        Add (Con 2) (Con 9),
        Add (Con 4) (Con 1),
        Add (Con 4) (Con 3),
        Add (Con 4) (Con 9)]

There's another useful way to traverse a list function. Instead of taking each choice at each leaf we make a single choice ahead of time about whether we'll take the first, second, or third menu item, and then we take that item every time:

    > traverse (\n -> Control.Applicative.ZipList [1,n,n*n]) (Add (Con 2) (Con 3))
    ZipList {getZipList = [Add (Con 1) (Con 1),
                           Add (Con 2) (Con 3),
                           Add (Con 4) (Con 9)]}

There's a built-in instance for Either a b also. It's a lot like Maybe. Right is like Just and Left is like Nothing. If all the sub-results are Right y then it rebuilds the structure with all the ys and gives back Right (structure). But if any of the sub-results is Left x then the computation is spoiled and it gives back the first Left x. For example:

 > traverse (\x -> if even x then Left (x `div` 2) else Right (x * 10)) [3,17,23,9]
 Right [30,170,230,90]                
 > traverse (\x -> if even x then Left (x `div` 2) else Right (x * 10)) [3,17,22,9]
 Left 11

Okay, I think I got it.

Now I just have to drill some more holes.


[Other articles in category /prog/haskell] permanent link

Tue, 16 Oct 2018

I redesign the LA Times’ Hurricane Maria chart

This could have been a great chart, but I think it has a big problem:

The chart plots
“monthly deaths in Puerto Rico” month by month for the year 2017 (a
bold red line) and for comparison, the historical average and range.
The 2017 counts are similar to the historical counts for January
through August.  In August and September they increase rapidly to a
high of 3,000 in October, and then fall back to normal by November
2017.  December 2017 is missing.  A note explains: “Hurricane Maria
struck Sept. 20, 2017”.

It appears that the death toll started increasing in early August, even though the hurricane didn't hit until 20 September. According to this chart, the hurricane was shortly followed by a drastic decrease in the death rate.

What's actually going on is that the August value is a total for all of August and is typically low, the September value is a total for all of September and is atypically high, and the chart designer has drawn a straight line between the August and September values, implying a linear increase over the course of August. The data for August is at the mark “A” on the chart, which seems reasonable, except that one has to understand that “A” as marking the end of August instead of the beginning, which is the opposite of the usual convention.

I think a bar chart would have been a better choice here. The lines imply continuous streams of data, but the reality is that each line represents only twelve data points. Maybe something like this instead?

Same data, same colors, but each misleading continuous line has
been replaced by twelve discrete bars.

I'm not sure the historical range bars are really adding much.

The same, but
with only the historical average and the 2017 data.

If I were designing this from scratch I think I might replace the blue bars with brackets (although maybe the LA Times knows that their readership finds those confusing?). Or maybe plot the difference between the 2017 data and ths historical average. But I think you get the point.


[Other articles in category /IT] permanent link

Mon, 15 Oct 2018

'The' reader monad does not exist

Reading over my recent article complaining about the environment functor I realized there's yet another terminology problem that makes the discussion unnecessarily confusing. “The” environment functor isn't unique. There is a family of environment functors, one for each possible environment type e. If g is the environment functor at type e, a value of type g t is a function e → t. But e could be anything and if g and h are environment functors at two different types e and e’ they are of course different functors.

This is even obvious from the definition:

    data Environ e t = Env (e -> t)
    instance Functor (Environ e) where
      fmap f (Env x) = Env $ \e -> f (x e)

The functor isn't Environ, it's Environ e, and the functor instance declaration, as it says on line 2. (It seems to me that the notation is missing a universal quantifier somewhere, but I'm not going to open that issue.)

We should speak of Environ e as an environment functor, not the environment functor. So for example instead of:

When operating in the environment functor, fmap has the type (a -> b) -> g a -> g b

I should have said:

When operating in an environment functor, fmap has the type (a -> b) -> g a -> g b

And instead of:

A function p -> q is a q parcel in the environment functor

I should have said:

A function p -> q is a q parcel in an environment functor

or

A function p -> q is a q parcel in the environment functor at p

although I'm not sure I like the way the prepositions are proliferating there.

The same issue affects ⸢the⸣ reader monad, ⸢the⸣ state monad, and many others.

I'm beginning to find remarkable how much basic terminology Haskell is missing or gets wrong. Mathematicians have a very keen appreciation of the importance of specific and precise terminology, and you'd think this would have filtered into the Haskell world. People are forever complaining that Haskell uses unfamiliar terms like “functor”, and the community's response is (properly, I think) that these terms are pre-existing and there is no point to inventing a new term that will be just as unfamiliar, or, worse, lure people into thinking that the know what it means when they don't. You don't want to call a functor a “container”, says the argument, because many functors (environment functors for example) are nothing at all like containers. I think this is wise.

But having planted their flag on that hill, the Haskell folks don't then use their own terminology correctly. I complained years ago that the term “monad” was used interchangeably for four subtly different concepts, and here we actually have a fifth. I pointed out that in the case of Environment e t, common usage refers to both Environment e and Environment e t as monads, and only the first is correct. But when people say “the environment monad” they mean that Environment itself is a monad, which it is not.


[Other articles in category /prog/haskell] permanent link

Fri, 12 Oct 2018

Parcels and motes

I said recently:

Is there any good terminology for a value of type f a when f is an arbitrary functor? I will try calling an f t value a “t parcel” and see how that works.

The more I think about “parcel” the happier I am with it. It strongly suggests container types, of course, so that a t parcel might be a boxful of ts. But it also hints at some other possible situations:

  • You might open the parcel and find it empty. (Maybe t)
  • You might open the parcel and find, instead of the t you expected, a surprising prank snake. (Either ErrorMessage t)
  • You might open the parcel and find that your t has been shipped with assembly required. (env -> t)
  • The parcel might explode when you open it. (IO t)
  • And, of course, a burrito is a sort of parcel of meat and beans.

I coined “parcel” thinking that one would want different terminology for values of type f t depending on whether f was a functor (“parcel”) or also a monad (“mote”). Of course every mote is a parcel, but not always vice versa. Now I'm not sure that both terms are needed. Non-monadic functors are unusual, and non-applicative functors rare, so perhaps one term will do for all three.


[Other articles in category /prog/haskell] permanent link

Thu, 11 Oct 2018

I hate the environment functor

Here we have the well-known fmap function:

    fmap :: Functor f => (a -> b) -> f a -> f b

It takes a single function and a (collection of input values / decorated input value / something something input value) and produces a (collection of output values / decorated output value / something something output value).

Yow, that's not going to work. Is there any good terminology for a value of type f a when f is an arbitrary functor? A while back I discussed a similar problem and suggested the term “mote” for a value in a monadic type. I will try calling an f t value a “t parcel and see how that works. So [t], Maybe t, and IO t are all examples of t parcels, in various functors.

Starting over then. Here we have the well-known fmap function:

    fmap :: Functor f => (a -> b) -> f a -> f b

It takes a single function, and an a parcel, and produces a b parcel, by applying the function independently to the a values in the parcel.

Here is a sort of reversed version of fmap that I call pamf:

    pamf :: Functor f => f (a -> b) -> a -> f b

It takes a parcel of functions, and a single input and produces a parcel of outputs, by applying each function in the parcel independently to the single a value. It can be defined in terms of fmap:

    pamf fs a = fmap ($ a) fs

So far so good. Now I ask you to predict the type of

    pamf fmap

Certainly it should start out with

    pamf fmap :: (Functor f, Functor g) => ...

because the pamf and the fmap might be operating in two different functors, right? Indeed, if I compose the functions the other way around, fmap pamf, the type does begin this way; it is:

    (Functor f, Functor g) => f (g (a -> b)) -> f (a -> g b)

The f here is the functor in which fmap operates, and the g is the functor in which pamf is operating. In general fmap takes an arbitrary function

              a       ->      b

and lifts it to a new function that operates in the f functor:

            f a       ->    f b

Here it has taken pamf, which is a function

          g (a -> b)  ->     (a -> g b)

and lifted it to a new function that operates in the f functor:

       f (g (a -> b))  ->  f (a -> g b)

This is complicated but straightforward. Okay, that was fmap pamf. What about pamf fmap though? The computed type is

        pamf fmap :: Functor f => f a -> (a -> b) -> f b

and when I saw this I said “What. Where did g go? What happened to g?”

Then I paused and for a while and said “… I bet it's that goddamn environment thing again.” Yep, that's what it was. It's the environment functor, always turning up where I don't want it and least expect it, like that one guy we all went to college with. The environment functor, by the way, is yet another one of those things that Haskell ought to have a standard name for, but doesn't. The phrase “the reader monad” is fairly common, but here I only want the functor part of the monad. And people variously say “reader monad”, “environment monad”, and “evaluation monad” to mean the same thing. In this article, it will be the environment functor.

Here's what happened. Here are fmap and pamf again:

    fmap :: Functor f => (p -> q) -> f p -> f q

    pamf :: Functor g => g (a -> b) -> a -> g b

The first argument to pamf should be a parcel in the g functor. But fmap is not a parcel, so pamf fmap will be a type error, right? Wrong! If you are committed enough, there is a way to construe any function as a parcel. A function p -> q is a q parcel in the environment functor. Say that g denotes an environment functor. In this functor, a parcel of type g t is a function which consults an “environment” of type e and yields a result of type t. That is, $$g t \equiv e \to t.$$

When operating in the environment functor, fmap has the type (a -> b) -> g a -> g b, which is shorthand for (a -> b) -> (e -> a) -> (e -> b). This instance of fmap is defined this way:

    fmap f x = \e -> f (x e)

or shorter and more mysteriously

    fmap = (.)

which follows by η-reduction, something Haskell enthusiasts never seem to get enough of.

In fmap f x, the x isn't the actual value to give to f; instead it's a parcel, as it always is with fmap. In the context of the environment functor, x is a function that consults the environment e and returns an a. The result of fmap f x is a new parcel: it uses x to consult the supplied environment for a value of type a, which it then feeds to f to get the required value of type b.

In the application pamf fmap, the left side pamf wants fmap to be a parcel. But it's not a parcel, it's a function. So, type error, right? No! Any function is a parcel if you want it to be, it's a parcel in the environment functor! And fmap is a function:

    fmap :: Functor f => (p -> q) -> f p -> f q

so it can be understood as a parcel in the environment functor, where the environment e has type p -> q. Then pamf is operating in this environment functor, so $$g t = (p \to q) \to t.$$ A g t parcel is a function that consults an “environment” of type p -> q and somehow produces a t value. (Haskell folks, who are obsessed with currying all the things, will write this as the nearly unreadable g = ((->) (p -> q)).)

We wanted pamf to have this type:

    pamf :: Functor g =>            g (a -> b)  -> a ->            g b

and since Haskell has decided that g must be the environment functor with !!g x \equiv (p \to q) \to x!!, this is an abbreviation for:

    pamf ::              ((p -> q) -> (a -> b)) -> a -> ((p -> q) -> b)

To apply this to fmap, we have to unify the type of pamf's argument, which is (p -> q) -> (a -> b), and the type of fmap, which is (p -> q) -> (f p -> f q). Then !!a\equiv f p!! and !!b \equiv f q!!, so the result of pamf fmap is

    pamf fmap :: Functor f => f p -> ((p -> q) -> f q)

Where did g go? It was specialized to mean the environment functor ((->) (p -> q)), so it's gone.

The funny thing about the type of pamf fmap is that it is exactly the type of flip fmap, which is fmap with the order of its two arguments reversed:

   (flip fmap) x f ≡ fmap f x

and indeed, by some theorem or other, because the types are identical, the functions themselves must be identical also! (There are some side conditions, all of which hold here.) The two functions pamf fmap and flip fmap are identical. Analogous to the way fmap, restricted to the environment functor, is identical to (.), pamf, when similarly restricted, is exactly flip. You can even see this from its type:

    pamf :: ((p -> q) -> (a -> b)) -> a -> ((p -> q) -> b)

Or, cleaning up some superfluous parentheses and inserting some new ones:

    pamf :: ((p -> q) ->  a -> b) -> (a ->  (p -> q) -> b)

And putting !!c = p\to q!!:

    pamf :: (c        -> a -> b) -> (a -> c        -> b)
    flip :: (                 the same                 )

Honestly, I would have preferred a type error: “Hey, dummy, fmap has the wrong type to be an argument to pamf, which wants a functorial value.” Instead I got “Okay, if you want functions to be a kind of functor I can do that, also wouldn't it be simpler if the universe was two-dimensional and there were only three kinds of quarks? Here you go, no need to thank me!” Maybe someone can explain to me why this is a useful behavior, and then explain why it is so useful that it should happen automatically and implicitly instead of being triggered by some lexical marker like:

    newtype Environment e a = Environment (e -> a)
    instance Functor (Environment e) where
      fmap f (Environment x) = Environment $ \e -> f (x e)

I mean, seriously, suppose you wrote a + b where b was accidentally a function instead of a number. What if when you did that, instead of a type error, Haskell would silently shift into some restricted domain in which it could implicitly interpret b as a number in some weird way and give you something totally bizarre? Isn't the whole point of Haskell supposed to be that it doesn't implicitly convert things that way?


[Other articles in category /prog/haskell] permanent link

Wed, 10 Oct 2018

A minecraft mapmaking utility

Minecraft makes maps in-game, and until now I would hold up the map in the game, take a screenshot, print it out, and annotate the printout.

Now I can do better. I have a utility that takes multiple the map files direcly from the Minecraft data directory and joins them together into a single PNG image that I can then do whatever with. Here's a sample, made by automatically processing the 124(!) maps that exist in my current Minecraft world.

Notice how different part of the result are at different resolutions. That is because Minecraft maps are in five different resolutions, from !!2^{14}!! to !!2^{22}!! square meters in area, and these have been stitched together properly. My current base of operations is a little comma-shaped thingy about halfway down in the middle of the ocean, and on my largest ocean map it is completely invisible. But here it has been interpolated into the correct tiny scrap of that big blue map.

Bonus: If your in-game map is lost or destroyed, the data file still hangs around, and you can recover the map from it.

Source code is on Github. It needs some work:

  • You have to manually decompress the map files
  • It doesn't show banners or other markers
  • I got bored typing in the color table and didn't finish
  • Invoking it is kind of heinous
   ./dumpmap  $(./sortbyscale map_*) | pnmtopng > result.png 

Patches welcome!


[Other articles in category /games] permanent link

Mon, 08 Oct 2018

Notes on using git-replace to get rid of giant objects

A couple of years ago someone accidentally committed a 350 megabyte file to our Git repository. Now it's baked in. I wanted to get rid of it. I thought that I might be able to work out a partial but lightweight solution using git-replace.

Summary: It didn't work.

Details

In 2016 a programmer commited a 350 megabyte file to my employer's repo, then in the following commit they removed it again. Of course it's still in there, because someone might check out the one commit where it existed. Everyone who clones the repo gets a copy of the big file. Every copy of the repo takes up an extra 350 megabytes on disk.

The usual way to fix this is onerous:

  1. Use git-filter-branch to rebuild all the repository history after the bad commit.

  2. Update all the existing refs to point to the analogous rebuilt objects.

  3. Get everyone in the company to update all the refs in their local copies of the repo.

I thought I'd tinker around with git-replace to see if there was some way around this, maybe something that someone could do locally on their own repo without requiring everyone else to go along with it.

The git-replace command annotates the Git repository to say that whenever object A is wanted, object B should be used instead. Say that the 350 MB file has an ID of ffff9999ffff9999ffff9999ffff9999ffff9999. I can create a small file that says

 This is a replacement object.  It replaces a very large file
 that was committed by mistake.  To see the commit as it really
 was, use

      git --no-replace-objects show 183a5c7e90b2d4f6183a5c7e90b2d4f6183a5c7e
      git --no-replace-objects checkout 183a5c7e90b2d4f6183a5c7e90b2d4f6183a5c7e

 or similarly.  To see the file itself, use

      git --no-replace-objects show ffff9999ffff9999ffff9999ffff9999ffff9999

I can turn this small file into an object with git-add; say the new small object has ID 1111333311113333111133331111333311113333. I then run:

git replace ffff9999ffff9999ffff9999ffff9999ffff9999 1111333311113333111133331111333311113333

This creates .git/refs/replace/ffff9999ffff9999ffff9999ffff9999ffff9999, which contains the text 1111333311113333111133331111333311113333. thenceforward, any Git command that tries to access the original object ffff9999 will silently behave as if it were 11113333 instead. For example, git show 183a5c7e will show the diff between that commit and the previous, as if the user had committed my small file back in 2016 instead of their large one. And checking out that commit will check out the small file instead of the large one.

So far this doesn't help much. The checkout is smaller, but nobody was likely to have that commit checked out anyway. The large file is still in the repository, and clones and transfers still clone and transfer it.

The first thing I tried was a wan hope: will git gc discard the replaced object? No, of course not. The ref in refs/replace/ counts as a reference to it, and it will never be garbage-collected. If it had been, you would no longer be able to examine it with the --no-replace-objects commands. So much for following the rules!

Now comes the hacking part: I am going to destroy the actual object. Say for example, what if:

cp /dev/null .git/objects/ff/ff9999ffff9999ffff9999ffff9999ffff9999

Now the repository is smaller! And maybe Git won't notice, as long as I do not use --no-replace-objects?

Indeed, much normal Git usage doesn't notice. For example, I can make new commits with no trouble, and of course any other operation that doesn't go back as far as 2016 doesn't notice the change. And git-log works just fine even past the bad commit; it only looks at the replacement object and never notices that the bad object is missing.

But some things become wonky. You get an error message when you clone the repo because an object is missing. The replacement refs are local to the repo, and don't get cloned, so clone doesn't know to use the replacement object anyway. In the clone, you can use git replace -f .... to reinstate the replacement, and then all is well unless something tries to look at the missing object. So maybe a user could apply this hack on their own local copy if they are willing to tolerate a little wonkiness…?

No. Unfortunately, there is a show-stopper: git-gc no longer works in either the parent repo or in the clone:

fatal: unable to read ffff9999ffff9999ffff9999ffff9999ffff9999
error: failed to run repack

and it doesn't create the pack files. It dies, and leaves behind a .git/objects/pack/tmp_pack_XxXxXx that has to be cleaned up by hand.

I think I've reached the end of this road. Oh well, it was worth a look.

[ Addendum 20181009: A lot of people have unfortunately missed the point of this article, and have suggested that I use BFG or reposurgeon. I have a small problem and a large problem. The small problem is how to remove some files from the repository. This is straightforward, and the tools mentioned will help with it. But because of the way Git works, the result is effectively a new repository. The tools will not help with the much larger problem I would have then: How to get 350 developers to migrate to the new repository at the same time. The approach I investigated in this article was an attempt to work around this second, much larger problem. ]


[Other articles in category /prog] permanent link

Sun, 07 Oct 2018

Ridge and valley

Last week we visited Toph's school and I saw a big map of Pennsylvania. I was struck by something I had never noticed before. Here's a detail of central Pennsylvania, an area roughly 100 miles (160 km) square:

I think I described it pretty
well below, but if you have suggestions for how to improve the
description, please let me know.  mjd@plover.com

As you can see, a great many lines on this map start in the lower left, proceed in roughly parallel tracks north-by-northeast, then bend to the right, ending in the upper right corner. Many of the lines represent roads, but some represent county borders, and some represent natural features such as forests and rivers.

When something like this happens, it is almost always for some topographic reason. For example, here's a map of two adjacent communities in Los Angeles:

Described below in main text.

You can see just from the pattern of the streets that these two places must have different topography. The streets in Santa Monica are a nice rectangular grid. The streets in Pacific Palisades are in groups of concentric squiggles. This is because Pacific Palisades is hilly, and the roads go in level curves around the hills, whereas Santa Monica is as flat as a squid.

I had never noticed before that all the lines in central Pennsylvania go in the same direction. From looking at the map, one might guess that central Pennsylvania was folded into many long parallel wrinkles, and that the roads, forests, and rivers mostly lay in the valleys between the wrinkles. This is indeed the case. This part of Pennsylvania is part of the Ridge-and-Valley Appalachians, and the pattern of ridges and valleys extends far beyond Pennsylvania.

The wrinkling occurred over a long period, ending around 260 million years ago, in an event called the Alleghanian orogeny. North America and Africa (then part of the supercontinents Euramerica and Gondwana, respectively) collided with one another, and that whole part of North America crumpled up like a sheet of aluminum foil. People in Harrisburg and State College are living in the crumples.


[Other articles in category /misc] permanent link

Mon, 24 Sep 2018

Zipfian regression

A long time ago, I wrote up a blog article about how to derive the linear regression formulas fro first principles. Then I decided it was not of general interest, so I didn't publish it. (Sometime later I posted it to math stack exchange, so the effort wasn't wasted.)

The basic idea is, you have some points !!(x_i, y_i)!!, and you assume that they can be approximated by a line !!y=mx+b!!. You let the error be a function of !!m!! and !!b!!: $$\varepsilon(m, b) = \sum (mx_i + b - y_i)^2$$ and you use basic calculus to find !!m!! and !!b!! for which !!\varepsilon!! is minimal. Bing bang boom.

I knew this for a long time but it didn't occur to me until a few months ago that you could use basically the same technique to fit any other sort of curve. For example, suppose you think your data is not a line but a parabola of the type !!y=ax^2+bx+c!!. Then let the error be a function of !!a, b, !! and !!c!!:

$$\varepsilon(a,b,c) = \sum (ax_i^2 + bx_i + c - y_i)^2$$

and again minimize !!\varepsilon!!. You can even get a closed form as you can with ordinary linear regression.

I especially wanted to try fitting hyperbolas to data that I expected to have a Zipfian distribution. For example, take the hundred most popular names for girl babies in Illinois in 2017. Is there a simple formula which, given an ordinal number like 27, tells us approximately how many girls were given the 27th most popular name that year? (“Scarlett”? Seriously?)

I first tried fitting a hyperbola of the form !!y = c + \frac ax!!. We could, of course, take !!y_i' = \frac 1{y_i}!! and then try to fit a line to the points !!\langle x_i, y_i'\rangle!! instead. But this will distort the measurement of the error. It will tolerate gross errors in the points with large !!y!!-coordinates, and it will be extremely intolerant of errors in points close to the !!x!!-axis. This may not be what we want, and it wasn't what I wanted. So I went ahead and figured out the Zipfian regression formulas:

$$ \begin{align} a & = \frac{HY-NQ}D \\ c & = \frac{HQ-JY}D \end{align} $$

Where:

$$\begin{align} H & = \sum x_i^{-1} \\ J & = \sum x_i^{-2} \\ N & = \sum 1\\ Q & = \sum y_ix_i^{-1} \\ Y & = \sum y_i \\ D & = H^2 - NJ \end{align} $$

When I tried to fit this to some known hyperbolic data, it worked just fine. For example, given the four points !!\langle1, 1\rangle, \langle2, 0.5\rangle, \langle3, 0.333\rangle, \langle4, 0.25\rangle!!, it produces the hyperbola $$y = \frac{1.00018461538462}{x} - 0.000179487179486797.$$ This is close enough to !!y=\frac1x!! to confirm that the formulas work; the slight error in the coefficients is because we used !!\bigl\langle3, \frac{333}{1000}\bigr\rangle!! rather than !!\bigl\langle3, \frac13\bigr\rangle!!.

Unfortunately these formulas don't work for the Illinois baby data. Or rather, the hyperbola fits very badly. The regression produces !!y = \frac{892.765272442475}{x} + 182.128894972025:!!

A graph showing
the actual data (as blue dots) and the best-fit hyperbola (as a red
line).  They do not match well.  The hyperbola drops down much too
quickly, then makes a sharp turn to catch up with the right tail it
has overshot. It levels off at 182.12, instead of at around 100.

I think maybe I need to be using some hyperbola with more parameters, maybe something like !!y = \frac a{x-b} + c!!.

In the meantime, here's a trivial script for fitting !!y = \frac ax + c!! hyperbolas to your data:

while (<>) {
  chomp;
  my ($x, $y) = split;
  ($x, $y) = ($., $x) if not defined $y;
  $H += 1/$x;
  $J += 1/($x*$x);
  $N += 1;
  $Q += $y/$x;
  $Y += $y;
}

my $D = $H*$H - $J*$N;
my $c = ($Q*$H - $J*$Y)/$D;
my $a = ($Y*$H - $Q*$N)/$D;

print "y = $a / x + $c\n";

[ Addendum 20180925: Shreevatsa R. asked a related question on StackOverflow and summarized the answers. The problem is more complex than it might first appear. Check it out. ]


[Other articles in category /math] permanent link

Fri, 14 Sep 2018

How not to remember the prime numbers under 1,000

A while back I said I wanted to memorize all the prime numbers under 1,000, because I am tired of getting some number like 851 or 857, or even 307, and then not knowing whether it is prime.

The straightforward way to deal with this is: just memorize the list. There are only 168 of them, and I have the first 25 or so memorized anyway.

But I had a different idea also. Say that a set of numbers from !!10n!! to !!10n+9!! is a “decade”. Each decade contains at most 4 primes, so 4 bits are enough to describe the primes in a single decade. Assign a consonant to each of the 16 possible patterns, say “b” when none of the four numbers is a prime, “d” when only !!10n+1!! is prime, “f” when only !!10+3!! is, and so on.

Now memorizing the primes in the 90 decades is reduced to memorizing 90 consonants. Inserting vowels wherever convenient, we have now turned the problem into one of memorizing around 45 words. A word like “potato” would encode the constellation of primes in three consecutive decades. 45 words is only a few sentences, so perhaps we could reduce the list of primes to a short and easily-remembered paragraph. If so, memorizing a few sentences should be much easier than memorizing the original list of primes.

The method has several clear drawbacks. We would have to memorize the mapping from consonants to bit patterns, but this is short and maybe not too difficult.

More significant is that if we're trying to decide if, say, 637 is prime, we have to remember which consonant in which word represents the 63rd decade. This can be fixed, maybe, by selecting words and sentences of standard length. Say there are three sentences and each contains 30 consonants. Maybe we can arrange that words always appear in patterns, say four words with 1 or 2 consonants each that together total 7 consonants, followed by a single long word with three consonants. Then each sentence can contain three of these five-word groups and it will be relatively easy to locate the 23rd consonant in a sentence: it is early in the third group.

Katara and I tried this, with not much success. But I'm not ready to give up on the idea quite yet. A problem we encountered early on is that we remember consonants not be how words are spelled but by how they sound. So we don't want a word like “hammer” to represent the consonant pattern h-m-m but rather just h-m.

Another problem is that some constellations of primes are much more common than others. We initially assigned consonants to constellations in order. This assigned letter “b” to the decades that contain no primes. But this is the most common situation, so the letter “b” tended to predominate in the words we needed for our mnemonic. We need to be careful to assign the most common constellations to the best letters.

Some consonants in English like to appear in clusters, and it's not trivial to match these up with the common constellations. The mapping from prime constellations to consonants must be carefully chosen to work with English. We initially assigned “s” to the constellation “☆•☆☆” (where !!10n+1, 10n+7,!! and !!10n+9!! are prime but !!10n+3!! is not) and “t” to the constellation “☆☆••” (where !!10n+1!! and !!10n+3!! are prime but !!10n+7!! and !!10n+9!! are not) but these constellations cannot appear consecutively, since at least one of !!10n+7, 10n+9, 10n+11!! is composite. So any word with “s” and “t” with no intervening consonants was out of play. This eliminated a significant fraction of the entire dictionary!

I still think it could be made to work, maybe. If you're interested in playing around with this, the programs I wrote are available on Github. The mapping from decade constellations to consonant clusters is in select_words.py.


[Other articles in category /math] permanent link

Wed, 12 Sep 2018

Perils of hacking on mature software

Yesterday I wrote up an interesting bug in git-log --follow's handling of empty files. Afterward I thought I'd see if I could fix it.

People complain that the trouble of working on mature software like Git is to understand the way the code is structured, its conventions, the accumulated layers of cruft, and where everything is. I think this is a relatively minor difficulty. The hard part is no so much doing what you want, as knowing what you want to do.

My original idea for the fix was this: I can give git log a new option, say --follow-size-threshhold=n. This would disable all copy and rename detection for any files of size less than n bytes. If not specified or configured, n would default to 1, so that the default behavior would disable copy and rename detection of empty files but not of anything else. I was concerned that an integer option was unnecessarily delicate. It might have been sufficient to have a boolean --follow-empty-files flag. But either way the programming would be almost the same and it would be easy to simplify the option later if the Git maintainers wanted it that way

I excavated the code and found where the change needed to go. It's not actually in git-log itself. Git has an internal system for diffing pairs of files, and git-log --follow uses this to decide when two blobs are similar enough for it to switch from following one to the other. So the flag actually needed to be added to git-diff, where I called it --rename-size-threshhold. Then git-log would set that option internally before using the Git diff system to detect renames.

But then I ran into a roadblock. Diff already has an undocumented flag called --rename-empty that tells it to report on renames of empty files in certain contexts — not the context I was interested in unfortunately. The flag is set by default, but it is cleared internally when git-merge is resolving conflicts. The issue it addresses is this: Suppose the merge base has some empty file X. Somewhere along the line X has been removed. In one branch, an unrelated empty file Y has been created, and in the other branch a different unrelated empty file Z has been created. When merging these two branches, Git will detect a merge conflict: was file X moved to location Y or to location Z? This ⸢conflict⸣ is almost certainly spurious, and is is very unlikely that the user will thank us for demanding that they resolve it manually. So git-merge sets --no-rename-empty internally and Git resolves the ⸢conflict⸣ automatically.

(See this commit for further details.)

The roadblock is: how does --rename-empty fit together with my proposed --rename-size-threshhold flag? Should they be the same thing? Or should they be separate options? There appear to be at least three subsystems in Git that try to decide if two similar or identical files (which might have different names, or the same name in different directories) are “the same file” for various purposes. Do we want to control the behavior of these subsystems separately or in unison?

If they should be controlled in unison, should --rename-size-threshhold be demoted to a boolean, or should --rename-empty be promoted to an integer? And if they should be the same, what are the implications for backward compatibility? Should the existing --rename-empty be documented?

If we add new options, how do they interact with the existing and already non-orthogonal flags that do something like this? They include at least the following options of git-diff, git-log, and git-show:

--follow
--find-renames=n 
--find-copies
--find-copies-harder
-l 

Only git-log has --follow and my new feature was conceived as a modification of it, which is why I named it --follow-size-threshhold. But git-log wouldn't be implementing this itself, except to pass the flag into the diff system. Calling it --follow-size-threshhold in git-diff didn't make sense because git-diff doesn't have a --follow option. It needs a different name. But if I do that, then we have git-diff and git-log options with different names that nevertheless do exactly the same thing. Confusing!

Now suppose you would like to configure a default for this option in your .gitconfig. Does it make sense to have both diff.renameSizeThreshhold and log.followSizeThreshhold options? Not really. It would never be useful to set one but not the other. So eliminate log.followSizeThreshhold. But now someone like me who wants to change the behavior of git-log --follow will not know to look in the right place for the option they need.

The thing to do at this point is to come up with some reasonable-seeming proposal and send it to Jeff King, who created the undocumented --rename-empty feature, and who is also a good person to work with. But coming up with a good solution entirely on my own is unlikely.

Doing any particular thing would not be too hard. The hard part is deciding what particular thing to do.


[Other articles in category /prog] permanent link

Language fluency in speech and print

Long ago I worked among the graduate students at the University of Pennsylvania department of Computer and Information Sciences. Among other things, I did system and software support for them, and being about the same age and with many common interests, I socialized with them also.

There was one Chinese-Malaysian graduate student who I thought of as having poor English. But one day, reading one of his emailed support requests, I was struck by how clear and well-composed it was. I suddenly realized I had been wrong. His English was excellent. It was his pronunciation that was not so good. When speaking to him in person, this was all I had perceived. In email, his accent vanished and he spoke English like a well-educated native. When I next met him in person I paid more careful attention and I realized that, indeed, I had not seen past the surface: he spoke the way he wrote, but his accent had blinded me to his excellent grammar and diction.

Once I picked up on this, I started to notice better. There were many examples of the same phenomenon, and also the opposite phenomenon, where someone spoke poorly but I hadn't noticed because their pronunciation was good. But then they would send email and the veil would be lifted. This was even true of native speakers, who can get away with all sorts of mistakes because their pronunciation is so perfect. (I don't mean perfect in the sense of pronouncing things the way the book says you should; I mean in the sense of pronouncing things the way a native speaker does.) I didn't notice this unless I was making an effort to look for it.

I'm not sure I have anything else to say about this, except that it seems to me that when learning a foreign language, one ought to consider whether one will be using it primarily for speech or primarily for writing, and optimize one's study time accordingly. For speech, concentrate on good pronunciation; for writing, focus on grammar and diction.

Hmm, put that way it seems obvious. Also, the sky is sometimes blue.


[Other articles in category /lang] permanent link