The Universe of Discourse

Thu, 11 Sep 2008

Return return
Among the things I read during the past two months was the paper Functional Programming with Overloading and Higher-Order Polymorphism, by Mark P. Jones. I don't remember why I read this, but it sure was interesting. It is an introduction to the new, cool features of Haskell's type system, with many examples. It was written in 1995 when the features were new. They're no longer new, but they are still cool.

There were two different pieces of code in this paper that wowed me. When I started this article, I was planning to write about #2. I decided that I would throw in a couple of paragraphs about #1 first, just to get it out of the way. This article is that couple of paragraphs.

Suppose you have a type that represents terms over some type v of variable names. The v type is probably strings but could possibly be something else:

	data Term v = TVar v                -- Type variable
| TInt                  -- Integer type
| TString               -- String type
| Fun (Term v) (Term v) -- Function type

There's a natural way to make the Term type constructor into an instance of Monad:

	instance Monad Term where
return v          = TVar v
TVar v   >>= f = f v
TInt     >>= f = TInt
TString  >>= f = TString
Fun d r  >>= f = Fun (d >>= f) (r >>= f)

That is, the return operation just lifts a variable name to the term that consists of just that variable, and the bind operation just maps its argument function over the variable names in the term, leaving everything else alone.

Jones wants to write a function, unify, which performs a unification algorithm over these terms. Unification answers the question of whether, given two terms, there is a third term that is an instance of both. For example, consider the two terms a → Int and String → b, which are represented by Fun (TVar "a") TInt and Fun TString (TVar "b"), respectively. These terms can be unified, since the term String → Int is an instance of both; one can assign a = TString and b = TInt to turn both terms into Fun TString TInt.

The result of the unification algorithm should be a set of these bindings, in this example saying that the input terms can be unified by replacing the variable "a" with the term TString, and the variable "b" with the term TInt. This set of bindings can be represented by a function that takes a variable name and returns the term to which it should be bound. The function will have type v → Term v. For the example above, the result is a function which takes "a" and returns TString, and which takes "b" and returns TInt. What should this function do with variable names other than "a" and "b"? It should say that the variable named "c" is "replaced" by the term TVar "c", and similarly other variables. Given any other variable name x, it should say that the variable x is "replaced" by the term TVar x.

The unify function will take two terms and return one of these substitutions, where the substition is a function of type v → Term v. So the unify function has type:

    unify :: Term v → Term v → (v → Term v)

Oh, but not quite. Because unification can also fail. For example, if you try to unify the terms ab and Int, represented by Fun (TVar "a") (TVar "b") and TInt respectively, the unfication should fail, because there is no term that is an instance of both of those; one represents a function and the other represents an integer. So unify does not actually return a substitution of type v → Term v. Rather, it returns a monad value, which might contain a substitution, if the unification is successful, and otherwise contains an error value. To handle the example above, the unify function will contain a case like this:

	unify	TInt	(Fun _ _) = fail ("Cannot unify" ....)

It will fail because it is not possible to unify functions and integers.

If unification is successful, then instead of using fail, the unify function will construct a substitution and then return it with return. Let's consider the result of unifying TInt with TInt. This unification succeeds, and produces a trivial substitition with no bindings. Or more precisely, every variable x should be "replaced" by the term TVar x. So in this case the substitution returned by unify should be the trivial one, a function which takes x and returns TVar x for all variable names x.

But we already have such a function. This is just what we decided that Term's return function should do, when we were making Term into a monad. So in this case the code for unify is:

	unify	TInt	TInt	  = return return

Yep, in this case the unify function returns the return function.

Wheee!

At this point in the paper I was skimming, but when I saw return return, I boggled. I went back and read it more carefully after that, you betcha.

That's my couple of paragraphs. I was planning to get to this point and then say "But that's not what I was planning to discuss. What I really wanted to talk about was...". But I think I'll break with my usual practice and leave the other thing for tomorrow.

Happy Diada Nacional de Catalunya, everyone!