The Universe of Disco


Tue, 18 Oct 2022

Tree search in Haskell

In Perl I would often write a generic tree search function:

    # Perl
    sub search {
      my ($is_good, $children_of, $root) = @_;
      my @queue = ($root);
      return sub {
        while (1) {
          return unless @queue;
          my $node = shift @queue;
          push @queue, $children_of->($node);
          return $node if $is_good->($node);
        }
      }
    }

For example, see Higher-Order Perl, section 5.3.

To use this, we provide two callback functions. $is_good checks whether the current item has the properties we were searching for. $children_of takes an item and returns its children in the tree. The search function returns an iterator object, which, each time it is called, returns a single item satisfying the $is_good predicate, or undef if none remains. For example, this searches the space of all strings over abc for palindromic strings:

    # Perl
    my $it = search(sub { $_[0] eq reverse $_[0] },
                    sub { return map "$_[0]$_" => ("a", "b", "c") },
                    "");

    while (my $pal = $it->()) {
      print $pal, "\n";
    }

Many variations of this are possible. For example, replacing push with unshift changes the search from breadth-first to depth-first. Higher-Order Perl shows how to modify it to do heuristically-guided search.

I wanted to do this in Haskell, and my first try didn’t work at all:

    -- Haskell
    search1 :: (n -> Bool) -> (n -> [n]) -> n -> [n]
    search1 isGood childrenOf root =
      s [root]
        where
          s nodes = do
            n <- nodes
            filter isGood (s $ childrenOf n)

There are two problems with this. First, the filter is in the wrong place. It says that the search should proceed downward only from the good nodes, and stop when it reaches a not-good node. To see what's wrong with this, consider a search for palindromes. Th string ab isn't a palindrome, so the search would be cut off at ab, and never proceed downward to find aba or abccbccba. It should be up to childrenOf to decide how to continue the search. If the search should be pruned at a particular node, childrenOf should return an empty list of children. The $isGood callback has no role here.

But the larger problem is that in most cases this function will compute forever without producing any output at all, because the call to s recurses before it returns even one list element.

Here’s the palindrome example in Haskell:

    palindromes = search isPalindrome extend ""
      where
        isPalindrome s = (s == reverse s)
        extend s = map (s ++) ["a", "b", "c"]

This yields a big fat !!\huge \bot!!: it does nothing, until memory is exhausted, and then it crashes.

My next attempt looked something like this:

        search2 :: (n -> Bool) -> (n -> [n]) -> n -> [n]
        search2 isGood childrenOf root = filter isGood $ s [root]
            where
              s nodes = do
                n <- nodes
                n : (s $ childrenOf n)

The filter has moved outward, into a single final pass over the generated tree. And s now returns a list that at least has the node n on the front, before it recurses. If one doesn’t look at the nodes after n, the program doesn’t make the recursive call.

The palindromes program still isn’t right though. take 20 palindromes produces:

    ["","a","aa","aaa","aaaa","aaaaa","aaaaaa","aaaaaaa","aaaaaaaa",
     "aaaaaaaaa", "aaaaaaaaaa","aaaaaaaaaaa","aaaaaaaaaaaa",
     "aaaaaaaaaaaaa","aaaaaaaaaaaaaa", "aaaaaaaaaaaaaaa",
     "aaaaaaaaaaaaaaaa","aaaaaaaaaaaaaaaaa", "aaaaaaaaaaaaaaaaaa",
     "aaaaaaaaaaaaaaaaaaa"]

It’s doing a depth-first search, charging down the leftmost branch to infinity. That’s because the list returned from s (a:b:rest) starts with a, then has the descendants of a, before continuing with b and b's descendants. So we get all the palindromes beginning with “a” before any of the ones beginning with "b", and similarly all the ones beginning with "aa" before any of the ones beginning with "ab", and so on.

I needed to convert the search to breadth-first, which is memory-expensive but at least visits all the nodes, even when the tree is infinite:

        search3 :: (n -> Bool) -> (n -> [n]) -> n -> [n]
        search3 isGood childrenOf root = filter isGood $ s [root]
            where
              s nodes = nodes ++ (s $ concat (map childrenOf nodes))

This worked. I got a little lucky here, in that I had already had the idea to make s :: [n] -> [n] rather than the more obvious s :: n -> [n]. I had done that because I wanted to do the n <- nodes thing, which is no longer present in this version. But it’s just what we need, because we want s to return a list that has all the nodes at the current level (nodes) before it recurses to compute the nodes farther down. Now take 20 palindromes produces the answer I wanted:

    ["","a","b","c","aa","bb","cc","aaa","aba","aca","bab","bbb","bcb",
     "cac", "cbc","ccc","aaaa","abba","acca","baab"]

While I was writing this version I vaguely wondered if there was something that combines concat and map, but I didn’t follow up on it until just now. It turns out there is and it’s called concatMap. 😛

        search3' :: (n -> Bool) -> (n -> [n]) -> n -> [n]
        search3' isGood childrenOf root = filter isGood $ s [root]
            where
              s nodes = nodes ++ (s $ concatMap childrenOf nodes)

So this worked, and I was going to move on. But then a brainwave hit me: Haskell is a lazy language. I don’t have to generate and filter the tree at the same time. I can generate the entire (infinite) tree and filter it later:

    -- breadth-first tree search
    bfsTree :: (n -> [n]) -> [n] -> [n]
    bfsTree childrenOf nodes =
        nodes ++ bfsTree childrenOf (concatMap childrenOf nodes)

    search4 isGood childrenOf root =
        filter isGood $ bfsTree childrenOf [root]

This is much better because it breaks the generation and filtering into independent components, and also makes clear that searching is nothing more than filtering the list of nodes. The interesting part of this program is the breadth-first tree traversal, and the tree traversal part now has only two arguments instead of three; the filter operation afterwards is trivial. Tree search in Haskell is mostly tree, and hardly any search!

With this refactoring we might well decide to get rid of search entirely:

    palindromes4 = filter isPalindrome allStrings
      where
        isPalindrome s = (s == reverse s)
        allStrings = bfsTree (\s -> map (s ++) ["a", "b", "c"]) [""]

And then I remembered something I hadn’t thought about in a long, long time:

[Lazy evaluation] makes it practical to modularize a program as a generator that constructs a large number of possible answers, and a selector that chooses the appropriate one.

That's exactly what I was doing and what I should have been doing all along. And it ends:

Lazy evaluation is perhaps the most powerful tool for modularization … the most powerful glue functional programmers possess.

(”Why Functional Programming Matters”, John Hughes, 1990.)

I felt a little bit silly, because I wrote a book about lazy functional programming and yet somehow, it’s not the glue I reach for first when I need glue.

[ Addendum 20221023: somewhere along the way I dropped the idea of using the list monad for the list construction, instead using explicit map and concat. But it could be put back. For example:

        s nodes = (nodes ++) . s $ do
            n <- nodes
            childrenOf n

I don't think this is an improvement on just using concatMap. ]


[Other articles in category /prog/haskell] permanent link