The List Monad

You might still be confused by the concept of monads. I'm not going to provide a comprehensive monads introduction, but I'll show you a simple example of the List monad in action, and maybe it will provide some movement towards a general understanding of monads.

The idea behind the List monad is that it captures the essence of "multi-valued functions." Let's say that a problem might have several possible solutions, and you want to capture all of them, but you also want to treat the problem each step of the way as though you were only dealing with one of the answers at one time. The List monad allows this, and I'll show you how.

The problem I'll examine is: given a sorted sequence of items, what are all the different binary search trees you can put the items in?

Recall that a binary search tree is a sorted tree where each element has zero, one, or two children. My tree will be of Ints, each Node will contain an Int and two children, and children can either be Nodes or Leaf items (an "end of tree" marker). Thus the Haskell structure I will use is the following:

data Tree = Leaf | Node Int Tree Tree deriving Show
-- For example, the tree
--    4
--  2   5
-- would be constructed using the code:
testTree = Node 4 (Node 2 Leaf Leaf) (Node 5 Leaf Leaf)

OK, so now we have a Tree structure. Let's look at the problem again. What are all possible trees?

   2            2
     4             5
       5          4


    5         5             4
 2           4            2   5
  4         2

How did I arrive at this? Do it yourself and see. I selected a root node (2), then tried to figure out all the possible subtrees on the left and right of that 2. In this case there were two: 4-5 and 5-4. Same for if I selected 5. But when I selected 4 as the root, there was only one subtree possibility for each side.

As you can probably see, we can solve it with a neat recursive solution: arbitrarily select a root, partition the list into sublists on the left and right side, solve the problem recursively for the sublists, then combine the solutions in all possible ways to produce all the trees rooted at the selected root. Select the next root and continue.

An iterative programmer will look at that and see three nested for-loops. Take a look and see if you agree:

def getAllTrees(lst):
    outputs = []
    for root in lst:
        lhs <- items left of root
        rhs <- items right of root
        for lhstree in getAllTrees(lhs):
            for rhstree in getAllTrees(rhs):
                add new Node(root, lhstree, rhstree) to outputs

    return outputs

The List monad, however, allows us to do something much neater. Our getAllTrees function is fundamentally a multi-valued function, but it also makes use of some multi-valued functions: one to select a root (the function that returns each root), and the recursive calls to getAllTrees.

Let's write getAllTrees in Haskell. First, the type signature, and let's also think about what happens in the base case. When we are given a single element list, we want that element in a Node with two Leafs. That means that getAllTrees of an empty list must return a single Tree as its result, the "empty tree".

getAllTrees :: [Int] -> [Tree]
getAllTrees [] = [Leaf]

Now, the rest of the function. Let's split the tree at each possible place: the beginning, after the first item, after the second item, and so on. We always want to leave one item for the root, and we're splitting it before the root each time, so there are n ways to split the tree. The splitAt function is what we want: given an n and a list, it gives us the tuple (n-long prefix, rest of items). The root will be the first element of the 'rest', so we uncons it, and now we have lhs, root, and rhs. Look at the algorithm, and I'll explain it more below.

getAllTrees lst = do
    rootIdx <- allPossibleRoots
    let (lhs, root:rhs) = splitAt rootIdx lst
    lhstree <- getAllTrees lhs
    rhstree <- getAllTrees rhs
    return $ Node root lhstree rhstree
    where allPossibleRoots = [0..(length lst)-1]

Do you see how it works? rootIdx gets the value 0 and then the rest of the algorithm runs: splitAt is called; lhs, root, and rhs are assigned; getAllTrees is called; lhs- and rhstree get values; and a new Node is built with the results. The magic of the List monad is that this process happens once for each item returned from allPossibleRoots and getAllTrees. At the end, the algorithm calls 'return' a whole bunch of times. Isn't it neat?