The Universe of Discourse


Sun, 09 Sep 2018

APL matrix product operator

I very recently suggested a mathematical operation that does this:

$$\begin{align} \left((\sqrt\bullet) \cdot x + \left(\frac1\bullet\right) \cdot 1 \right) ⊛ (9x+4) & = \sqrt9 x^2 + \sqrt4 x + \frac19 x + \frac14 \\ & = 3x^2 + \frac{19}{9} x + \frac 14 \end{align}$$

Here the left-hand argument is like a polynomial, except that the coefficients are functions. The right-hand argument is an ordinary polynomial.

It occurs to me that the APL progamming lanaguage (invented around 1966) actually has something almost like this, in its generalized matrix product.

In APL, if ? and ! are any binary operators, you can write ?.! to combine them into a matrix operator. Like ordinary matrix multiplication, the new operator combines an !!m×n!! and an !!n×r!! matrix into an !!m×r!! matrix. Ordinary matrix multiplication is defined like this:

$$c_{ij} = a_{i1} \cdot b_{1j} +
a_{i2} \cdot b_{2j} + \ldots + a_{in} \cdot b_{nj} $$

The APL ?.! operator replaces the addition with ? and the multiplication with !, so that +.× is exactly the standard matrix multiplication. Several other combined operations of this type are, if not common, at least idiomatic. For example, I have seen, and perhaps used, ∨.∧, +.∧, and ⌈.⌊. ( and are APL's two-argument minimum and maximum operators.)

With this feature, the ⊛ operator I proposed above would be something like +.∘, where means function composition. To make it work you need to interpret the coefficients of an ordinary polynomial as constant functions, but that is not much of a stretch. APL doesn't actually have a function composition operator.

APL does have a symbol, but it doesn't mean function composition, and also the !.? notation is special cased, in typically APL style, so that ∘.? does something sort of related but rather different. Observe also that if !!a!! and !!b!! are !!1×n!! and !!n×1!! matrices, respectively, then !!a +.× b!! ought to be dot product of !!a!! and !!b!!: it is a !!1×1!! matrix whose sole entry is:

$$c_{11} = a_{11} \cdot b_{11} +
a_{12} \cdot b_{21} + \ldots + a_{1n} \cdot b_{n1} $$

and similarly if !!a!! is !!n×1!! and !!b!! is !!1×m!! then !!a +.× b!! is the outer product, the !!n×m!! matrix whose !!c_{ij} = a_i × b_j!!. But I think APL doesn't distinguish between a !!1×n!! matrix and a vector, though, and always considers them to be vectors, so that in such cases !!a +.× b!! always gets you the dot product, if !!a!! and !!b!! are the same length, and an error otherwise. If you want the outer product of two vectors you use a ∘.× b instead. a ∘.+ b would be the outer product matrix with !!c_{ij} = a_i + b_j!!. APL is really strange.

I applied for an APL job once; I went to a job fair (late 1980s maybe?) and some Delaware bank was looking for APL programmers to help maintain their legacy APL software. I was quite excited at the idea of programming APL professionally, but I had no professional APL experience so they passed me over. I think they made a mistake, because there are not that many people with professional APL experience anyway, and how many twenty-year-olds are there who know APL and come knocking on your door looking for a job? But whatever, it's probably better that I didn't take that route.

The +.× thing exemplifies my biggest complaint about APL semantics: it was groping toward the idea of functional programming without quite getting there, never quite general enough. You could use !/, where ! was any built-in binary operator, and this was quite like a fold. But you couldn't fold a user-defined function of two arguments! And you couldn't write a higher-order fold function either.

I was pleased to find out that Iverson had designed a successor language, J, and then quickly disappointed when I saw how little it added. For example, it has an implicit “hook” construction, which is a special case in the language for handling one special case of function composition. In Haskell it would be:

    hook f g x = x `f` (g x)

but in J the hook itself is implicit. If you would rather use (g x) `f` x instead, you are out of luck because that is not built-in. I don't know why Iverson thought the hook was the thing to embed in the language. (J also has an implicit “fork” which is fork f g h x = (f x) `g` (h x).)

[ Addendum 20180910: The explanation. ]

Meanwhile the awful APL notation has gotten much more awful in J, and you get little in return. You even lose all the fun of the little squiggles. Haskell is a much better J than J ever was. Haskell's notation can be pretty awful too ((.) . (.)?), but at least you are are getting your money's worth.

I thought I'd see about implementing APL's !.? thing in Haskell to see what it would look like. I decided to do it by implementing a regular matrix product and then generalizing. Let's do the simplest thing that could possibly work and represent a matrix as a list of rows, each of which is a list of entries.

For a regular matrix product, !!C = AB!! means that !!c_{ij}!! is the dot product of the !!i!!th row of !!A!! and the !!j!!th column of !!B!!, so I implemented a dot product function:

    dot_product :: Num b => [b] -> [b] -> b
    dot_product a b = foldr (+) 0 $ zipWith (*) a b

OK, that was straightforward.

The rows of !!A!! are right there, but we also need the columns from !!B!!, so here's a function to get those:

    transpose ([]:_) = []
    transpose x = (map head x) : transpose (map tail x)

Also straightforward.

After that I toiled for a very long time over the matrix product itself. My first idea was to turn !!A!! into a list of functions, each of which would dot-product one of the rows of !!A!! by a given vector. Then I would map each of these functions over the columns of !!B!!.

Turning !!A!! into a list of functions was easy:

    map dot_product a  :: [ [x] -> x ]

and getting the columns of !!B!! I had already done:

    transpose b :: [[x]]

and now I just need to apply each row of functions in the first part to each column in the second part and collect the results:

    ??? (map dot_product a) (transpose b)

I don't know why this turned out to be so damn hard. This is the sort of thing that ought to be really, really easy in Haskell. But I had many difficulties.

First I wasted a bunch of time trying to get <*> to work, because it does do something like that. But the thing I wanted has signature

  ??? :: [a -> b] -> [a] -> [[b]]

whereas <*> flattens the result:

  <*> :: [a -> b] -> [a] -> [b]

and I needed to keep that extra structure. I tried all sorts of tinkering with <*> and <$> but never found what I wanted.

Another part of the problem was I didn't know any primitive for “map a list of functions over a single argument”. Although it's not hard to write, I had some trouble thinking about it after I wrote it:

    pamf fs b = fmap ($ b) fs

Then the “map each function over each list of arguments” is map . pamf, so I got

     (map . pamf) (map dot_product a) (transpose b)

and this almost works, except it produces the columns of the results instead of the rows. There is an easy fix and a better fix. The easy fix is to just transpose the final result. I never did find the better fix. I thought I'd be able to replace map . pamf with pamf . map but the latter doesn't even type check.

Anyway this did work:

    matrix_product a b = 
       transpose $ (map . pamf) (map dot_product a) (transpose b)

but that transpose on the front kept bothering me and I couldn't leave it alone.

So then I went down a rabbit hole and wrote nine more versions of ???:

    fs `op` as  = do
       f <- fs
       return $ fmap f as

    fs `op2` as = fs >>= (\f -> return $ fmap f as)

    fs `op3` as = fs >>= (return . flip fmap as )
    fs `op4` as = fmap ( flip fmap as ) fs
    op5 as = fmap ( flip fmap as )
    op6 :: [a -> b] -> [a] -> [[b]]
    op6 = flip $ fmap . (flip fmap)

    fs `op7` as = map (\f -> [ f a | a <- as ]) fs
    fs `op8` as = map (\f -> (map f as)) fs
    fs `op9` as = map (flip map as) fs

I finally settled on op6, except it takes the arguments in the “wrong” order, with the list of functions second and their arguments first. But I used it anyway:

    matrix_product a b =  (map . flip map) (transpose b) (map dot_product a)

The result was okay, but it took me so long to get there.

Now I have matrix_product and I can generalize it to uses two arbitrary operations instead of addition and multiplication. And hey, I don't have to touch matrix_product! I only need to change dot_product because that's where the arithmetic is. Instead of

    dot_product a b = foldr (+) 0 $ zipWith (*) a b

just use:

    inner_product u v = foldr add 0 $ zipWith mul u v

Except uh oh, that 0 is wrong. It might not be the identity for whatever weird operation add is; it might be min and then we need the 0 to be minus infinity.

I tinkered a bit with requiring a Monoid instance for the matrix entries, which seemed interesting at least, but to do that I would need to switch monoids in the middle of the computation and I didn't want to think about how to do that. So instead I wrote a version of foldr that doesn't need an identity element:

    foldr' f (a:as) = foldr f a as

This fails on empty lists, which is just fine, since I wasn't planning on multiplying any empty matrices.

Then I have the final answer:

    general_matrix_product add mul a b =
      (map . flip map) (transpose b) (map inner_product a) where
        inner_product u v = foldr' add $ zipWith mul u v

It's nice and short, but on the other hand it has that mysterious map . flip map in there. If I hadn't written that myself I would see it and ask what on earth it was doing. In fact I did write it myself and I although I do know what it is doing I don't really understand why.

As for the shortness, let's see what it looks like in a more conventional language:

    def transpose(m):
      return list(zip(*m))

Wow, that was amazingly easy.

    def matrix_product(a, b):
      def dot_product(u, v):
        total = 0
        for pair in zip(u, v):
          total += pair[0] * pair[1]
        return total

      bT = transpose(b)
      c = []
      for i in range(len(a)):
        c.append([])
        for j in range(len(bT)):
          c[-1].append(None)
          c[i][j] = dot_product(a[i], bT[j])
      return c

Okay, that was kind of a mess. The dot_product should be shorter because Python has a nice built-in sum function but how do I build the list of products I want to sum? It doesn't have map because it doesn't have lambdas. I know, I know, someone is going to insist that Python has lambdas. It does, sort of, but they suck.

I think the standard Python answer to this is that you don't need map because you're supposed to use list comprehension instead:

      def dot_product(u, v):
        return sum([ x*y for (x, y) in zip(u, v) ])

I don't know how I feel about that argument in general but in this case the result was lovely. I have no complaints.

While I was writing the Python program I got a weird bug that turned out to be related to mutability: I had initialized c with

    c = [[None] * len(bT)] * len(a)

But this makes the rows of c the same mutable object, and then installing values in each row overwrites the entries we stored in the previous rows. So definitely score one point for Haskell there.

A lot of the mess in the code is because Python is so obstinate about extending lists when you need them extended, you have to say pretty please every time. Maybe I can get rid of that by using more list comprehensions?

    def matrix_product2(a, b):
      def dot_product(u, v):
        return sum([ x*y for (x, y) in zip(u, v) ])

      return [ [ dot_product(u, v) for v in transpose(b) ] for u in a ]

Python's list comprehensions usually make me long for Haskell's, which are so much nicer, but this time they were fine. Python totally wins here. No wait, that's not fair: maybe I should have been using list comprehensions in Haskell also?

    matrix_product = [ [ dot_product row col | col <- transpose b ] | row <- a ]

Yeah, okay. All that map . flip map stuff was for the birds. Guido thinks that map is a bad idea, and I thought he was being silly, but maybe he has a point. If I did want the ??? thing that applies a list of functions to a list of arguments, the list comprehension solves that too:

    [ f x | f <- fs, x <- xs ]

Well, lesson learned.

I really wish I could write Haskell faster. In the mid-1990s I wrote thousands of lines of SML code and despite (or perhaps because of) SML's limitations I was usually able to get my programs to do what I wanted. But when I try to write programs in Haskell it takes me a really long time to get anywhere.

Apropos of nothing, today is the 77th birthday of Dennis M. Ritchie.

[ Addendum: It took me until now to realize that, after all that, the operation I wanted for polynomials is not matrix multiplication. Not at all! It is actually a convolution:

$$ c_k = \sum_{i+j=k} a_ib_j $$

or, for my weird functional version, replace the multiplication !!a_ib_j!! with function composition !!a_i ∘ b_j!!. I may implement this later, for practice. And it's also tempting to try to do it in APL, even though that would most likely be a terrible waste of time… ]

[ Addendum 20180909: Vaibhav Sagar points out that my foldr' is the standard Prelude function foldr1. But as I said in the previous article, one of the problems I have is that faced with a need for something like foldr1, instead of taking one minute to write it, I will waste fifteen minutes looking for it in Hoogle. This time I opted to not do that. In hindsight it was a mistake, perhaps, but I don't regret the choice. It is not easy to predict what is worth looking for. To see the downside risk, consider pamf. A Hoogle search for pamf produces nothing like what I want, and, indeed, it doesn't seem to exist. ]


[Other articles in category /prog] permanent link