The Universe of Discourse

Sun, 05 Mar 2017

Solving twenty-four puzzles

Back in July, I wrote:

Lately my kids have been interested in puzzles of this type: You are given a sequence of four digits, say 1,2,3,4, and your job is to combine them with ordinary arithmetic operations (+, -, ×, and ÷) in any order to make a target number, typically 24. For example, with 1,2,3,4, you can go with $$((1+2)+3)×4 = 24$$ or with $$4×((2×3)×1) = 24.$$

I said I had found an unusually difficult puzzle of this type, which is to make 2,5,6,6 total to 17. This is rather difficult. (I will reveal the solution later in this article.) Several people independently wrote to advise me that it is even more difficult to make 3,3,8,8 total to 24. They were right; it is amazingly difficult. After a couple of weeks I finally gave up and asked the computer, and when I saw the answer I didn't feel bad that I hadn't gotten it myself. (The solution is here if you want to give up without writing a program.)

From now on I will abbreviate the two puzzles of the previous paragraph as «2 5 6 6 ⇒ 17» and «3 3 8 8 ⇒ 24», and others similarly.

The article also inspired a number of people to write their own solvers and send them to me, and comparing them was interesting. My solver followed the tree search technique that I described in chapter 5 of Higher-Order Perl, and which has become so familiar to me that by now I can implement it without thinking about it very hard:

  1. Invent a data structure that represents the state of a possibly-incomplete search. This is just a list of the stuff one needs to keep track of while searching. (Let's call this a node.)

  2. Build a function which recognizes when a node represents a successful search.

  3. Build a function which takes a node, computes all the ways the search could proceed from that point, and returns a list of nodes for those slightly-more-advanced searches.

  4. Initialize a queue with a node representing a search that has just begun.

  5. Do this:

      until ( queue.is_empty() ) {
        current_node = queue.get_next()
        if ( is_successful( current_node ) ) { print the solution }
        queue.push( slightly_more_complete_searches( current_node ) )

This is precisely a breadth-first search. To make it into depth-first search, replace the queue with a stack. To make a heuristically directed search, replace get_next with a function that looks at the queue and chooses the best-looking node from which to proceed. Many other variations are possible, which is the advantage of this synthetic approach over letting the search arise organically from a recursive searcher. (Higher-Order Perl says “Recursive functions naturally perform depth-first searches.” (page 203)) In Python or Ruby one would be able to use yield and would not have to manage the queue explicitly, but in this case the queue management is trivial.

In my solver, each node contains a list of available expressions, annotated with its numerical value. Initially, the expressions are single numbers and the values are the same, say

    [ [ "2" => 2 ], [ "3" => 3 ], [ "4" => 4 ], [ "6" => 6 ] ]

Whether you represent expressions as strings or as something more structured depends on what you need to do with them at the end. If you just need to print them out, strings are good enough and are easy to handle.

A node represents a successful search if it contains only a single expression and if the expression's value is the target sum, say 24:

    [ [ "(((6÷2)+3)×4)" => 24 ] ]

From a node, the search should proceed by selecting two of the expressions, removing them from the node, selecting a legal operation, combining the two expressions into a single expression, and inserting the result back into the node. For example, from the initial node shown above, the search might continue by subtracting the fourth expression from the second:

    [ [ "2" => 2 ], [ "4" => 4 ], [ "(3-6)" => -3 ] ]

or by multiplying the second and the third:

    [ [ "2" => 2 ], [ "(3×4)" => 12 ], [ "6" => 6 ] ]

When the program encounters that first node it will construct both of these, and many others, and put them all into the queue to be investigated later.


    [ [ "2" => 2 ], [ "(3×4)" => 12 ], [ "6" => 6 ] ]

the search might proceed by dividing the first expression by the third:

    [ [ "(3×4)" => 12 ], [ "(2÷6)" => 1/3 ] ]

Then perhaps by subtracting the first from the second:

    [ [ "((2÷6)-(3×4))" => -35/3 ] ]

From here there is no way to proceed, so when this node is removed from the queue, nothing is added to replace it. Had it been a winner, it would have been printed out, but since !!-\frac{35}3!! is not the target value of 24, it is silently discarded.

To solve a puzzle of the «a b c d ⇒ t» sort requires examining a few thousand nodes. On modern hardware this takes approximately zero seconds.

The actual code for my solver is a lot of Perl gobbledygook that may not be of general interest so I will provide a link for people who are interested in deciphering it. It also represents my second attempt: I lost the code that I described in the earlier article and had to rewrite it. It is rather bigger than I would have liked.

My puzzle solver in Perl.

Stuff goes wrong

People showed me a lot of programs to solve this, and many didn't work. There are a few hard cases that several of them get wrong.


Some puzzles require that some subexpressions have fractional values. Many of the programs people showed me used integer arithmetic (sometimes implicitly and unintentionally) and failed to solve those puzzles. We can detect this by asking for a solution to «2 5 6 6 ⇒ 17», which requires a fraction. The solution is !!6×(2+(5÷6))!!. A program using integer arithmetic will calculate !!5÷6 = 0!! and fail to recognize the solution.

Several people on Twitter made this mistake and then mistakenly claimed that there was no solution at all. Usually it was possible to correct their programs by changing

        inputs = [ 2, 2, 5, 6 ]


        inputs = [ 2.0, 2.0, 5.0, 6.0 ]

or something like that.

Some people also surprised me by claiming that I had lied when I stated that the puzzle could be solved without any “underhanded tricks”, and that the use of intermediate fractions was itself an underhanded trick. Your Honor, I plead not guilty. I originally described the puzzle this way:

You are given a sequence of four digits, say 1,2,3,4, and your job is to combine them with ordinary arithmetic operations (+, -, ×, and ÷) in any order to make a target number, typically 24.

The objectors are implicitly claiming that when you combine 5 and 6 with the “ordinary arithmetic operation” of division, you get something other than !!\frac56!!. This is an indefensible claim.

I wasn't even trying to be tricky! It never occurred to me that fractions were something that some people would consider underhanded, and now that it has been suggested, I reject the suggestion. Folks, the result of division can be a fraction. Fractions are not some sort of obscure mathematical pettifoggery. They have been with us for at least 3,500 years now, so it is time everyone got used to them.

Floating-point error

Some programs used floating-point arithmetic to deal with the fractions and then fell foul of floating-point error. I will defer discussion of this to a future article.

I've complained about floating-point numbers on this blog before. ( 1 2 3 4 5 ) God, how I loathe them.

[ Addendum 20170825: Looking back on our old discussion from July 2016, I see that Lindsey Kuper said to me:

One nice thing about using Racket or Scheme is that it handles the numeric stuff so nicely. If you weren't careful, I could imagine in Python a solution failing because it evaluated to 16.99999999999999997 or something.

Good call, Dr. Kuper! ]

Expression construction

A more subtle error that several programs made was to assume that all expressions can be constructed by combining a previous expression with a single input number. For example, to solve «2 3 5 7 ⇒ 24», you multiply 3 by 7 to get 21, then add 5 to get 26, then subtract 2 to get 24.

But not every puzzle can be solved this way. Consider «2 3 5 7 ⇒ 41». You start by multiplying 2 by 3 to get 6, but if you try to combine the 6 with either 5 or 7 at this point you will lose. The only solution is to put the 6 aside and multiply 5 by 7 to get 35. Then add the 6 and the 35 to get 41.

Another way to put this is that an unordered binary tree with 4 leaves can take two different shapes. (Imagine filling the green circles with numbers and the pink squares with operators.)

The right-hand type of structure is sometimes necessary, as with «2 3 5 7 ⇒ 41». But several of the proposed solutions produced only expressions with structures like that on the left.

Here's Sebastian Fischer's otherwise very elegant Haskell solution, in its entirety:

    import Data.List ( permutations )

    solution = head
      [ (a,x,(b,y,(c,z,d)))
        | [a,b,c,d] <- permutations [2,5,6,6],
           ops <- permutations [((+),'+'),((-),'-'),((*),'*'),((/),'/')],
           let [u,v,w] = map fst $ take 3 ops,
           let [x,y,z] = map snd $ take 3 ops,
           (a `u` (b `v` (c `w` d))) == 17

You can see the problem in the last line. a, b, c, and d are numbers, and u, v, and w are operators. The program evaluates an expression to see if it has the value 17, but the expression always has the left-hand shape. (The program has another limitation: it never uses the same operator twice in the expression. That second permutations should be (sequence . take 3 . repeat) or something. It can still solve «2 5 6 6 ⇒ 17», however.)

Often the way these programs worked was to generate every possible permutation of the inputs and then apply the operators to the input lists stackwise: pop the first two values, combine them, push the result, and repeat. Here's a relevant excerpt from a program by Tim Dierks, this time in Python:

  for ordered_values in permutations(values):
    for operations in product(ops, repeat=len(values)-1):
      result, formula = calc_result(ordered_values, operations)

Here the expression structure is implicit, but the current result is always made by combining one of the input numbers with the old result.

I have seen many people get caught by this and similar traps in the past. I once posed the problem of enumerating all the strings of balanced parentheses of a given length, and several people assumed that all such strings have the form ()S, S(), or (S), where S is a shorter string of the same type. This seems plausible, and it works up to length 6, but (())(()) does not have that form.

Division by zero

A less common error exhibited by some programs was a failure to properly deal with division by zero. «2 5 6 6 ⇒ 17» has a solution, and if a program dies while checking !!2+(5÷(6-6))!! and doesn't find the solution, that's a bug.

Programs that worked

Ingo Blechschmidt (Haskell)

Ingo Blechschmidt showed me a solution in Haskell. The code is quite short. M. Blechschmidt's program defines a synthetic expression type and an evaluator for it. It defines a function arb which transforms an ordered list of numbers into a list of all possible expressions over those numbers. Reordering the list is taken care of earlier, by Data.List.permutations.

By “synthetic expression type” I mean this:

    data Exp a
        = Lit  a
        | Sum  (Exp a) (Exp a)
        | Diff (Exp a) (Exp a)
        | Prod (Exp a) (Exp a)
        | Quot (Exp a) (Exp a)
        deriving (Eq, Show)

Probably 80% of the Haskell programs ever written have something like this in them somewhere. This approach has a lot of boilerplate. For example, M. Blechschmidt's program then continues:

    eval :: (Fractional a) => Exp a -> a
    eval (Lit x) = x
    eval (Sum  a b) = eval a + eval b
    eval (Diff a b) = eval a - eval b
    eval (Prod a b) = eval a * eval b
    eval (Quot a b) = eval a / eval b

Having made up our own synonyms for the arithmetic operators (Sum for !!+!!, etc.) we now have to explain to Haskell what they mean. (“Not expressions, but an incredible simulation!”)

I spent a while trying to shorten the code by using a less artificial expression type:

    data Exp a
        = Lit  a
        | Op ((a -> a -> a), String) (Exp a) (Exp a)

but I was disappointed; I was only able to cut it down by 18%, from 34 lines to 28. I hope to discuss this in a future article. By the way, “Blechschmidt” is German for “tinsmith”.

Shreevatsa R. (Python)

Shreevatsa R. showed me a solution in Python. It generates every possible expression and prints it out with its value. If you want to filter the voluminous output for a particular target value, you do that later. Shreevatsa wrote up an extensive blog article about this which also includes a discussion about eliminating duplicate expressions from the output. This is a very interesting topic, and I have a lot to say about it, so I will discuss it in a future article.

Jeff Fowler (Ruby)

Jeff Fowler of the Recurse Center wrote a compact solution in Ruby that he described as “hot garbage”. Did I say something earlier about Perl gobbledygook? It's nice that Ruby is able to match Perl's level of gobbledygookitude. This one seems to get everything right, but it fails mysteriously if I replace the floating-point constants with integer constants. He did provide a version that was not “egregiously minified” but I don't have it handy.

Lindsey Kuper (Scheme)

Lindsey Kuper wrote a series of solutions in the Racket dialect of Scheme, and discussed them on her blog along with some other people’s work.

M. Kuper's first draft was 92 lines long (counting whitespace) and when I saw it I said “Gosh, that is way too much code” and tried writing my own in Scheme. It was about the same size. (My Perl solution is also not significantly smaller.)

Martin Janecke (PHP)

I saved the best for last. Martin Janecke showed me an almost flawless solution in PHP that uses a completely different approach than anyone else's program. Instead of writing a lot of code for generating permutations of the input, M. Janecke just hardcoded them:

    $zahlen = [
      [2, 5, 6, 6],
      [2, 6, 5, 6],
      [2, 6, 6, 5],
      [5, 2, 6, 6],
      [5, 6, 2, 6],
      [5, 6, 6, 2],
      [6, 2, 5, 6],
      [6, 2, 6, 5],
      [6, 5, 2, 6],
      [6, 5, 6, 2],
      [6, 6, 2, 5],
      [6, 6, 5, 2]

Then three nested loops generate the selections of operators:

 $operatoren = [];
 foreach (['+', '-', '*', '/'] as $x) {
   foreach (['+', '-', '*', '/'] as $y) {
     foreach (['+', '-', '*', '/'] as $z) {
       $operatoren[] = [$x, $y, $z];

Expressions are constructed from templates:

        $klammern = [
          '%d %s %d %s %d %s %d',
          '(%d %s %d) %s %d %s %d',
          '%d %s (%d %s %d) %s %d',
          '%d %s %d %s (%d %s %d)',
          '(%d %s %d) %s (%d %s %d)',
          '(%d %s %d %s %d) %s %d',
          '%d %s (%d %s %d %s %d)',
          '((%d %s %d) %s %d) %s %d',
          '(%d %s (%d %s %d)) %s %d',
          '%d %s ((%d %s %d) %s %d)',
          '%d %s (%d %s (%d %s %d))'

(I don't think those templates are all necessary, but hey, whatever.) Finally, another set of nested loops matches each ordering of the input numbers with each selection of operators, uses sprintf to plug the numbers and operators into each possible expression template, and uses @eval to evaluate the resulting expression to see if it has the right value:

   foreach ($zahlen as list ($a, $b, $c, $d)) {
     foreach ($operatoren as list ($x, $y, $z)) {
       foreach ($klammern as $vorlage) {
         $term = sprintf ($vorlage, $a, $x, $b, $y, $c, $z, $d);
         if (17 == @eval ("return $term;")) {
           print ("$term = 17\n");

If loving this is wrong, I don't want to be right. It certainly satisfies Larry Wall's criterion of solving the problem before your boss fires you. The same approach is possible in most reasonable languages, and some unreasonable ones, but not in Haskell, which was specifically constructed to make this approach as difficult as possible.

M. Janecke wrote up a blog article about this, in German. He says “It's not an elegant program and PHP is probably not an obvious choice for arithmetic puzzles, but I think it works.” Indeed it does. Note that the use of @eval traps the division-by-zero exceptions, but unfortunately falls foul of floating-point roundoff errors.


Thanks to everyone who discussed this with me. In addition to the people above, thanks to Stephen Tu, Smylers, Michael Malis, Kyle Littler, Jesse Chen, Darius Bacon, Michael Robert Arntzenius, and anyone else I forgot. (If I forgot you and you want me to add you to this list, please drop me a note.)

Coming up

I have enough material for at least three or four more articles about this that I hope to publish here in the coming weeks.

But the previous article on this subject ended similarly, saying

I hope to write a longer article about solvers in the next week or so.

and that was in July 2016, so don't hold your breath.

[ Addendum 20170820: the next article is ready. I hope you weren't holding your breath!  ]

[ Addendum 20170828: yet more about this ]

[Other articles in category /math] permanent link