The Universe of Discourse

Fri, 21 Aug 2015

This is page 6 of the Cosmic Call message. An explanation follows.

The 10 digits again:

 0 1 2 3 4 5 6 7 8 9

Page 6 discusses fundamental particles of matter, the structure of the hydrogen and helium atoms, and defines glyphs for the most important chemical elements.

Depicted at top left is the hydrogen atom, with a proton in the center and an electron circulating around the outside. This diagram is equated to the glyph for hydrogen.

The diagram for helium is similar but has two electrons, and its nucleus has two protons and also two neutrons.

 Proton Neutron Electron

The illustrations may puzzle the aliens, depending on how they think of atoms. (Feynman once said that this idea of atoms as little solar systems, with the electrons traveling around the nucleus like planets, was a hundred years old and out of date.) But the accompanying mass and charge data should help clear things up. The first formula says

!!M_p = 1836\cdot M_e!!

the mass of the proton is 1836 times the mass of the electron, and that 1836, independent of the units used and believed to be a universal and fundamental constant, ought to be a dead giveaway about what is being discussed here.

If you want to communicate fundamental constants, you have a bit of a problem. You can't tell the aliens that the speed of light is !!1.8\cdot10^{12}!! furlongs per fortnight without first explaining furlongs and fortnights (as is actually done on a later page). But the proton-electron mass ratio is dimensionless; it's 1836 in every system of units. (Although the value is actually known to be 1836.15267; I don't know why a more accurate value wasn't given.)

This is the first use of subscripts in the document. It also takes care of introducing the symbol for mass. The following formula does the same for charge : !!Q_p = -Q_e!!.

The next two formulas, accompanying the illustration of the helium atom, describe the mass (1.00138 protons) and charge (zero) of the neutron. I wonder why the authors went for the number 1.00138 here instead of writing the neutron-electron mass ratio of 1838 for consistency with the previous ratio. I also worry that this won't be enough for the aliens to be sure about the meaning of . The 1836 is as clear as anything can be, but the 0 and -1 of the corresponding charge ratios could in principle be a lot of other things. Will the context be enough to make clear what is being discussed? I suppose it has to; charge, unlike mass, comes in discrete units and there is nothing like the 1836.

The second half of the page reiterates the symbols for hydrogen and helium and defines symbols for eight other chemical elements. Some of these appear in organic compounds that will be discussed later; others are important constituents of the Earth. It also introduces symbol for “union” or “and”: . For example, sodium is described as having 11 protons and 12 neutrons.

 Hydrogen Helium Carbon Nitrogen Oxygen Aluminium Silicon Iron Sodium Chlorine

Most of these new glyphs are not especially mnemonic, except for hydrogen—and aluminium, which is spectacular.

The blog is going on hiatus until early September. When it returns, the next article will discuss page 7, shown at right. It has three errors. Can you find them? (Click to enlarge.)

Wed, 19 Aug 2015

This is page 5 of the Cosmic Call message. An explanation follows.

The 10 digits again:

 0 1 2 3 4 5 6 7 8 9

Page 5 discusses two basic notions of geometry. The top half concerns circles and introduces !!\pi!!. There is a large circle with its radius labeled :

The outer diameter is then which is !!2\cdot r!!.

The perimeter is twice times the radius , and the area is times the square of the radius . What is ? It's !!\pi!! of course, as the next line explains, giving !!\pi = 3.1415926545697932…365698614212904!!, which gives enough digits on the front to make clear what is being communicated. The trailing digits are around the 51 billionth places and communicate part of the state of our knowledge of !!\pi!!. I almost wish the authors had included a sequence of fifteen random digits at this point, just to keep the aliens wondering.

The bottom half of the page is about the pythagorean theorem. Here there's a rather strange feature. Instead of using the three variables from the previous page, , the authors changed the second one and used instead. This new glyph does not appear anywhere else. A mistake, or did they do it on purpose?

In any case, the pythagorean formula is repeated twice, once with exponents and once without, as both !!z^2=x^2+b^2!! and !!z\cdot z = x\cdot x + b\cdot b!!. I think they threw this in just in case the exponentiation on the previous pages wasn't sufficiently clear. I don't know why the authors chose to use an isosceles right triangle; why not a 3–4–5 or some other scalene triangle, for maximum generality? (What if the aliens think we think the pythagorean theorem applies only for isosceles triangles?) But perhaps they were worried about accurately representing any funny angles on their pixel grid. I wanted to see if it would fit, and it does. You have to make the diagram smaller, but I think it's still clear:

(I made it smaller than it needed to be and then didn't want to redo it.)

I hope this section will be sufficiently unmistakable that the aliens will see past the oddities.

The next article will discuss page 6, shown at right. (Click to enlarge.) Try to figure it out before then.

Mon, 17 Aug 2015

This is page 4 of the Cosmic Call message. An explanation follows.

Reminder: page 1 explained the ten digits:

 0 1 2 3 4 5 6 7 8 9

And the equal sign . Page 2 explained the four basic arithmetic operations and some associated notions:

 addition subtraction multiplication division negation ellipsis (…) decimalpoint indeterminate

This page, headed with the glyph for “mathematics” , describes the solution of simple algebraic equations and defines glyphs for three variables, which we may as well call !!x,y,!! and !!z!!:

 x y z

Each equation is introduced by the locution which means “solve for !!x!!”. This somewhat peculiar “solve” glyph will not appear again until page 23.

For example the second equation is !!x+4=10!!:

Solve for !!x!!: !!x+4=10!!

The solution, 6, is given over on the right:

!!x=6!!

After the fourth line, the equations to be solved change from simple numerical equations in one variable to more abstract algebraic relations between three variables. For example, if

Solve for !!x!!: !!x\cdot y=z!!

then

!!x=z\div y!!.

The next-to-last line uses a decimal fraction in the exponent, !!0.5!!: . On the previous page, the rational fraction !!1\div 2!! was used. Had the same style been followed, it would have looked like this: .

Finally, the last line defines !!x=y^3!! and then, instead of an algebraic solution, gives a graph of the resulting relation, with axes labeled. The scale on the axes is not the same; the !!x!!-coordinate increases from 0 to 20 pixels, but the !!y!!-coordinate increases from 0 to 8000 pixels because !!20^3 = 8000!!. If axes were to the same scale, the curve would go up by 8,000 pixels. Notice that the curve does not peek above the !!x!!-axis until around !!x=8, y=512!! or so. The authors could have stated that this was the graph of !!y=x^3\div 400!!, but chose not to.

I also wonder what the aliens will make of the arrows on the axes. I think the authors want to show that our coordinates increase going up and to the left, but this seems like a strange and opaque way to do that. A better choice would have been to use a function with an asymmetric graph, such as !!y=2^x!!.

(After I wrote that I learned that similar concerns were voiced about the use of a directional arrow in the Pioneer plaque.

(Wikipedia says: “An article in Scientific American criticized the use of an arrow because arrows are an artifact of hunter-gatherer societies like those on Earth; finders with a different cultural heritage may find the arrow symbol meaningless.”)

The next article will discuss page 5, shown at right. (Click to enlarge.) Try to figure it out before then.

Sun, 16 Aug 2015

My overall SE posting volume was down this month, and not only did I post relatively few interesting items, I've already written a whole article about the most interesting one. So this will be a short report.

• I already wrote up Building a box from smaller boxes on the blog here. But maybe I have a couple of extra remarks. First, the other guy's proposed solution is awful. It's long and complicated, which is forgivable if it had answered the question, but it doesn't. And the key point is “blah blah blah therefore code a solver which visits all configurations of the search space”. Well heck, if this post had just been one sentence that ended with “code a solver which visits all configurations of the search space” I would not have any complaints about that.

As an undergraduate I once gave a talk on this topic. One of my examples was the problem of packing 31 dominoes into a chessboard from which two squares have been deleted. There is a simple combinatorial argument why this is impossible if the two deleted squares are the same color, say if they are opposite corners: each domino must cover one square of each color. But if you don't take time to think about the combinatorial argument you could waste a lot of time on computer search learning that there is no solution in that case, and completely miss the deeper understanding that it brings you. So this has been on my mind for a long time.

• I wrote a few posts this month where I thought I gave good hints. In How to scale an unit vector !!u!! in such way that !!a u\cdot u=1!! where !!a!! is a scalar I think I did a good job identifying the original author's confusion; he was conflating his original unit vector !!u!! and the scaled, leading him to write !!au\cdot u=1!!. This is sure to lead to confusion. So I led him to the point of writing !!a(bv)\cdot(bv)=1!! and let him take it from there. The other proposed solution is much more rote and mechanical. (“Divide this by that…”)

In Find numbers !!\overline{abcd}!! so that !!\overline{abcd}+\overline{bcd}+\overline{cd}+d+1=\overline{dcba}!! the OP got stuck partway through and I specifically addressed the stuckness; other people solved the problem from the beginning. I think that's the way to go, if the original proposal was never going to work, especially if you stop and say why it was never going to work, but this time OP's original suggestion was perfectly good and she just didn't know how to get to the next step. By the way, the notation !!\overline{abcd}!! here means the number !!1000a+100b+10c+d!!.

In Help finding the limit of this series !!\frac{1}{4} + \frac{1}{8} + \frac{1}{16} + \frac{1}{32} + \cdots!! it would have been really easy to say “use the formula” or to analyze the series de novo, but I think I almost hit the nail on the head here: it's just like !!1+\frac12 + \frac{1}{4} + \frac{1}{8} + \frac{1}{16} + \frac{1}{32} + \cdots!!, which I bet OP already knows, except a little different. But I pointed out the wrong difference: I observed that the first sequence is one-fourth the second one (which it is) but it would have been simpler to observe that it's just the second one without the !!1+\frac12!!. I had to review it just now to give the simpler explanation, but I sure wish I'd thought of it at the time. Nobody else pointed it out either. Best of all, would have been to mention both methods. If you can notice both of them you can solve the problem without the advance knowledge of the value of !!1+\frac12+\frac14+\ldots!!, because you have !!4S = 1+\frac12 + S!! and then solve for !!S!!.

In Visualization of Rhombus made of Radii and Chords it seemed that OP just needed to see a diagram (“I really really don't see how two circles can form a rhombus?”), so I drew one.

Fri, 14 Aug 2015

Earlier articles: Introduction Common features Page 1 (numerals) Page 2 (arithmetic)

This is page 3 of the Cosmic Call message. An explanation follows.

Reminder: page 1 explained the ten digits:

 0 1 2 3 4 5 6 7 8 9

And the equal sign . Page 2 explained the four basic arithmetic operations and some associated notions:

 addition subtraction multiplication division negation ellipsis (…) decimalpoint indeterminate

This page, headed with the glyph for “mathematics” , explains notations for exponentiation and scientific notation. (This notation was first used on page 1 in the mersenne prime .)

Exponentiation could be represented by an operator, but instead the authors have chosen to represent it by a superscripted position on the page, as is done in conventional mathematical notation. This saves space.

The top section of the page has small examples of exponentiation, including for example !!5^3=125!!:

!!5^3=125!!

There is a section that follows with powers of 10: !!10^1=10, 10^2=100, 10^3=1000, !! and more interestingly !!10^{{}^-2} = 0.01!!:

!!10^{{}^-2} = 0.01!!

This is a lead-in to the next section, which expresses various quantities in scientific notation, which will recur frequently later on. For example, !!0.045!! can be written as !!45\times 10^{{}^-2}!!:

!!45\times10^{{}^-2} = 0.45!!

Finally, there is an offhand remark about the approximate value of the square root of 2:

!!2^{1\div 2} = 1.41421356…!!

The next article will discuss page 4, shown at right. (Click to enlarge.) Try to figure it out before then.

Thu, 13 Aug 2015

On Tuesday I discussed an interesting solution to the problem of turning this:

  no X              X on

A --------------- C


into this:

  no X     X off    X on

A ------ B ------ C


Dave Du Cros has suggested an alternative solution: Make the changes required to turn off feature X, and commit them as B, as in my solution:

  no X     X on     X off

A ------ C ------ B


Then use git-revert to revert the changes, making a new C commit in the right place:

  no X     X on     X off     X on

A ------ C ------ B ------ C'


C' and C have identical trees.

Then use git-rebase to squash together C and B:

  no X              X off     X on

A --------------- B ------ C'


This has the benefit of not requiring anything strange. I think my solution is more general, but it's also weird, and it's not clear that the increased generality is useful.

However, what if there were a git-reorder-commits command? Then my solution would seem much less weird. It would look like this: create B, as before, and do:

    git reorder-commits 0 1


This last command would mean that the previous two commits, normally HEAD~1 and HEAD~0, should switch places. This might be a useful standard tool. Or similarly to turn

    B -- 3 -- 2 -- 1 -- 0


into

    B -- 2 -- 0 -- 3 -- 1


one would use

    git reorder-commits 2 0 3 1


I think git-reorder-commits would be easy to implement, as a loop atop git-commit-tree, as in the previous article.

Wed, 12 Aug 2015

Earlier articles: Introduction Common features Page 1 (numerals)

This is page 2 of the Cosmic Call message. An explanation follows.

Reminder: the previous page explained the ten digits:

 0 1 2 3 4 5 6 7 8 9

The page is in five sections, three on top and two below.
The first four sections explain addition , subtraction , multiplication , and division . Each is explained with a series of five typical arithmetic equalities. For example, !!4\times 3= 12!!:

The subtraction sign actually appeared back on page 1 in the Mersenne prime !!2^{3021377}-1!! .

The negative sign is introduced in connection with subtraction, since !!1-2={}^-1!!:

Note that the negative-number sign is not the same as the subtraction sign.

The decimal point is introduced in connection with division. For example, !!3\div 2 = 1.5!!:

There is also an attempt to divide by zero:

It's not clear what the authors mean by this; the mysterious glyph does not appear anywhere else in the document. What did they think it meant? Infinity? Indeterminate? Well, I found out later they published a cheat sheet, which assigns the meaning “undetermined” to this glyph. Not a great choice, in my opinion, because !!1÷0!! is not numerically equal to anything.

For some reason, perhaps because of space limitations, the authors have stuck the equation !!0-1 = {}^-1!! at the bottom of the division section.

The fifth section, at lower right, displays some nonterminating decimal fractions and introduces the ellipsis or ‘…’ symbol. For example, !!1\div 9 = 0.1111\ldots!!:

I would have put !!2÷27 = 0.0740…!! here instead of !!2\div 3!!, which I think is too similar to the other examples.

The next article, to appear 2015-08-14, will discuss page 3, shown at right. (Click to enlarge.) Try to figure it out before then.

Earlier articles: Introduction Common features

This is page 1 of the Cosmic Call message. An explanation follows.

This page, headed with the glyph for “mathematics” , explains the numeral symbols that will be used throughout the rest of the document. I should warn you that these first few pages are a little dull, establishing basic mathematical notions. The good stuff comes a little later.

The page is in three sections. The first section explains the individual digit symbols. A typical portion looks like this:

•••• ••• = 0111 = 7

Here the number 7 is written in three ways: first, as seven dots, probably unmistakable. Second, as a 4-bit binary number, using the same bit symbols that are used in the page numbers. The three forms are separated by the glyph , which means “equals”. The ten digits, in order from 0 to 9, are represented by the glyphs

 0 1 2 3 4 5 6 7 8 9

The authors did a great job selecting glyphs that resemble the numerals they represent. All have some resemblance except for 4, which has 4 horizontal strokes. Watch out for 4; it's easy to confuse with 3.

The second section serves two purposes. It confirms the meaning of the ten digits, and it also informs the aliens that the rest of the message will write numerals in base ten. For example, the number 14:

••••• ••••• •••• = 14

Again, there are 14 dots, an equal sign, and the numeral 14, this time written with the two glyphs (1) and (4). The base-2 version is omitted this time, to save space. The aliens know from this that we are using base 10; had it been, say, base 8, the glyphs would have been .

People often ask why the numbers are written in base 10, rather than say in base 2. One good answer is: why not? We write numbers in base 10; is there a reason to hide that from the aliens? The whole point of the message is to tell the aliens a little bit about ourselves, so why disguise the fact that we use base-10 numerals? Another reason is that base-10 numbers are easier to proofread for the humans sending the message.

The third section of the page is a list of prime numbers from 2 to 89:

67, 71, 73, 79, 83

and finally the number !!2^{3021377}-1!!

,
!!2^{3021377}-1!!

which was the largest prime number known to humans at the time. (The minus sign and exponentiation notation are explained on later pages.) Why? Again. to tell the aliens about ourselves: here's a glimpse of the limits of our mathematical knowledge.

I often wonder what the aliens will think of the !!2^{3021377}-1!!. Will they laugh at how cute we are, boasting about the sweet little prime number we found? Or will they be astounded and wonder why we think we know that such a big number is prime?

The next article, to appear 2015-08-12, will discuss page 2, shown at right. (Click to enlarge.) Try to figure it out before then.

Tue, 11 Aug 2015

I know, you want to say “Why didn't you just use git-rebase?” Because git-rebase wouldn't work here, that's why. Let me back up.

Say I have commit A, in which feature X does not exist yet. Then in commit C, I implement feature X.

But I realize what I really wanted was to have A, then B, in which feature X was implemented but disabled, and then C in which feature X was enabled. The C I want is just like the C that I have, but I don't have the intervening B.

I have:

  no X              X on

A --------------- C


I want:

  no X     X off    X on

A ------ B ------ C


One way to do this is to use git-rebase in edit mode to split C into B and C. To do this I would pause while rebasing C, edit C to disable feature X, commit the result, which is B, then undo the previous edits to re-enable X, and continue the rebase, creating C. That's two sets of edits. I could backup the files before the first edit and then copy them back for the second edit, but that's the SVN way, so I'm not going to do that.

Now someone wants me to use git-rebase to “reorder the commits”. Their idea is: I have C. Edit C to disable feature X and commit the result as B':

  no X     X on     X off

A ------ C ------ B'


Now use interactive git-rebase to reorder B and C. But this will not work. git-rebase will construct a patch for turning C into B' and will try to apply it to A. This will fail completely, because a patch for turning C into B' is a patch for turning off feature X once it is implemented. Feature X is not in A and you can't turn something off that isn't there. So the rebase will fail to apply the patch.

What I did instead was rather bizarre, using a plumbing command, but worked well. I wrote the code to disable X, and committed it as B, obtaining this:

  no X     X on     X off

A ------ C ------ B


Now B and C have the files I want in them, but their parents are wrong. That is, the history is in the wrong order, but if the parent of C was B and the parent of B was A, eveything would be perfect.

But we can't just change the parents; we have to create a new commit, say B', which has the same files as B but whose parent is A instead of C, and we have to create a new commit C' which has the same files as C but whose parent is B' instead of A.

This is what git-commit-tree does. You give it a tree object containing the files you want, a list of parents, and a commit message, and it creates the commit you asked for and prints its SHA1.

When we use git-commit, it first turns the index into a tree, with git-write-tree, then creates the commit, with git-commit-tree, and then moves the current head ref up to the new commit. Here we will use git-commit-tree directly.

So I did:

       % git checkout -b XX A
Switched to a new branch 'XX'
% git commit-tree -p HEAD B^{tree}
% git commit-tree -p HEAD C^{tree}
ce46beb90d4aa4e2c9fe0e2e3d22eea256edceac
% git reset --hard ce46beb90d4aa4e2c9fe0e2e3d22eea256edceac


The first git-commit-tree

   % git commit-tree -p HEAD B^{tree}


says to make a commit whose tree is the same as B's, and whose parent is the current HEAD, which is A. (B^{tree} is a special notation that means to get the tree from commit B.) Git pauses here to read the commit message from standard input (not shown), and prints the SHA of the new commit on the terminal. I then use git-reset to move the current head ref, XX, up to the new commit. Normally git-commit would do this for us, but we're not using git-commit today.

Then I do the same thing with C:

   % git commit-tree -p HEAD C^{tree}


makes a new commit whose tree is the same as C's, and whose parent is the current head, which looks just like B. Again it reads a commit message from standard input, and prints the SHA of the new commit on the terminal, and again I use git-reset to move XX up to the new commit.

Now I have what I want and I only had to edit the files once. To complete the task I just reset the head of my working branch to wherever XX is now, discarding the old A-C-B branch in favor of the new A-B-C branch. If there's an easier way to do this, I don't know it.

It seems to me that there have been a number of times in the past when I wanted to do something like reordering commits, and git-rebase did not do what I wanted because it reorders patches and not commits. I should keep my eyes open, and see if this comes up again, and if it is worth automating.

[ Thanks to Jeremy Leader for suggesting I write this up and to Jeremy Leader and Rik Signes for advance editing. ]

[ Adendum 20150813: a followup article ]

Sun, 09 Aug 2015

Earlier articles: Introduction

(At left is page 1 of the Cosmic Call message. For an enlarged version of the image, click it.)

First, some notes about the general format of each page. The Cosmic Call message was structured as 23 pages, each a 127×127 bitmap. The entire message was therefore 127×127×23 bits, and this would hopefully be suggestive to the aliens: they could try decoding it as 127 pages of 127×23-bit images, which would produce garbage, or as 23 pages of 127×127-bit images, which is correct. Or they might try decoding it as a single 127×2921-bit image, which would also work. But the choices are quite limited and it shouldn't take long to figure out which one makes sense.

To assist in the framing, each page of the message is surrounded by a border of black pixels and then a second smaller border of white pixels. If the recipient misinterpreted the framing of the bit sequence, say by supposing that the message was made of lines of 23 pixels, it would be immediately apparent that something was wrong, as at right. At the very least the regular appearance of the black border pixels every 127 positions, and the fact that the message began with 128 black pixels, would suggest that there was something significant about that number. If the aliens fourier-transform the message, there should be a nice big spike at the 127 mark.

Most of the message is encoded as a series of 5×7-pixel glyphs. The glyphs were generated at random and then subject to later filtering: candidate glyphs were discarded if they don't differ from previous glyphs in enough bit positions. This is to help the recipients reconstruct the glyphs if some of the bits are corrupted in transmission, as is likely.

The experimenters then eyeballed the glyphs and tried to match glyphs with their meanings in a way that would be easy for humans to remember, to aid in proofreading. For example, the glyph they chose to represent the digit 7 was .

People frequently ask why the message uses strange glyphs instead of standard hindu-arabic numerals. This is explained by the need to have the glyphs be as different as possible. Communication with other stars is very lossy. Imagine trying to see a tiny flickering light against the background of a star at a distance of several light years. In between you and the flickering light are variable dust and gas clouds. Many of the pixels are likely to be corrupted in transmission. The message needs to survive this corruption. So glyphs are 35 bits each. Each one differs from the other glyphs in many positions, whereas a change of only a few pixels could change a standard 6 into an 8 or vice versa. A glyph is spread across multiple lines of the image, which makes it more resistant to burst errors: even if an entire line of pixels is destroyed in transit, no entire glyph will be lost.

At the top left and top right of each page are page numbers. For example, page number 1: The page numbers are written in binary, most significant bit first, with representing a 1 bit and representing a 0 bit. These bit shapes were chosen to be resistant to data corruption; you can change any 4 of the 9 pixels in either shape and the recipient can still recover the entire bit unambiguously. There is an ambiguity about whether the numerals are written right to left or left to right—is the number 1 or the number 16?—but the aliens should be able to figure it out by comparing page numbers on consecutive pages; this in turn will help them when time comes for them to figure out the digit symbols.

Every page has a topic header, in this case , which roughly means “mathematics”. The topics of the following pages are something like:

• 1–5 Mathematics
• 6–11,21 Physics
• 12–14,19–20 The Earth
• 15–18 Human anatomy and biochemistry
• 22 Cosmology
• 23 Questions

In the next article I'll explain the contents of page 1. Each following article will appear two or three days later and will explain another page.

Zip file of all 23 pages

Fri, 07 Aug 2015

A message to the aliens (introduction)
In 1999, two Canadian astrophysicists, Stéphane Dumas and Yvan Dutil, composed and sent a message into space. The message was composed of twenty-three pages of bitmapped data, and was sent from the RT-70 radio telescope in Yevpatoria, Ukraine, as part of a set of messages called Cosmic Call.

The message images themselves are extremely compelling. I saw the first one in the book Beyond Contact by Brian McConnell:

I didn't think much of the rest of the book, but the image was arresting. After staring at it for a while, and convincing myself I understood the basic idea, I found the full set of images on Mike Matessa's web site, printed them out, and spent a happy couple of hours at the kitchen table deciphering them.

Sometimes when I gave conference talks, I would put this image on the screen during break, to give people something to think about before the class started up again. I like to say that it's fun to see if you're as smart as an alien, or at least if as smart as the Canadian astrophysicists thought the aliens would be.

I invite you to try to understand what is going on in the first image, above. In a day or two I will post a full explanation, along with the second image. Over the next few weeks I hope to write a series of blog articles about the 23 pages, explaining the details of each.

If you can't wait that long for the full set, you can browse them here, or download a zip file.

Addendum 20151223: The series is now complete. The full set of articles is:

Tue, 04 Aug 2015

A few months ago I wrote an article about using Haskell's list monad to do exhaustive search, with the running example of solving this cryptarithm puzzle:

    S E N D
+   M O R E
-----------
M O N E Y


(This means that we want to map the letters S, E, N, D, M, O, R, Y to distinct digits 0 through 9 to produce a five-digit and two four-digit numerals which, when added in the indicated way, produce the indicated sum.)

At the end, I said:

It would be an interesting and pleasant exercise to try to implement the same underlying machinery in another language. I tried this in Perl once, and I found that although it worked perfectly well, between the lack of the do-notation's syntactic sugar and Perl's clumsy notation for lambda functions (sub { my ($s) = @_; … } instead of \s -> …) the result was completely unreadable and therefore unusable. However, I suspect it would be even worse in Python because of semantic limitations of that language. I would be interested to hear about this if anyone tries it. I was specifically worried about Python's peculiar local variable binding. But I did receive the following quite clear solution from Peter De Wachter, who has kindly allowed me to reprint it: digits = set(range(10)) def to_number(*digits): n = 0 for d in digits: n = n * 10 + d return n def let(x, f): return f(x) def unit(x): return [x] def bind(xs, f): ys = [] for x in xs: ys += f(x) return ys def guard(b, f): return f() if b else []  after which the complete solution looks like: def solutions(): return bind(digits - {0}, lambda s: bind(digits - {s}, lambda e: bind(digits - {s,e}, lambda n: bind(digits - {s,e,n}, lambda d: let(to_number(s,e,n,d), lambda send: bind(digits - {0,s,e,n,d}, lambda m: bind(digits - {s,e,n,d,m}, lambda o: bind(digits - {s,e,n,d,m,o}, lambda r: let(to_number(m,o,r,e), lambda more: bind(digits - {s,e,n,d,m,o,r}, lambda y: let(to_number(m,o,n,e,y), lambda money: guard(send + more == money, lambda: unit((send, more, money)))))))))))))) print(solutions())  I think this shows that my fears were unfounded. This code produces the correct answer in about 1.8 seconds on my laptop. Thus inspired, I tried doing it again in Perl, and it was not as bad as I remembered: sub bd { my ($ls, $f) = @_; [ map @{$f->($_)}, @$ls ]      # Yow
}
sub guard { $_[0] ? [undef] : [] }  I opted to omit unit/return since an idiomatic solution doesn't really need it. We can't name the bind function bind because that is reserved for a built-in function; I named it bd instead. We could use Perl's operator overloading to represent binding with the >> operator, but that would require turning all the lists into objects, and it didn't seem worth doing. We don't need to_number, because Perl does it implicitly, but we do need a set subtraction function, because Perl has no built-in set operators: sub remove { my ($b, $a) = @_; my %h = map {$_ => 1 } @$a; delete$h{$_} for @$b;
return [ keys %h ];
}


After which the solution, although cluttered by Perl's verbose notation for lambda functions, is not too bad:

my $digits = [0..9]; my$solutions =
bd remove([0],        $digits) => sub { my ($s) = @_;
bd remove([$s],$digits) => sub { my ($e) = @_; bd remove([$s,$e],$digits) => sub { my ($n) = @_; bd remove([$s,$e,$n], $digits) => sub { my ($d) = @_;
my $send = "$s$e$n$d"; bd remove([0,$s,$e,$n,$d],$digits) => sub { my ($m) = @_; bd remove([$s,$e,$n,$d,$m],    $digits) => sub { my ($o) = @_;
bd remove([$s,$e,$n,$d,$m,$o], $digits) => sub { my ($r) = @_;
my $more = "$m$o$r$e"; bd remove([$s,$e,$n,$d,$m,$o,$r], $digits) => sub { my ($y) = @_;
my $money = "$m$o$n$e$y";
bd guard($send +$more == $money) => sub { [[$send, $more,$money]] }}}}}}}}};

for my $s (@$solutions) {
print "@\$s\n";
}


This runs in about 5.5 seconds on my laptop. I guess, but am not sure, that remove is mainly at fault for this poor performance.

An earlier version of this article claimed, incorrectly, that the Python version had lazy semantics. It does not; it is strict.

[ Addendum: Aaron Crane has done some benchmarking of the Perl version. A better implementation of remove (using an array instead of a hash) does speed up the calculation somewhat, but contrary to my guess, the largest part of the run time is bd itself, apparently becuse Perl function calls are relatively slow.

HN user masklinn tried a translation of the Python code into a version that returns a lazy iterator; I gather the changes were minor. ]