The Universe of Disco


Sun, 12 Nov 2017

No, it is not a compiler error. It is never a compiler error.

When I used to hang out in the comp.lang.c Usenet group, back when there was a comp.lang.c Usenet group, people would show up fairly often with some program they had written that didn't work, and ask if their compiler had a bug. The compiler did not have a bug. The compiler never had a bug. The bug was always in the programmer's code and usually in their understanding of the language.

When I worked at the University of Pennsylvania, a grad student posted to one of the internal bulletin boards looking for help with a program that didn't work. Another graduate student, a super-annoying know-it-all, said confidently that it was certainly a compiler bug. It was not a compiler bug. It was caused by a misunderstanding of the way arguments to unprototyped functions were automatically promoted.

This is actually a subtle point, obscure and easily misunderstood. Most examples I have seen of people blaming the compiler are much sillier. I used to be on the mailing list for discussing the development of Perl 5, and people would show up from time to time to ask if Perl's if statement was broken. This is a little mind-boggling, that someone could think this. Perl was first released in 1987. (How time flies!) The if statement is not exactly an obscure or little-used feature. If there had been a bug in if it would have been discovered and fixed by 1988. Again, the bug was always in the programmer's code and usually in their understanding of the language.

Here's something I wrote in October 2000, which I think makes the case very clearly, this time concerning a claimed bug in the stat() function, another feature that first appeared in Perl 1.000:

On the one hand, there's a chance that the compiler has a broken stat and is subtracting 6 or something. Maybe that sounds likely to you but it sounds really weird to me. I cannot imagine how such a thing could possibly occur. Why 6? It all seems very unlikely.

Well, in the absence of an alternative hypothesis, we have to take what we can get. But in this case, there is an alternative hypothesis! The alternative hypothesis is that [this person's] program has a bug.

Now, which seems more likely to you?

  • Weird, inexplicable compiler bug that nobody has ever seen before

or

  • Programmer fucked up

Hmmm. Let me think.

I'll take Door #2, Monty.

Presumably I had to learn this myself at some point. A programmer can waste a lot of time looking for the bug in the compiler instead of looking for the bug in their program. I have a file of (obnoxious) Good Advice for Programmers that I wrote about twenty years ago, and one of these items is:

Looking for a compiler bug is the strategy of LAST resort. LAST resort.

Anyway, I will get to the point. As I mentioned a few months ago, I built a simple phone app that Toph and I can use to find solutions to “twenty-four puzzles”. In these puzzles, you are given four single-digit numbers and you have to combine them arithmetically to total 24. Pennsylvania license plates have four digits, so as we drive around we play the game with the license plate numbers we see. Sometimes we can't solve a puzzle, and then we wonder: is it because there is no solution, or because we just couldn't find one? Then we ask the phone app.

The other day we saw the puzzle «5 4 5 1», which is very easy, but I asked the phone app, to find out if there were any other solutions that we missed. And it announced “No solutions.” Which is wrong. So my program had a bug, as my programs often do.

The app has a pre-populated dictionary containing all possible solutions to all the puzzles that have solutions, which I generated ahead of time and embedded into the app. My first guess was that bug had been in the process that generated this dictionary, and that it had somehow missed the solutions of «5 4 5 1». These would be indexed under the key 1455, which is the same puzzle, because each list of solutions is associated with the four input numbers in ascending order. Happily I still had the original file containing the dictionary data, but when I looked in it under 1455 I saw exactly the two solutions that I expected to see.

So then I looked into the app itself to see where the bug was. Code Studio's underlying language is Javascript, and Code Studio has a nice debugger. I ran the app under the debugger, and stopped in the relevant code, which was:

    var x = [getNumber("a"), getNumber("b"), getNumber("c"), getNumber("d")].sort().join("");

This constructs a hash key (x) that is used to index into the canned dictionary of solutions. The getNumber() calls were retrieving the four numbers from the app's menus, and I verified that the four numbers were «5 4 5 1» as they ought to be. But what I saw next astounded me: x was not being set to 1455 as it should have been. It was set to 4155, which was not in the dictionary. And it was set to 4155 because

the built-in sort() function

was sorting the numbers

into

the

wrong

order.

For a while I could not believe my eyes. But after another fifteen or thirty minutes of tinkering, I sent off a bug report… no, I did not. I still didn't believe it. I asked the front-end programmers at my company what my mistake had been. Nobody had any suggestions.

Then I sent off a bug report that began:

I think that Array.prototype.sort() returned a wrongly-sorted result when passed a list of four numbers. This seems impossible, but …

I was about 70% expecting to get a reply back explaining what I had misunderstood about the behavior of Javascript's sort().

But to my astonishment, the reply came back only an hour later:

Wow! You're absolutely right. We'll investigate this right away.

In case you're curious, the bug was as follows: The sort() function was using a bubble sort. (This is of course a bad choice, and I think the maintainers plan to replace it.) The bubble sort makes several passes through the input, swapping items that are out of order. It keeps a count of the number of swaps in each pass, and if the number of swaps is zero, the array is already ordered and the sort can stop early and skip the remaining passes. The test for this was:

    if (changes <= 1) break;

but it should have been:

    if (changes == 0) break;

Ouch.

The Code Studio folks handled this very creditably, and did indeed fix it the same day. (The support system ticket is available for your perusal, as is the Github pull request with the fix, in case you are interested.)

I still can't quite believe it. I feel as though I have accidentally spotted the Loch Ness Monster, or Bigfoot, or something like that, a strange and legendary monster that until now I thought most likely didn't exist.

A bug in the sort() function. O day and night, but this is wondrous strange!

[ Addendum 20171113: Thanks to Reddit user spotter for pointing me to a related 2008 blog post of Jeff Atwood's, “The First Rule of Programming: It's Always Your Fault”. ]

[ Addendum 20171113: Yes, yes, I know sort() is in the library, not in the compiler. I am using “compiler error” as a synecdoche for “system software error”. ]

[ Addendum 20171116: I remembered examples of two other fundamental system software errors I have discovered, including one honest-to-goodness compiler bug. ]

[ Addendum 20200929: Russell O'Connor on a horrifying GCC bug ]


[Other articles in category /prog] permanent link