The Universe of Discourse


Wed, 03 Oct 2007

Van der Waerden's problem: program 1
In this series of articles I'm going to analyze four versions of a program that I wrote around 1988, and then another program that does the same thing that I wrote last month without referring to the 1988 code.

If you don't remember what the program does, here's an explanation.

Program 1

I'm going to discuss the program a bit at a time. The complete program is here.

This program does an unpruned exhaustive search of the string space. Since for V(3, 3) the string space contains 327 = 7,625,597,484,987 strings, it takes a pretty long time to finish. I quickly realized that I was wasting my time with this program.

The program is invoked with a length argument and an optional colors argument, which defaults to 2. It then looks for good strings of the specified length, printing those it finds. If there are none, one then knows that V(3, colors) > length. Otherwise, one knows that V(3, colors) ≤ length, and has witness strings to prove it.

I don't want to spend a lot of time on it because there are plenty of C programming style guides you can read if you care for that. But already on lines 4–5 we have something I wouldn't write today:

        #define NO	0
        #define YES	!NO
Oh well.

The program wants to iterate through all Cn strings. How does it know when it's done? It's not easy to make a program as slow as this one even slower, but I found a way to do it.

        last = STRING(length);
        stuff(last,'A' - 1 + colors);

        for (i=0; i<colors; i++)
          last[i] = 'A' + i;

        for (; strcmp(seq,last); strinc(seq))
          ...
It manufactures the string ABCDDDDDDDDD....D and compares the current string to that one every time through the loop. A much simpler method is to detect completion while incrementing the target string. The function that does the increment looks like this:

        void
        strinc(s)
        char *s;
        {
        int i;

        for (i= length - 1; i>=0; i--)
          {
          if (s[i] != 'A' - 1 + colors)
            {
            s[i]++;
            return;
            }
          s[i] = 'A';
          }
        return;
        }
Had I been writing it today, it would have looked more like this:

        unsigned strinc(char *s) 
        {
          char *p = strchr(s, '\0') - 1;
          while (p >= s && *p == 'A' + colors - 1) *p-- = 'A';
          if (p < s) return 0;
          (*p)++;
          return 1;
        }
(This code is in a pink box to show that it is not actually part of the program I am discussing in this article.)

The function returns true on success and false on failure. A false return can be taken by the caller as the signal to terminate the program.

This replacement function invokes undefined behavior, because there is no guarantee that p is allowed to run off the beginning of the string in the way that it does. But there is no need to check the strings in lexicographic order. Instead of scanning the strings in the order AAA, AAB, ABA, ABB, BAA, etc., one can scan them in reverse lexicographic order: AAA, BAA, ABA, BBA, AAB, etc. Then instead of running off the beginning of the string, p runs off the end, which is allowed. This fixes the undefined behavior problem and also eliminates the call to strchr that finds the end of the string. This is likely to produce a significant speedup:

        unsigned strinc(char *s) 
        {
          while (*s == 'A' + colors - 1) *s++ = 'A';
          if (!*s) return 0;
          (*s)++;
          return 1;
        }
Here we're depending on the optimizer to avoid recomputing the value of 'A' + colors - 1 every time through the loop.

The heart of the program is the apchk() function, which checks whether a string q contains an arithmetic progression of length 3:

        int
        apchk(q)
        char *q;
        {
        int f, s, t;

        for (f=0; f <= length - 3; f++)
          for (s=f+1; s <= length - 2; s++)
            {
            t = s+s-f;
            if (t >= length) break;
            if ((q[f] == q[s]) && (q[s] == q[t])) return YES;
            }
        return NO;
        }
I hesitate to say that this is the biggest waste of time in the whole program, since after all it is a program whose job is to examine 7,625,597,484,987 strings. But look. 2/3 of the calls to this function are asking it to check a string that differs from the previous string in the final character only. Nevertheless, it still checks all 49 possible arithmetic progressions, even the ones that didn't change.

The t ≥ length test is superfluous, or if it isn't, it should be.

Also notice that I wasn't sure of the precendence in the final test.

It didn't take me long to figure out that this program was not going to finish in time. I wrote a series of others, which I hope to post here in coming days. The next one sucks too, but in a completely different way.

[ Addendum 20071005: Here is part 2. ]

[ Addendum 20071014: Here is part 3. ]


[Other articles in category /prog] permanent link