The Universe of Discourse


Sun, 14 Oct 2007

Van der Waerden's problem: programs 3 and 4
In this series of articles I'm analyzing five versions of a program that I wrote around 1988, and then another program that does the same thing that I wrote last month without referring to the 1988 code. (I said before that it was four versions, but apparently I'm not so good at counting to five.)

If you don't remember what the program does, here's an explanation.

Here is program 1, which was an earlier attempt to do the same thing. Here's program 2.

Program 3

Complete source code for this version.

I said of the previous program:

The problem is all in the implementation. You see, this program actually constructs the entire tree in memory.

Somewhere along the line it dawned on me that constructing the tree was unnecessary, so I took that machinery out, and the result was version 3.

Consequently, this program is easy to explain once you have seen the previous version: almost all I have to do is list the stuff that I took out.

Since this program does not construct a tree of node structures, it omits the definition of the node structure and the macro for manufacturing nodes. Since it gets rid of the node allocation, it also gets rid of the memory leak of the previous version, and so omits the customized memory allocation functions Malloc and Free that performed memory tracking.

The previous program had a compiled-in limit on the number of colors it would handle, because at the time I didn't know how to do a dynamic array. In this program, I got rid of the node structures, so there was no array of node structures, so no need for a limit on the number of node structures in the array. And all the code that enforced the limit is gone.

The apchk function, which checks to see if a string is good, remains unchanged from the previous version.

The makenodes function, which was the principal function in the previous program, remains, but has lost a lot of code. It is simpler to call, too; the node argument is gone:

        makenodes(maxlen,"");
I got rid of the silly !howfar test in favor of a more easily-understood howfar == 0 test. There are lots of times when ! is appropriate, but testing whether a non-negative integer has reached zero is not one of them. I was going to comment earlier about what a novice error this is, and I'm glad to see that I fixed it.

The main use of apchk in the previous program had if (!apchk(...)) { ... }. That was okay, because apchk returns a Boolean result. But the negation is annoying. It suggests that apchk's return value is backward. (Instead of returning true for a bad string, it should return true for a good string.) This is not very much a big deal, and I only brought it up so that I could diffidently confess that these days I would probably have done:

        #define unless(c)       if(!(c))
        ...
        unless (is_bad(...)) {
        }
There are a lot of stories of doofus Pascal programmers who do:

        #define begin {
        #define end }
and Fortran programmers who do:

        #define GT >
        #define GE >=
        #define LT <
        #define LE <=
and I find, to my shame, that I have become one of them. Anyone seeing #define unless(c) if(!(c)) would snort and say "Oh, this was obviously written by a Perl programmer."

But at least I was a C programmer first.

Actually I was a Fortran programmer first. But I was never a big enough doofus to #define GE >=.

The big flaw in the current program is the string argument to makenodes. Each call to makenodes copies this string so that it can append a character to the end. I discussed this at some length in the previous article, so I don't want to make too much of it now; I'll just say that a better technique would have reused the string buffer from call to call. This obviously saves a little memory, and since most of the contents of the string doesn't change, it also saves a lot of time.

This might be worth seeing, since it seems to me now to be a marvel of wasted code:

    ls = strlen(s);
    newarg = STRING(ls + 1);
    if (!newarg) 
      {
      fprintf(stderr,"Couldn't get %d bytes for newarg in makenodes\n",ls+2);
      fprintf(stderr,"Total get was %d.\n",gotten);
      fprintf(stderr,"P\n L\n  O\n   P\n    !\n");
      abort();
      }
    strcpy(newarg,s);
    newarg[ls+1] = '\0';
    newarg[ls] = 'A' + i;
    makenodes(howfar-1,newarg);
    free(newarg);
The repeated strlen, for example, when ls could be calculated as maxlen - howfar. The excessively verbose failure message, which should be inside the STRING macro anyway. (The code that maintains gotten has gone away with the debugging allocation routines, so the second fprintf is superfluous.) And why did I think abort was the right thing to call on an out-of-memory condition?

Oh well, you live and learn.

Program 4

Complete source code for this version.

The fourth version of the program is even more trimmed-down. In this version of the program I did get the idea to reuse the string buffer instead of copying the string on every recursive call. But I also got an even better idea, and eliminated the recursive call. The makenodes function is now down to one argument, which tells it how deep a tree to search.

        void
        makenodes(maxdepth)
        int maxdepth;
        {
        int apchk(), depth = 0;
        char curlet, *curstring = STRING(maxdepth);

        curstring[0] = '\0';
        curlet = 'A';

        while (depth >= 0)
          {
          while (curlet <= 'A' - 1 + colors)
            {
        #ifdef DIAG
            printf("%s makenoding with string %s%c, depth %d.\n",
                TABS+12-depth,curstring,curlet,depth);
        #endif
            if (apchk(curstring,curlet))
              curlet++;
            else
              if (depth < maxdepth)
                {
                curstring[depth] = curlet;
                curstring[depth+1] = '\0';
                depth += 1;
                curlet = 'A';
                }
              else
                {
                printf("%s%c\n",curstring,curlet);
                curlet++;
                }
            }
          depth -= 1;
          curlet = curstring[depth] + 1;
          curstring[depth] = '\0';
          }
        }
This is a better job all around, and not very different from what I wrote last month to do the same thing. I was going to title this series of articles "I have become a better programmer!", and now that I see this version, I'm glad I didn't, because there's no evidence here that I am much better. This version of the program gets a solid A from my older self.

The value depth scans forward in the string when the search is going well, and is decremented again when the search needs to backtrack. If depth == maxdepth, a witness of the desired length has been found, and is printed out.

The curlet ("current letter") variable tracks which branch of the current tree node we are "recursing" down. After the function recurses down, by incrementing depth, curlet is set to 'A' to visit the first sub-node of the new current node. The curstring buffer tracks the path through the tree to the current node. When the function needs to backtrack, it restores the state of curlet from the last character in the buffer and then trims that character off the end of the path.

I'd only want to make two changes to this code. One would be to make depth a pointer into the curstring buffer instead of an index into it. Then again, the compiler may well have optimized it into one anyway. But it would also allow me to eliminate curlet in favor of just using *depth everywhere.

The other change would address a more serious defect: the contents of curstring are kept properly zero-terminated at all times, whenever depth is advanced or retracted. This zero-termination is unnecessary, since curstring is never used as a string except when depth == maxdepth. When printfing curstring, I could have used something like:

        printf("%.*s%c\n",curstring,maxlen,curlet);
which prints exactly maxlen characters from the buffer, regardless of whether it is zero-terminated.

It would, however, have required that I know about %.*s, which I'm sure I did not. Was %.*s even available in 1988? I forget, and my copy of K&R First Edition is in a box somewhere since my recent move. Anyway, if %.*s was unavailable for whatever reason, the code could have had a single curstring[maxdepth] = 0 up front, which would have been quite sufficient for the one printf it needed to do.

Coming next: one very different program to solve the same problem, and a comparison with last month's effort.


[Other articles in category /prog] permanent link