The Universe of Discourse


Mon, 30 Jan 2006

Rotten code in a ProFTPD plugin module
One of my work colleagues asked me to look at a piece of C source code today. He was tracking down a bug in the FTP server. He thought he had traced it to this spot, and wanted to know if I concurred and if I agreed with his suggested change.

Here's the (exceptionally putrid) (relevant portion of the) code:

static int gss_netio_write_cb(pr_netio_stream_t *nstrm, char *buf,size_t buflen) {

    int     count=0;
    int     total_count=0;        
    char    *p;

    OM_uint32   maj_stat, min_stat;
    OM_uint32   max_buf_size;

    ...
    /* max_buf_size = maximal input buffer size */
    p=buf;
    while ( buflen > total_count ) { 
        /* */ 
        if ( buflen - total_count > max_buf_size ) {
            if ((count = gss_write(nstrm,p,max_buf_size)) != max_buf_size )
                return -1;
        } else {
            if ((count = gss_write(nstrm,p,buflen-total_count)) != buflen-total_count )
                return -1;
        }       
        total_count = buflen - total_count > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
    }

    return buflen;  
}
(You know there's something wrong when the comment says "maximal input buffer size", but the buffer is for performing output. I have not looked at any of the other code in this module, which is 2,800 lines long, so I do not know if this chunk is typical.) Mr. Colleague suggested that p=p+total_count was wrong, and should be replaced with p=p+max_buf_size. I agreed that it was wrong, and that his change would fix the problem, although I suggested that p += count would be a better change. Mr. Colleague's change, although it would no longer manifest the bug, was still "wrong" in the sense that it would leave p pointing to a garbage location (and incidentally invokes behavior not defined by the C language standard) whereas my change would leave p pointing to the end of the buffer, as one would expect.

Since this is a maintenance programming task, I recommended that we not touch anything not directly related to fixing the bug at hand. But I couldn't stop myself from pointing out that the code here is remarkably badly written. Did I say "exceptionally putrid" yet? Oh, I did.

Good. It stinks like a week-old fish.

The first thing to notice is that the expression buflen - total_count appears four times in only nine lines of code—five if you count the buflen > total_count comparison. This strongly suggests that the algorithm would be more clearly expressed in terms of whatever buflen - total_count really is. Since buflen is the total number of characters to be written, and total_count is the number of characters that have been written, buflen - total_count is just the number of characters remaining. Rather than computing the same expression four times, we should rewrite the loop in terms of the number of characters remaining.

    size_t left_to_write = buflen;
    while ( left_to_write > 0 ) { 
        /* */ 
        if ( left_to_write > max_buf_size ) {
            if ((count = gss_write(nstrm,p,max_buf_size)) != max_buf_size )
                return -1;
        } else {
            if ((count = gss_write(nstrm,p,left_to_write)) != left_to_write )
                return -1;
        }       
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
        left_to_write -= count;
    }
Now we should notice that the two calls to gss_write are almost exactly the same. Duplicated code like this can almost always be eliminated, and eliminating it almost always produces a favorable result. In this case, it's just a matter of introducing an auxiliary variable to record the amount that should be written:

    size_t left_to_write = buflen, write_size;
    while ( left_to_write > 0 ) { 
        write_size = left_to_write > max_buf_size ? max_buf_size : left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
        left_to_write -= count;
    }
At this point we can see that write_size is going to be max_buf_size for every write except possibly the last one, so we can simplify the logic the maintains it:

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p=p+total_count;
        left_to_write -= count;
    }
Even if we weren't here to fix a bug, we might notice something fishy: left_to_write is being decremented by count, but p, the buffer position, is being incremented by total_count instead. In fact, this is exactly the bug that was discovered by Mr. Colleague. Let's fix it:

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        total_count = left_to_write > max_buf_size ? total_count + max_buf_size : buflen;
        p += count;
        left_to_write -= count;
    }
We could fix up the line the maintains the total_count variable so that it would be correct, but since total_count isn't used anywhere else, let's just delete it.

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) != write_size )
                return -1;
        p += count;
        left_to_write -= count;
    }
Finally, if we change the != write_size test to < 0, the function will correctly handle partial writes, should gss_write be modified in the future to perform them:
    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p,write_size)) < 0 )
                return -1;
        p += count;
        left_to_write -= count;
    }
We could trim one more line of code and one more state change by eliminating the modification of p:

    size_t left_to_write = buflen, write_size = max_buf_size;
    while ( left_to_write > 0 ) { 
        if (left_to_write < max_buf_size) 
            write_size = left_to_write;
        if ((count = gss_write(nstrm,p+buflen-left_to_write,write_size)) < 0 )
                return -1;
        left_to_write -= count;
    }
I'm not sure I think that is an improvement. (My idea is that if we do this, it would be better to create a p_end variable up front, set to p+buflen, and then use p_end - left_to_write in place of p+buflen-left_to_write. But that adds back another variable, although it's a constant one, and the backward logic in the calculation might be more confusing than the thing we were replacing. Like I said, I'm not sure. What do you think?)

Anyway, I am sure that the final code is a big improvement on the original in every way. It has fewer bugs, both active and latent. It has the same number of variables. It has six lines of logic instead of eight, and they are simpler lines. I suspect that it will be a bit more efficient, since it's doing the same thing in the same way but without the redundant computations, although you never know what the compiler will be able to optimize away.

Right now I'm engaged in writing a book about this sort of cleanup and renovation for Perl programs. I've long suspected that the same sort of processes could be applied to C programs, but this is the first time I've actually done it.

The funny thing about this code is that it's performing a task that I thought every C programmer would already have known how to do: block-writing of a bufferfull of data. Examples of the right way to do this are all over the place. I first saw it done in Marc J. Rochkind's superb book Advanced Unix Programming around 1989. (I learned from the first edition, but the link to the right is for the much-expanded second edition that came out in 2004.) I'm sure it must pop up all over the Stevens books.

But the really exciting thing I've learned about code like this is that it doesn't matter if you don't already know how to do it right, because you can turn the wrong code into the right code, as we did here, by noticing a few common problems, like duplicate tests and repeated subexpressions, and applying a few simple refactorizations to get rid of them. That's what my book will be about.

(I am also very pleased that it has taken me 37 blog entries to work around to discussing any programming-related matters.)


[Other articles in category /prog] permanent link