Jackson and Gregg on optimization
Today Brendan Gregg's blog has an article Evaluating the Evaluation:
Benchmarking
Checklist
that begins:
A co-worker introduced me to Craig Hanson and Pat Crain's performance
mantras, which neatly summarize much of what we do in performance
analysis and tuning. They are:
Performance mantras
- Don't do it
- Do it, but don't do it again
- Do it less
- Do it later
- Do it when they're not looking
- Do it concurrently
- Do it cheaper
I found this striking because I took it to be an obvious reference
Michael A. Jackson's advice in his
brilliant 1975 book Principles of Program Design. Jackson said:
We follow two rules in the matter of optimization:
Rule 1: Don't do it.
Rule 2 (for experts only). Don't do it yet.
The intent of the two passages is completely different. Hanson
and Crain are offering advice about what to optimize. “Don't do it”
means that to make a program run faster, eliminate some of the things
it does. “Do it, but don't do it again” means that to make a program
run faster, have it avoid repeating work it has already done, say by
caching results. And so on.
Jackson's advice is of a very different nature. It is only indirectly
about improving the program's behavior. Instead it is addressing the
programmer's behavior: stop trying to optimize all the damn time!
It is not about what to optimize but whether, and Jackson says
that to a first approximation, the answer is no.
Here are Jackson's rules with more complete context. The quotation is
from the preface (page vii) and is discussing the style of the
examples in his book:
Above all, optimization is avoided. We follow two rules in the matter
of optimization:
Rule 1. Don't do it.
Rule 2 (for experts only). Don't do it yet — that is, not until you have a perfectly clear and unoptimized solution.
Most programmers do too much optimization, and virtually all
do it too early. This book tries to act as an antidote. Of course,
there are systems which must be highly optimized if they are to be
economically useful, and Chapter 12 discusses some relevant
techniques. But two points should always be remembered: first,
optimization makes a system less reliable and harder to maintain, and
therefore more expensive to build and operate; second, because
optimization obscures structure it is
difficult to improve the efficiency of a system which is already partly optimized.
Here's some code I dealt with this month:
my $emailclass = $args->{emailclass};
if (!$emailclass && $args->{emailclass_name} ) {
# do some caching so if we're called on the same object over and over we don't have to do another find.
my $last_emailclass = $self->{__LAST_EMAILCLASS__};
if ( $last_emailclass && $last_emailclass->{name} eq $args->{emailclass_name} ) {
$emailclass = $last_emailclass->{emailclass};
} else {
$emailclass = $self->schema->resultset('EmailClass')
->find_by_name($args->{emailclass_name});
$self->{__LAST_EMAILCLASS__} = {
name => $args->{emailclass_name},
emailclass => $emailclass,
};
}
}
Holy cow, this is wrong in so many ways. 8 lines of this mess, for
what? To cache a single database lookup (the ->find_by_name call),
in a single object, if it happens to be looking for the same name as
last time. If caching was actually wanted, it should have been
addressed in the ->find_by_name call, which could do the caching
more generally, and which has some hope of knowing something about
when the cache entries should be expired. Even stipulating that
caching was wanted and for some reason should have been put here, why
such an elaborate mechanism, all to cache just the last lookup? It
could have been:
$emailclass = $self->emailclass_for_name($args->{emailclass_name});
...
sub emailclass_for_name {
my ($self, $name) = @_;
$self->{emailclass}{$name} //=
$self->schema->resultset('EmailClass')->find_by_name($name);
return $self->{emailclass}{$name};
}
I was able to do a bit better than this, and replaced the code with:
$emailclass = $self->schema->resultset('EmailClass')
->find_by_name($args->{emailclass_name});
My first thought was that the original caching code had been written
by a very inexperienced programmer, someone who with more maturity
might learn to do their job with less wasted effort. I was wrong; it
had been written by a senior developer, someone who with more maturity
might learn to do their job with less wasted effort.
The tragedy did not end there. Two years after the original code was
written a more junior programmer duplicated the same unnecessary code
elsewhere in the same module, saying:
I figured they must have had a reason to do it that way…
Thus is the iniquity of the fathers visited on the children.
In a nearby piece of code, an object A, on the first call to a
certain method, constructed object B and cached it:
B->new(
base_path => ...
schema => $self->schema,
retry => ...,
);
Then on
subsequent calls, it reused B from the cache.
But the cache was shared among many instances of A, not all of which
had the same ->schema member. So some of those instances of A
would ask B a question and get the answer from the wrong database.
A co-worker spent hours and hours in the middle of the night last
month tracking this down. Again, the cache was not only broken but
completely unnecesary. What was being saved? A single object
construction, probably a few hundred bytes and a few hundred
microseconds at most. And again, the code was perpetrated by a senior
developer who should have known better. My co-worker replaced 13
lines of broken code with four that worked.
Brendan Gregg is unusually clever, and an exceptional case. Most
programmers are not Brendan Gregg, and should take Jackson's advice
and stop trying to be so clever all the time.
[Other articles in category /prog]
permanent link
|