The Universe of Discourse


Thu, 15 Jun 2006

Just put it in the damn object!
The Perl Program Repair Shop book, despite outward appearances, is coming along. A few months ago in Pittsburgh I gave a talk about someone's module that I didn't think was very well-written, for various reasons. Up to that point, the book had been about a lot of small-scale stuff: repeated code, unnecessary variables, making two passes over a data structure when only one was necessary, C-style for loops, and such. But unlike the previous examples, this one had been written by someone who was actually competent. So the problems I found were competent-programmer examples, rather than incompetent-programmer examples. I need some more chapters of that kind of examples, or the book will not be of much interest to competent programmers.

What kind of mistakes do competent programmers make? They make a lot of errors of object-oriented design.

This module's purpose was to emulate some Java library that lets you register an "observer" of an object. Let's say that the "observer" object is a Guard, and the object it is observing is an "Alarm". The idea is that the Alarm object registers the Guard as being an observer of the Alarm, and then whenever the Alarm calls a notify_observers method, the notify method in the Guard object is called back. Actually the Java people didn't make the names that sensible; instead of notify_observers calling the notify method in the observers, it calls the update method. Why wasn't notify_observers called update_observers, then? I dunno. It's Java. You want me to explain Java?

Okay, so for each Alarm A you need somewhere to store the list of observers of A. Where do you put that?

The author of the module put it in a global hash. I don't think it's immediately clear that this was a mistake. But I do think it was a mistake. Big problems arise; the module had a lot of bugs, mostly related to garbage collection. As a result of putting the Guard objects into this hash, they are never garbage collected. Well, not quite. The author used a weak reference in the hash, so the objects there are garbage collected.

Weak references are one of those technical solutions that fits really well into the formula "A programmer had a problem. So he used weak references. Then he had two problems." (Non-greedy regexes are another example. Some people say Perl itself is an example.) Weak references do solve a couple of very specific problems, mostly having to do with caching. For anything else, they turn out to be a bigger problem than the thing you were trying to solve. In this case, the weak references cause this delightful problem:

  
        my $alarm = Alarm->new();
        $alarm->add_observer(Guard->new);
        $alarm->notify_observers("I like pie!");


        Failed to send observation from 'Alarm=HASH(0x8113f74)' to '':
        Can't call method "update" on an undefined value at
        lib/Class/Observable.pm line 95.

See, the Guard that is observing the Alarm is immediately garbage-collected. We could prevent the fatal error here, but that wouldn't solve the problem, which is that that Guard should not be garbage collected as long as the Alarm is still extant, because it is supposed to be watching the Alarm.

This was only one of several problems caused by this design, to store the observers in a global hash. Some of these could possibly be avoided if Perl were a better language, but others I think not.

My suggestion here was the following advice: Whenever you're trying to store information about an object, there is a right place to store it and a wrong place to store it. The right place: inside the object. The wrong place: anywhere else.

Rationale: What is an object? It's a place where you store all the pertinent information about some entity. So here we have some pertinent information about this entity, the alarm, namely the list of guards who are watching it. Where should we store this information? Well, we have this data structure, the Alarm object, which is supposed to store all the pertinent information about the alarm. The list of guards is part of this, so that's where it should go. Entia non sunt multiplicanda praeter necessitatem.

Now, this choice may be obvious, but it has a fair share of problems too, which is why I think the other programmer might not have made it. The add_observers method in the base class of Alarm is called upon to add the Guard to the list of observers of the Alarm. If the list of observers is stored in the Alarm object, add_observersmust make some assumptions about how the Alarm is implemented. But it turns out that once you start thinking about it, the problems turn out to have fairly simple solutions. For example, the base class needs to know what list to append the Guard to. It could assume that the list will be stored in $alarm->{Observers}, but that's iffy: it assumes that the Alarm will be made from a hash, it assumes that that hash key is not used for something else, and so on. So instead, you give Alarm a method that returns a reference to the array in which the Guard is supposed to place itself. Normally, that method is inherited from the base class itself, and returns a reference to $self->{Observers}, but the Alarm class can override that if it needs to. It can even get back the original global-variable implementation by overriding the method to return a reference to an element in a global hash.

To summarize, both designs appeared to have significant technical problems, but the problems in the design I suggested turned out to be a lot easier to solve well then the problems that arose from the design that the author chose. At least in this example, the advice that object data should be stored in the object turned out pretty well.

(Cynical advice to people wishing to become famous experts: phrase your advice so that it sounds inevitable. Anyone wishing to argue against advice like object data should be stored in the object will have an uphill battle, and risks looking like an idiot.)

Now today I was in the shower, thinking about this other piece of software I wrote recently, in which I was having, surprise, a garbage collection problem. The software is a module that is intended to provide lightweight access to flat-file databases, which is something that has been inexplicably overlooked by CPAN.

You use the module like this:

        my $db = FlatFile->new(FILE => ..., FIELDS => [...], ...);

        my @red = $db->lookup(color => "red");

        $red[0]->set_color("blue");  # paint it blue

        $red[1]->delete;
The elements of @red are Record objects, and the Record methods like set_color and delete need to call back to the FlatFile object that represents the database, to inform it that records have been modified or deleted. So each Record object needs a pointer back to the database to which it belongs.

The way I implemented this was to have a class variable in the Record class, $Record::DB, which contained the database object. But then the database object is never garbage-collected, which means it is never finalized, which means that changes to the database are not automatically written to the disk when the database object becomes inaccessible, which means that you have to call $db->flush to make sure the changes are written out, which opens the possibility that you will forget, and the module will bugger your database with a badger.

What I realized in the shower was that I had better take my own advice, that object data should be stored in the object. If the source database is a pertinent piece of information about a record, then I had better store the source database as a piece of member data in the record object, and not try this stupid thing involving a global variable.

If the advice is good, this will be a better design. I haven't done it yet, so I'm not sure. But it certainly looks like a better design.

I want to say I don't know why I used the global variable in the first place, but I do know: my original design for the record object was that it had nothing in it but the data from the record; no metadata at all. The data members were named after the fields in the database, so there was no namespace left over for metadata. That didn't work out too well, and although I resisted putting metadata into the record objects long enough to screw myself regarding where to store the owning database and some other stuff, I eventually got backed into a corner and had to redesign the object to have space for metadata. But the effects of original misdesign persisted, because I was still storing a bunch of the object's metadata in stupid global variables, until I took that shower.

That's two successes so far for my advice, which is two good signs. Now all I need is a good counterexample. Every piece of advice in this book is going to have a counterexample. That's the big problem with computer programming advice books: no counterexamples. But that's another article for another day.


[Other articles in category /prs] permanent link