The Universe of Disco


Fri, 11 Jan 2008

Help, help!
(Readers of Planet Haskell may want to avert their eyes from this compendium of Perl introspection techniques. Moreover, a very naughty four-letter word appears, a word that begins with "g" and ends with "o". Let's just leave it at that.)

Przemek Klosowski wrote to offer me physics help, and also to ask about introspection on Perl objects. Specifically, he said that if you called a nonexistent method on a TCL object, the error message would include the names of all the methods that would have worked. He wanted to know if there was a way to get Perl to do something similar.

There isn't, precisely, because Perl has only a conventional distinction between methods and subroutines, and you Just Have To Know which is which, and avoid calling the subroutines as methods, because the Perl interpreter has no idea which is which. But it does have enough introspection features that you can get something like what you want. This article will explain how to do that.

Here is a trivial program that invokes an undefined method on an object:

        use YAML;

        my $obj = YAML->new;
        $obj->nosuchmethod;
When run, this produces the fatal error:

        Can't locate object method "nosuchmethod" via package "YAML" at test.pl line 4.
(YAML in this article is just an example; you don't have to know what it does. In fact, I don't know what it does.)

Now consider the following program instead:

        use YAML;
        use Help 'YAML';
        
        my $obj = YAML->new;
        $obj->nosuchmethod;
Now any failed method calls to YAML objects, or objects of YAML's subclasses, will produce a more detailed error message:

        Unknown method 'nosuchmethod' called on object of class YAML
        Perhaps try:
          Bless 
          Blessed 
          Dump 
          DumpFile 
          Load 
          LoadFile 
          VALUE 
          XXX 
          as_heavy (inherited from Exporter)
          die (inherited from YAML::Base)
          dumper_class 
          dumper_object 
          export (inherited from Exporter)
          export_fail (inherited from Exporter)
          export_ok_tags (inherited from Exporter)
          export_tags (inherited from Exporter)
          export_to_level (inherited from Exporter)
          field 
          freeze 
          global_object 
          import (inherited from Exporter)
          init_action_object 
          loader_class 
          loader_object 
          new (inherited from YAML::Base)
          node_info (inherited from YAML::Base)
          require_version (inherited from Exporter)
          thaw 
          warn (inherited from YAML::Base)
          ynode 
        Aborting at test.pl line 5
Some of the methods in this list are bogus. For example, the stuff inherited from Exporter should almost certainly not be called on a YAML object.

Some of the items may be intended to be called as functions, and not as methods. Some may be functions imported from some other module. A common offender here is Carp, which places a carp function into another module's namespace; this function will show up in a list like the one above, without even an "inherited from" note, even though it is not a method and it does not make sense to call it on an object at all.

Even when the items in the list really are methods, they may be undocumented, internal-use-only methods, and may disappear in future versions of the YAML module.

But even with all these warnings, Help is at least a partial solution to the problem.

The real reason for this article is to present the code for Help.pm, not because the module is so intrinsically useful itself, but because it is almost a catalog of weird-but-useful Perl module hackery techniques. A full and detailed tour of this module's 30 lines of code would probably make a decent 60- or 90-minute class for intermediate Perl programmers who want to become wizards. (I have given many classes on exactly that topic.)

Here's the code:

        package Help;

        use Carp 'croak';

        sub import {
          my ($selfclass, @classes) = @_;
          for my $class (@classes) {
            push @{"$class\::ISA"}, $selfclass;
          }
        }

        sub AUTOLOAD {
          my ($bottom_class, $method) = $AUTOLOAD =~ /(.*)::(.*)/;
          my %known_method;

          my @classes = ($bottom_class);
          while (@classes) {
            my $class = shift @classes;
            next if $class eq __PACKAGE__;
            unshift @classes, @{"$class\::ISA"};
            for my $name (keys %{"$class\::"}) {
              next unless defined &{"$class\::$name"};
              $known_method{$name} ||= $class;
            }
          }

          warn "Unknown method '$method' called on object of class $bottom_class\n";
          warn "Perhaps try:\n";
          for my $name (sort keys %known_method) {
            warn "  $name " . 
              ($known_method{$name} eq $bottom_class 
               ? "" 
               : "(inherited from $known_method{$name})") . 
                "\n";
          }
          croak "Aborting";
        }

        sub help {
          $AUTOLOAD = ref($_[0]) . '::(none)';
          goto &AUTOLOAD;
        }

        sub DESTROY {} 

        1;

use Help 'Foo'

When any part of the program invokes use Help 'Foo', this does two things. First, it locates Help.pm, loads it in, and compiles it, if that has not been done already. And then it immediately calls Help->import('Foo').

Typically, a module's import method is inherited from Exporter, which gets control at this point and arranges to make some of the module's functions available in the caller's namespace. So, for example, when you invoke use YAML 'freeze' in your module, Exporter's import method gets control and puts YAML's "freeze" function into your module's namespace. But that is not what we are doing here. Instead, Help has its own import method:

        sub import {
          my ($selfclass, @classes) = @_;
          for my $class (@classes) {
            push @{"$class\::ISA"}, $selfclass;
          }
        }
The $selfclass variable becomes Help and @classes becomes ('Foo'). Then the module does its first tricky thing. It puts itself into the @ISA list of another class. The push line adds Help to @Foo::ISA.

@Foo::ISA is the array that is searched whenever a method call on a Foo objects fails because the method doesn't exist. Perl will search the classes named in @Foo::ISA, in order. It will search the Help class last. That's important, because we don't want Help to interfere with Foo's ordinary inheritance.

Notice the way the variable name Foo::ISA is generated dynamically by concatenating the value of $class with the literal string ::ISA. This is how you access a variable whose name is not known at compile time in Perl. We will see this technique over and over again in this module.

The backslash in @{"$class\::ISA"} is necessary, because if we wrote @{"$class::ISA"} instead, Perl would try to interpolate the value of $ISA variable from the package named class. We could get around this by writing something like @{$class . '::ISA'}, but the backslash is easier to read.

AUTOLOAD

So what happens when the program calls $foo->nosuchmethod? If one of Foo's base classes includes a method with that name, it will be called as usual.

But when method search fails, Perl doesn't give up right away. Instead, it tries the method search a second time, this time looking for a method named AUTOLOAD. If it finds one, it calls it. It only throws an exception of there is no AUTOLOAD.

The Help class doesn't have a nosuchmethod method either, but it does have AUTOLOAD. If Foo or one of its other parent classes defines an AUTOLOAD, one of those will be called instead. But if there's no other AUTOLOAD, then Help's AUTOLOAD will be called as a last resort.

$AUTOLOAD

When Perl calls an AUTOLOAD function, it sets the value of $AUTOLOAD to include the full name of the method it was trying to call, the one that didn't exist. In our example, $AUTOLOAD is set to "Foo::nosuchmethod".

This pattern match dismantles the contents of $AUTOLOAD into a class name and a method name:

        sub AUTOLOAD {
          my ($bottom_class, $method) = $AUTOLOAD =~ /(.*)::(.*)/;
The $bottom_class variable contains Foo, and the $method variable contains nosuchmethod.

The AUTOLOAD function is now going to accumulate a table of all the methods that could have been called on the target object, print out a report, and throw a fatal exception.

The accumulated table will reside in the private hash %known_method. Keys in this hash will be method names. Values will be the classes in which the names were found.

Accumulating the table of method names

The AUTOLOAD function accumulates this hash by doing a depth-first search on the @ISA tree, just like Perl's method resolution does internally. The @classes variable is a stack of classes that need to be searched for methods but that have not yet been searched. Initially, it includes only the class on which the method was actually called, Foo in this case:

          my @classes = ($bottom_class);
As long as some class remains unsearched, this loop will continue to look for more methods. It begins by grabbing the next class off the stack:
          while (@classes) {
            my $class = shift @classes;
Foo inherits from Help too, but we don't want our error message to mention that, so the search skips Help:

            next if $class eq __PACKAGE__;
(__PACKAGE__ expands at compile time to the name of the current package.)

Before the loop actually looks at the methods in the current class it's searching, it looks to see if the class has any base classes. If there are any, it pushes them onto the stack to be searched next:

            unshift @classes, @{"$class\::ISA"};
Now the real meat of the loop: there is a class name in $class, say Foo, and we want the program to find all the methods in that class. Perl makes the symbol table for the Foo package available in the hash %Foo::. Keys in this hash are variable, subroutine, and filehandle names.

To find out if a name denotes a subroutine, we use defined(&{subroutine_name}) for each name in the package symbol table. If there is a subroutine by that name, the program inserts it and the class name into %known_method. Otherwise, the name is a variable or filehandle name and is ignored:

            for my $name (keys %{"$class\::"}) {
              next unless defined &{"$class\::$name"};
              $known_method{$name} ||= $class;
            }
          }
The ||= sets a new value for $name in the hash only if there was not one already. If a method name appears in more than one class, it is recorded as being in the first one found in the search. Since the search is proceeding in the same order that Perl uses, the one recorded is the one that Perl will actually find. For example, if Foo inherits from Bar, and both classes define a this method, the search will find Foo::this before Bar::this, and that is what will be recorded in the hash. This is correct, because Foo's this method overrides Bar's.

If you have any clever techniques for identifying other stuff that should be omitted from the output, this is where you would put them. For example, many authors use the convention that functions whose names have a leading underscore are private to the implementation, and should not be called by outsiders. We might omit such items from the output by adding a line here:

              next if $name =~ /^_/;
After the loop finishes searching all the base classes, the %known_method hash looks something like this:

    (
        this => Foo,
        that => Foo,
        new => Base,
        blookus => Mixin::Blookus,
        other => Foo
    )
This means that methods this, that, and other were defined in Foo itself, but that new is inherited from Base and that blookus was inherited from Mixin::Blookus.

Printing the report

The AUTOLOAD function then prints out some error messages:

          warn "Unknown method '$method' called on object of class $bottom_class\n";
          warn "Perhaps try:\n";
And at last the payoff: It prints out the list of methods that the programmer could have called:

          for my $name (sort keys %known_method) {
            warn "  $name " . 
              ($known_method{$name} eq $bottom_class 
               ? "" 
               : "(inherited from $known_method{$name})") . 
                "\n";
          }
          croak "Aborting";
        }
Each method name is printed. If the class in which the method was found is not the bottom class, the name is annotated with the message (inherited from wherever).

The output for my example would look like this:

        Unknown method 'nosuchmethod' called on object of class Foo:
        Perhaps try:
          blookus (inherited from Mixin::Blookus)
          new (inherited from Base)
          other
          that
          this
        Aborting at YourErroneousModule.pm line 679
Finally the function throws a fatal exception. If we had used die here, the fatal error message would look like Aborting at Help.pm line 34, which is extremely unhelpful. Using croak instead of die makes the message look like Aborting at test.pl line 5 instead. That is, it reports the error as coming from the place where the erroneous method was actually called.

Synthetic calls

Suppose you want to force the help message to come out. One way is to call $object->fgsfds, since probably the object does not provide a fgsfds method. But this is ugly, and it might not work, because the object might provide a fgsfds method. So Help.pm provides another way.

You can always force the help message by calling $object->Help::help. This calls a method named help, and it starts the inheritance search in the Help package. Control is transferred to the following help method:

        sub help {
          $AUTOLOAD = ref($_[0]) . '::(none)';
          goto &AUTOLOAD;
        }
The Help::help method sets up a fake $AUTOLOAD value and then uses "magic goto" to transfer control to the real AUTOLOAD function. "Magic goto" is not the evil bad goto that is Considered Harmful. It is more like a function call. But unlike a regular function call, it erases the calling function (help) from the control stack, so that to subsequently executed code it appears that AUTOLOAD was called directly in the first place.

Calling AUTOLOAD in the normal way, without goto, would have worked also. I did it this way just to be a fusspot.

DESTROY

Whenever a Perl object is destroyed, its DESTROY method is called, if it has one. If not, method search looks for an AUTOLOAD method, if there is one, as usual. If this lookup fails, no fatal exception is thrown; the object is sliently destroyed and execution continues.

It is very common for objects to lack a DESTROY method; usually nothing additional needs to be done when the object's lifetime is over. But we do not want the Help::AUTOLOAD function to be invoked automatically whenever such an object is destroyed! So Help defines a last-resort DESTROY method that is called instead; this prevents Perl from trying the AUTOLOAD search when an object with no DESTROY method is destroyed:

        sub DESTROY {} 
This DESTROY method restores the default behavior, which is to do nothing.

Living dangerously

Perl has a special package, called UNIVERSAL. Every class inherits from UNIVERSAL. If you want to apply Help to every class at once, you can try:

        use Help 'UNIVERSAL';
but don't blame me if something weird happens.

About use strict

Whenever I present code like this, I always get questions (or are they complaints?) from readers about why I omitted "use strict". "Always use strict!" they say.

Well, this code will not run with "use strict". It does a lot of stuff on purpose that "strict" was put in specifically to keep you from doing by accident.

At some point you have to take off the training wheels, kiddies.

License

Code in this article is hereby placed in the public domain.

Share and enjoy.


[Other articles in category /prog/perl] permanent link