The Universe of Discourse


Sat, 24 Jan 2009

Higher-Order Perl: nonmemoizing streams
The first version of tail() in the streams chapter looks like this:

        sub tail {
          my $s = shift;
          if (is_promise($s->[1])) {
            return $s->[1]->();  # Force promise
          } else {
            return $s->[1];
          }
        }
But this is soon replaced with a version that caches the value returned by the promise:

        sub tail {
          my $s = shift;
          if (is_promise($s->[1])) {
            $s->[1] = $s->[1]->();  # Force and save promise
          }
          return $s->[1];
        }
The reason that I give for this in the book is a performance reason. It's accompanied by an extremely bad explanation. But I couldn't do any better at the time.

There are much stronger reasons for the memoizing version, also much easier to explain.

Why use streams at all instead of the iterators of chapter 4? The most important reason, which I omitted from the book, is that the streams are rewindable. With the chapter 4 iterators, once the data comes out, there is no easy way to get it back in. For example, suppose we want to process the next bit of data from the stream if there is a carrot coming up soon, and a different way if not. Consider:

        # Chapter 4 iterators
        my $data = $iterator->();
        if (carrot_coming_soon($iterator)) {
          # X 
        } else {
          # Y
        }

        sub carrot_coming_soon {
          my $it = shift;
          my $soon = shift || 3;
          while ($soon-- > 0) {
            my $next = $it->();
            return 1 if is_carrot($next);
          }
          return;   # No carrot
        }
Well, this probably doesn't work, because the carrot_coming_soon() function extracts and discards the upcoming data from the iterator, including the carrot itself, and now that data is lost.

One can build a rewindable iterator:

        sub make_rewindable {
          my $it = shift;
          my @saved;  # upcoming values in LIFO order
          return sub {
            my $action = shift || "next";
            if ($action eq "put back") {
              push @saved, @_;
            } elsif ($action eq "next") {
              if (@saved) { return pop @saved; }
              else { return $it->(); }
            }
          };
        }
But it's kind of a pain in the butt to use:

        sub carrot_coming_soon {
          my $it = shift;
          my $soon = shift || 3;
          my @saved;
          my $saw_carrot;
          while ($soon-- > 0) {
            push @saved, $it->();
            $saw_carrot = 1, last if is_carrot($saved[-1]);
          }
          $it->("put back", @saved);
          return $saw_carrot;
        }
Because you have to explicitly restore the data you extracted.

With the streams, it's all much easier:

        sub carrot_coming_soon {
          my $s = shift;
          my $soon = shift || 3;
          while ($seen-- > 0) {
            return 1 if is_carrot($s->head);
            drop($s);
          }
          return;
        }
The working version of carrot_coming_soon() for streams looks just like the non-working version for iterators.

But this version of carrot_coming_soon() only works for memoizing streams, or for streams whose promise functions are pure. Let's consider a counterexample:

        my $bad = filehandle_stream(\*DATA);

        sub filehandle_stream {
          my $fh = shift;
          return node(scalar <$fh>, 
                      promise { filehandle_stream($fh) });
        }

        __DATA__
        fish
        dog
        carrot
        goat rectum
Now consider what happens if I do this:

        $carrot_soon = carrot_coming_soon($bad);
        print "A carrot appears soon after item ", head($bad), "\n"
          if $carrot_soon;
It says "A carrot appears soon after item fish". Fine. That's because $bad is a node whose head contains "fish". Now let's see what's after the fish:

        print "After ", head($bad), " is ", head(tail($bad)), "\n";
This should print After fish is dog, and for the memoizing streams I used in the book, it does. But a non-memoizing stream will print "After fish is goat rectum". Because tail($bad) invokes the promise function, which, since the next() was not saved after carrot_coming_soon() examined it, builds a new node, which reads the next item from the filehandle, which is "goat rectum".

I wish I had explained the rewinding property of the streams in the book. It's one of the most significant omissions I know about. And I wish I'd appreciated sooner that the rewinding property only works if the tail() function autosaves the tail node returned from the promise.


[Other articles in category /prog/perl] permanent link