The Universe of Discourse


Wed, 25 Sep 2013

In which I revisit the pastimes of my misspent youth
Last weekend I was at a flea market and saw an HP-15C calculator for $10. The HP-15C was the last pocket calculator I owned, some time before pocket calculators became ridiculous. It was a really nice calculator when I got it in 1986, one of my most prized possessions.

I lost my original one somewhere along the way, and also the spare I had bought from a friend against the day when I lost the original, and I was glad to get another one, even though I didn't have any idea what I was going to do with it. My phone has a perfectly serviceable scientific calculator in it, a very HP-ish one called RealCalc. (It's nice, you should check it out.) The 15C was sufficiently popular that someone actually brought it back a couple of years ago, in a new and improved version, with the same interface but 21st-century technology, and I thought hard about getting one, but decided I couldn't justify spending that much money on something so useless, even if it was charming. Finding a cheap replacement was a delightful surprise.

Then on Friday night I was sitting around thinking about which numbers n are such that !!10n^2+9!! a perfect square, and I couldn't think of any examples except for 0, 2, and 4. Normally I would just run and ask the computer, which would take about two minutes to write the program and one second to run it. But I was out in the courtyard, it was a really nice evening, my favorite time of the year, the fading light was beautiful, and I wasn't going to squander it by going inside to brute-force some number problem.

But I did have the HP-15C in my pocket, and the HP-15C is programmable, by mid-1980s programmable calculator standards. That is to say, it is just barely programmable, but just barely is all you need to implement linear search for solutions of !!10n^2+9 = m^2!!. So I wrote the program and discovered, to my surprise, that I still remember many of the fussy details of how to program an HP-15C. For example, the SST button single-steps through the listing, in program mode, but single-steps the execution in run mode. And instead of using the special test 5 to see if the x and y registers are equal you might as well subtract them and use the x=0 test; it uses the same amount of program memory and you won't have to flip the calculator over to remember what test 5 is. And the x2 and INT() operations are on the blue shift key.

Here's the program:

        001 - 42,21,11      Label A:  (subroutine)
        002 -    43 11        x²
        003 -        1
        004 -        0        10
        005 -       20        multiply
        006 -        9        9
        007 -       40        add
        008 -       36        enter (dup)
        009 -       11        √
        010 -       36        enter (dup)
        011 -    43 44        x ← int(x) 
        012 -       30        subtract
        013 -    43 20        unless x=0:
        014 -       31          STOP
        015 -    43 32        return from subroutine
        016 - 42,21,12      Label B:
        017 -       40        +
        018 -    45  0        load register 0
        019 -    32 11        call A
        020 -        2        2
        021 - 44,40, 0        add to register 0
        022 -    22 12        goto B
I see now that when I tested !!\sqrt{10n^2+9}!! for integrality, I did it the wrong way. My method used four steps:
        010 -       36   -- enter (dup)
        011 -    43 44   -- x ← INT(x)
        012 -       30   -- subtract
        013 -    43 20   -- unless x=0: …
but it would have been better to just test the fractional part of the value for zeroness:
                 42 44   -- x ← FRAC(x)
                 43 20   -- unless x=0: …
Saving two instructions might not seem like a big deal, but it takes the calculator a significant amount of time to execute two instructions. The original program takes 55.2 seconds to find n=80; with the shorter code, it takes only 49.2 seconds, a 10% improvement. And when your debugging tool can only display a single line of numeric operation codes, you really want to keep the program as simple as you can.

Besides, stuff should be done right. That's why it's called "right".

But I kind of wish I had that part of my brain back. Who knows what useful thing I would be able to remember if I wasn't wasting my precious few brain cells remembering that the back-step key ("BST") is on the blue shift, and that "42,21,12" is the code for "subroutine B starts here".

Anyway, the program worked, once I had debugged it, and in short order (by 1986 standards) produced the solutions n=18, 80, 154, which was enough to get my phone to search the OEIS and find the rest of the sequence. The OEIS entry mentioned that the solutions have the generating function

$$\frac{2x^2(1+2x+9x^2+2x^3+x^4)}{1-38x^3+x^6}$$

and when I saw that !!38x^3!! in the denominator, I laughed, really loudly. My new neighbor was in her back yard, which adjoins the courtyard, and heard me, and said that if I was going to laugh like that I had to explain what was so funny. I said “Do you really want to know?” and she said yes, but I think she was mistaken.


[Other articles in category /brain] permanent link

Tue, 24 Sep 2013

The shittiest project I ever worked on
Sometimes in job interviews I've been asked to describe a project I worked on that failed. This is the one I always think of first.

In 1995 I quit my regular job as senior web engineer for Time-Warner and became a consultant developing interactive content for the World-Wide Web, which was still a pretty new thing at the time. Time-Warner taught me many things. One was that many large companies are not single organizations; they are much more like a bunch of related medium-sized companies that all share a building and a steam plant. (Another was that I didn't like being part of a large company.)

One of my early clients was Prudential, which is a large life insurance, real estate, and financial services conglomerate based in Newark, New Jersey—another fine example of a large company that actually turned out to be a bunch of medium-sized companies sharing a single building. I did a number of projects for them, one of which was to produce an online directory of Prudential-affiliated real estate brokers. I'm sure everyone is familiar with this sort of thing by now. The idea was that you would visit a form on their web site, put in your zip code or town name, and it would extract the nearby brokers from a database and present them to you on a web page, ordered by distance.

The project really sucked, partly because Prudential was disorganized and bureaucratic, and partly because I didn't know what I was doing. I quoted a flat fee for the job, assuming that it would be straightforward and that I had a good idea of what was required. But I hadn't counted on bureaucratic pettifoggery and the need for every layer of the management hierarchy to stir the soup a little. They tweaked and re-tweaked every little thing. The data set they delivered was very dirty, much of it garbled or incomplete, and they kept having to fix their exporting process, which they did incompletely, several times. They also changed their minds at least once about which affiliated real estate agencies should be in the results, and had to re-send a new data set with the new correct subset of affiliates, and then the new data would be garbled or incomplete. So I received replacement data six or seven times. This would not have been a problem, except that each time they presented me with a file in a somewhat different format, probably exported from some loser's constantly-evolving Excel spreadsheet. So I had to write seven or eight different versions of the program that validated and loaded the data. These days I would handle this easily; after the first or second iteration I would explain the situation: I had based my estimate on certain expectations of how much work would be required; I had not expected to clean up dirty data in eight different formats; they had the choice of delivering clean data in the same format as before, renegotiating the fee, or finding someone else to do the project. But in 1995 I was too green to do this, and I did the extra work for free.

Similarly, they tweaked the output format of the program repeatedly over weeks: first the affiliates should be listed in distance order, but no, they should be listed alphabetically if they are in the same town and then after that the ones from other towns, grouped by town; no, the Prudential Preferred affiliates must be listed first regardless of distance, which necessitated a redelivery of the data which up until then hadn't distinguished between ordinary and Preferred affiliates; no wait, that doesn't make sense, it puts a far-off Preferred affiliate ahead of a nearby regular affiliate... again, this is something that many clients do, but I wasn't expecting it and it took a lot of time I hadn't budgeted for. Also these people had, I now know, an unusually bad case of it.

Anyway, we finally got it just right, and it had been approved by multiple layers of management and given a gold star by the Compliance Department, and my clients took it to the Prudential Real Estate people for a demonstration.

You may recall that Prudential is actually a bunch of medium-sized companies that share a building in Newark. The people I was working with were part of one of these medium-sized companies. The real estate business people were in a different company. The report I got about the demo was that the real estate people loved it, it was just what they wanted.

“But,” they said, “how do we collect the referral fees?”

Prudential Real Estate is a franchise operation. Prudential does not actually broker any real estate. Instead, a local franchisee pays a fee for the use of the name and logo and other services. One of the other services is that Prudential runs a national toll-free number; you can call this up and they will refer you to a nearby affiliate who will help you buy or sell real estate. And for each such referral, the affiliate pays Prudential a referral fee.

We had put together a real estate affiliate locator application which let you locate a nearby Prudential-affiliated franchisee and contact them directly, bypassing the referral and eliminating Prudential's opportunity to collect a referral fee.

So I was told to make one final change to the affiliate locator. It now worked like this: The user would enter their town or zip code; the application would consult the database and find the contact information for the nearby affiliates, it would order them in the special order dictated by the Compliance Department, and then it would display a web page with the addresses and phone numbers of the affiliates carefully suppressed. Instead, the name of each affiliate would be followed by the Prudential national toll-free number AND NOTHING ELSE. Even the names were suspect. For a while Prudential considered replacing each affiliate's name with a canned string, something like "Prudential Real Estate Affiliate", because what if the web user decided to look up the affiliate in the Yellow Pages and call them directly? It was eventually decided that the presence of the toll-free number directly underneath rendered this risk acceptably small, so the names stayed. But everything else was gone.

Prudential didn't need an affiliate locator application. They needed a static HTML page that told people to call the number. All the work I had put into importing the data, into formatting the output, into displaying the realtors in precisely the right order, had been a complete waste of time.

[ Addendum 20131018: This article is available in Chinese. ]


[Other articles in category /tech] permanent link

Tue, 17 Sep 2013

Overlapping intervals
Our database stores, among other things, "budgets", which have a lifetime with a start and end time. A business rule is that no two budgets may be in force at the same time. I wanted to build a method which, given a proposed start and end time for a new budget, decided whether there was already a budget in force during any part of the proposed period.

The method signature is:

   sub find_overlapping_budgets {
     my ($self, $start, $end) = @_;
     ...
   }
and I want to search the contents of $self->budgets for any budgets that overlap the time interval from $start to $end. Budgets have a start_date and an end_date property.

My first thought was that for each existing budget, it's enough to check to see if its start_date or its end_date lies in the interval of interest, so I wrote it like this:

   sub find_overlapping_budgets {
     my ($self, $start, $end) = @_;

     return $self->budgets->search({
       [ { start_date => { ">=" , $start },
           start_date => { "<=" , $end },
         },
         { end_date => { ">=" , $start },
           end_date => { "<=" , $end },
         },
       ]
     });
   }
People ridicule Lisp for having too many parentheses, and code like this, a two-line function which ends with },},]});}, should demonstrate that that is nothing but xenophobia. I'm not gonna explain the ridiculous proliferation of braces and brackets here, except to say that this is expressing the following condition:

$$ \begin{array}{} ( start_A \le & start_B & & \wedge & \\ & start_B & \le end_A & & ) \vee \\ ( start_A \le & end_B & & \wedge & \\ & end_B & \le end_A & & ) \\ \end{array} $$

which we can abbreviate as:

$$ start_A \le start_B \le end_A \vee \\ start_A \le end_B \le end_A \\ $$

And if this condition holds, then the intervals overlap. Anyway, this seemed reasonable at the time, but is totally wrong, and happily, the automated tests I wrote for the method caught the error. Say that we ask whether we can create a budget that runs from June 1 to June 10. Say there is a budget that already exists, running from June 6 to June 7. Then the query asks :

$$ \text{June 5} \le \text{June 1} \le \text{June 6} \vee \\ \text{June 5} \le \text{June 10} \le \text{June 6} \\ $$

Both of the disjuncts are false, so the method reports that there is no overlap. My implementation was just completely wrong. it's not enough to check to see if either endpoint of the proposed interval lies within an existing interval; you also have to check to see if any of the endpoints of the existing intervals lie within the proposed interval. (Alert readers will have noticed that although the condition "Intervals A and B overlap" is symmetric in A and B, the condition as I wrote it is not symmetric, and this should raise your suspicions.)

This was yet another time when I felt slightly foolish as I wrote the automated tests, assuming that the time and effort I spent on testing this trivial function would would be time and effort thrown away on nothing—and then they detected a real fault. Someday perhaps I'll stop feeling foolish writing tests for functions like this one; until then, many cases just like this one will help me remember that I must write the tests even though I feel foolish doing it.

Okay, how to get this right? I tried a bunch of things, mostly involving writing out a conjunction of every required condition and then using boolean algebra to simplify the resulting expression:

$$ start_A \le start_B \le end_A \vee \\ start_A \le end_B \le end_A \vee \\ start_B \le start_A \le end_B \vee \\ start_B \le end_A \le end_B \\ $$

This didn't work well, partly because I was doing it at two in the morning, partly because there are many conditions, all very similar, and I kept getting them mixed up, and partly because, for implementation reasons, the final expression must be a query on interval A, even though it is most naturally expressed symmetrically between the two intervals.

But then I had a happy idea: For some reason it seemed much simpler to express the opposite condition, that the two intervals do not conflict. If they don't conflict, then interval A must be entirely to the left of interval B, so that $$end_A \lt start_B,$$ or vice-versa, so that $$end_B\lt start_A.$$ Then the intervals do not overlap if either of these is true:

$$ end_A \lt start_B \vee end_B \lt start_A $$

and the condition that we want, that the two intervals do overlap, is simply its negation:

$$ end_A \ge start_B \wedge end_B \ge start_A $$

This is correct, or at least all the tests now pass, and it is even simpler than the incorrect condition I wrote in the first place. The code looks like this:

   sub find_overlapping_budgets {
     my ($self, $start, $end) = @_;

     return $self->budgets->search({
         end_date   =>   { '>=', $start },
         start_date =>   { '<=', $end   },
     });
   }
Usually I like to draw some larger lesson from this sort of thing. What comes to mind now (other than “Just write the tests, fool!”) is this: The end result is quite clever. Often I see the final version of the code and say "Oh, I wonder why I didn't see that right off?" Not this time. I want to say I couldn't have found it by myself, except that I did find it by myself, not by just pulling it magically out of my head, but by applying technique.

Instead of "not by magically pulling it out of my head" I was about to write "not by just thinking", but that is not quite right. I did solve it by "just thinking", but it was a different sort of thinking. Sometimes I consider a problem, and a solution leaps to mind, as it did in this case, except that it was wrong. That is what I call "just thinking". But applying carefully-learned and practiced technique is also thinking.

The techniques I applied in this problem included: noticing and analyzing symmetries of the original problem, and application of laws of boolean algebra, both in the unsuccessful and the successful attempt. Higher-level strategies included trying more than one approach, and working backwards. Learning and correctly applying technique made me effectively a better thinker, not just in general, but in this particular case.

[ Addendum 20130917: Dfan Schmidt remarks: "I'm astonished you didn't know the interval-overlap trick already." I was a little surprised, also, when I tried to pull the answer out of my head and didn't find one there already, either from having read it somewhere before, or from having solved the problem before. ]


[Other articles in category /prog] permanent link