The Universe of Discourse
           
Tue, 17 Sep 2013

Overlapping intervals
Our database stores, among other things, "budgets", which have a lifetime with a start and end time. A business rule is that no two budgets may be in force at the same time. I wanted to build a method which, given a proposed start and end time for a new budget, decided whether there was already a budget in force during any part of the proposed period.

The method signature is:

   sub find_overlapping_budgets {
     my ($self, $start, $end) = @_;
     ...
   }
and I want to search the contents of $self->budgets for any budgets that overlap the time interval from $start to $end. Budgets have a start_date and an end_date property.

My first thought was that for each existing budget, it's enough to check to see if its start_date or its end_date lies in the interval of interest, so I wrote it like this:

   sub find_overlapping_budgets {
     my ($self, $start, $end) = @_;

     return $self->budgets->search({
       [ { start_date => { ">=" , $start },
           start_date => { "<=" , $end },
         },
         { end_date => { ">=" , $start },
           end_date => { "<=" , $end },
         },
       ]
     });
   }
People ridicule Lisp for having too many parentheses, and code like this, a two-line function which ends with },},]});}, should demonstrate that that is nothing but xenophobia. I'm not gonna explain the ridiculous proliferation of braces and brackets here, except to say that this is expressing the following condition:

$$ \begin{array}{} ( start_A \le & start_B & & \wedge & \\ & start_B & \le end_A & & ) \vee \\ ( start_A \le & end_B & & \wedge & \\ & end_B & \le end_A & & ) \\ \end{array} $$

which we can abbreviate as:

$$ start_A \le start_B \le end_A \vee \\ start_A \le end_B \le end_A \\ $$

And if this condition holds, then the intervals overlap. Anyway, this seemed reasonable at the time, but is totally wrong, and happily, the automated tests I wrote for the method caught the error. Say that we ask whether we can create a budget that runs from June 1 to June 10. Say there is a budget that already exists, running from June 6 to June 7. Then the query asks :

$$ \text{June 5} \le \text{June 1} \le \text{June 6} \vee \\ \text{June 5} \le \text{June 10} \le \text{June 6} \\ $$

Both of the disjuncts are false, so the method reports that there is no overlap. My implementation was just completely wrong. it's not enough to check to see if either endpoint of the proposed interval lies within an existing interval; you also have to check to see if any of the endspoints of the existing intervals lie within the proposed interval. (Alert readers will have noticed that although the condition "Intervals A and B overlap" is symmetric in A and B, the condition as I wrote it is not symmetric, and this should raise your suspicions.)

This was yet another time when I felt slightly foolish as I wrote the automated tests, assuming that the time and effort I spent on testing this trivial function would would be time and effort thrown away on nothing—and then they detected a real fault. Someday perhaps I'll stop feeling foolish writing tests for functions like this one; until then, many cases just like this one will help me remember that I must write the tests even though I feel foolish doing it.

Okay, how to get this right? I tried a bunch of things, mostly involving writing out a conjunction of every required condition and then using boolean algebra to simplify the resulting expression:

$$ start_A \le start_B \le end_A \vee \\ start_A \le end_B \le end_A \vee \\ start_B \le start_A \le end_B \vee \\ start_B \le end_A \le end_B \\ $$

This didn't work well, partly because I was doing it at two in the morning, partly because there are many conditions, all very similar, and I kept getting them mixed up, and partly because, for implementation reasons, the final expression must be a query on interval A, even though it is most naturally expressed symmetrically between the two intervals.

But then I had a happy idea: For some reason it seemed much simpler to express the opposite condition, that the two intervals do not conflict. If they don't conflict, then interval A must be entirely to the left of interval B, so that $$end_A \lt start_B,$$ or vice-versa, so that $$end_B\lt start_A.$$ Then the intervals do not overlap if either of these is true:

$$ end_A \lt start_B \vee end_B \lt start_A $$

and the condition that we want, that the two intervals do overlap, is simply its negation:

$$ end_A \ge start_B \wedge end_B \ge start_A $$

This is correct, or at least all the tests now pass, and it is even simpler than the incorrect condition I wrote in the first place. The code looks like this:

   sub find_overlapping_budgets {
     my ($self, $start, $end) = @_;

     return $self->budgets->search({
         end_date   =>   { '>=', $start },
         start_date =>   { '<=', $end   },
     });
   }
Usually I like to draw some larger lesson from this sort of thing. What comes to mind now (other than “Just write the tests, fool!”) is this: The end result is quite clever. Often I see the final version of the code and say "Oh, I wonder why I didn't see that right off?" Not this time. I want to say I couldn't have found it by myself, except that I did find it by myself, not by just pulling it magically out of my head, but by applying technique.

Instead of "not by magically pulling it out of my head" I was about to write "not by just thinking", but that is not quite right. I did solve it by "just thinking", but it was a different sort of thinking. Sometimes I consider a problem, and a solution leaps to mind, as it did in this case, except that it was wrong. That is what I call "just thinking". But applying carefully-learned and practiced technique is also thinking.

The techniques I applied in this problem included: noticing and analyzing symmetries of the original problem, and application of laws of boolean algebra, both in the unsuccessful and the successful attempt. Higher-level strategies included trying more than one approach, and working backwards. Learning and correctly applying technique made me effectively a better thinker, not just in general, but in this particular case.

[ Addendum 20130917: Dfan Schmidt remarks: "I'm astonished you didn't know the interval-overlap trick already." I was a little surprised, also, when I tried to pull the answer out of my head and didn't find one there already, either from having read it somewhere before, or from having solved the problem before. ]


[Other articles in category /prog] permanent link