Archive:
In this section:
Subtopics:
Comments disabled |
Fri, 17 Feb 2012
It came from... the HOLD SPACE
I recently posted about my work on Zach Holman's
One example of awful crap we had to deal with before Perl etc. were invented was that some people used to write 'sed scripts', although I am really not sure how they did it. I tried once, without much success, and then for this talk I tried again, and again did not have much success. "The hold space" is ased-ism. The basic model of sed is that it
reads the next line of data into the 'pattern space', then applies a
bunch of transformations to it, and then prints it out. If you need
to save this line for later examination, or for emitting later on
instead, you can hold it in the 'hold space'. Use of the hold space
is what distinguishes sed experts from mere sed nobodies like me. So
I planned to talk about the hold space, and then I got the happy idea
to analogize the Hold Space to the Twilight Zone, or maybe the Phantom
Zone, a place where you stick naughty data when you don't want it to
escape. I never feel like audiences appreciate the work I put into
this sort of thing; when I'm giving the talk it always sounds too much
like a private joke. Explaining it just feels like everyone is sitting
through my explanation of a private joke.
The little guy to the right is known
as
(Speaking of teletypewriters, ours had an attachment for punching paper tape, which you can see on the left side of the picture. The punched chads fell into a plastic chad box (which is missing in the picture), and when I was about three I spilled the chad box. Chad was everywhere, and it was nearly impossible to pick up. There were still chads stuck in the cracks in the floorboards when we moved out three years later. That's why, when the contested election of 2000 came around, I was one of the few people in North America who was not bemused to learn that there was a name for the little punched-out bits.)
Anyway, back to
I really wanted to present a tour de force of
On slide
28 I called
If you are interested in the details of the
Several people have asked me to clarify my claim to have invented
My own current version of the [ Addendum 20170325: I have revised this talk a couple of times since this blog article was written. Links to particular slides go to the 2011 versions, but the current version is from 2017. There are only minor changes. For example, I removed `awk` from the list of “monsters”. ]
Insane calculations in bash
A while back, Zach Holman (who I hadn't heard of before, but who is
apparently a bigwig at GitHub) implemented a kind of cute little
hack, called "spark".
It's a little shell utility,
spark 2,4,6,8will print out something like:
▃▄▆▇To do this in one of the 'P' languages (Perl, Python, PHP, Puby, or maybe Pickle) takes something like four lines of code. But M. Holman decided to implement it in bash for maximum
portability, so it took 72 lines, not counting comments, whitespace,
etc.Let's begin by discussing the (very simple) mathematics that underlies drawing bar graphs. Suppose you want to generate a set of bars for the numbers $1, $9, $20. And suppose you can actually generate bars of integer heights only, say integers from 0–7: 0 1 ▁ 2 ▂ 3 ▃ 4 ▄ 5 ▅ 6 ▆ 7 ▇(M. Holman 's original program did this, even though a height-8 bar █ is available. But the mathematics is the same either way.)
## Absolute scalingThe first step is to scale the input numbers onto the range of the bars. To do this, we find a scale factorf that maps dollars
onto bar heights, say that f bar units = $1.A reasonable thing to try is to say that since your largest number is $20, we will set 7 bar units = $20. Then 0.35 bar units = $1, and 3.45 bar units = $9. We'll call these the "natural heights" for the bars. Unfortunately we can't render the bars at their natural heights; we can only render them at integer heights, so we have to round off. 0.35 bar units rounds off to 0, so we will represent $1 as no bar at all. 3.45 bar units rounds off, badly, to 3, but that's the way it goes; if you try to squeeze the numbers from 1 to 20 into the range 0 to 7, something has to give. Anyway, this gives
(1,9,20) → ( ▃▇)The formula is: Let max be the largest input number (here,
20) and let n be the size of the largest possible bar (here, 7).
Then an input number x becomes a bar of size
n·x / max:
$$x\rightarrow {n\cdot x \over max } $$ Note that this mapsmax itself to n, and 0 to 0.I'll call this method "absolute scaling", because big numbers turn into big bars. (It fails for negative numbers, but we'll assume that the numbers are non-negative.)
(0…20) → ( ▁▁▁▂▂▂▃▃▄▄▄▅▅▅▆▆▆▇▇)There are a couple of variations we might want to apply. First, maybe we don't like that $1 mapped to no bar at all; it's too hard to see, depending on the context. Perhaps we would like to guarantee that only 0 maps to 0. One way to ensure that is to round everything up, instead of rounding to the nearest integer:
(0…20) → ( ▁▁▂▂▂▃▃▃▄▄▄▅▅▅▆▆▆▇▇▇) (1,9,20) → (▁▄▇)Another benefit of always rounding up is that it uses the bars equally. Suppose we're mapping numbers in the range 1–100 to bars of heights 1–7. If we round off to the nearest integer, each bar
represents 14 or 15 different numbers, except that the tallest bar
only represents the 8 numbers 93–100. This is a typical situation.
If we always round up, each bar corresponds to a nearly equal range of
numbers. (Another way to adjust this is to replace n with
n+½ in the formula.)
## Relative scalingNow consider the numbers $18, $19, $20. Under the absolute scaling method, we get:
(18,19,20) → (▆▇▇)or, if you're rounding up,
(18,19,20) → (▇▇▇)which obscures the difference between the numbers. There's only an 11% difference between the tallest and shortest bar, and that doesn't show up at this resolution. Depending on your application, this might be what you want, but we might also want to avail ourselves of the old trick of adjusting the baseline. Instead of the bottom of the bar being 0, we can say it represents 17. This effectively reduces every bar by 17 before scaling it, so that the number x is now
represented by a bar with natural height
n·(x−17) / (max−17).
Then we get these bars:
(18,19,20) → (▃▅▇)Whether this "relative scaling" is a better representation than ▇▇▇ depends on the application. It emphasizes different properties of the data.
In general, if we put the baseline at $$x\rightarrow {n\cdot (x-b) \over (max-b) } $$ That is the same formula as before, except that everything has been shifted down byb.
A reasonable choice of
## The shell sucksBut anyway, what I really wanted to talk about was how to fix this program, because I think my solution was fun and interesting. There is a tricky problem, which is that you need to calculate values like (n-b)/(x-b), which so you might like to do
some division, but as I wrote earlier, bash has no
facilities for doing fractional arithmetic. The original program
used $((…)) everywhere, which throws away fractions. You
can work around that, because you don't actually the fractional part
of (n-b)/(x-b); you only need the greatest
integer part. But the inputs to the program might themselves be
fractional numbers, like say 3.5, and $((…)) barfs if you
try to operate on such a number:
$ x=3.5; echo $((x + 1)) bash: 3.5: syntax error: invalid arithmetic operator (error token is ".5")and you seemingly cannot work around that.
My first response to this was to replace all the uses of So supposing that you take this complaint seriously, what can you do? Are you just doomed? No, I found a solution to the problem that solves all the problems. It is portable, efficient, and correct. It is also slightly insane.
## Portable fractions in
We cannot use decimal numbers: |