The Universe of Discourse

Mark Dominus (陶敏修)
mjd@pobox.com

12 recent entries

A puzzle about balancing test tubes in a centrifuge
Proof by insufficient information
Willie Singletary will you please go now?
How our toy octopuses got revenge on a Philadelphia traffic court judge
Does someone really have to do the dirty jobs?
The mathematical past is a foreign country
Baseball on the Moon
Hangeul sign-engraving machine
Claude and Merle Miller let me down
Reflector grids
Jonathan Chait
Claude chokes on graph theory

Archive:

2025: JF M A M
2024: JF M A M J
J ASOND
2023: JF M A M J
J A S O N D
2022: J F M A M J
JAS O N D
2021: J F M AMJ
J A S O N D
2020: J F M A M J
J A S O N D
2019: JFM A M J
J A S O N D
2018: J F M A M J
J A S O N D
2017: J F M A M J
J A S O N D
2016: JF M A M J
JASON D
2015: JFM A M J
J A S O N D
2014: J F M AMJ
JASON D
2013: JFMAMJ
JAS OND
2012: J F MAMJ
JASOND
2011: JFMAM J
JASOND
2010: JFMAMJ
JA S O ND
2009: J F MAM J
JASOND
2008: J F M A M J
JAS O ND
2007: J F M A M J
J A S O N D
2006: J F M A M J
JAS O N D
2005: O N D

In this section:

Notes on a system for abbreviating SQL queries
Notes on card games played by aliens
Notes on rarely-seen game mechanics
Twingler, a generic data structure munger

Subtopics:

Mathematics 245

Programming 99

Language 95

Miscellaneous 75

Book 50

Tech 49

Etymology 35

Haskell 33

Oops 30

Unix 27

Cosmic Call 25

Math SE 25

Law 22

Physics 21

Perl 17

Biology 15

Brain 15

Calendar 15

Food 15

Comments disabled

Wed, 03 May 2023

Notes on rarely-seen game mechanics

A while back I posted some miscellaneous notes on card games played by aliens. Dave Turner has written a response in the same mode, titled “Rarely seen game mechanics”. If you like my blog, you will probably enjoy this article of Dave's.

Dave's article inspired me to write him an email reply, a version of which follows.

[ Content warning: discussion of self-mutilation. ]

Perception

Dave says:

Some time ago, I invented a game where you try to determine physical quantities. Like, you’re given a box full of weights, and you have to determine how many grams it is, just by hefting it.

So far I can only think of one common game that is even a little bit similar to this: guess the number of jelly beans in the jar. As Dave points out, weighing the jar, measuring it, or even picking up the jar are forbidden.

Montessori education focuses on developing a child's perception, and has many activities of this type, not played as competitive games, but as single-person training games. One example is the “sound boxes” or “sound cylinders”. This is a family of twelve opaque containers, usually in six pairs.

Two wooden boxes, each
containing a set of six wooden cylinders. One set has red caps on the
ends, the other has blue caps.

Two containers (call them R1 and B1) contain a small amount of sand, two (R2 and B2) contain dried peas, R3 and B3 contain small pasta, R4 and B4 contain pebbles, and so on. The child is first presented with three cylinders, say R1, R3 and R6. They shake the cylinders close to their ears and listen to the sounds. Then they are invited to listen to R1, R3, and R6 and to match them with their mates B1, B3, and B6. As the child gets better at this, more pairs of cylinders cnn be introduced. Later games involve taking a single set of cylinders R1–R6 and ordering them from quietest to loudest.

Other Montessori equipment of this type includes “baric tablets”, which are same-sized tablets made of different woods, to be distinguished by weight, and “cylinder blocks” which are sets of cylinders in different sizes, to be sorted by size and then put back in their corresponding sockets. None of this is hard stuff for adults, but it is interesting and challenging for a three-year-old. Montessori was a brilliant woman.

My piano teacher would sometimes play two notes and ask me to say whether it was a major third or a perfect fourth or whatever.

Dave mentions a microgame played by Ethiopian girls to choose who gets to go first:

they would instead hide three stones in one hand and four in another, and ask another player to figure out which hand had more stones. Of course, this is also a deception game.

This made me think of Odds and Evens, which was the standard method in New York where I grew up. One player (doesn't matter which) is designated “odds” and the other “evens”. Then on a count of three, the two players each reveal either one or two fingers. If the total number of fingers is even, the evens player wins. I always assumed that this was a game of pure randomness, but now I'm not sure. Maybe other people play it at a higher level, using psychology or trickery to gain an advantage.

Short-term memory

At math camp we used to play a short-term memory game that went like this: The first player would say "I went on a picnic, and I brought an abominable antelope.” Then the second player would say “I went on a picnic and I brought an abominable antelope and a basket of bananas.” Then “I went on a picnic and I brought and abominable antelope, a basket of bananas, and a copy of Charlie and the Chocolate Factory.” If you miss one, you have to drop out. Have-to-drop-out is often a bad game mechanic, but since this is a low-stakes social game, it can be fun to hang around and watch the end even after you're out. And since this was usually played in a larger gathering, it was easy enough to wander away and talk to someone else.

Similarly at one point I worked with an instructor who, on the first day of class, would have the kids do this with their names: kid #7 would have to recite the names of the six kids who had gone before. Unfair to the kids in the back, of course, but that's what you get for sitting in the back. And the instructor would promise ahead of time that he would have go last and the TAs would have to go next-to-last.

Pain tolerance

Dave said:

Dominus suggests a version of chess where you can make your opponent take a move back by chopping off one of your fingers. This is rather permanent (at least for humans). But what about temporary pain?

I had originally imagined this variation played by aliens with many regenerating tentacles, where cutting one off is painful, embarrassing, and inconvenient, but not crippling. But then I thought the idea of playing it as a human was much funnier and more compelling, and it ran away with me.

You enter a chess tournament and sit down. Try to imagine your thought process when you see that your opponent is missing a finger. Or that your opponent is missing three fingers.

There would be stories about how in the 2008 Olympiad Hoekstra was mounting a devastating attack, and thought he was safe because his opponent Berenin was not a finger-chopper. But then Berenin completely foiled the attack by unexpectedly chopping his finger, at just the right moment. (Interviewer: “You have never cut off a finger before. Was it a sudden inspiration, Grandmaster Berenin?” Berenin: “No, I saw what Hoekstra was doing, so I had been planning since move 13 to interfere with his rook defense in this way.”)

Or there might be the legendary game in which Berenin made the devastating move ♘fe6, and when Hoekstra, perhaps panicking, cut off a finger, Berenin merely shrugged and immediately made the equally devastating and nearly identical move ♘de6.

What's the record for one player cutting off fingers in a single game? Is it a legendarily bad game by a reckless dumbass? Or is it a story about how GM Basanian would stop at nothing to win the 1972 world championship? What's the record for two players cutting off fingers in a single game?

Okay, let's try a more plausible variation. Chess, but if your opponent makes a move you don't like, you can force them to take it back and play another by taking a shot of schnapps. Intriguing!

Dave continues, discussing a P.K. Dick story in which the characters take turns holding their fingers in a cigarette lighter. In the story, they're not burned, because Dick, but you could imagine playing this as a brutal game in our non-Dick universe. In fact I thought I might have heard of people playing exactly this game, but I'm not sure.

For a game of this type that I'm certain of, consider Episode 13 of Survivor: Borneo:

Final immunity challenge: Each tribe member held on to the immunity idol while standing on a small log. The person who lasted the longest would win immunity. After two hours of holding on the idol, Jeff tempted the three with oranges. After 2½ hours, Richard gave a speech, said he wouldn't be able to outlast Kelly, and stepped down voluntarily. … After three hours, the two left switched positions while keeping their hand on the idol and were to do so every half-hour. … After 4 hours, 11 minutes, Rudy [inadvertently] took his hand off the idol while switching spots, and Kelly won immunity yet again.

People compete in eating contests, which also has an element of seeing whose body can take the most abuse. And there's that game where two players take turns hitting each other in the face until one gives up or is too battered to continue — I don't know what it's called.

There are also games like Chicken and Russian Roulette (and possibly follow-the-leader) that are about who is willing to tolerate the greatest amount of unnecessary risk.

Conclusion

If you haven't read Dave's article yet, at least check out the thing about the Sichuan peppercorns.

[ Addendum 20230517: More notes about games with pain mechanics. ]

[Other articles in category /notes] permanent link

Fri, 24 Mar 2023

Notes on card games played by aliens

A few years back I mentioned, in an article on a quite different topic:

This article, believe it or not, started out as a discussion of whether the space aliens, when they arrive, will already know how to play chess. (Well, obviously not. But it is less obvious that they will not already know how to play go. I will write that article sooner or later.)

Apparently the answer to that is ‘later’. Much later.

Last month Eric Erbach wrote to ask:

Will we ever hear about why the aliens might already know how to play Go?

I'm not sure. This is not that article.

But M. Erbach inspired me to think about this some more and I excitedly sent him several emails about the alien relationship to poker and other card games. Then today I remembered I have a whole section of the blog for ‘notes’ and even though it has only two articles in it it was always intended as a place where I could dump interesting ideas that might never go anywhere. So here we are!

I didn't actually talk about chess or go in my message, but I went off in a different direction.

I think it's also likely the aliens play something like poker. Not with the same cards or hands, but something along these lines:

Multiple players are assigned a position or status

Most or all of the information the players' positions is hidden

Multiple rounds in which players may commit to their changing estimations of how good their hidden positions are

A final showdown in which the hidden information is made public

Crucial point: you can win either by having the best actual position, or by bluffing the other players into giving up.

The essence of poker is the secret information, which enables bluffing: does that person across the table from me have a winning position, or are they just pretending to have a winning position? Or, conversely: fate has dealt me a losing hand. How do I make the best of a bad situation?

This is, I believe, a crucial sort of problem that will come up over and over for all sentient beings. Everyone at some point has to make the best of a bad situation. You have to know when to hold 'em, and when to fold 'em. Games like poker are stripped-down practice versions of these sorts of fundamental problems.

(Games like chess and go are stripped-down practice versions of different sorts of fundamental problems, problems of strategic planning and tactical execution.)

Thinking about human card games, I said:

I wonder if they will also have trick-taking games?

Humans invented trick-taking card games over and over. Consider not just European games like bridge, pinochle, euchre, skat, hearts, spades, and canasta, which may have developed out of simpler games invented a thousand years ago in China, but also the West African game of agram. So it is tempting to think that the aliens would surely do the same. But maybe not! Turning it around the other way, I think auctions are also something very likely to be shared by all sentient beings, and yet human card games seem include very few auction-themed games. If the humans can neglect what must be a huge class of sealed-bid auction games, perhaps the aliens will somehow forget to have trick-taking games.

Trick-taking games like contract bridge and auction bridge have things called auctions, but they're not actually very auction-like. For an example of what I mean, here's goofspiel, a game that is pure auction:

One player receives the thirteen spade cards, the other the thirteen club cards. If there is a third player, they receive the thirteen heart cards.

The diamonds are shuffled, stacked face down, and the top one is turned over. Each player now offers a sealed bid to buy the face-up diamond card. They do this by selecting one of the cards from their hand and playing it face-down on the table.

The sealed bids are revealed simultaneously and the highest bid placed wins the auction. scoring points for the winning player: the A♢ is worth one point and K♢ worth 13. The bid cards are discarded, the next diamond is revealed, and the game continues with each player offering a bid from their now-depleted hand.

After 13 rounds, the player with the most points out of 91 is the winner.

For a more complex auction bidding game, see Sid Sackson's All My Diamonds.

Now, my point is that as far as I know there are very few popular human auction games. But perhaps, instead of bluffing games like poker or trick-taking games like whist, the aliens love to play auction games with cards? Many interesting variations are possible. In email with M. Erback I suggested a completely made-up auction game of a sort I've never seen among humans:

A bundle of cards is dealt face-down to the middle of the table. Each player, in turn, examines one card and then offers a bid for the whole bundle, which must be higher than the previous bid. (Or alternatively, drops out.) After going round the table a few times, the bundle is sold to the highest bidder. At that point it is revealed, evaluated according to some standard schedule, and the finances of the winner are settled.

I think there might be interesting strategy here. Suppose you are going second. Which card of the bundle will you look at? You can look at the same one that the first player looked at, so now you know what they know, but they also know what you know and that you know what they know. Or you can look at a different card, and learn something that nobody else knows yet.

After that you have to make a bid, which communicates to the other players something about what you know, and in the early stages of the game you can be tricky and make your bid a bluff.

Maybe players are allowed to pass without dropping out. Or maybe it wouldn't work at all; you never know until you playtest it. What if we played Texas Hold'em in this style? Instead of exposing all five community cards, some were kept partly secret?

Another thing we can do to get an alien game is to take a common human game and make it into something completely different by changing the emphasis:

In human contract bridge, communication between partners is highly formalized and strictly limited. Signaling the contents of your hand to your partner, other than by bidding, is considered cheating. What about an alien version of bridge in which the signaling is the whole point? You and your partner are expected to communicate the contents of your respective hands, and coordinate your play, and also to steal your opponents' signs if possible.

Many old games can be spiced up in this way. For example chess, but if your opponent plays a move you don't like, you can force them to take it back and play a different move, by chopping off one of your fingers. Totally different strategy!

Finally, I remembered a funny moment from the Larry Niven story “There is a Tide”. Niven's character Louis Wu has discovered a valuable lost artifact. Unfortunately, a representative of a previously-uncontacted alien species has discovered the same artifact. Rather than fighting, Wu proposes that they gamble for it.

“There is a mathematics game,” said the alien.… “The game involves a screee—" Some word that the autopilot couldn't translate. The alien raised a three-clawed hand, holding a lens-shaped object. The alien's mutually opposed fingers turned it so that Louis could see the different markings on each side. "This is a screee. You and I will throw it upward six times each. I will choose one of the symbols, you will choose the other. If my symbol falls looking upward more often than yours, the artifact is mine. The risks are even."

"Agreed," said Louis. He was a bit disappointed in the simplicity of the game.

Pretty funny! Louis is imagining something fun and interesting, but the alien proposes the opposite of this. In my opinion this is a good plan, as it will tend to prevent arguments. Although something needs to be said about the 22% chance that Louis and the alien will tie. In any case, Louis is kind of a dilettante, and it turns out that the alien is actually playing the game that is one level up.

[ Addendum: Thanks to John Wiersba for providing me with the name of goofspiel. ]

Addendum 20230523: Dave Turner has a sort-of response to this article titled “Rarely seen game mechanics”. ]

[Other articles in category /notes] permanent link

Thu, 30 Jan 2014

Twingler, a generic data structure munger

(Like everything else in this section, these are notes for a project that was never completed.)

Introduction

These incomplete notes from 1997-2001 are grappling with the problem of transforming data structures in a language like Perl, Python, Java, Javascript, or even Haskell. A typical problem is to take an input of this type:

  [
    [Chi, Ill],
    [NY, NY],
    [Alb, NY],
    [Spr, Ill],
    [Tr, NJ],
    [Ev, Ill],
  ]

and to transform it to an output of this type:

  { Ill => [Chi, Ev, Spr],
    NY  => [Alb, NY],
    NJ  => [Tr],
  }

One frequently writes code of this sort, and it should be possible to specify the transformation with some sort of high-level declarative syntax that is easier to read and write than the following gibberish:

   my $out;
   for my $pair (@$in) {
       push @{$out->{$pair->[0]}}, $pair->[1];
   }
   for my $k (keys %$out) {
       @{$out->{$k}} = sort @{$out->{$k}};
   }

This is especially horrible in Perl, but it is bad in any language. Here it is in a hypothetical language with a much less crusty syntax:

 for pair (in.items) :
   out[pair[0]].append(pair[1])
 for list (out.values) :
   list.sort

You still can't see what it really going on without executing the code in your head. It is hard for a beginner to write, and hard to anyone to understand.

Original undated notes from around 1997–1998

Consider this data structure DS1:

  [
    [Chi, Ill],
    [NY, NY],
    [Alb, NY],
    [Spr, Ill],                DS1
    [Tr, NJ],
    [Ev, Ill],
  ]

This could be transformed several ways:

  {
    Chi => Ill,
    NY => NY, 
    Alb => NY,
    Spr => Ill,                DS2
    Tr => NJ,
    Ev => Ill,
  }

  { Ill => [Chi, Spr, Ev],
    NY  => [NY, Alb],        DS3
    NJ  => Tr,
  }
  { Ill => 3,
    NY  => 2,
    NJ  => 1,
  }

  [ Chi, Ill, NY, NY, Alb, NY, Spr, Ill, Tr, NJ, Ev, Ill]    DS4

Basic idea: Transform original structure of nesting depth N into an N-dimensional table. If Nth nest is a hash, index table ranks by hash keys; if an array, index by numbers. So for example, DS1 becomes

        1        2
    1   Chi Ill
    2   NY  NY
    3   Alb NY
    4   Spr Ill
    5   Tr  NJ
    6   Ev  Ill

Or maybe hashes should be handled a little differently? The original basic idea was more about DS2 and transformed it into

            Ill        NY        NJ
    Chi     X
    NY              X
    Alb             X
    Spr     X
    Tr                      X
    Ev      X

Maybe the rule is: For hashes, use a boolean table indexed by keys and values; for arrays, use a string table index by integers.

Notation idea: Assign names to the dimensions of the table, say X and Y. Then denote transformations by:

    [([X, Y])]        (DS1)
    {(X => Y)}        (DS2)
    {X => [Y]}        (DS3)
    [(X, Y)]        (DS4)

The (...) are supposed to incdicate a chaining of elements within the larger structure. But maybe this isn't right.

At the bottom: How do we say whether

    X=>Y, X=>Z

turns into

    [ X => Y, X => Z ]                (chaining)

or [ X => [Y, Z] ] (accumulation)

Consider

      A B C
    D . . .
    E . .
    F .   .

<...> means ITERATE over the thing inside and make a list of the results. It's basically `map'.

Note that:

    <X,Y>     |=   (D,A,D,B,D,C,E,A,E,B,F,A,F,C)
    <X,[<Y>]> |=   (D,[A,B,C],E,[A,B],F,[A,C])

Brackets and braces just mean brackets and braces. Variables at the same level of nesting imply a loop over the cartesian join. Variables subnested imply a nested loop. So:

    <X,Y> means

    for x in X
      for y in Y
        push @result, (x,y) if present(x,y);

But

    <X,<Y>> means
    for x in X
      for y in Y
        push @yresult, (y) if present(x,y);
      push @result, @yresult

Hmmm. Maybe there's a better syntax for this.

Well, with this plan:

    DS1: [ <[X,Y]> ]
    DS2: { <X=>Y>  }
    DS3: { <X => [<Y>]> }
    DS4: [ <X, Y> ]

It seems pretty flexible. You could just as easily write

    { <X => max(<Y>) }

and you'd get

    { D => C, E => B, F => C }

If there's a `count' function, you can get

    { D => 3, E => 2, F => 2 }

or maybe we'll just overload scalar' to meancount'.

Question: How to invert this process? That's important so that you can ask it to convert one data structure to another. Also, then you could write something like

    [ <city, state> ]  |= { <state => [<city>] > }

and omit the X's and Y's.

Real example: From proddir. Given

    ID / NAME / SHADE / PALETTE / DESC

For example:

    A / AAA / red     / pink   / Aaa
    B / BBB / yellow  / tawny  / Bbb
    A / AAA / green   / nude   / Aaa
    B / BBB / blue    / violet / Bbb
    C / CCC / black   / nude   / Ccc

Turn this into

    { A => [ AAA, [ [red, pink], [green, nude] ], Aaa],
      B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb],
      C => [ CCC, [ [black, nude] ], CCC]
    }


    { < ID => [
                 name,
                 [ <[shade, palette]> ]
                 desc
              ]>
    }

Something interesting happened here. Suppose we have

    [ [A, B]
      [A, B]
    ]

And we ask for <A, B>. Do we get (A, B, A, B), or just (A, B)? Does it remove duplicate items for us or not? We might want either.

In the example above, why didn't we get

    { A => [ AAA, [ [red, pink], [green, nude] ], Aaa],
      A => [ AAA, [ [red, pink], [green, nude] ], Aaa],
      B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb],
      B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb],
      C => [ CCC, [ [black, nude] ], CCC]
    }

If the outer iteration was supposed to be over all id-name-desc triples? Maybe we need

    <...>                all triples
    <!...!>                unique triples only

Then you could say

    <X>  |= <!X!>

to indicate that you want to uniq a list.

But maybe the old notation already allowed this:

    <X> |= keys %{< X => 1 >}

It's still unclear how to write the example above, which has unique key-triples. But it's in a hash, so it gets uniqed on ID anyway; maybe that's all we need.

1999-10-23

Rather than defining some bizarre metalanguage to describe the transformation, it might be easier all around if the user just enters a sample input, a sample desired output, and lets the twingler figure out what to do. Certainly the parser and internal representation will be simpler.

For example:

    [ [ A, B ],
      [ C, B ],
      [ D, E ] ]
    --------------
    { B => [A, C],
      E => [D],
    }

should be enough for it to figure out that the code is:

    for my $a1 (@$input) {
      my ($e1, $e2) = @$a1;
      push @{$output{$e2}}, $e1;
    }

Advantage: After generating the code, it can run it on the sample input to make sure that the output is correct; otherwise it has a bug.

Input grammar:

    %token ELEMENT
    expr: array | hash ;
    array: '[' csl ']' ;
    csl: ELEMENT | ELEMENT ',' csl | /* empty */ ;
    hash: '{' cspl '}' ;
    cspl: pair | pair ',' cspl | /* empty */ ;
    pair: ELEMENT '=>' ELEMENT;

Simple enough. Note that (...) lines are not allowed. They are only useful at the top level. A later version can allow them. It can replace the outer (...) with [...] or {...] as appropirate when it sees the first top-level separator. (If there is a => at the top level, it is a hash, otherwise an array.)

Idea for code generation: Generate pseudocode first. Then translate to Perl. Then you can insert a peephole optimizer later. For example

    foreachkey k (somehash) {
      push somearray, $somehash{k}
    }

could be optimized to

    somearray = values somehash;

add into hash: as key, add into value, replace value add into array: at end only

How do we analyze something like:

    [ [ A, B ],
      [ C, B ],
      [ D, E ] ]
    --------------
    { B => [A, C],
      E => [D],
    }

Idea: Analyze structure of input. Analyze structure of output and figure out an expression to deposit each kind of output item. Iterate over input items. Collect all input items into variables. Deposit items into output in appropriate places.

For an input array, tag the items with index numbers. See where the indices go in the output. Try to discern a pattern. The above example:

    Try #1:
    A: 1
    B: 2
    C: 1
    B: 2  -- consistent with B above
    D: 1
    E: 2

    Output:  2 => [1, 1]
             2 => [1]

OK—2s are keys, 1s are array elements.

A different try fails:

    A: 1
    B: 1
    C: 2
    B: 2 -- inconsistent, give up on this.

Now consider:

    [ [ A, B ],
      [ C, B ],
      [ D, E ] ]
    --------------
    { A => B,
      C => B,
      D => E,
    }

A,C,D get 1; B,E get 2. this works again. 1s are keys, 2s are values.

I need a way of describing an element of a nested data structure as a simple descriptor so that I can figure out the mappings between descriptors. For arrays and nested arrays, it's pretty easy: Use the sequence of numeric indices. What about hashes? Just K/V? Or does V need to be qualified with the key perhaps?

Example above:

    IN:        A:11  B:12 22 C:21 D:31 E:32
    OUT:    A:K   B:V     C:K  D:K  E:V

Now try to find a mapping from the top set of labels to the bottom. x1 => K, x2 => V works.

Problem with this:

    [ [ A, B ],
      [ B, C ],
    ]
    ------------
    { A => B,
      B => C,
    }

is unresolvable. Still, maybe this works well enough in most common cases.

Let's consider:

    [[ A , AAA , red     , pink   , Aaa], 
     [ B , BBB , yellow  , tawny  , Bbb],
     [ A , AAA , green   , nude   , Aaa],
     [ B , BBB , blue    , violet , Bbb],
     [ C , CCC , black   , nude   , Ccc],
    ]
    -------------------------------------------------------------
    { A => [ AAA, [ [red, pink], [green, nude] ], Aaa],
      B => [ BBB, [ [yellow, tawny], [blue, violet] ], Bbb],
      C => [ CCC, [ [black, nude] ], CCC]
    }

    A:        00,20 => K
    AAA:        01,21 => V0
    red:        02    => V100
    pink:        03    => V101
    Aaa:    04    => V2
    B:      10,30 => K
    C:      40    => K

etc.

Conclusion: x0 => K; x1 => V0; x2 => V100; x3 => V101; x4 => V2

How to reverse?

Simpler reverse example:

    { A => [ B, C ],
      E => [ D ],
    }
    ---------------------
    [ [ A, B ],
      [ A, C ],
      [ E, D ],
    ]

    A: K        => 00, 10
    B: V0   => 01
    C: V1   => 11
    D: V0   => 21
    E: K    => 20

Conclusion: K => x0; V => x1 This isn't enough information. Really, V => k1, where k is whatever the key was!

What if V items have the associated key too?

    A: K        => 00, 10
    B: V{A}0=> 01
    C: V{A}1=> 11
    D: V{E}0=> 21
    E: K    => 20

Now there's enough information to realize that B and C stay with the A, if we're smart enough to figure out how to use it.

2001-07-28

Sent to Nyk Cowham

2001-08-24

Sent to Timur Shtatland

2001-10-28

Here's a great example. The output from HTML::LinkExtor is a list like

    ([img, src, URL1, losrc, URL2],
     [a, href, URL3],
     ...
    )

we want to transform this into

    (URL1 => undef, 
     URL2 => undef, 
     URL3 => undef, 
     ...
    )

[Other articles in category /notes] permanent link

Sun, 19 Jan 2014

Notes on a system for abbreviating SQL queries

(This post inaugurates a new section on my blog, for incomplete notes. It often happens that I have some idea, usually for software, and I write up a bunch of miscellaneous notes about it, and then never work on it. I'll use this section to post some of those notes, mainly just because I think they might be interesting, but also in the faint hope that someone might get interested and do something with it.)

Why are simple SQL queries so verbose?

For example:

    UPDATE batches b
      join products p using (product_id)
      join clients c using (client_id)
    SET b.scheduled_date = NOW()
    WHERE b.scheduled_date > NOW()
      and b.batch_processor = 'batchco'
      and c.login_name = 'mjd' ;

(This is 208 characters.)

I guess about two-thirds of this is unavoidable, but those join-using clauses ought to be omittable, or inferrable, or abbreviatable, or something.

b.batch_processor should be abbreviated to at least batch_processsor, since that's the only field in those three tables with that name. In fact it could probably be inferred from b_p. Similarly c.login_name -> login_name -> log or l_n.

   update batches set sch_d = NOW()
   where sch_d > NOW()
   and bp = 'batchco'
   and cl.ln = 'mjd'

(Only 94 characters.)

cl.ln is inferrable: Only two tables begin with cl. None of the field names in the client_transaction_view table look like ln. So cl.ln unambiguously means client.login_name.

Then the question arises of how to join the batches to the clients. This is the only really interesting part of this project, and the basic rule is that it shouldn't do anything really clever. There is a graph, which the program can figure out from looking at the foreign key constraints. And the graph should clearly have a short path from batches through products to clients.

bp might be globally ambiguous, but it can be disambiguated by assuming it's in one of the three tables involved.

If something is truly ambiguous, we can issue an intelligent request for clarification:

"bp" is ambiguous. Did you mean:
  1. batches.batch_processor
  2. batches.bun_predictor
  0. None of the above
which? _

Overview

Debreviate table names
Figure out required joins and join fields
Debreviate field names

Can 1 and 2 really be separated? They can in the example above, but maybe not in general.

I think separating 3 and putting it at the end is a good idea: don't try to use field name abbreviations to disambiguate and debreviate table names. Only go the other way. But this means that we can't debreviate cl, since it might be client_transaction_view.

What if something like cl were left as ambiguous after stage 1, and disambiguated only in stage 3? Then information would be unavailable to the join resolution, which is the part that I really want to work.

About abbreviations

Abbreviations for batch_processor:

          bp
          b_p
          ba_pr
          batch_p

There is a tradeoff here: the more different kinds of abbreviations you accept, the more likely there are to be ambiguities.

About table inference

There could also be a preferences file that lists precedences for tables and fields: if it lists clients, then anything that could debreviate to clients or to client_transaction_view automatically debreviates to clients. The first iteration could just be a list of table names.

About join inference

Short join paths are preferred to long join paths.

If it takes a long time to generate the join graph, cache it. Build it automatically on the first run, and then rebuild it on request later on.

More examples

(this section blank)

Implementation notes

Maybe convert the input to a SQL::Abstract first, then walk the resulting structure, first debreviating names, then inserting joins, then debreviating the rest of the names. Then you can output the text version of the result if you want to.

Note that this requires that the input be valid SQL. Your original idea for the abbreviated SQL began with

   update set batches.sch_d = NOW()

rather than

   update batches set sch_d = NOW()

but the original version would probably be ruled out by this implementation. In this case that is not a big deal, but this choice of implementation might rule out more desirable abbreviations in the future.

Correcting dumb mistakes in the SQL language design might be in Quirky's purview. For example, suppose you do

     select * from table where (something)

Application notes

RJBS said he would be reluctant to use the abbreviated version of a query in a program. I agree: it would be very foolish to do so, because adding a table or a field might change the meaning of an abbreviated SQL query that was written into a program ten years ago and has worked ever since. This project was never intended to abbreviate queries in program source code.

Quirky is mainly intended for one-off queries. I picture it going into an improved replacement for the MySQL command-line client. It might also find use in throwaway programs. I also picture a command-line utility that reads your abbreviated query and prints the debreviated version for inserting into your program.

Miscellaneous notes

(In the original document this section was blank. I have added here some notes I made in pen on a printout of the foregoing, on an unknown date.)

Maybe also abbreviate update => u, where => w, and => &. This cuts the abbreviated query from 94 to 75 characters.

Since debreviation is easier [than join inference] do it first!

Another idea: "id" always means the main table's primary key field.

[Other articles in category /notes] permanent link