The Universe of Discourse

Mon, 31 Jul 2023

Can you identify this language?

Rummaging around in the Internet Archive recently, I found a book in a language I couldn't recognize. Can you identify it? Here's a sample page:

The page
is hard to read, but as far as I can tell, it begins: “plac'het iaouank a ioa ouz ho gortoz, ho chleuzeuriou var
elum, a ieas gantho d ai zal a eured; ha goudeze e oue serret an or Ar
plac'het iaouank all a erruas ive d'ar fin, ha setu hi da c'hervel ar
goaz nevez en eur lavaret; …”

I regret that IA's scan is so poor.

Answer: Breton.


Addendum 20230731: Bernhard Schmalhofer informs me that HathiTrust has a more legible scan. ]

[Other articles in category /lang] permanent link

Sun, 30 Jul 2023

The shell and its crappy handling of whitespace

I'm about thirty-five years into Unix shell programming now, and I continue to despise it. The shell's treatment of whitespace is a constant problem. The fact that

    for i in *.jpg; do
      cp $i /tmp

doesn't work is a constant pain. The problem here is that if one of the filenames is bite me.jpg then the cp command will turn into

    cp bite me.jpg /tmp

and fail, saying

    cp: cannot stat 'bite': No such file or directory
    cp: cannot stat 'me.jpg': No such file or directory

or worse there is a file named bite that is copied even though you did not want to copy it, maybe overwriting /tmp/bite that you wanted to keep.

To make it work properly you have to say

    for i in *; do
      cp "$i" /tmp

with the quotes around the $i.

Now suppose I have a command that strips off the suffix from a filename. For example,

    suf foo.html

simply prints foo to standard output. Suppose I want to change the names of all the .jpeg files to the corresponding names with .jpg instead. I can do it like this:

    for i in *.jpeg; do
      mv $i $(suf $i).jpg

Ha ha, no, some of the files might have spaces in their names. I have to write:

    for i in *.jpeg; do
      mv "$i" $(suf "$i").jpg    # two sets of quotes

Ha ha, no, fooled you, the output of suf will also have spaces. I have to write:

    for i in *.jpeg; do
      mv "$i" "$(suf "$i")".jpg  # three sets of quotes

At this point it's almost worth breaking out a real language and using something like this:

    ls *.jpeg | perl -nle '($z = $_) =~ s/\.jpeg$/.jpg/; rename $_ => $z'

I think what bugs me most about this problem in the shell is that it's so uncharacteristic of the Bell Labs people to have made such an unforced error. They got so many things right, why not this?

It's not even a hard choice! 99% of the time you don't want your strings implicitly split on spaces, why would you? For example they got the behavior of for i in *.jpeg right; if one of those files is bite me.jpeg the loop still runs only once for that file. And the shell doesn't have this behavior for any other sort of special character. If you have a file named foo|bar and a variable z='foo|bar' then ls $z doesn't try to pipe the output of ls foo into the bar command, it just tries to list the file foo|bar like you wanted. But if z='foo bar' then ls $z wants to list files foo and bar. How did the Bell Labs wizards get everything right except the spaces?

Even if it was a simple or reasonable choice to make in the beginning, at some point around 1979 Steve Bourne had a clear opportunity to realize he had made a mistake. He introduced $* and must shortly therefter have discovered that it wasn't useful. This should have gotten him thinking.

$* is literally useless. It is the variable that is supposed to contain the arguments to the current shell. So you can write a shell script:

    # “yell”
    echo "I am about to run '$*' now!!1!"
    exec $*

and then run it:

    $ yell date
    I am about to run 'date' now!!1!
    Wed Apr  2 15:10:54 EST 1980

except that doesn't work because $* is useless:

    $ ls *.jpg
    bite me.jpg

    $ yell ls *.jpg
    I am about to run 'ls bite me.jpg' now!!1!
    ls: cannot access 'bite': No such file or directory
    ls: cannot access 'me.jpg': No such file or directory

Oh, I see what went wrong, it thinks it got three arguments, instead of two, because the elements of $* got auto-split. I needed to use quotes around $*. Let's fix it:

    echo "I am about to run '$*' now!!1!"
    exec "$*"

    $ yell ls *.jpg
    yell: 3: exec: ls /tmp/bite me.jpg: not found

No, the quotes disabled all the splitting so that now I got one argument that happens to contain two spaces.

This cannot be made to work. You have to fix the shell itself.

Having realized that $* is useless, Bourne added a workaround to the shell, a unique special case with special handling. He added a $@ variable which is identical to $* in all ways but one: when it is in double-quotes. Whereas $* expands to

    $1 $2 $3 $4 …

and "$*" expands to

    "$1 $2 $3 $4 …"

"$@" expands to

    "$1" "$2" "$3" "$4" …

so that inside of yell ls *jpg, an exec "$@" will turn into exec "ls" "bite me.jpg" and do what you wanted exec $* to do in the first place.

I deeply regret that, at the moment that Steve Bourne coded up this weird special case, he didn't instead stop and think that maybe something deeper was wrong. But he didn't and here we are. Larry Wall once said something about how too many programmers have a problem, think of a simple solution, and implement the solution, and what they really need to be doing is thinking of three solutions and then choosing the best one. I sure wish that had happened here.

Anyway, having to use quotes everywhere is a pain, but usually it works around the whitespace problems, and it is not much worse than a million other things we have to do to make our programs work in this programming language hell of our own making. But sometimes this isn't an adequate solution.

One of my favorite trivial programs is called lastdl. All it does is produce the name of the file most recently written in $HOME/Downloads, something like this:

    cd $HOME/Downloads
    echo $HOME/Downloads/"$(ls -t | head -1)"

Many programs stick files into that directory, often copied from the web or from my phone, and often with long and difficult names like e15c0366ecececa5770e6b798807c5cc.jpg or 2023_3_20230310_120000_PARTIALPAYMENT_3028707_01226.PDF or gov.uscourts.nysd.590045.212.0.pdf that I do not want to type or even autocomplete. No problem, I just do

    rm $(lastdl)


    okular $(lastdl)


    mv $(lastdl) /tmp/receipt.pdf

except ha ha, no I don't, because none of those works reliably, they all fail if the difficult filename happens to contain spaces, as it often does. Instead I need to type

    rm "$(lastdl)"
    okular "$(lastdl)"
    mv "$(lastdl)" /tmp/receipt.pdf

which in a command so short and throwaway is a noticeable cost, a cost extorted by the shell in return for nothing. And every time I do it I am angry with Steve Bourne all over again.

There is really no good way out in general. For lastdl there is a decent workaround, but it is somewhat fishy. After my lastdl command finds the filename, it renames it to a version with no spaces and then prints the new filename:

    # This is not the real code
    # and I did not test it
    cd $HOME/Downloads
    fns="$HOME/Downloads/$(ls -t | head -1)"              # those stupid quotes again
    fnd="$HOME/Downloads/$(echo "$fns" | tr ' \t\n' '_')" # two sets of stupid quotes this time
    mv "$fns" $HOME/Downloads/$fnd                        # and again
    echo $fnd

The actual script is somewhat more reliable, and is written in Python, because shell programming sucks.

[ Addendum 20230731: Drew DeVault has written a reply article about how the rc shell does not have these problems. rc was designed in the late 1980s by Tom Duff of Bell Labs, and I was a satisfied user (of the Byron Rakitzis clone) for many years. Definitely give it a look. ]

[ Addendum 20230806: Chris Siebenmann also discusses rc. ]

[Other articles in category /Unix] permanent link

Sat, 29 Jul 2023

Tiny life hack: paint your mouse dongles

I got a small but easy win last month. I have many wireless mice, and many of them are nearly impossible to tell apart.

Formerly, I would take my laptop somewhere, leaving the mouse behind, but accidentally take the dongle with me. Then I had a mouse with no dongle, but no way to match the dongle with all the other mice that had no dongle.

At best I could remember to put the dongles on a shelf at home, the mice on an adjacent shelf, and periodically attempt to match them up. This is a little more troublesome than it sounds at first, because a mouse that seems not to match any of the dongles might just be out of power. So I have to change the batteries in all the mice also.

Anyway, this month I borrowed Toph's paint markers and color-coded each mouse and dongle pair. Each mouse has a different color scribbled on its underside, and each dongle has a matching scribble. Now when I find a mystery dongle in one of my laptops, it's easy to figure out which mouse it belongs with.

Logitech mouse, lying on its
back, and its dongle.  The head of the dongle and the underside of
the mouse have been scribbled on with sky-blue

The blue paint is coming off the dongle here, but there's still enough to recognize it by. I can repaint it before the color goes completely.

I had previously tried Sharpie marker, which was too hard to see and wore off to quickly. I had also tried scribing a pattern of scratches into each mouse and its dongle, but this was too hard to see, and there isn't enough space on a mouse dongle to legibly scribe very much. The paint markers worked better.

I used Uni Posca markers. You can get a set of eight fat-tipped markers for $20 and probably find more uses for them. Metallic colors might be more visible than the ones I used.

[ Addendum 20230730: A reader reports good results using nail polish, saying “It's cheap, lots of colors available and if you don't use gel variants it's pretty durable.”. Thanks nup! ]

[Other articles in category /tech] permanent link