The shell and its crappy handling of whitespace
I'm about thirty-five years into Unix shell programming now, and I
continue to despise it. The shell's treatment of whitespace is
a constant problem. The fact that
for i in *.jpg; do
cp $i /tmp
done
doesn't work is a constant pain. The problem here is that if one of
the filenames is bite me.jpg then the cp command will turn into
cp bite me.jpg /tmp
and fail, saying
cp: cannot stat 'bite': No such file or directory
cp: cannot stat 'me.jpg': No such file or directory
or worse there is a file named bite that is copied even though
you did not want to copy it, maybe overwriting /tmp/bite that you
wanted to keep.
To make it work properly you have to say
for i in *; do
cp "$i" /tmp
done
with the quotes around the $i .
Now suppose I have a command
that strips off the suffix from a filename. For example,
suf foo.html
simply prints foo to standard output. Suppose I want to change the
names of all the .jpeg files to the corresponding names with .jpg
instead. I can do it like this:
for i in *.jpeg; do
mv $i $(suf $i).jpg
done
Ha ha, no, some of the files might have spaces in their names.
I have to write:
for i in *.jpeg; do
mv "$i" $(suf "$i").jpg # two sets of quotes
done
Ha ha, no, fooled you, the output of suf will also have spaces. I
have to write:
for i in *.jpeg; do
mv "$i" "$(suf "$i")".jpg # three sets of quotes
done
At this point it's almost worth breaking out a real language and using
something like this:
ls *.jpeg | perl -nle '($z = $_) =~ s/\.jpeg$/.jpg/; rename $_ => $z'
I think what bugs me most about this problem in the shell is that it's
so uncharacteristic of the Bell Labs people to have made such an
unforced error. They got so many things right, why not this?
It's not even a hard choice! 99% of the time you don't want your
strings implicitly split on spaces, why would you?
For example they got the behavior of for i in *.jpeg right; if one of those
files is bite me.jpeg the loop still runs only once for that file.
And the shell
doesn't have this behavior for any other sort of special character.
If you have a file named foo|bar and a variable z='foo|bar' then
ls $z doesn't try to pipe the output of ls foo into the bar
command, it just tries to list the file foo|bar like you wanted. But
if z='foo bar' then ls $z wants to list files foo and bar .
How did the Bell Labs wizards get everything right except the
spaces?
Even if it was a simple or reasonable choice to make in the beginning,
at some point around 1979 Steve Bourne had a clear opportunity to
realize he had made a mistake. He introduced $* and must shortly
therefter have discovered that it wasn't useful. This should have
gotten him thinking.
$* is literally useless. It is the variable that is supposed to
contain the arguments to the current shell. So you can write a shell
script:
#!/bin/sh
# “yell”
echo "I am about to run '$*' now!!1!"
exec $*
and then run it:
$ yell date
I am about to run 'date' now!!1!
Wed Apr 2 15:10:54 EST 1980
except that doesn't work because $* is useless:
$ ls *.jpg
bite me.jpg
$ yell ls *.jpg
I am about to run 'ls bite me.jpg' now!!1!
ls: cannot access 'bite': No such file or directory
ls: cannot access 'me.jpg': No such file or directory
Oh, I see what went wrong, it thinks it got three arguments, instead
of two, because the elements of $* got auto-split. I needed to use
quotes around $* . Let's fix it:
#!/bin/sh
echo "I am about to run '$*' now!!1!"
exec "$*"
$ yell ls *.jpg
yell: 3: exec: ls /tmp/bite me.jpg: not found
No, the quotes disabled all the splitting so that now I got one
argument that happens to contain two spaces.
This cannot be made to work. You have to fix the shell itself.
Having realized that $* is useless, Bourne added a workaround to the
shell, a unique special case with special handling. He added a $@ variable which
is identical to $* in all ways but one: when it is in
double-quotes. Whereas $* expands to
$1 $2 $3 $4 …
and "$*" expands to
"$1 $2 $3 $4 …"
"$@" expands to
"$1" "$2" "$3" "$4" …
so that inside of yell ls *jpg , an exec "$@" will turn into
exec "ls" "bite me.jpg" and do what you wanted exec $* to do in
the first place.
I deeply regret that, at the moment that Steve Bourne coded up this
weird special case, he didn't instead stop and think that maybe
something deeper was wrong. But he didn't and here we are. Larry
Wall once said something about how too many programmers have a
problem, think of a simple solution, and implement the solution, and
what they really need to be doing is thinking of three solutions and
then choosing the best one. I sure wish that had happened here.
Anyway, having to use quotes everywhere is a pain, but usually it
works around the whitespace problems, and it is not much worse than a
million other things we have to do to make our programs work in this
programming language hell of our own making. But sometimes this
isn't an adequate solution.
One of my favorite trivial programs is called lastdl . All it does
is produce the name of the file most recently written in
$HOME/Downloads , something like this:
#!/bin/sh
cd $HOME/Downloads
echo $HOME/Downloads/"$(ls -t | head -1)"
Many programs stick files into that directory, often copied from the
web or from my phone, and often with long and difficult names like
e15c0366ecececa5770e6b798807c5cc.jpg or
2023_3_20230310_120000_PARTIALPAYMENT_3028707_01226.PDF or
gov.uscourts.nysd.590045.212.0.pdf that I do not want to type or
even autocomplete. No problem, I just do
rm $(lastdl)
or
okular $(lastdl)
or
mv $(lastdl) /tmp/receipt.pdf
except ha ha, no I don't, because none of those works reliably, they
all fail if the difficult filename happens to contain spaces, as it
often does. Instead I need to type
rm "$(lastdl)"
okular "$(lastdl)"
mv "$(lastdl)" /tmp/receipt.pdf
which in a command so short and throwaway is a noticeable cost,
a cost extorted by the shell in return for nothing.
And every time I do it I am angry with Steve Bourne all over again.
There is really no good way out in general. For lastdl there is a decent
workaround, but it is somewhat fishy. After my lastdl command finds the
filename, it renames it to a version with no spaces and then prints
the new filename:
#!/bin/sh
# This is not the real code
# and I did not test it
cd $HOME/Downloads
fns="$HOME/Downloads/$(ls -t | head -1)" # those stupid quotes again
fnd="$HOME/Downloads/$(echo "$fns" | tr ' \t\n' '_')" # two sets of stupid quotes this time
mv "$fns" $fnd # and again
echo $fnd
The actual script
is somewhat more reliable, and is written in Python, because shell
programming sucks.
[ Addendum 20230731: Drew DeVault has written a reply article about
how the rc shell does not have these problems.
rc was designed in the late 1980s by Tom Duff of Bell Labs, and I
was a satisfied user (of the Byron Rakitzis clone) for many years. Definitely give it a look. ]
[ Addendum 20230806: Chris Siebenmann also discusses rc . ]
[Other articles in category /Unix]
permanent link
|