Proposal for turning off standard I/O buffering
Some Unix commands, such as grep , will have a command-line flag to
say that you want to turn off the buffering that is normally done in
the standard I/O library. Some just try to guess what you probably
want. Every command is a little different and if the command you want
doesn't have the flag you need, you are basically out of luck.
Maybe I should explain the putative use case here. You have some
command (or pipeline) X that will produce dribbles of data at
uncertain intervals. If you run it at the terminal, you see each
dribble timely, as it appears. But if you put X into a pipeline,
say with
X | tee ...
or
X | grep ...
then the dribbles are buffered and only come out of X when an entire
block is ready to be written, and the dribbles could be very old
before the downstream part of the pipeline, including yourself, sees
them. Because this is happening in user space inside of X , there is
not a damn thing anyone farther downstream can do about it. The only
escape is if X has some mode in which it turns off standard I/O
buffering. Since standard I/O buffering is on by default, there is a
good chance that the author of X did not think to affirmatively add
this feature.
Note that adding the --unbuffered flag to the downstream grep does
not solve the problem; grep will produce its own output timely, but
it's still getting its input from X after a long delay.
One could imagine a program which would interpose a pseudo-tty, and
make X think it is writing to a terminal, and then the standard I/O
library would stay in line-buffered mode by default. Instead of
running
X | tee some-file | ...
or whatever, one would do
pseudo-tty-pipe -c X | tee some-file | ...
which allocates a pseudo-tty device, attaches standard output to it,
and forks. The child runs X , which dribbles timely into the
pseudo-tty while the parent runs a read loop to remove dribbles from
the master end of the TTY and copy them timely into the pipe. This
would work. Although tee itself also has no --unbuffered flag
so you might even have to:
pseudo-tty-pipe -c X | pseudo-tty-pipe -c 'tee some-file' | ...
I don't think such a program exists, and anyway, this is all
ridiculous, a ridiculous abuse of the standard I/O library's buffering
behavior: we want line buffering, the library will only give it to us
if the process is attached to a TTY device, so we fake up a TTY just
to fool stdio into giving us what we want. And why? Simply because
stdio has no way to explicitly say what we want.
But it could easily expose this behavior as a controllable
feature. Currently there is a branch in the library that says how to
set up a buffering mode when a stream is opened for the first time:
if the stream is for writing, and is attached to descriptor 2,
it should be unbuffered; otherwise …
if the stream is for writing, and connects descriptor 1 to a
terminal device, it should be line-buffered; otherwise …
if the moon is waxing …
…
otherwise, the stream should be block-buffered
To this, I propose a simple change, to be inserted right at the beginning:
If the environment variable STDIO_BUF is set to "line" , streams
default to line buffering. If it's set to "none" , streams default
to no buffering. If it's set to "block" , streams default to block
buffered. If it's anything else, or unset, it is ignored.
Now instead of this:
pseudo-tty-pipe --from X | tee some-file | ...
you write this:
STDIO_BUF=line X | tee some-file | ...
Problem solved.
Or maybe you would like to do this:
export STDIO_BUF=line
which then it affects every program in every pipeline in the rest of
the session:
X | tee some-file | ...
Control is global if you want it, and per-process if you want it.
This feature would cost around 20 lines of C code in the standard I/O
library and would impose only an insigificant run-time cost. It would
effectively add an --unbuffered flag to every program in the
universe, retroactively, and the flag would be the same for every
program. You would not have to remember that in mysql the magic
option is -n and that in GNU grep it is --line-buffered and that
for jq is is --unbuffered and that Python scripts can be
unbuffered by supplying the -u flag and that in tee you are just
SOL, etc. Setting STDIO_BUF=line would Just Work.
Programming languages would all get this for free also. Python
already has PYTHONUNBUFFERED but in other languages
you have to do something or other; in Perl you use some
horrible Perl-4-ism like
{ my $ofh = select OUTPUT; $|++; select $ofh }
This proposal would fix every programming language everywhere. The
Perl code would become:
$ENV{STDIO_BUF} = 'line';
and every other language would be similarly simple:
/* In C */
putenv("STDIO_BUF=line");
[ Addendum 20180521: Mariusz Ceier corrects me,
pointing out that this will not work for the process’ own standard
streams, as they are pre-opened before the process gets a chance to
set the variable. ]
It's easy to think of elaborations on this: STDIO_BUF=1:line might
mean that only standard output gets line-buffering by default,
everything else is up to the library.
This is an easy thing to do. I have wanted this for twenty years.
How is it possible that it hasn't been in the GNU/Linux standard
library for that long?
[ Addendum 20180521: it turns out there is quite a lot to say about
the state of the art here. In
particular, NetBSD has the feature very much as I described it. ]
[Other articles in category /Unix]
permanent link
|