| The Universe of Discourse | ||||||||||||||||||
|
12 recent entries Archive:
In this section: Comments disabled |
Wed, 21 Feb 2007
A bug in HTML generation
But then I started to see requests in the HTTP error log for URLs like this:
/pictures/blog/tex/total-die-rolls.gif$${6/choose%20k}k!{N!/over%20/prod%20{i!}^{n_i}{n_i}!}/qquad%20/hbox{/rm%20where%20$k%20=%20/sum%20n_i$}$$.gif
Someone must be referring people to these incorrect URLs, and it is
presumably me. The HTML version of the blog looked okay, so I checked
the RSS and Atom files, and found that, indeed, they were malformed.
Instead of <img src="foo.gif" alt="$TeX$">, they
contained codes for <img src="foo.gif$TeX$">.I tracked down and fixed the problem. Usually when I get a bug like this, I ask myself what I could learn from it. This one is unusual. I can't think of much. Here's the bug. The <img> element is generated by a function called imglink. The arguments to imglink are the filename that contains the image (for use in the SRC attribute) and the text for the ALT attribute. The ALT text is optional. If it is omitted, the function tries to locate the TeX source code and fetch it. If this attempt fails, it continues anyway, and omits the ALT attribute. Then it generates and returns the HTML:
sub imglink {
my $file = shift;
...
my $alt = shift || fetch_tex($file);
...
$alt = qq{alt="$alt"} if $alt;
qq{<img $alt border=0 src="$url">};
}
This function is called from several places in the plugin. Sometimes
the TeX source code is available at the place from which the call
comes, and the code has return imglink($file, $tex);
sometimes it isn't and the code has
return imglink($file) and hopes that the imglink
function can retrieve the TeX.One such place is the branch that handles generation of tags for every type of output except HTML. When generating the HTML output, the plugin actually tries to run TeX and generate the resulting image file. For other types of output, it assumes that the image file is already prepared, and just calls imglink to refer to an image that it presumes already exists:
return imglink($file, $tex) unless $blosxom::flavour eq "html";The bug was that I had written this instead:
return imglink($file. $tex) unless $blosxom::flavour eq "html";The . here is a string concatenation operator. It's a bit surprising that I don't make more errors like this than I do. I am a very inaccurate typist. Stronger type checking would not have saved me here. Both arguments are strings, concatenation of strings is perfectly well-defined, and the imglink function was designed and implemented to accept either one or two arguments. The function did note the omission of the $tex argument, attempted to locate the TeX source code for the bizarrely-named file, and failed, but I had opted to have it recover and continue silently. I still think that was the right design. But I need to think about that some more. The only lesson I have been able to extract from this so far is that I need a way of previewing the RSS and Atom outputs before publishing them. I do preview the HTML output, but in this case it was perfectly correct.
[Other articles in category /prog/bug] permanent link Tue, 20 Feb 2007
ssh-agent, revisited
[Other articles in category /Unix] permanent link
Environmental manipulations
The nohup basically does signal(NOHUP, SIG_IGN) before calling execvp(COMMAND, ARGV) to execute the command. Similarly, there is a chroot command, run as chroot new-root-directory command args..., which runs the specified command with its default root inode set to somewhere else. And there is a nice command, run as nice nice-value-adjustment command args..., which runs the specified command with its "nice" value changed. And there is an env environment-settings command args... which runs the specified command with new variables installed into the environment. The standard sudo command could also be considered to be of this type. I have also found it useful to write trivial commands called indir, which runs a command after chdir-ing to a new directory, and stopafter, which runs a command after setting the alarm timer to a specified amount, and, just today, with-umask, which runs a command after setting the umask to a particular value. I could probably have avoided indir and with-umask. Instead of indir DIR COMMAND, I could use sh -c 'cd DIR; exec COMMAND', for example. But indir avoids an extra layer of horrible shell quotes, which can be convenient. Today it occurred to me to wonder if this proliferation of commands was really the best way to solve the problem. The sh -c '...' method solves it partly, for those parts of the process user area to which correspond shell builtin commands. This includes the working directory, umask, and environment variables, but not the signal table, the alarm timer, or the root directory. There is no standardized interface to all of these things at any level. At the system call level, the working directory is changed by the chdir system call, the root directory by chroot, the alarm timer by alarm, the signal table by a bunch of OS-dependent nonsense like signal or sigaction, the nice value by setpriority, environment variables by a potentially complex bunch of memory manipulation and pointer banging, and so on. Since there's no single interface for controlling all these things, we might get a win by making an abstraction layer for dealing with them. One place to put this abstraction layer is at the system level, and might look something like this:
/* declares USERAREA_* constants,
int userarea_set(int, ...)
and void *userarea_get(int)
*/
#include <sys/userarea.h>
userarea_set(USERAREA_NICE, 12);
userarea_set(USERAREA_CWD, "/tmp");
userarea_set(USERAREA_SIGNAL, SIGHUP, SIG_IGN);
userarea_set(USERAREA_UMASK, 0022);
...
This has several drawbacks. One is that it requires kernel hacking.
A subitem of this is that it will never become widespread, and that if
you can't (or don't want to) replace your kernel, it cannot be made to
work for you. Another is that it does not work for the environment
variables, which are not really administered by the kernel. Another
is that it does not fully solve the original problem, which is to
obviate the plethora of nice, nohup, sudo,
and env commands. You would still have to write a command
to replace them. I had thought of another drawback, but forgot it while I
was writing the last two sentences.You can also put the abstraction layer at the C library level. This has fewer drawbacks. It no longer requires kernel hacking, and can provide a method for modifying the environment. But you still need to write the command that uses the library. We may as well put the abstraction layer at the Unix command level. This means writing a command in some language, like Perl or C, which offers a shell-level interface to manipulating the process environment, perhaps something like this:
newenv nice=12 cwd=/tmp signal=HUP:IGNORE umask=0022 -- command args...Then newenv has a giant dispatch table inside it to process the settings accordingly:
...
nice => sub { setpriority(PRIO_PROCESS, $$, $_) },
cwd => sub { chdir($_) },
signal => sub {
my ($name, $result) = split /:/;
$SIG{$name} = $result;
},
umask => sub { umask(oct($_)) },
...
One question to ask is whether something like this already exists.
Another is, if not, whether it's because there's some reason why it's
a bad idea, or because there's a simpler solution, or just because
nobody has done it yet.
[Other articles in category /Unix] permanent link
Elimination of "f" system calls
Michael C. Toren: Oho, I hadn't seen that before. The chroot() in step 2 is required to avoid the special case in the Kernel that checks to see if you are doing ".." in the current root directory. But because you chrooted() yourself somewhere else, the special case isn't exercised. Older systems don't have fchdir(), which is a fairly recent addition. With the proliferation of "f" calls in recent years (fchdir, fchmod, fchown, fstat, fsync, etc.) I wonder what would be the result if the Unix system interface were redesigned to eliminate the non-"f" versions of the calls entirely. Instead, there would be a generic function, which we might call "iname", which transforms a path name to an "inode" structure:
struct inode * iname (const char *path);
Unix kernels already contain a function with this name that does this job. The system calls that formerly accepted path names are changed to require an inode structure. So instead of
fd = open("dir/file", ...)
one now has
fd = open(iname("dir/file"), ...)
(There are some minor language and usability issues here: what if iname() returns NULL? Ignore those; I want to discuss OS issues, not language issues.) There would be a function, analogous to iname(), that also returned an inode structure, but which took an open file descriptor instead of a path name:
struct inode * inode(int fd);
This is essentially equivalent to the fstat() function we have now. chown() and fchown() would merge to become a single call that accepted an inode structure; instead of:
chown("dir/file", owner)
fchown(fd, owner)
one would have:
chown(iname("dir/file"), owner)
chown(inode(fd), owner)
Similarly, instead of:
chdir(path);
fchdir(fd);
one would have:
chdir(iname(path));
chdir(inode(fd));
stat() and fstat() would not only merge but would disappear entirely; the struct inode can do everything that the struct stat can do. This code:
stat(&statbuf, "dir/file");
fstat(&statbuf, fd);
turns into this:
statbuf = iname("dir/file"));
statbuf = inode(fd);
There are some security implications to this idea. There needs to be protection against counterfeiting an inode structure. For example, consider a world-readable file in a secret, nonsearchable directory. Suppose the file happens to have i-number 123456. If it's possible to do this, then security has failed:
struct inode I;
I.inumber = 123456;
fd = open(I, O_RDWR);
It should be impossible for anyone to manufacture the struct inode that represents the secret file without actually using iname() somewhere along the line. A simple way to arrange this would be to have the kernel cryptographically sign each struct inode. This can be done inexpensively. This still has some access implications. Consider a world-readable file in a world-searchable directory. Process A iname()s the file, obtaining its struct inode. The search permissions on the directory are then removed. Process A can still open the file. This is analogous to a similar situation in standard Unix in which process A opens the file before the permissions are changed, and can still read and write it afterwards. So that's not a big change. What might be a big change is that A can dump the struct inode to a file and the a different process can read it back again, evading the increased access protections on the directory. The cryptographic signature technique can fix this problem by restricting struct inodes to be used by a single process. Whether this is worth doing I don't know. My main idea in thinking it up was to avoid the increasing duplication of system calls. Does Unix need an "fsymlink" call? Does it need three different ones?
symlink(oldpath, newpath);
fsymlink1(fd, newpath);
fsymlink2(oldpath, fd);
fsymlink3(oldfile_fd, newdir_fd);
Perhaps not this week, but who knows what the future holds? With the iname() / inode() style, these are all a single call:
symlink(iname(oldpath), iname(newpath));
symlink(inode(fd), iname(newpath));
symlink(iname(oldpath), inode(fd));
symlink(inode(oldfile_fd), inode(newdir_fd));
This also fixes some of the proliferation in the system call interface between calls that work on symlinks and calls that work through symlinks. For example, stat() and lstat(), and chown() and lchown(). On normal files, each pair is the same. But on a symlink, stat() stats the pointed-to file while lstat() stats the symlink itself; similarly chown() changes the owner of the pointed-to file while lchown() changes the owner of the symlink itself. But where's lchmod()? What about llink()? There's no way to make a hard link to a symbolic link! With the inode() / iname() technique above, you only need one extra call to handle all possible operations on a symbolic link:
lstat(path);
lchown(path, owner);
llink(path, newpath);
becomes:
stat(liname(path));
chown(liname(path), owner);
link(liname(path), iname(newpath));
where liname() is just like iname(), except that if the resulting file is a symbolic link, its inode is returned immediately; iname() would have read the target of the symbolic link and called itself recursively to resolve the target. It also seems to me that this interface might make it easier to communicate open files from one process to another. Some unix systems offer a experimental features for passing file descriptors around; this system only requires that the struct inode be communicated directly to the receiving process. | |||||||||||||||||