The Universe of Discourse

Fri, 09 Dec 2005

Elimination of "f" system calls

Michael C. Toren:
> 1) Open a file descriptor pointing to the current working directory.
> 2) Create a temporary directory within the jail, and chroot() to it.
> 3) Using fchdir(), change the working directory to the file descriptor
> saved from step 1.

Oho, I hadn't seen that before. The chroot() in step 2 is required to avoid the special case in the Kernel that checks to see if you are doing ".." in the current root directory. But because you chrooted() yourself somewhere else, the special case isn't exercised.

Older systems don't have fchdir(), which is a fairly recent addition.

With the proliferation of "f" calls in recent years (fchdir, fchmod, fchown, fstat, fsync, etc.) I wonder what would be the result if the Unix system interface were redesigned to eliminate the non-"f" versions of the calls entirely. Instead, there would be a generic function, which we might call "iname", which transforms a path name to an "inode" structure:

        struct inode * iname (const char *path);

Unix kernels already contain a function with this name that does this job.

The system calls that formerly accepted path names are changed to require an inode structure. So instead of

        fd = open("dir/file", ...)

one now has

        fd = open(iname("dir/file"), ...)

(There are some minor language and usability issues here: what if iname() returns NULL? Ignore those; I want to discuss OS issues, not language issues.)

There would be a function, analogous to iname(), that also returned an inode structure, but which took an open file descriptor instead of a path name:

        struct inode * inode(int fd);

This is essentially equivalent to the fstat() function we have now.

chown() and fchown() would merge to become a single call that accepted an inode structure; instead of:

        chown("dir/file", owner) 
        fchown(fd, owner)

one would have:

        chown(iname("dir/file"), owner)
        chown(inode(fd), owner)

Similarly, instead of:


one would have:


stat() and fstat() would not only merge but would disappear entirely; the struct inode can do everything that the struct stat can do. This code:

        stat(&statbuf, "dir/file");
        fstat(&statbuf, fd);

turns into this:

        statbuf = iname("dir/file"));
        statbuf = inode(fd);

There are some security implications to this idea. There needs to be protection against counterfeiting an inode structure. For example, consider a world-readable file in a secret, nonsearchable directory. Suppose the file happens to have i-number 123456. If it's possible to do this, then security has failed:

        struct inode I;
        I.inumber = 123456;
        fd = open(I, O_RDWR);

It should be impossible for anyone to manufacture the struct inode that represents the secret file without actually using iname() somewhere along the line. A simple way to arrange this would be to have the kernel cryptographically sign each struct inode. This can be done inexpensively.

This still has some access implications. Consider a world-readable file in a world-searchable directory. Process A iname()s the file, obtaining its struct inode. The search permissions on the directory are then removed. Process A can still open the file. This is analogous to a similar situation in standard Unix in which process A opens the file before the permissions are changed, and can still read and write it afterwards. So that's not a big change. What might be a big change is that A can dump the struct inode to a file and the a different process can read it back again, evading the increased access protections on the directory. The cryptographic signature technique can fix this problem by restricting struct inodes to be used by a single process.

Whether this is worth doing I don't know. My main idea in thinking it up was to avoid the increasing duplication of system calls. Does Unix need an "fsymlink" call? Does it need three different ones?

        symlink(oldpath, newpath);
        fsymlink1(fd, newpath);
        fsymlink2(oldpath, fd);
        fsymlink3(oldfile_fd, newdir_fd);

Perhaps not this week, but who knows what the future holds? With the iname() / inode() style, these are all a single call:

        symlink(iname(oldpath), iname(newpath));
        symlink(inode(fd), iname(newpath));
        symlink(iname(oldpath), inode(fd));
        symlink(inode(oldfile_fd), inode(newdir_fd));

This also fixes some of the proliferation in the system call interface between calls that work on symlinks and calls that work through symlinks. For example, stat() and lstat(), and chown() and lchown(). On normal files, each pair is the same. But on a symlink, stat() stats the pointed-to file while lstat() stats the symlink itself; similarly chown() changes the owner of the pointed-to file while lchown() changes the owner of the symlink itself. But where's lchmod()? What about llink()? There's no way to make a hard link to a symbolic link! With the inode() / iname() technique above, you only need one extra call to handle all possible operations on a symbolic link:

        lchown(path, owner);
        llink(path, newpath);


        chown(liname(path), owner);
        link(liname(path), iname(newpath));

where liname() is just like iname(), except that if the resulting file is a symbolic link, its inode is returned immediately; iname() would have read the target of the symbolic link and called itself recursively to resolve the target.

It also seems to me that this interface might make it easier to communicate open files from one process to another. Some unix systems offer a experimental features for passing file descriptors around; this system only requires that the struct inode be communicated directly to the receiving process.

[Other articles in category /Unix] permanent link