The Universe of Discourse


Sat, 27 Jan 2007

Software archaeology
For appropriate values of "everyone", everyone knows that Unix files do not record any sort of "creation time". A fairly frequently asked question in Unix programming forums, and other related forums, such as Perl programming forums, is how to get the creation date of a file; the answer is that you cannot do that because it is not there.

This lack is exacerbated by several unfortunate facts: creation times are available on Windows systems; the Unix inode contains three timestamps, one of which is called the "ctime", and the "c" is suggestive of the wrong thing; Perl's built-in stat function overloads the return value to return the Windows creation time in the same position (on Windows) as it returns the ctime (on Unix).

So we see questions like this one, which appeared this week on the Philadelphia Linux Users' Group mailing list:

How does one check and change ctime?
And when questioned as to why he or she wanted to do this, this person replied:

We are looking to change the creation time. From what I understand, ctime is the closest thing to creation time.
There is something about this reply that irritates me, but I'm not quite sure what it is. Several responses come to mind: "Close" is not sufficient in system programming; the ctime is not "close" to a creation time, in any sense; before you go trying to change the thing, you ought to do a minimal amount of research to find out what it is. It is a perfect example of the Wrong Question, on the same order as that poor slob all those years ago who wanted to know how to tell if a file was a hard link or a soft link.

But anyway, that got me thinking about ctimes in general, and I did some research into the history and semantics of the thing, and made some rather surprising discoveries.

One good reference for the broad outlines of early Unix is the paper that Dennis Ritchie and Ken Thompson published in Communications of the ACM in 1974. This was updated in 1978, but the part I'm quoting wasn't revised and is current to 1974. Here is what it has to say about the relevant parts of the inode structure:

IV. IMPLEMENTATION OF THE FILE SYSTEM

... The entry found thereby (the file's i-node) contains the description of the file:
... time of creation, last use, and last modification
An error? I don't think so. Here is corroborating evidence, the stat man page from the first edition of Unix, from 1971:

NAME        stat -- get file status
SYNOPSIS    sys      stat; name; buf / stat = 18.
DESCRIPTION name points to a null-terminated string naming a file; buf is the
            address of a 34(10) byte buffer into which information is placed
            concerning the file. It is unnecessary to have any permissions at all
            with respect to the file, but all directories leading to the file
            must be readable.
            After stat, buf has the following format:
            buf, +1             i-number
            +2, +3              flags (see below)
            +4                  number of links
            +5                     user ID of owner size in bytes
            +6,+7            size in bytes
            +8,+9            first indirect block or contents block
            ...
            +22,+23             eighth indirect block or contents block
            +24,+25,+26,+27 creation time
            +28,+29, +30,+31 modification time
                +32,+33         unused
(Dennis Ritchie provides the Unix first edition manual; the stat page is in section 2.1.)

Now how about that?

When did the ctime change from being called a "creation time" to a "change time"? Did the semantics change too, or was the "creation time" description a misnomer? If I can't find out, I might write to Ritchie to ask. But this is, of course, a last resort.

In the meantime, I do have the source code for the fifth edition kernel, but it appears that, around that time (1975 or so), there was no creation time. At least, I can't find one.

The inode operations inside the kernel are defined to operate on struct inodes:

	struct inode {
		char    i_flag;
		char    i_count;
		int     i_dev;
		int     i_number;
		int     i_mode;
		char    i_nlink;
		char    i_uid;
		char    i_gid;
		char    i_size0;
		char    *i_size1;
		int     i_addr[8];
		int     i_lastr;
	} inode[NINODE];
The i_lastr field is what we would now call the atime. (I suppose it stands for "last read".) The mtime and ctime are not there, because they are not stored in the in-memory copy of the inode. They are fetched directly from the disk when needed.

We can see an example of this in the stat1 function, which is the backend for the stat and fstat system calls:

     1	stat1(ip, ub)
     2	int *ip;
     3	{
     4	        register i, *bp, *cp;
     5	
     6	        iupdat(ip, time);
     7	        bp = bread(ip->i_dev, ldiv(ip->i_number+31, 16));
     8	        cp = bp->b_addr + 32*lrem(ip->i_number+31, 16) + 24;
     9	        ip = &(ip->i_dev);
    10	        for(i=0; i<14; i++) {
    11	                suword(ub, *ip++);
    12	                ub =+ 2;
    13	        }
    14	        for(i=0; i<4; i++) {
    15	                suword(ub, *cp++);
    16	                ub =+ 2;
    17	        }
    18	        brelse(bp);
    19	}
ub is the user buffer into which the stat data will be deposited. ip is the inode structure from which most of this data will be copied. The suword utility copies a two-byte unsigned integer ("short unsigned word") from source to destination. This is done starting at the i_dev field (line 9), which effectively skips the two earlier fields, i_flag and i_count, which are internal kernel matters that are none of the user's business.

14 words are copied from the inode structure starting from this position, including the device and i-number fields, the mode, the link count, and so on, up through the addresses of the data or indirect blocks. (In modern Unixes, the stat call omits these addresses.) Then four words are copied out of the cp buffer, which has been read from the inode actually on the disk; these eight bytes are at position 24 in the inode, and ought to contain the mtime and the ctime. The question is, which is which? This simple question turns out to have a surprisingly complicated answer.

When an inode is modified, the IUPD flag is set in the i_flag member. For example, here is chmod, which modifies the inode but not the underlying data. On a modern unix system, we would expect this to update the ctime, but not the mtime. Let's see what it does in version 5:

     1	chmod()
     2	{
     3	        register *ip;
     4	
     5	        if ((ip = owner()) == NULL)
     6	                return;
     7	        ip->i_mode =& ~07777;
     8	        if (u.u_uid)
     9	                u.u_arg[1] =& ~ISVTX;
    10	        ip->i_mode =| u.u_arg[1]&07777;
    11	        ip->i_flag =| IUPD;
    12	        iput(ip);
    13	}
Line 10 is the important one; it sets the mode on the in-memory copy of the inode to the argument supplied by the user. Then line 11 sets the IUPD flag to indicate that the inode has been modified. Line 12 calls iput, whose principal job is to maintain the kernel's internal reference count of the number of file descriptors that are attached to this inode. When this number reaches zero, the inode is written back to disk, and discarded from the kernel's open file table. The iupdat function, called from iput, is the one that actually writes the modified inode back to the disk:

     1	iupdat(p, tm)
     2	int *p;
     3	int *tm;
     4	{
     5		register *ip1, *ip2, *rp;
     6		int *bp, i;
     7	
     8		rp = p;
     9		if((rp->i_flag&(IUPD|IACC)) != 0) {
    10			if(getfs(rp->i_dev)->s_ronly)
    11				return;
    12			i = rp->i_number+31;
    13			bp = bread(rp->i_dev, ldiv(i,16));
    14			ip1 = bp->b_addr + 32*lrem(i, 16);
    15			ip2 = &rp->i_mode;
    16			while(ip2 < &rp->i_addr[8])
    17				*ip1++ = *ip2++;
    18			if(rp->i_flag&IACC) {
    19				*ip1++ = time[0];
    20				*ip1++ = time[1];
    21			} else
    22				ip1 =+ 2;
    23			if(rp->i_flag&IUPD) {
    24				*ip1++ = *tm++;
    25				*ip1++ = *tm;
    26			}
    27			bwrite(bp);
    28		}
    29	}
What is going on here? p is the in-memory copy of the inode we want to update. It is immediately copied into a register, and called by the alias rp thereafter. tm is the time that the kernel should write into the mtime field of the inode. Usually this is the current time, but the smdate system call ("set modified date") supplies it from the user instead.

Lines 16–17 copy the mode, link count, uid, gid, "size", and "addr" fields from the in-memory copy of the inode into the block buffer that will be written back to the disk. Lines 18–22 update the atime if the IACC flag is set, or skip it if not. Then, if the IUPD flag is set, lines 24–25 write the tm value into the next slot in the buffer, where the mtime is stored. The bwrite call on line 27 commits the data to the disk; this results in a call into the appropriate device driver code.

There is no sign of updating the ctime field, but recall that we started this search by looking at what the chmod call does; it sets IUPD, which eventually results in the updating of the mtime field. So the mtime field is not really an mtime field as we now know it; it is doing the job that is now done by the ctime field. And in fact, the dump command predicates its decision about whether to dump a file on the contents of the mtime field. Which is really the ctime field. So functionally, dump is doing the same thing it does now.

It's possible that I missed it, but I cannot find the advertised creation time anywhere. The logical place to look is in the maknode function, which allocates new inodes. The maknode function calls ialloc to get an unused inode from the device, and this initializes its mode (as specified by the user), its link count (to 1), and its uid and gid (to the current process's uid and gid). It does not set a creation time. The ialloc function is fairly complicated, but as far as I can tell it is not setting any creation time either.

Working it from the other end, asking who might look at the ctime field, we have the find command, which has a -mtime option, but no -ctime option. The dump command, as noted before, uses the mtime. Several commands perform stat calls and declare structs to hold the result. For example, pr, which prints files with nice pagination, declares a struct inode, which is the inode as returned by stat, as opposed to the inode as used internally by the kernel—what we would call a struct stat now. There was no /usr/include in the fifth edition, so the pr command contains its own declaration of the struct inode. It looks like this:

struct inode {
        int dev;
	...
        int atime[2];
        int mtime[2];
};
No sign of the ctime, which would have been after the mtime field. (Of course, it could be there anyway, unmentioned in the declaration, since it is last.) And similarly, the ls command has:

struct ibuf {
	int	idev;
	int	inum;
        ...
	char	*iatime[2];
	char	*imtime[2];
};
A couple of commands have extremely misleading declarations. Here's the struct inode from the prof command, which prints profiling reports:

struct inode {
        int     idev;
        ...
        int ctime[2];
        int mtime[2];
        int fill;
};
The atime field has erroneously been called ctime here, but it seems that since prof does not use the atime, nobody noticed the bug. And there's a mystery fill field at the end, as if prof is expecting one more field, but doesn't know what it will be for. The declaration of ibuf in the ln command has similar oddities.

So the creation time advertised by the CACM paper (1974) and the version 1 manual (1971) seems to have disappeared by the time of version 5 (1975), if indeed it ever existed.

But there was some schizophrenia in the version 5 system about whether there was a third date in addition to the atime and the mtime. The stat call copied it into the stat buffer, and some commands assumed that it would be there, although they weren't sure what it would be for, and none of them seem look at it. It's quite possible that there was at one time a creation date, which had been eliminated by the time of the fifth edition, leaving behind the vestigial remains we saw in commands like ln and prof and in the code of the stat1 function.

Functionally, the version 5 mtime is actually what we would now call the ctime: it is updated by operations like chmod that in modern Unix will update the ctime but not the mtime. A quick scan of the Lions Book suggests that it was the same way in version 6 as well. I imagine that the ctime-mtime distinction arose in version 7, because that was the last version before the BSD/AT&T fork, and nearly everything common to those two great branches of the Unix tree was in version 7.

Oh, what the hell; I have the version 7 source code; I may as well look at it. Yes, by this time the /usr/include/sys/stat.h file had been invented, and does indeed include all three times in the struct stat. So the mtime (as we now know it) appears to have been introduced in v7.

One sometimes hears that early Unix had atime and mtime, and that ctime was introduced later. But actually, it appears that early Unix had atime and ctime, and it was the mtime that was introduced later. The confusion arises because in those days the ctime was called "mtime".

Addendum: It occurs to me now that the version 5 mtime is not precisely like the modern ctime, because it can be set via the smdate call, which is analogous to the modern utime call. The modern ctime cannot be set at all.


(Minor trivium: line 22 of iupdat is ip1 =+ 2. In modern C, we would write ip1 += 2. The =+ and =- operators had turned out to be a mistake, because people would write i=-1, intending i = -1, but the compiler would understand it as i =- 1, producing subtle bugs. The spellings of the operators were changed to avoid these bugs. The change from =+ to += was complete by the time K&R first edition was published in 1978: K&R mentions the old-style operators and says that the are obsolete. In spite of this, the Sun compiler I used in 1987 would still produce a warning for i=-1, despite interpreting it as i = -1. I believe this was because it was PCC-derived, and all PCC compilers emitted this warning. In the fifth edition code, we can see the obsolete form still in use.)

(Totally peripheral addendum: Google search for dmr puts Dennis M. Ritchie in fourth position, not the first. Is this grave insult to our community to be tolerated? I think not! It must be avenged! With fire and steel!)

[ Addendum 20070127: Unix source code prior to the fifth edition is lost. The manuals for the third and fourth editions are available from the Unix Heritage Society. The manual for the third edition (February 1973) mentions the creation time, but by the fourth edition (November 1973) the stat(2) man page no longer mentions a creation time. In v4, the two dates in the stat structure are called actime (modern atime) and modtime (modern mtime/ctime). ]



[Other articles in category /Unix] permanent link