The Universe of Discourse


Sun, 26 Nov 2023

A Qmail example of dealing with unavoidable race conditions

[ I recently posted about a race condition bug reported by Joe Armstrong and said “this sort of thing is now in the water we swim in, but it wasn't yet [in those days of olde].” This is more about that. ]

I learned a lot by reading everything Dan Bernstein wrote about the design of qmail. A good deal of it is about dealing with potential issues just like Armstrong's. The mail server might crash at any moment, perhaps because someone unplugged the server. In DJB world, it is unacceptable for mail to be lost, ever, and also for the mail queue structures to be corrupted if there was a crash. That sounds obvious, right? Apparently it wasn't; sendmail would do those things.

(I know someone wants to ask what about Postfix? At the time Qmail was released, Postfix was still called ‘VMailer’. The ‘V’ supposedly stood for “Venema” but the joke was that the ‘V’ was actually for “vaporware” because that's what it was.)

A few weeks ago I was explaining one of Qmail's data structures to a junior programmer. Suppose a local user queues an outgoing message that needs to be delivered to 10,000 recipients in different places. Some of the deliveries may succeed immediately. Others will need to be retried, perhaps repeatedly. Eventually (by default, ten days) delivery will time out and a bounce message will be delivered back to the sender, listing the recipients who did not receive the delivery. How does Qmail keep track of this information?

2023 junior programmer wanted to store a JSON structure or something. That is not what Qmail does. If the server crashes halfway through writing a JSON file, it will be corrupt and unreadable. JSON data can be written to a temporary file and the original can be replaced atomically, but suppose you succeed in delivering the message to 9,999 of the 10,000 recipients and the system crashes before you can atomically update the file? Now the deliveries will be re-attempted for those 9,999 recipients and they will get duplicate copies.

Here's what Qmail does instead. The file in the queue directory is in the following format:

    Trecip1@host1■Trecip2@host2■…Trecip10000@host10000■

where ■ represents a zero byte. To 2023 eyes this is strange and uncouth, but to a 20th-century system programmer, it is comfortingly simple.

When Qmail wants to attempt a delivery to recip1346@host1346 it has located that address in the file and seen that it has a T (“to-do”) on the front. If it had been a D (“done”) Qmail would know that delivery to that address had already succeeded, and it would not attempt it again.

If delivery does succeed, Qmail updates the T to a D:

 if (write(fd,"D",1) != 1) { close(fd); break; }
 /* further errors -> double delivery without us knowing about it, oh well */
 close(fd);
 return;

The update of a single byte will be done all at once or not at all. Even writing two bytes is riskier: if the two bytes span a disk block boundary, the power might fail after only one of the modified blocks has been written out. With a single byte nothing like that can happen. Absent a catastrophic hardware failure, the data structure on the disk cannot become corrupted.

Mail can never be lost. The only thing that can go wrong here is if the local system crashes in between the successful delivery and the updating of the byte; in this case the delivery will be attempted again, to that one user.

Addenda

  1. I think the data structure could even be updated concurrently by more than one process, although I don't think Qmail actually does this. Can you run multiple instances of qmail-send that share a queue directory? (Even if you could, I can't think of any reason it would be a good idea.)

  2. I had thought the update was performed by qmail-remote, but it appears to be done by qmail-send, probably for security partitioning reasons. qmail-local runs as a variable local user, so it mustn't have permission to modify the queue file, or local users would be able to steal email. qmail-remote doesn't have this issue, but it would be foolish to implement the same functionality in two places without a really good reason.


[Other articles in category /prog] permanent link