As Bad atime as Any: Every Read a Write
Against POSIX atime
Published:
I recently had the displeasure of being reminded that atime exists while working on updating this website. What is atime? As per POSIX.1-2024 (square brackets and emphasis mine):
Each file has three distinct associated timestamps: the time of last data access [atime], the time of last data modification [mtime], and the time the file status last changed [ctime]. These values are returned in the file characteristics structure struct stat, as described in <sys/stat.h>.
Each function or utility in POSIX.1-2024 that reads or writes data (even if the data does not change) or performs an operation to change file status (even if the file status does not change) indicates which of the appropriate timestamps shall be marked for update.
Since the atime needs to be updated every time the file is accessed, this turns every file read into a file write. It gets worse: this definition of 'file' actually includes directories too1, so every access to a directory also becomes a write. Suffice it to say this is not good for performance. To solve this, Linux introduced and enabled by default relatime:
relatime
Update inode access times relative to modify or change time. Access time is only updated if the previous access time was earlier than or equal to the current modify or change time. (Similar to noatime, but it doesn't break mutt(1) or other applications that need to know if a file has been read since the last time it was modified.)
Since Linux 2.6.30, the kernel defaults to the behavior provided by this option (unless noatime was specified), and the strictatime option is required to obtain traditional semantics. In addition, since Linux 2.6.30, the file's last access time is always updated if it is more than 1 day old.
Why keep atime at all? Some programs rely on it, but most sources are unclear on which ones. After a lot of digging, I've found a few uses. Programs like tmpreaper and systemd-tmpfiles delete temporary files that haven't been accessed in a while. Some disk usage analyzers use atime to identify unused files that could potentially be deleted. Some old email programs like Mutt rely on atime in certain configurations to identify new mail. Debian's popularity-contest uses atime to identify which packages are actually used. Some sources claim it's used for backup systems, but these claims all seem to stem from a single claim on the kernel mailing list where the only example given was hierarchical storage management (HSM) software (which I wouldn't consider backup software). Some commands like find or ls support filtering/sorting based on atime.
relatime supports all of these use cases without the constant writes which is why it's the kernel default. However, I don't think any of these niche use cases justify having atime at all for the vast majority of users. popularity-contest is an opt-in package for Debian and could potentially be rewritten to use inotify. atime can be enabled on a tmpfs without affecting other file systems. I don't think knowing when a file was last accessed would be of much help when trying to free up disk space. Using atime to identify new mail was always fairly sketchy to begin with.2 Me personally: I've gone ahead and disabled atime on my ZFS pool.
atime lore dump
It appears the daily updates were added so "utilities like tmpreaper (which delete files based on last access time)" would continue working.
Fixing Races for Fun and Profit: How to abuse atime uses atime to defeat a probabilistic defense against time-of-check to time-of-use (TOCTOU) attacks (the probabilistic defense repeats the race a bunch of times to make it harder to exploit). The paper notes that other side channels can be used instead of atime. A StackExchange answer lists only mail clients and tmpwatch (another program that deletes files based on atime) as users. A different answer claims it's used for incoming queues somehow (inotify at home?). The 2008 paper Rootkit-Resistant Disks claims "the number of programs affected by the lack of atime is small". The 2015 BetrFS claims "maintaining the correct atime behavior induces a heavy microwrite load, but some applications require accurate atime values", but doesn't give any examples. Linux 2.6.30 (which enabled relatime by default) came out in 2009 and is older than the kernel version they used so I'm assuming their 'accurate' means relatime. agedu and dust use atime to identify unused files taking up space. Debian's popularity-contest uses atime to identify packages which are installed but not used. A few papers mentioned potential use in digital forensics (which I've lost the links to), but it's not that useful without knowing who accessed a file or why.
CVE-2014-5207 mentions "backups and auditing on systems" as usages of atime that could be harmed by turning off atime. A NetworkWorld article also claims some backup tools use atime. These claims basically all seem to originate from Alan Cox's claim on LKML, but the only example he gives is hierarchical storage management (HSM) software which I don't think qualifies as backup. I found various claims actual backup programs either trample atime or need to explicitly work around it.
A few papers noted other concerns. The 1991 paper File system measurements and their application to the design of efficient operation logging algorithms notes that atime presents a challenge to fault-tolerant techniques since it makes reads state updates and proposes changing Unix semantics to update atime on close. The YAFFS2 file system doesn't store atime on NAND storage because of wear concerns. The 2007 Powering Down notes that atime can trigger writes even if a read comes from cache, potentially preventing hard disks from spinning down. The 2005 paper Comparison-based File Server Verification found that while Linux was updating atime every time a symlink was followed, FreeBSD was only updating it when the symlink was accessed directly. DragonFly BSD disables atime by default. Wikipedia's stat() article used to contain a "Criticism of atime" section, but it was sadly commented out. The case for (no) atime on Linux argues that the write pattern from atime is particularly bad for CoW file systems. Btrfs requires noatime makes a similar argument. Nix used to support atime in the garbage collector, but it was removed because it was "not very useful in practice". paccache has an option to remove packages from cache based on atime. Proxmox Backup Server uses atime as part of a backup garbage collection system.
- ^ "File types include regular file, character special file, block special file, FIFO special file, symbolic link, socket, and directory."
- ^ Mutt's documentation mentions that "utilities like biff or frm or any other program which accesses the mailbox might cause Mutt to never detect new mail for that mailbox if they do not properly reset the access time". Also, according to the documentation, atime is used by Mutt for the Mbox and Mmdf formats, but not Maildir.