ben.enterprises | RSS2

Hash-Based Incremental Builds with Ninja

Published:

I've moved this website from being written purely in Java to a mix of Java, Python, and Ninja. Using Ninja allows faster website rebuilds since the expensive parts of website rebuilds (currently Open Graph image generation and font processing) only happen if their inputs change. I could of course write caching for these parts in Java, but Ninja also gives parallelism for free (plus the font processing code was in Ninja + Python anyways since it was initially for a different codebase).

One challenge with this is that currently all of the blog pages are output by the Java code at once so any change in the Java code necessitates re-outputting all the blog pages. Once this happens, the files have a newer mtime (modification time) so Ninja will rebuild their fonts and images even if the blog posts are still the same. To solve this, we would need Ninja to consider the file's content instead of its mtime. Unfortunately, Ninja doesn't support this and probably won't any time soon. As a workaround, I wrote a quick Python function to save file mtimes and restore them if the file contents haven't changed.

import os import os.path as op import hashlib import json from tempfile import NamedTemporaryFile def restore_mtime(cache_f, path): cache = {} if op.isfile(cache_f): with open(cache_f, 'r') as f: cache = json.load(f) ncache = {} for root, dirs, files in os.walk(path): for filename in files: p = op.join(root, filename) with open(p, "rb") as f: h = hashlib.file_digest(f, "blake2b").hexdigest() if p in cache: ch, ct = cache[p] if ch == h: os.utime(p, ns=(ct, ct)) t = os.stat(p).st_mtime_ns ncache[p] = (h, t) tf = NamedTemporaryFile(dir=op.dirname(cache_f), delete=False, suffix=".tmp.mtime.json", mode='w') json.dump(ncache, tf) tf.close() os.rename(tf.name, cache_f)