-
Notifications
You must be signed in to change notification settings - Fork 470
Cache: Support Copy on Write (cp --reflink) #12071
Description
Desired Behavior
There are currently two cache storage modes: Hardlink and Copy. Hardlink does not work across mount points so Copy is the only option when the cache sits on a separate volume. Copy unconditionally creates a full copy of the file and thus dune ends up always duplicating the entire content of _build.
File systems such as btrfs (and possibly recent versions of zfs and xfs) support copy on write across subvolumes/partitions/mount points. If the Copy storage mode was realized through the appropriate APIs that perform copy on write (if possible), dune could transparently save precious space without impacting user experience at all. In fact, it would be much faster than actually copying the file contents. This solution would be my preferred way to support copy on write file systems.
Alternatively (and perhaps additionally), dune could offer a third storage mode, perhaps called CopyOnWrite or Reflink, that unconditionally uses copy on write and fails if it is not available. This option is probably less useful since people can build dune projects anywhere on their systems, even on removable media and ram disks that are not related to the file system on which the cache resides. Always demanding copy on write will likely end up being a foot gun in cases like these.
Example
My own system is setup up in such a way that I exclude ~/.cache/dune from my automatic file system snapshots. For this, the directory has to be its own subvolume (btrfs terminology; think "partition") and, thus, its own mount point. This forces me to use the Copy mode for dune's cache. The result is that after almost 2 weeks on this new system I have over 20GB of duplicate files between ~/.cache/dune and my various _build folders. (Reported by fclones which unfortunately cannot yet deduplicate read-only files.)
To replicate this, any dune setup that uses the Copy mode should be sufficient. Then build a project and either measure disk space if your file system reports that accurately or use a tool such as fclones to find duplicates between the dune's cache and the project's _build.