Skip to content

Cache: Support Copy on Write (cp --reflink) #12071

@Janno

Description

@Janno

Desired Behavior

There are currently two cache storage modes: Hardlink and Copy. Hardlink does not work across mount points so Copy is the only option when the cache sits on a separate volume. Copy unconditionally creates a full copy of the file and thus dune ends up always duplicating the entire content of _build.

File systems such as btrfs (and possibly recent versions of zfs and xfs) support copy on write across subvolumes/partitions/mount points. If the Copy storage mode was realized through the appropriate APIs that perform copy on write (if possible), dune could transparently save precious space without impacting user experience at all. In fact, it would be much faster than actually copying the file contents. This solution would be my preferred way to support copy on write file systems.

Alternatively (and perhaps additionally), dune could offer a third storage mode, perhaps called CopyOnWrite or Reflink, that unconditionally uses copy on write and fails if it is not available. This option is probably less useful since people can build dune projects anywhere on their systems, even on removable media and ram disks that are not related to the file system on which the cache resides. Always demanding copy on write will likely end up being a foot gun in cases like these.

Example

My own system is setup up in such a way that I exclude ~/.cache/dune from my automatic file system snapshots. For this, the directory has to be its own subvolume (btrfs terminology; think "partition") and, thus, its own mount point. This forces me to use the Copy mode for dune's cache. The result is that after almost 2 weeks on this new system I have over 20GB of duplicate files between ~/.cache/dune and my various _build folders. (Reported by fclones which unfortunately cannot yet deduplicate read-only files.)

To replicate this, any dune setup that uses the Copy mode should be sufficient. Then build a project and either measure disk space if your file system reports that accurately or use a tool such as fclones to find duplicates between the dune's cache and the project's _build.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature-requestUser wanted featuresproposalRFC's that are awaiting discussion to be accepted or rejectedshared-cacheShared artefacts cache

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions