Skip to content

state, prune: locally delete old commitment history snapshot files #21199

@JkLondon

Description

@JkLondon

Motivation

Users running with --prune.include-commitment-history want to bound the on-disk size of commitment-history snapshots without resyncing. The download-side half (skipping old commitment-history files at sync time) is tracked in a companion issue and is straightforward. This issue covers the harder half: physically deleting commitment-history .kv + inverted-index files that are already on disk.

Why this is hard

Commitment-history files are not block/receipt snapshots — they are managed by the state aggregator (db/state/aggregator.go) and have a reader-owned lifecycle:

  • .kv files are mmaped and may be held open by live aggregator readers (RPC, txpool, sync stages).
  • Files transition through dirtyFiles → frozen visible files; the canDelete flag is the only safe way to retire them.
  • Physical os.Remove must happen on the reader-cleanup path once the last reference drops, otherwise live mmaps are invalidated and readers crash/produce garbage.
  • Existing aggregator code opts frozen-file deletion out for most domains; commitment history needs an explicit opt-in (e.g. a deleteFrozen flag on the domain config).
  • The prune cycle is staged-sync–driven, so a new pruneCommitmentHistorySnapshots step has to coordinate with the aggregator rather than touch files directly.
  • Restart behaviour matters: tightening the retention between runs has to clean up leftover files on the next prune cycle; widening must be rejected (or trigger resync) because the filtered files no longer exist.

Proposed scope

  • Add an aggregator path that marks commitment-history files older than the retention boundary as canDelete, removes them from dirtyFiles, and lets the reader-cleanup path physically remove them.
  • New staged-sync step (e.g. pruneCommitmentHistorySnapshots) that runs alongside block/receipt pruning but defers to the aggregator's file lifecycle.
  • Idempotent: repeated invocations on an already-pruned datadir must no-op.
  • Tests: downloader delete notification, no-op on repeat prune, immediate cleanup of unpinned dirty files, deferred cleanup while frozen files have active readers.

Dependencies / ordering

Implementing this without the download-side filter is fine in principle, but the user value is much higher once they're combined — the download filter prevents the disk hit on fresh syncs; this issue handles the existing-datadir case.

Prior art

PR #21021 attempted this together with the download filter. The pruning portion is the most complex and risk-sensitive part of that change. Reviving it should probably happen after the download-filter half lands, so the moving parts can be evaluated independently.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions