Skip to content

db/state: restore empty tombstones on unwind and honor them in getLatestFromDb#20627

Merged
yperbasis merged 1 commit into
mainfrom
fix/domain-stale-read-after-unwind
Apr 18, 2026
Merged

db/state: restore empty tombstones on unwind and honor them in getLatestFromDb#20627
yperbasis merged 1 commit into
mainfrom
fix/domain-stale-read-after-unwind

Conversation

@mh0lt

@mh0lt mh0lt commented Apr 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Re-land the narrow core of #20483 (reverted by #20509), addressing the DB-layer siblings of the post-unwind stale-read bug. Complements #20625 which addressed the in-memory overlay side.

Two linked bugs in the DB-level domain unwind and read paths caused stale data resurrection after an unwind that reverted a first-time write or a deletion inside the unwound range.

Symptom observed on mainnet

Post-Fusaka mainnet catch-up sync with a chaindata/ wipe (snapshots/ retained). On the first re-executed block after a forkchoice-driven unwind, execution returned less gas than the header — diffs observed of -34200 (block 24,898,955), -73829 (block 24,899,403), -118872 (block 24,899,594). The diffs break down into multiples of SSTORE_SET - SSTORE_RESET = 17100 plus cold-access flips of 2600.

Previous PR #20625 cleared the first two by pruning the in-memory overlay on Unwind. Block 24,899,594 still failed because the overlay was already flushed to DB at Unwind time — the stale-read path now was purely DB-layer, addressed here.

Fixes

1. unwind() must restore empty tombstones — db/state/domain.go:1317 (both DupSort and LargeValues paths)

DomainEntryDiff.Value has three shapes, documented in db/kv/helpers.go:247:

  • nil — "different step": prev value lives at another step, skip restore (legacy V0 changeset shape)
  • []byte{} — "no previous value": key was absent before this step; write an empty tombstone so the key appears absent again after the unwind completes
  • non-empty — restore the actual previous value

The old guard if len(value) > 0 skipped both nil and []byte{}, leaving no tombstone after unwinding a first-time write. Corrected to if value != nil.

2. getLatestFromDb must treat empty values as authoritative — db/state/domain.go:1665

Empty-value entries are deletion tombstones. The step-age guard previously discarded them when their step fell within the frozen file range, causing the caller to fall through to getLatestFromFiles. Frozen files encode deletions as absence, so the file returns the pre-deletion value — the exact resurrection the deletion was meant to prevent. Empty entries are now returned as found=true regardless of step age; the step-age guard still applies to non-empty entries.

Relationship to #20483 / #20509

This is a deliberate re-land of the narrow core of #20483. Key differences:

Tests

  • TestDomain_UnwindRestoresDeletionMarker (DupSort + LargeValues subtests) — writes a key, deletes it, re-writes within the same step, builds files, then unwinds the re-write. Fails on pre-fix code (getLatest returns the stale post-unwind value2 from the frozen file); passes with the fix. Exercises the write-side bug directly.

  • TestDomain_DeletedKeyNotResurrectedByFiles (DupSort + LargeValues subtests) — documents the read-side contract by writing a key and deleting it at a step that falls within file range. Passes on current main even without the fix (the file-build + prune semantics evolved since db/state: fix stale slot resurrection after unwind and deletion #20483 and no longer hit the specific stale-read in this exact test scenario), but retained as a forward regression guard and as documentation of the invariant.

Test plan

  • go test -short ./db/state/... — all pass
  • make lint — 0 issues
  • make erigon — builds clean
  • Manual repro of the production symptom (mainnet sync from snapshots-only) in combination with db/state: prune TemporalMemBatch overlay entries past unwindToTxNum #20625 — sync progresses past the catch-up / first-forkchoice-unwind window without a gas mismatch. (Re-verification run in progress alongside this PR.)

🤖 Generated with Claude Code

@sudeepdino008

Copy link
Copy Markdown
Member

Frozen files encode deletions as absence, so the file returns the pre-deletion value — the exact resurrection the deletion was meant to prevent

files do encode empty byte as marker for deletion. this is different from absence

@yperbasis

Copy link
Copy Markdown
Member

Fix #1 relies on the nil vs []byte{} distinction in DomainEntryDiff.Value:

  • nil → skip Put (different-step skip, legacy V0)
  • []byte{} → Put empty tombstone
  • non-empty → Put the actual value

Both empty cases have len() == 0, so the new if value != nil guard is the only place in this path where the distinction is load-bearing. Nothing upstream enforces it — DomainDiff.DomainUpdate at db/kv/helpers.go:288 even rewrites nil[]byte{} in the creation path, and a casual refactor that collapses empty slices to nil (a common style lint) would silently regress this.

TestDomain_UnwindRestoresDeletionMarker currently covers the []byte{} case. Worth adding a sibling subtest that explicitly uses Value: nil and asserts no entry is written at unwindStep:

diffs := []kv.DomainEntryDiff{{Key: string(k) + string(stepBytes), Value: nil}}
require.NoError(dt.unwind(ctx, tx, step, txNumUnwindTo, currentFilesEndStep, diffs))
// Expect: no (k, unwindStep, ...) entry — nil means "different step, skip"

That pins the asymmetry into the test suite so a future refactor can't silently flip the semantics back.

Fix the DB-level unwind path so first-time-write reversals leave a proper
empty-value tombstone in the domain values table, rather than deleting the
current-step entry and silently falling back to files (which have no
concept of deletions and therefore return stale pre-unwind data).

`DomainEntryDiff.Value` carries three distinct shapes that must be handled
differently during unwind (per the comment in db/kv/helpers.go):

- nil         — "different step": prev value lives at another step, skip
                  restore (only produced by legacy V0 changesets where
                  valueLen==0 deserializes as nil)
- []byte{}    — "no previous value": key was absent before this step; write
                  an empty tombstone at the unwind step so subsequent reads
                  see the key as absent again
- non-empty   — restore the actual previous value

The old guard `if len(value) > 0` conflated `nil` and `[]byte{}` and
skipped both, leaving no tombstone after unwinding a first-time write.
Corrected to `if value != nil`, matching the diff's documented contract.

The written tombstone lands at max(step, currentFilesEndStep), which by
construction is beyond the frozen file range, so the existing step-age
guard in getLatestFromDb already accepts it without needing to change the
read path.

This is a deliberately narrow re-land of #20483 (reverted by #20509). The
PR's other two pieces — the `len(v)==0` empty-authority bypass in
getLatestFromDb and the LargeValues cross-check in getLatest — are NOT
included here. Regressions surfaced in kurtosis/assertoor_glamsterdam
(which runs with --experimental.bal and the storagerefundtx spammer,
generating a deluge of tombstones) suggest the empty-authority bypass
has downstream interactions with the BAL / post-Amsterdam paths that
need separate narrowing. The interrupted-prune LargeValues case is the
original regression suspect from #20509 and remains out-of-scope.

Tests in db/state/domain_test.go:

- TestDomain_UnwindRestoresDeletionMarker (DupSort + LargeValues) — write
  a key, delete it, re-write within the same step, build files, then unwind
  the re-write. Fails on pre-fix code (getLatest returns the stale post-
  unwind value from the frozen file); passes with this fix. Directly
  exercises the write-side bug.

- TestDomain_DeletedKeyNotResurrectedByFiles (DupSort + LargeValues) —
  retained as a forward regression guard for the read-side contract,
  even though it passes on both pre-fix and post-fix code on current main
  (file-build + prune semantics have evolved since #20483 such that the
  original scenario no longer triggers the read-side bug on its own).

Tested: go test -short ./db/state/... all pass; make lint 0 issues;
make erigon builds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mh0lt mh0lt force-pushed the fix/domain-stale-read-after-unwind branch from 760387c to c447697 Compare April 17, 2026 17:37
@mh0lt

mh0lt commented Apr 17, 2026

Copy link
Copy Markdown
Contributor Author

Force-pushed a narrowed version of this PR to address the recurring kurtosis/assertoor_glamsterdam CI failure.

Summary of the narrowing:

  • Kept: the unwind() write-side fix that restores empty tombstones for first-time writes (fix Pull from go-ethereum up to 26aea73 (7 Feb 2019) #1 in the original).
  • Dropped: the getLatestFromDb empty-value authority bypass (the if len(v) == 0 { return ...true } at line ~1665 of domain.go).

Why:

The glamsterdam suite is the one kurtosis configuration that runs with --experimental.bal plus the storagerefundtx + setcodetx spammers — a workload that generates a continuous deluge of empty-value tombstones in the storage domain. With the empty-authority bypass active, many more reads return (ok=true, v=[]byte{}) where they previously returned (ok=false, v=nil); some downstream consumer in the BAL / post-Amsterdam codepath is not robust to this semantic flip and the test hangs (runtime jumped from ~12 min on PR #20624/#20625 to 1h07m / 1h38m on two consecutive runs of this PR's previous version).

The unwind fix alone is sufficient for the originally-reported bug (stale reads after unwind on post-Fusaka mainnet), because the tombstone is written at max(step, currentFilesEndStep) — already beyond the file range, so the existing step-age guard accepts it without any read-path change.

TestDomain_UnwindRestoresDeletionMarker still fails on pre-fix code and passes with this narrower fix, so the regression guard is preserved.

Out of scope, tracked for follow-up:

CI will re-run on the force-push — expecting glamsterdam to return to ~12min passing.

@mh0lt

mh0lt commented Apr 17, 2026

Copy link
Copy Markdown
Contributor Author

Confirmed — the narrowing works.

Local reproduction (kurtosis 1.18.1, ethereum-package 6.1.0, same glamsterdam.io config as CI):

[18:01:12] block=44 (Δ=44) stability=success proposal=success
GLAMSTERDAM_PASS

Total wall time from enclave RUNNING to both assertoor tests green: ~5 min (local hardware is more generous than the GH runner).

CI also green on the amended commit: kurtosis / assertoor_glamsterdam_test: pass (12m54s) — back to the normal 12-min envelope (versus 1h07m / 1h38m on the previous version of this PR).

So the getLatestFromDb empty-authority bypass (fix #2 in the original #20483) was indeed the trigger. Glamsterdam's storagerefundtx spammer generates thousands of empty-value storage tombstones per block via DomainDel (rw_v3.go:805). With the bypass active, every subsequent read of those slots returned (ok=true, v=[]byte{}) where previously it returned (ok=false, v=nil). Some consumer in the BAL path under --experimental.bal isn't robust to that semantic flip. The narrowing keeps only the write-side tombstone-on-unwind, which doesn't change any read semantics and is sufficient for the original mainnet stale-read bug since tombstones are written at max(step, currentFilesEndStep) — already beyond the file range.

Follow-up work captured in the commit body:

  • the read-side empty-authority bypass can be re-landed with a narrower gate if a future bug reproduces the original stale-read scenario (needs the bypass), but should exclude the post-Amsterdam / --experimental.bal path
  • the LargeValues interrupted-prune cross-check from the original db/state: fix stale slot resurrection after unwind and deletion #20483 remains deferred

AskAlexSharov added a commit that referenced this pull request Apr 18, 2026
…elease/3.4) (#20636)

Cherry-pick of #20627 to release/3.4.

---------

Co-authored-by: mh0lt <mark.holt@erigon.tech>
@AskAlexSharov AskAlexSharov added this pull request to the merge queue Apr 18, 2026
@AskAlexSharov AskAlexSharov removed this pull request from the merge queue due to a manual request Apr 18, 2026
@AskAlexSharov AskAlexSharov added this pull request to the merge queue Apr 18, 2026
@AskAlexSharov

Copy link
Copy Markdown
Collaborator

@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 18, 2026
@yperbasis yperbasis added this pull request to the merge queue Apr 18, 2026
Merged via the queue into main with commit d593fd3 Apr 18, 2026
36 checks passed
@yperbasis yperbasis deleted the fix/domain-stale-read-after-unwind branch April 18, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants