node/app/event: fix data race on eventBus.prevQueueSize#21551
Conversation
prevQueueSize gates a debug-logging heuristic for execpool overflow, but was read and written without synchronization by concurrent async Publish goroutines. Make it an atomic.Int64. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR removes a data race in node/app/event by making eventBus.prevQueueSize atomic, since it is read/written concurrently by multiple async Publish goroutines as part of a debug logging heuristic.
Changes:
- Change
eventBus.prevQueueSizefrominttoatomic.Int64to make concurrent reads/writes race-free. - Update the queue-size comparison and assignment logic to use
Load()/Store(). - Rely on the zero value of
atomic.Int64inNewEventBus(remove explicit initialization).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Both comparisons now see one consistent snapshot and it is a single atomic load instead of two. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — straightforward data-race fix by making prevQueueSize atomic for execpool overflow logging.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — straightforward atomic fix for prevQueueSize, which removes the data race without changing the surrounding behavior.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — straightforward atomic.Int64 fix for a clear data race on prevQueueSize.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — small, obvious atomic fix for prevQueueSize race.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — straightforward data-race fix: prevQueueSize becomes atomic and the follow-up reads one consistent snapshot per publish.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — small, obvious atomic fix for prevQueueSize race accounting.
Giulio2002
left a comment
There was a problem hiding this comment.
LGTM — straightforward atomic fix for prevQueueSize data race.
## Summary A sweep of the repo's **unconditionally-disabled tests** — `t.Skip(...)` not gated on `testing.Short()`. This removes **24** such skips, either by fixing the underlying behavior or by converting permanent skips into clean precondition gates (live node, snapshot file, env var, RAM) that run when the resource is present and skip cleanly when it isn't. What's intentionally left skipped is listed below. (Out of scope: `testing.Short()` skips, Windows/race-guarded skips, runtime precondition guards, and commented-out skips.) ## t8n / `cmd/evm` (the meaty part) Un-skipping `TestT8n` surfaced **5 stacked bugs**, all fixed here: 1. `getTransaction` dropped the JSON `v/r/s`, so signed txs got a zero signature → `ErrInvalidSig`. Now preserves V/R/S (new `TestGetTransactionPreservesSignature`). 2. **E3 state plumbing** — `MakePreState` read base state from `tx` while writing to `SharedDomains` (execution saw balance 0), and `CalculateStateRoot` used a fresh `SharedDomains` (→ empty-trie root). Now reads/commits on the same `sd` that executed, then flushes and records the block→txNum so the alloc dumper sees the post-state. 3. `EphemeralExecResult.Difficulty` now serializes as hex (`*math.HexOrDecimal256`), matching the spec. (Only non-test change; the field is marshaled solely by t8n.) 4. A blockhash requested but missing from the env now yields exit code 4 instead of a silent tx rejection. 5. Empty receipts serialize as `null`, matching the reference output. ## Deliberately left skipped - **`node/app/component` gomock tests** (`TestComponentLifecycle`, `TestDomainLifecycle`, `TestAddRemoveDeps`) — left skipped; this PR doesn't touch them. The whole package is disabled by a pre-existing `TestMain` `os.Exit(0)`; re-enabling it exposes a data race (fixed in erigontech#21551) plus a flaky test and a goroutine-leak deadlock, tracked in erigontech#21552. - **HexPatriciaHashed traps** — `Test_ModeUpdate_SiblingConsistency` (erigontech#20961) and `Test_HexPatriciaHashed_StateRestoreAndContinue` (concurrent map write): consensus-critical, left as regression markers. - **Feature-gated** — `execmodule` background-commit and `component/fuzz_test` wait on features not yet built. - **`node.TestRegisterProtocols`** — empty `// TODO` stub for p2p registration that moved to sentry; recommend deleting. - **Likely real bugs, left for owner judgment** — `execution/state.TestCreateOnExistingStorage` (Erigon keeps pre-existing storage on CREATE-over; go-ethereum clears it) and `db/test.Test_AggregatorV3_RestartOnDatadir_{WithoutDB,WithoutAnything}` (restart-from-files-without-DB fails `commitmentdb` freshness). ## Testing `make lint` clean, `make erigon integration` builds, and every touched test passes (resource-gated ones skip cleanly without their resources). Latest `main` is merged in, which brings erigontech#21505's removal of the legacy `headerdownload` / `bodydownload` / `dataflow` packages. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it>
Summary
eventBus.prevQueueSizegates a debug-logging heuristic (the "Execpool overflowing / recovering" messages), but it was read (eventbus.go:185/192) and written (eventbus.go:201) without synchronization while multiple asyncPublishgoroutines run concurrently. The race detector flags this. It's harmless to correctness (logging only), but a genuine data race — fixed by making the field anatomic.Int64.Context
Surfaced while investigating the disabled
node/app/componenttest package (whoseTestMaindoesos.Exit(0), so none of its tests run). Re-enabling that package under-raceflagged this race first. It's split out here as a focused, independently-reviewable fix.The remaining problems that keep the component package disabled — cross-test event-subscription leakage and ~400 leaked actor goroutines that deadlock the suite — are not addressed here and are tracked in #21552.
Testing
go build ./node/app/...— cleannode/app/eventpassesgo test -race -count=5make lint— clean🤖 Generated with Claude Code