Skip to content

node/app/event: fix data race on eventBus.prevQueueSize#21551

Merged
taratorio merged 2 commits into
mainfrom
yperbasis/eventbus-prevqueuesize-race
Jun 3, 2026
Merged

node/app/event: fix data race on eventBus.prevQueueSize#21551
taratorio merged 2 commits into
mainfrom
yperbasis/eventbus-prevqueuesize-race

Conversation

@yperbasis

@yperbasis yperbasis commented Jun 1, 2026

Copy link
Copy Markdown
Member

Summary

eventBus.prevQueueSize gates a debug-logging heuristic (the "Execpool overflowing / recovering" messages), but it was read (eventbus.go:185/192) and written (eventbus.go:201) without synchronization while multiple async Publish goroutines run concurrently. The race detector flags this. It's harmless to correctness (logging only), but a genuine data race — fixed by making the field an atomic.Int64.

Context

Surfaced while investigating the disabled node/app/component test package (whose TestMain does os.Exit(0), so none of its tests run). Re-enabling that package under -race flagged this race first. It's split out here as a focused, independently-reviewable fix.

The remaining problems that keep the component package disabled — cross-test event-subscription leakage and ~400 leaked actor goroutines that deadlock the suite — are not addressed here and are tracked in #21552.

Testing

  • go build ./node/app/... — clean
  • node/app/event passes go test -race -count=5
  • make lint — clean

🤖 Generated with Claude Code

prevQueueSize gates a debug-logging heuristic for execpool overflow, but was read and written without synchronization by concurrent async Publish goroutines. Make it an atomic.Int64.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes a data race in node/app/event by making eventBus.prevQueueSize atomic, since it is read/written concurrently by multiple async Publish goroutines as part of a debug logging heuristic.

Changes:

  • Change eventBus.prevQueueSize from int to atomic.Int64 to make concurrent reads/writes race-free.
  • Update the queue-size comparison and assignment logic to use Load()/Store().
  • Rely on the zero value of atomic.Int64 in NewEventBus (remove explicit initialization).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread node/app/event/eventbus.go Outdated
Both comparisons now see one consistent snapshot and it is a single
atomic load instead of two.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@Giulio2002 Giulio2002 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward data-race fix by making prevQueueSize atomic for execpool overflow logging.

@Giulio2002 Giulio2002 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward atomic fix for prevQueueSize, which removes the data race without changing the surrounding behavior.

@Giulio2002 Giulio2002 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward atomic.Int64 fix for a clear data race on prevQueueSize.

@Giulio2002 Giulio2002 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small, obvious atomic fix for prevQueueSize race.

@Giulio2002 Giulio2002 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward data-race fix: prevQueueSize becomes atomic and the follow-up reads one consistent snapshot per publish.

@Giulio2002 Giulio2002 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small, obvious atomic fix for prevQueueSize race accounting.

@Giulio2002 Giulio2002 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward atomic fix for prevQueueSize data race.

@taratorio taratorio added this pull request to the merge queue Jun 3, 2026
Merged via the queue into main with commit e7f75c9 Jun 3, 2026
99 checks passed
@taratorio taratorio deleted the yperbasis/eventbus-prevqueuesize-race branch June 3, 2026 03:19
chris-mercer pushed a commit to white-b0x/erigon that referenced this pull request Jun 3, 2026
## Summary

A sweep of the repo's **unconditionally-disabled tests** — `t.Skip(...)`
not gated on `testing.Short()`. This removes **24** such skips, either
by fixing the underlying behavior or by converting permanent skips into
clean precondition gates (live node, snapshot file, env var, RAM) that
run when the resource is present and skip cleanly when it isn't. What's
intentionally left skipped is listed below.

(Out of scope: `testing.Short()` skips, Windows/race-guarded skips,
runtime precondition guards, and commented-out skips.)

## t8n / `cmd/evm` (the meaty part)

Un-skipping `TestT8n` surfaced **5 stacked bugs**, all fixed here:

1. `getTransaction` dropped the JSON `v/r/s`, so signed txs got a zero
signature → `ErrInvalidSig`. Now preserves V/R/S (new
`TestGetTransactionPreservesSignature`).
2. **E3 state plumbing** — `MakePreState` read base state from `tx`
while writing to `SharedDomains` (execution saw balance 0), and
`CalculateStateRoot` used a fresh `SharedDomains` (→ empty-trie root).
Now reads/commits on the same `sd` that executed, then flushes and
records the block→txNum so the alloc dumper sees the post-state.
3. `EphemeralExecResult.Difficulty` now serializes as hex
(`*math.HexOrDecimal256`), matching the spec. (Only non-test change; the
field is marshaled solely by t8n.)
4. A blockhash requested but missing from the env now yields exit code 4
instead of a silent tx rejection.
5. Empty receipts serialize as `null`, matching the reference output.

## Deliberately left skipped

- **`node/app/component` gomock tests** (`TestComponentLifecycle`,
`TestDomainLifecycle`, `TestAddRemoveDeps`) — left skipped; this PR
doesn't touch them. The whole package is disabled by a pre-existing
`TestMain` `os.Exit(0)`; re-enabling it exposes a data race (fixed in
erigontech#21551) plus a flaky test and a goroutine-leak deadlock, tracked in
erigontech#21552.
- **HexPatriciaHashed traps** — `Test_ModeUpdate_SiblingConsistency`
(erigontech#20961) and `Test_HexPatriciaHashed_StateRestoreAndContinue`
(concurrent map write): consensus-critical, left as regression markers.
- **Feature-gated** — `execmodule` background-commit and
`component/fuzz_test` wait on features not yet built.
- **`node.TestRegisterProtocols`** — empty `// TODO` stub for p2p
registration that moved to sentry; recommend deleting.
- **Likely real bugs, left for owner judgment** —
`execution/state.TestCreateOnExistingStorage` (Erigon keeps pre-existing
storage on CREATE-over; go-ethereum clears it) and
`db/test.Test_AggregatorV3_RestartOnDatadir_{WithoutDB,WithoutAnything}`
(restart-from-files-without-DB fails `commitmentdb` freshness).

## Testing

`make lint` clean, `make erigon integration` builds, and every touched
test passes (resource-gated ones skip cleanly without their resources).
Latest `main` is merged in, which brings erigontech#21505's removal of the legacy
`headerdownload` / `bodydownload` / `dataflow` packages.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: info@weblogix.biz <admin@10gbps.weblogix.it>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants