Skip to content

refactor(beacon-chain): potential race to write callbacks#3

Merged
mattevans merged 4 commits intomasterfrom
refactor/negate-potential-race
Apr 23, 2025
Merged

refactor(beacon-chain): potential race to write callbacks#3
mattevans merged 4 commits intomasterfrom
refactor/negate-potential-race

Conversation

@mattevans
Copy link
Copy Markdown
Member

@mattevans mattevans commented Feb 7, 2025

tldr; running go test with -race is picking-up that this insn't thread-safe.


I'm writing some unrelated e2e tests for contributoor and was getting some unusual failures stemming from this pkg when running tests with -race.

This is not a problem for production right now, given how we typically use this pkg during initialization

Initially I thought this was related to our multi-beacon-node changes in contributoor:

  • Each MetadataService creates its own ethwallclock instance
  • Each instance has its own independent callback slices
  • There's no sharing of callbacks between instances
  • The race detector was catching races within individual instances, not between them. Isn't smart enough to tell the difference.

If you run go test -v -race ./... on master of this repo, you'll see the race's/failures in question.

I'd love to keep using -race across contributoor and other pkgs, so hence this PR.

@mattevans mattevans self-assigned this Feb 7, 2025
@mattevans mattevans marked this pull request as ready for review February 7, 2025 04:44
test(beacon_chain): add tests for concurrent stop and nil wallclock scenarios

Adds a `Stop` method to the `EthereumBeaconChain` to allow for graceful shutdown of the background goroutines. This prevents goroutine leaks when the beacon chain is no longer needed.

Adds tests to ensure that calling `Stop` concurrently with callback registration does not cause a race condition or panic.

Adds a test to specifically address a production issue where calling `OnEpochChanged` on a nil `EthereumBeaconChain` receiver caused a panic. This test simulates the scenario where the wallclock becomes nil and verifies that the expected panic occurs, highlighting the need for nil checks before calling methods on the wallclock.
@mattevans mattevans merged commit b65c869 into master Apr 23, 2025
1 check passed
@mattevans mattevans deleted the refactor/negate-potential-race branch April 23, 2025 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants