txpool: fix goroutine leak in Fetch on shutdown#20006
Merged
Merged
Conversation
ConnectCore and ConnectSentries spawn goroutines that are not waited on when TxPool.Run returns. After context cancellation, Run exits via the errgroup but the fetch goroutines keep running — they may still be in a retry sleep or blocking on a stream when the DB and other resources are closed. Two fixes: 1. Track all goroutines spawned by ConnectCore/ConnectSentries with a WaitGroup. TxPool.Run defers Wait() so it blocks until they exit. 2. Replace bare time.Sleep calls in retry loops with context-aware selects so the goroutines exit promptly on cancellation instead of sleeping through a 3-second backoff. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Giulio2002
approved these changes
Mar 19, 2026
Giulio2002
left a comment
Contributor
There was a problem hiding this comment.
LGTM — clean goroutine leak fix: context-aware sleeps + WaitGroup tracking for ConnectCore/ConnectSentries goroutines
Verifies that goroutines spawned by ConnectCore/ConnectSentries exit promptly after context cancellation, rather than sleeping through a 3-second retry backoff. Fails if any context-aware select is reverted to bare time.Sleep. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ConnectCore/ConnectSentrieswith aWaitGroup;TxPool.RundefersWait()so it blocks until they all exittime.Sleepcalls in retry loops withselectonctx.Done()so goroutines exit promptly on cancellation instead of sleeping through a 3-second backoffThese goroutines were previously fire-and-forget: after context cancellation,
Run()would return via the errgroup while the fetch goroutines were still in retry sleeps or blocking on streams. Downstream cleanup (DB.Close(), etc.) could then race with them.Found while investigating flaky
TestCaplinBlockProductionWithWithdrawalRequestin #19981.Test plan
go test -race ./txnprovider/txpool/passesgo test -race -count=3 ./cl/beacon/handler/ -run TestCaplinBlockProductionWithWithdrawalRequestpasses without goroutine leak🤖 Generated with Claude Code