Skip to content

txpool: fix goroutine leak in Fetch on shutdown#20006

Merged
yperbasis merged 5 commits into
mainfrom
txpool-fetch-goroutine-leak
Mar 20, 2026
Merged

txpool: fix goroutine leak in Fetch on shutdown#20006
yperbasis merged 5 commits into
mainfrom
txpool-fetch-goroutine-leak

Conversation

@yperbasis

Copy link
Copy Markdown
Member

Summary

  • Track goroutines spawned by ConnectCore/ConnectSentries with a WaitGroup; TxPool.Run defers Wait() so it blocks until they all exit
  • Replace bare time.Sleep calls in retry loops with select on ctx.Done() so goroutines exit promptly on cancellation instead of sleeping through a 3-second backoff

These goroutines were previously fire-and-forget: after context cancellation, Run() would return via the errgroup while the fetch goroutines were still in retry sleeps or blocking on streams. Downstream cleanup (DB.Close(), etc.) could then race with them.

Found while investigating flaky TestCaplinBlockProductionWithWithdrawalRequest in #19981.

Test plan

  • go test -race ./txnprovider/txpool/ passes
  • go test -race -count=3 ./cl/beacon/handler/ -run TestCaplinBlockProductionWithWithdrawalRequest passes without goroutine leak

🤖 Generated with Claude Code

ConnectCore and ConnectSentries spawn goroutines that are not waited
on when TxPool.Run returns. After context cancellation, Run exits
via the errgroup but the fetch goroutines keep running — they may
still be in a retry sleep or blocking on a stream when the DB and
other resources are closed.

Two fixes:

1. Track all goroutines spawned by ConnectCore/ConnectSentries with a
   WaitGroup. TxPool.Run defers Wait() so it blocks until they exit.

2. Replace bare time.Sleep calls in retry loops with context-aware
   selects so the goroutines exit promptly on cancellation instead
   of sleeping through a 3-second backoff.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis requested a review from taratorio as a code owner March 19, 2026 11:10
@yperbasis yperbasis requested a review from mh0lt March 19, 2026 11:28

@Giulio2002 Giulio2002 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean goroutine leak fix: context-aware sleeps + WaitGroup tracking for ConnectCore/ConnectSentries goroutines

info@weblogix.biz and others added 4 commits March 19, 2026 15:03
Verifies that goroutines spawned by ConnectCore/ConnectSentries exit
promptly after context cancellation, rather than sleeping through a
3-second retry backoff. Fails if any context-aware select is reverted
to bare time.Sleep.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yperbasis yperbasis enabled auto-merge March 20, 2026 12:09
@yperbasis yperbasis added this pull request to the merge queue Mar 20, 2026
Merged via the queue into main with commit 06f4942 Mar 20, 2026
34 checks passed
@yperbasis yperbasis deleted the txpool-fetch-goroutine-leak branch March 20, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants