-
Notifications
You must be signed in to change notification settings - Fork 6
fix(ci): race conditions in tests #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
the test was incorrectly using require.Eventually on already-read results. also, parallelRouter doesn't deduplicate, so peers[0] can appear twice (once with addresses from mr1, once without from mr2). fixed by searching for the record with addresses before verifying filtration.
manyIter.Close() would deadlock when goroutines were blocked on unbuffered channel sends. fixed with synchronous drain that matches http streaming semantics from routing v1 spec: - idempotent close (safe to call multiple times) - drain channel to unblock senders (expected for early termination) - proper cleanup of child iterators also added missing defer it.Close() in tests.
dispatchFindPeer goroutines blocked forever trying to send to unbuffered channel when iterator closed early. fixed with buffered channel + non-blocking sends that drop results if nobody is reading (acceptable for background lookups). also added context cancellation in Close() to stop goroutines.
added defer it.Close() to all test iterators to prevent resource leaks that could cause intermittent hangs.
replaced recursion with loop to avoid stack overflow. maintains original logic: after dispatching lookup for peer without addresses, continues trying source iterator or waiting for lookup results. also fixed deadlock when ongoingLookups reaches 0 between check and channel read by using periodic timeout.
- removed unnecessary handleRecord closure - cache ongoingLookups value to avoid race in logging - use defer timer.Stop() for proper cleanup
- fix iterator leak in dispatchFindPeer by adding defer Close() - remove redundant goroutine in manyIter.Close() that could cause double-close panic. the channel is already closed by the goroutine in newManyIter.
Contributor
|
Triage:
|
gammazero
reviewed
Sep 23, 2025
gammazero
approved these changes
Sep 23, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The main branch fails very very often, especially on Windows.
This PR should stabilize CI by removing some races / deadlocks when tests are run with things like
go test -shuffle=on -failfast -race -count=10 ./....I don't think this will have impact on real world queries but maybe there are better ways of dealing with those?
fix: race condition in
TestFindProvidersthe test was incorrectly using require.Eventually on already-read results. also, parallelRouter doesn't deduplicate, so peers[0] can appear twice (once with addresses from mr1, once without from mr2). fixed by searching for the record with addresses before verifying filtration.
fix: deadlock in
manyIter.Close()manyIter.Close() would deadlock when goroutines were blocked on unbuffered channel sends, which occured sometimes on CI:
fixed with synchronous drain that seems to be most compatible with http streaming semantics from routing v1 spec (clients ending as soon they got enough providers, drain channel to unblock senders)
fix: deadlock in
cacheFallbackIterdispatchFindPeer goroutines blocked forever trying to send to unbuffered channel when iterator closed early.
unsure if this is the best way of dealing with this, but fixed with buffered channel + non-blocking sends that drop results if nobody is reading (acceptable for background lookups). also added context cancellation in Close() to correctly stop goroutines.
If we dispatch >100 concurrent lookups and the consumer is slow, we lose results silently. However, this is actually acceptable because these are lookups for peers which had no addresses anyway, and if we hit 100, means CID is extremely popular and dropping some of providers does not matter at that point
fix: infinite recursion in
cacheFallbackIter.Next()replaced recursion with loop to avoid stack overflow.
maintains original logic: after dispatching lookup for peer without addresses, continues trying source iterator or waiting for lookup results.
also fixed deadlock when ongoingLookups reaches 0 between check and channel read by using periodic timeout.