Skip to content

tests: avoid relying on socket reads/writes for synchronization#9033

Merged
arjan-bal merged 2 commits into
grpc:masterfrom
arjan-bal:fix-data-races
Apr 8, 2026
Merged

tests: avoid relying on socket reads/writes for synchronization#9033
arjan-bal merged 2 commits into
grpc:masterfrom
arjan-bal:fix-data-races

Conversation

@arjan-bal

Copy link
Copy Markdown
Contributor

Background

In #9032, we will transition from standard net.Conn.Read methods to syscall UNIX APIs to enable non-memory-pinning reads. Due to this change, the Go race detector beings failing on tests that share state between client and server goroutines without standard synchronization primitives (mutexes, channels, etc.).

Because these tests rely on the network request itself as a memory barrier, they are logically safe but technically racy from the Go runtime's perspective. The standard net.Conn leverages Go's internal network poller, which inadvertently provides the "happens-before" edges the race detector looks for. Dropping down to raw syscalls bypasses this, causing the detector to flag the accesses. (Minimal repro: https://go.dev/play/p/yvEtBmLTOJ2)

Solution

Introduce explicit synchronization to the affected tests. Tests now properly coordinate shared state access between clients and servers without relying on socket I/O timing.

RELEASE NOTES: N/A

@arjan-bal arjan-bal added this to the 1.81 Release milestone Apr 2, 2026
@codecov

codecov Bot commented Apr 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.09%. Comparing base (c754bf9) to head (aa9e4cd).
⚠️ Report is 10 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #9033      +/-   ##
==========================================
+ Coverage   82.99%   83.09%   +0.09%     
==========================================
  Files         413      413              
  Lines       33054    33269     +215     
==========================================
+ Hits        27433    27644     +211     
- Misses       4210     4211       +1     
- Partials     1411     1414       +3     

see 41 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arjan-bal arjan-bal requested a review from mbissa April 2, 2026 08:02
@arjan-bal arjan-bal changed the title tests: Avoid relying on socket reads/writes for synchronization tests: avoid relying on socket reads/writes for synchronization Apr 2, 2026
@arjan-bal arjan-bal requested a review from easwars April 2, 2026 12:47

@easwars easwars left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo some minor nits

Comment thread encoding/encoding_test.go
// Create a channel to the above server.
cc, err := grpc.NewClient(backend.Address, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
t.Fatalf("Failed to dial test backend at %q: %v", backend.Address, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use stubserver.Start() instead? That starts both the server and the client.

Same applies to other occurrences in this test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use stubserver.Start(). I tried that initially, however it results in a couple of extra lines of code because StartTestService sets a no-op EmptyCall handler which we need to set manually now.

)

func (s) TestClientSideRetry(t *testing.T) {
serverMu := sync.Mutex{}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please add a note here that the mutex protects both ctr and errs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread test/retry_test.go Outdated
Comment on lines 84 to 85

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you are here, could you change this to wantCode and wantCount. This test is quite hard to read without that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread test/retry_test.go Outdated
Comment on lines 151 to 152

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here too. Thanks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread test/end2end_test.go Outdated
Comment on lines +4501 to +4503
// mu guards access to the unaryCall field.
mu sync.Mutex
unaryCall func(ctx context.Context, in *testpb.SimpleRequest) (*testpb.SimpleResponse, error)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only test that updates this field is TestLargeTimeout. Should we instead change it to create a new funcServer for each of its timeouts? That way we would never have a situation where this field is written to after the funcServer is initialized.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a cleaner approach. Changed and removed the mutex.

@easwars easwars assigned arjan-bal and unassigned easwars and mbissa Apr 7, 2026
@arjan-bal arjan-bal merged commit 1c132b9 into grpc:master Apr 8, 2026
14 checks passed
@arjan-bal arjan-bal deleted the fix-data-races branch April 8, 2026 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants