tests: avoid relying on socket reads/writes for synchronization#9033
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #9033 +/- ##
==========================================
+ Coverage 82.99% 83.09% +0.09%
==========================================
Files 413 413
Lines 33054 33269 +215
==========================================
+ Hits 27433 27644 +211
- Misses 4210 4211 +1
- Partials 1411 1414 +3 🚀 New features to boost your workflow:
|
easwars
left a comment
There was a problem hiding this comment.
LGTM, modulo some minor nits
| // Create a channel to the above server. | ||
| cc, err := grpc.NewClient(backend.Address, grpc.WithTransportCredentials(insecure.NewCredentials())) | ||
| if err != nil { | ||
| t.Fatalf("Failed to dial test backend at %q: %v", backend.Address, err) |
There was a problem hiding this comment.
Can we use stubserver.Start() instead? That starts both the server and the client.
Same applies to other occurrences in this test.
There was a problem hiding this comment.
Updated to use stubserver.Start(). I tried that initially, however it results in a couple of extra lines of code because StartTestService sets a no-op EmptyCall handler which we need to set manually now.
| ) | ||
|
|
||
| func (s) TestClientSideRetry(t *testing.T) { | ||
| serverMu := sync.Mutex{} |
There was a problem hiding this comment.
Nit: Please add a note here that the mutex protects both ctr and errs.
There was a problem hiding this comment.
While you are here, could you change this to wantCode and wantCount. This test is quite hard to read without that.
| // mu guards access to the unaryCall field. | ||
| mu sync.Mutex | ||
| unaryCall func(ctx context.Context, in *testpb.SimpleRequest) (*testpb.SimpleResponse, error) |
There was a problem hiding this comment.
The only test that updates this field is TestLargeTimeout. Should we instead change it to create a new funcServer for each of its timeouts? That way we would never have a situation where this field is written to after the funcServer is initialized.
There was a problem hiding this comment.
Yes, that is a cleaner approach. Changed and removed the mutex.
Background
In #9032, we will transition from standard
net.Conn.Readmethods tosyscallUNIX APIs to enable non-memory-pinning reads. Due to this change, the Go race detector beings failing on tests that share state between client and server goroutines without standard synchronization primitives (mutexes, channels, etc.).Because these tests rely on the network request itself as a memory barrier, they are logically safe but technically racy from the Go runtime's perspective. The standard
net.Connleverages Go's internal network poller, which inadvertently provides the "happens-before" edges the race detector looks for. Dropping down to raw syscalls bypasses this, causing the detector to flag the accesses. (Minimal repro: https://go.dev/play/p/yvEtBmLTOJ2)Solution
Introduce explicit synchronization to the affected tests. Tests now properly coordinate shared state access between clients and servers without relying on socket I/O timing.
RELEASE NOTES: N/A