interop/orcalb: Delegate SubConn management to pick_first#8914
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
The pull request successfully transitions the orcalb interop balancer to delegate SubConn management to endpointsharding and pick_first, resolving the dualstack support issue (#8809) and adhering to gRFC A61. The implementation of the wrapper balancer and the picker logic for ORCA load report handling is solid, and the new tests provide good coverage for failover and OOB reporting scenarios. I identified one potential nil pointer dereference during balancer shutdown that should be addressed.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #8914 +/- ##
==========================================
- Coverage 83.04% 80.60% -2.45%
==========================================
Files 411 413 +2
Lines 32892 33543 +651
==========================================
- Hits 27316 27038 -278
- Misses 4181 4300 +119
- Partials 1395 2205 +810 🚀 New features to boost your workflow:
|
Pranjali-2501
left a comment
There was a problem hiding this comment.
Hi @zarinn3pal, thankyou for your contribution.
I have some comments, could you please take a look at that.
65d6901 to
0d4bf8b
Compare
|
Hi @Pranjali-2501 , |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request is a significant architectural improvement for the interop ORCA test load balancer. It refactors the balancer to delegate SubConn management to endpointsharding and pick_first, aligning with gRFC A61 and enabling proper dual-stack support. The implementation correctly follows the parent/child balancer pattern, intercepting calls to inject ORCA-specific logic for OOB listeners and picker wrapping. The changes are robust and well-structured. Furthermore, the addition of a comprehensive test suite in orcalb_test.go is excellent, covering various scenarios including multiple addresses, multiple endpoints, and resolver updates. The code quality is high, and I did not find any issues of medium or higher severity.
Pranjali-2501
left a comment
There was a problem hiding this comment.
Added few more comments, PTAL.
0d4bf8b to
5b26eb2
Compare
I think I've addressed the two questions that I saw. Please take care of the other comments. |
|
This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed. |
|
@easwars , I have addressed the remaining comments. And added why using async would not be a good idea. |
|
But the test is still failing on 32 bit architecture. Can you please take a look? |
The issue was because of use of
|
easwars
left a comment
There was a problem hiding this comment.
LGTM, modulo minor comments
| testgrpc "google.golang.org/grpc/interop/grpc_testing" | ||
| testpb "google.golang.org/grpc/interop/grpc_testing" |
There was a problem hiding this comment.
Nit: Please group proto imports as a separate group.
| grpctest.RunSubTests(t, s{}) | ||
| } | ||
|
|
||
| const defaultTestTimeout = 30 * time.Second |
There was a problem hiding this comment.
Does this really require a 30s timeout? Most of our test timeouts are 5 or 10s. And in the rare cases where we have anything more than 10s, we clearly document why those tests need to run for longer.
| srvB.metricsRecorder.SetCPUUtilization(0.9) | ||
|
|
||
| r := manual.NewBuilderWithScheme("test") | ||
| cc, err := grpc.Dial(r.Scheme()+":///test", |
There was a problem hiding this comment.
Here and below, does this have to grpc.Dial? FWIW, grpc.Dial is deprecated in favor of grpc.NewClient. The latter starts the channel off in idle mode though, but the first RPC should move it out of idle into connecting.
| wantA := &v3orcapb.OrcaLoadReport{CpuUtilization: 0.1} | ||
| wantB := &v3orcapb.OrcaLoadReport{CpuUtilization: 0.9} | ||
| seenA, seenB := false, false | ||
| for ; ctx.Err() == nil && (!seenA || !seenB); <-time.After(100 * time.Millisecond) { |
There was a problem hiding this comment.
Nit: You could probably try every 10ms to get the test to complete faster.
|
Moving to @mbissa for second review |
|
Checking. |
This PR switches the interop ORCA test load balancer to delegate SubConn management to endpointsharding + pick_first instead of creating and handling SubConns itself As per gRFC A61 .
Fixes #8809
RELEASE NOTES: N/A