Skip to content

interop/orcalb: Delegate SubConn management to pick_first#8914

Merged
mbissa merged 6 commits into
grpc:masterfrom
zarinn3pal:feat/interop-orca-to-pf
Apr 28, 2026
Merged

interop/orcalb: Delegate SubConn management to pick_first#8914
mbissa merged 6 commits into
grpc:masterfrom
zarinn3pal:feat/interop-orca-to-pf

Conversation

@zarinn3pal

@zarinn3pal zarinn3pal commented Feb 19, 2026

Copy link
Copy Markdown
Contributor

This PR switches the interop ORCA test load balancer to delegate SubConn management to endpointsharding + pick_first instead of creating and handling SubConns itself As per gRFC A61 .

Fixes #8809

RELEASE NOTES: N/A

@zarinn3pal

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully transitions the orcalb interop balancer to delegate SubConn management to endpointsharding and pick_first, resolving the dualstack support issue (#8809) and adhering to gRFC A61. The implementation of the wrapper balancer and the picker logic for ORCA load report handling is solid, and the new tests provide good coverage for failover and OOB reporting scenarios. I identified one potential nil pointer dereference during balancer shutdown that should be addressed.

Comment thread interop/orcalb.go
@codecov

codecov Bot commented Feb 19, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.60%. Comparing base (12e91dd) to head (69540c5).
⚠️ Report is 53 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8914      +/-   ##
==========================================
- Coverage   83.04%   80.60%   -2.45%     
==========================================
  Files         411      413       +2     
  Lines       32892    33543     +651     
==========================================
- Hits        27316    27038     -278     
- Misses       4181     4300     +119     
- Partials     1395     2205     +810     

see 67 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@easwars easwars added the Type: Internal Cleanup Refactors, etc label Feb 19, 2026
@easwars easwars added this to the 1.80 Release milestone Feb 19, 2026

@Pranjali-2501 Pranjali-2501 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @zarinn3pal, thankyou for your contribution.

I have some comments, could you please take a look at that.

Comment thread interop/orcalb.go Outdated
Comment thread interop/orcalb.go Outdated
Comment thread interop/orcalb.go Outdated
Comment thread interop/orcalb.go Outdated
Comment thread interop/orcalb_test.go Outdated
Comment thread interop/orcalb_test.go Outdated
Comment thread interop/orcalb_test.go Outdated
@zarinn3pal

Copy link
Copy Markdown
Contributor Author

Hi @Pranjali-2501 ,
Thank you for the review. I have addressed the concerns. Can you please have a look into it again?
Thanks

@zarinn3pal

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant architectural improvement for the interop ORCA test load balancer. It refactors the balancer to delegate SubConn management to endpointsharding and pick_first, aligning with gRFC A61 and enabling proper dual-stack support. The implementation correctly follows the parent/child balancer pattern, intercepting calls to inject ORCA-specific logic for OOB listeners and picker wrapping. The changes are robust and well-structured. Furthermore, the addition of a comprehensive test suite in orcalb_test.go is excellent, covering various scenarios including multiple addresses, multiple endpoints, and resolver updates. The code quality is high, and I did not find any issues of medium or higher severity.

Comment thread interop/orcalb_test.go
Comment thread interop/orcalb_test.go
Comment thread interop/orcalb_test.go Outdated
Comment thread interop/orcalb_test.go Outdated

@Pranjali-2501 Pranjali-2501 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added few more comments, PTAL.

@easwars

easwars commented Apr 7, 2026

Copy link
Copy Markdown
Contributor

Hi @easwars , I have some questions regarding the suggestions that were made.

I think I've addressed the two questions that I saw. Please take care of the other comments.

@github-actions

Copy link
Copy Markdown

This PR is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@zarinn3pal

Copy link
Copy Markdown
Contributor Author

@easwars , I have addressed the remaining comments. And added why using async would not be a good idea.
What are your thoughts on that?

@zarinn3pal zarinn3pal removed their assignment Apr 15, 2026
@easwars

easwars commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

But the test is still failing on 32 bit architecture. Can you please take a look?

@zarinn3pal

Copy link
Copy Markdown
Contributor Author

But the test is still failing on 32 bit architecture. Can you please take a look?

The issue was because of use of defer mentioned in the review below:
https://github.com/grpc/grpc-go/pull/8914/changes#r2998955808

defer delayed stopping the old listener until after the new one was registered, causing both to be active briefly. The race condition appeared between unlocking the mutex and the deferred stop call, allowing both old and new listeners to be active and receive reports. The fix was to stop the old listener immediately before registering the new one, eliminating this overlap.

@zarinn3pal zarinn3pal removed their assignment Apr 20, 2026

@easwars easwars left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, modulo minor comments

Comment thread interop/orcalb_test.go
Comment on lines +32 to +33
testgrpc "google.golang.org/grpc/interop/grpc_testing"
testpb "google.golang.org/grpc/interop/grpc_testing"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please group proto imports as a separate group.

Comment thread interop/orcalb_test.go
grpctest.RunSubTests(t, s{})
}

const defaultTestTimeout = 30 * time.Second

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really require a 30s timeout? Most of our test timeouts are 5 or 10s. And in the rare cases where we have anything more than 10s, we clearly document why those tests need to run for longer.

Comment thread interop/orcalb_test.go
srvB.metricsRecorder.SetCPUUtilization(0.9)

r := manual.NewBuilderWithScheme("test")
cc, err := grpc.Dial(r.Scheme()+":///test",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and below, does this have to grpc.Dial? FWIW, grpc.Dial is deprecated in favor of grpc.NewClient. The latter starts the channel off in idle mode though, but the first RPC should move it out of idle into connecting.

Comment thread interop/orcalb_test.go
wantA := &v3orcapb.OrcaLoadReport{CpuUtilization: 0.1}
wantB := &v3orcapb.OrcaLoadReport{CpuUtilization: 0.9}
seenA, seenB := false, false
for ; ctx.Err() == nil && (!seenA || !seenB); <-time.After(100 * time.Millisecond) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You could probably try every 10ms to get the test to complete faster.

@easwars easwars requested review from mbissa and removed request for Pranjali-2501 April 21, 2026 03:58
@easwars easwars assigned mbissa and unassigned easwars Apr 21, 2026
@easwars

easwars commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Moving to @mbissa for second review

@mbissa

mbissa commented Apr 21, 2026

Copy link
Copy Markdown
Contributor

Checking.

@mbissa mbissa merged commit bdf4571 into grpc:master Apr 28, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

interop/orcalb: Delegate SubConn management to pickfirst

5 participants