pass health check service name through LB policies via a channel arg#26441
Merged
markdroth merged 13 commits intogrpc:masterfrom Oct 7, 2021
Merged
Conversation
…nnected_subchannel_simplification
|
This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions! |
…nnected_subchannel_simplification
…o client_channel_health_check_service_name_in_channel_args
…alth_check_service_name_in_channel_args
donnadionne
approved these changes
Oct 6, 2021
…alth_check_service_name_in_channel_args
Member
Author
|
The test failures all look like infrastructure issues. |
copybara-service bot
pushed a commit
that referenced
this pull request
Nov 7, 2025
) Currently, the client channel maintains two different data structures for tracking subchannels: - `subchannel_refcount_map_`, which tracks the number of subchannel wrappers per subchannel. This is used to determine when to create and remove channelz node linkage. - `subchannel_wrappers_`, which tracks the set of all subchannel wrappers across all subchannels. This was originally introduced back in #20039 as part of propagating the health check service name to subchannels, but we switched to using a different approach for the health check service name in #26441. However, in between those two, we started using this data structure for handling keepalive time in #23313 -- which is actually somewhat inefficient, since we may wind up setting the keepalive time on the underlying subchannel more than once in the case where there is more than one subchannel wrapper per subchannel (which happens frequently during updates). This PR combines these two data structures into one: a map from subchannel to a set of subchannel wrappers for that subchannel. This is used both for channelz node updates and keepalive propagation -- and it's more efficient for the latter, because we can now update each subchannel exactly once. This also paves the way for a subsequent change that will be needed as part of the MAX_CONCURRENT_STREAMS design. Closes #40880 COPYBARA_INTEGRATE_REVIEW=#40880 from markdroth:client_channel_combine_subchannel_maps f289c20 PiperOrigin-RevId: 829602334
yuanweiz
pushed a commit
to yuanweiz/grpc
that referenced
this pull request
Nov 12, 2025
…c#40880) Currently, the client channel maintains two different data structures for tracking subchannels: - `subchannel_refcount_map_`, which tracks the number of subchannel wrappers per subchannel. This is used to determine when to create and remove channelz node linkage. - `subchannel_wrappers_`, which tracks the set of all subchannel wrappers across all subchannels. This was originally introduced back in grpc#20039 as part of propagating the health check service name to subchannels, but we switched to using a different approach for the health check service name in grpc#26441. However, in between those two, we started using this data structure for handling keepalive time in grpc#23313 -- which is actually somewhat inefficient, since we may wind up setting the keepalive time on the underlying subchannel more than once in the case where there is more than one subchannel wrapper per subchannel (which happens frequently during updates). This PR combines these two data structures into one: a map from subchannel to a set of subchannel wrappers for that subchannel. This is used both for channelz node updates and keepalive propagation -- and it's more efficient for the latter, because we can now update each subchannel exactly once. This also paves the way for a subsequent change that will be needed as part of the MAX_CONCURRENT_STREAMS design. Closes grpc#40880 COPYBARA_INTEGRATE_REVIEW=grpc#40880 from markdroth:client_channel_combine_subchannel_maps f289c20 PiperOrigin-RevId: 829602334
sreenithi
pushed a commit
to sreenithi/grpc
that referenced
this pull request
Nov 26, 2025
…c#40880) Currently, the client channel maintains two different data structures for tracking subchannels: - `subchannel_refcount_map_`, which tracks the number of subchannel wrappers per subchannel. This is used to determine when to create and remove channelz node linkage. - `subchannel_wrappers_`, which tracks the set of all subchannel wrappers across all subchannels. This was originally introduced back in grpc#20039 as part of propagating the health check service name to subchannels, but we switched to using a different approach for the health check service name in grpc#26441. However, in between those two, we started using this data structure for handling keepalive time in grpc#23313 -- which is actually somewhat inefficient, since we may wind up setting the keepalive time on the underlying subchannel more than once in the case where there is more than one subchannel wrapper per subchannel (which happens frequently during updates). This PR combines these two data structures into one: a map from subchannel to a set of subchannel wrappers for that subchannel. This is used both for channelz node updates and keepalive propagation -- and it's more efficient for the latter, because we can now update each subchannel exactly once. This also paves the way for a subsequent change that will be needed as part of the MAX_CONCURRENT_STREAMS design. Closes grpc#40880 COPYBARA_INTEGRATE_REVIEW=grpc#40880 from markdroth:client_channel_combine_subchannel_maps f289c20 PiperOrigin-RevId: 829602334
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Instead of storing the health check service name in the client channel code and updating all of the existing subchannel wrappers whenever it changes, we pass the health check service name through the LB policies via a channel arg. This avoids the need for synchronization in the client channel code.
Note that if the health check service name changes and existing subchannels report that the new name is unhealthy, we want the LB policies to stop sending them traffic. Previously, this happened automatically, because the channel updated the health check service name in the existing subchannel wrappers, so they would quickly start reporting TRANSIENT_FAILURE. However, now that the health check service name change is propagating through the LB policies via an update, this means that the new subchannel list will never report READY, and the current code would therefore never swap it in; it would just continue using the old subchannel list indefinitely, which is not what we want. So in order to make this work, I have changed both the pick_first and round_robin LB policies to swap in the new subchannel list if all of the subchannels are considered to be in TRANSIENT_FAILURE. (This change is something we were planning to do eventually anyway, to be consistent with our policy of always honoring what the control plane tells us to do, even if that causes the channel to go into TRANSIENT_FAILURE. That is also why I've changed the pick_first policy and not just the round_robin policy; the pick_first change is not actually directly related to the health check service name change, since pick_first does not support client-side health checking.)