Skip to content

pass health check service name through LB policies via a channel arg#26441

Merged
markdroth merged 13 commits intogrpc:masterfrom
markdroth:client_channel_health_check_service_name_in_channel_args
Oct 7, 2021
Merged

pass health check service name through LB policies via a channel arg#26441
markdroth merged 13 commits intogrpc:masterfrom
markdroth:client_channel_health_check_service_name_in_channel_args

Conversation

@markdroth
Copy link
Copy Markdown
Member

@markdroth markdroth commented Jun 7, 2021

Instead of storing the health check service name in the client channel code and updating all of the existing subchannel wrappers whenever it changes, we pass the health check service name through the LB policies via a channel arg. This avoids the need for synchronization in the client channel code.

Note that if the health check service name changes and existing subchannels report that the new name is unhealthy, we want the LB policies to stop sending them traffic. Previously, this happened automatically, because the channel updated the health check service name in the existing subchannel wrappers, so they would quickly start reporting TRANSIENT_FAILURE. However, now that the health check service name change is propagating through the LB policies via an update, this means that the new subchannel list will never report READY, and the current code would therefore never swap it in; it would just continue using the old subchannel list indefinitely, which is not what we want. So in order to make this work, I have changed both the pick_first and round_robin LB policies to swap in the new subchannel list if all of the subchannels are considered to be in TRANSIENT_FAILURE. (This change is something we were planning to do eventually anyway, to be consistent with our policy of always honoring what the control plane tells us to do, even if that causes the channel to go into TRANSIENT_FAILURE. That is also why I've changed the pick_first policy and not just the round_robin policy; the pick_first change is not actually directly related to the health check service name change, since pick_first does not support client-side health checking.)

@markdroth markdroth added the release notes: no Indicates if PR should not be in release notes label Jun 7, 2021
@stale
Copy link
Copy Markdown

stale bot commented Sep 6, 2021

This issue/PR has been automatically marked as stale because it has not had any update (including commits, comments, labels, milestones, etc) for 30 days. It will be closed automatically if no further update occurs in 7 day. Thank you for your contributions!

@markdroth markdroth requested a review from donnadionne October 6, 2021 19:00
@markdroth markdroth marked this pull request as ready for review October 6, 2021 19:00
@markdroth
Copy link
Copy Markdown
Member Author

The test failures all look like infrastructure issues.

@markdroth markdroth merged commit cb8dafa into grpc:master Oct 7, 2021
@markdroth markdroth deleted the client_channel_health_check_service_name_in_channel_args branch October 7, 2021 17:15
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Oct 8, 2021
copybara-service bot pushed a commit that referenced this pull request Nov 7, 2025
)

Currently, the client channel maintains two different data structures for tracking subchannels:
- `subchannel_refcount_map_`, which tracks the number of subchannel wrappers per subchannel.  This is used to determine when to create and remove channelz node linkage.
- `subchannel_wrappers_`, which tracks the set of all subchannel wrappers across all subchannels.  This was originally introduced back in #20039 as part of propagating the health check service name to subchannels, but we switched to using a different approach for the health check service name in #26441.  However, in between those two, we started using this data structure for handling keepalive time in #23313 -- which is actually somewhat inefficient, since we may wind up setting the keepalive time on the underlying subchannel more than once in the case where there is more than one subchannel wrapper per subchannel (which happens frequently during updates).

This PR combines these two data structures into one: a map from subchannel to a set of subchannel wrappers for that subchannel.  This is used both for channelz node updates and keepalive propagation -- and it's more efficient for the latter, because we can now update each subchannel exactly once.

This also paves the way for a subsequent change that will be needed as part of the MAX_CONCURRENT_STREAMS design.

Closes #40880

COPYBARA_INTEGRATE_REVIEW=#40880 from markdroth:client_channel_combine_subchannel_maps f289c20
PiperOrigin-RevId: 829602334
yuanweiz pushed a commit to yuanweiz/grpc that referenced this pull request Nov 12, 2025
…c#40880)

Currently, the client channel maintains two different data structures for tracking subchannels:
- `subchannel_refcount_map_`, which tracks the number of subchannel wrappers per subchannel.  This is used to determine when to create and remove channelz node linkage.
- `subchannel_wrappers_`, which tracks the set of all subchannel wrappers across all subchannels.  This was originally introduced back in grpc#20039 as part of propagating the health check service name to subchannels, but we switched to using a different approach for the health check service name in grpc#26441.  However, in between those two, we started using this data structure for handling keepalive time in grpc#23313 -- which is actually somewhat inefficient, since we may wind up setting the keepalive time on the underlying subchannel more than once in the case where there is more than one subchannel wrapper per subchannel (which happens frequently during updates).

This PR combines these two data structures into one: a map from subchannel to a set of subchannel wrappers for that subchannel.  This is used both for channelz node updates and keepalive propagation -- and it's more efficient for the latter, because we can now update each subchannel exactly once.

This also paves the way for a subsequent change that will be needed as part of the MAX_CONCURRENT_STREAMS design.

Closes grpc#40880

COPYBARA_INTEGRATE_REVIEW=grpc#40880 from markdroth:client_channel_combine_subchannel_maps f289c20
PiperOrigin-RevId: 829602334
sreenithi pushed a commit to sreenithi/grpc that referenced this pull request Nov 26, 2025
…c#40880)

Currently, the client channel maintains two different data structures for tracking subchannels:
- `subchannel_refcount_map_`, which tracks the number of subchannel wrappers per subchannel.  This is used to determine when to create and remove channelz node linkage.
- `subchannel_wrappers_`, which tracks the set of all subchannel wrappers across all subchannels.  This was originally introduced back in grpc#20039 as part of propagating the health check service name to subchannels, but we switched to using a different approach for the health check service name in grpc#26441.  However, in between those two, we started using this data structure for handling keepalive time in grpc#23313 -- which is actually somewhat inefficient, since we may wind up setting the keepalive time on the underlying subchannel more than once in the case where there is more than one subchannel wrapper per subchannel (which happens frequently during updates).

This PR combines these two data structures into one: a map from subchannel to a set of subchannel wrappers for that subchannel.  This is used both for channelz node updates and keepalive propagation -- and it's more efficient for the latter, because we can now update each subchannel exactly once.

This also paves the way for a subsequent change that will be needed as part of the MAX_CONCURRENT_STREAMS design.

Closes grpc#40880

COPYBARA_INTEGRATE_REVIEW=grpc#40880 from markdroth:client_channel_combine_subchannel_maps f289c20
PiperOrigin-RevId: 829602334
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

imported Specifies if the PR has been imported to the internal repository lang/core release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants