[round robin] delegate to pick_first instead of creating subchannels directly#32692
Conversation
…nt_data_watcher_new
…nt_data_watcher_new
…nt_data_watcher_new
…nt_data_watcher_new
…nt_data_watcher_new
… LB policies (#32709) This paves the way for making pick_first the universal leaf policy (see #32692), which will be needed for the dualstack design. That change will require changing pick_first to see both the raw connectivity state and the health-checking connectivity state of a subchannel, so that we can enable health checking when pick_first is used underneath round_robin without actually changing the pick_first connectivity logic (currently, pick_first always disables health checking). To make it possible to do that, this PR moves the health checking code out of the subchannel and into a separate API using the same data-watcher mechanism that was added for ORCA OOB calls.
…_ejection_via_health_watch
| LoadBalancingPolicy::UpdateArgs update_args; | ||
| update_args.addresses.emplace().emplace_back(address); | ||
| update_args.args = child_args; | ||
| // TODO(roth): If the child reports a non-OK status with the update, |
There was a problem hiding this comment.
Should it at least be logged?
There was a problem hiding this comment.
In this situation, there's no point in logging it, because a non-OK status isn't actually a real failure; it's really just an indication of whether the resolver should consider the update a failure. We actually have a broader pattern of this issue that occurs in many places, as introduced by #30809. When we eventually find a good solution for this, we'll fix it everywhere it occurs.
In this particular case, it's actually currently impossible for this to return non-OK anyway, because we are creating one endpoint for each address, and the only case where pick_first will return non-OK is if the address list is empty. However, that will change when we add support for multiple addresses per endpoint, so I've added a TODO in round_robin.cc to make sure that we properly handle that case.
| // To use this, a petiole policy must define its own subclass of both | ||
| // EndpointList and EndpointList::Endpoint, like so: | ||
| /* | ||
| class MyEndpointList : public EndpointList { |
There was a problem hiding this comment.
nit: Commented out code (I believe linter will point it out too)
There was a problem hiding this comment.
This commented-out code is intentional; it's showing an example of how to use this API. I'm not aware of any linter that would consider this a problem.
…directly (grpc#32692) More work on the dualstack backend design: - Change round_robin to delegate to pick_first instead of creating subchannels directly. - Change pick_first such that when it is the child of a petiole policy, it will unconditionally start a health watch. - Change the client-side health checking code such that if client-side health checking is not enabled, it will return the subchannel's raw connectivity state. - As part of this, we introduce a new endpoint_list library to be used by petiole policies, which is intended to replace the existing subchannel_list library. The only policy that will still directly interact with subchannels is pick_first, so the relevant parts of the subchannel_list functionality have been copied directly into that policy. The subchannel_list library will be removed after all petiole policies are updated to delegate to pick_first.
…hannels directly (grpc#32692)" This reverts commit 27a778f.
This reverts the following PRs: grpc#32692 grpc#33087 grpc#33093 grpc#33427 grpc#33568 These changes seem to have introduced some flaky crashes. Reverting while I investigate.
This rolls forward only the pick_first changes from #32692, which were rolled back in #33718. Specifically: - Changes PF to use its own subchannel list implementation instead of using the subchannel_list library, since the latter will be going away with the dualstack changes. - As a result of no longer using the subchannel_list library, PF no longer needs to set the `GRPC_ARG_INHIBIT_HEALTH_CHECKING` channel arg. - Adds an option to start a health watch on the chosen subchannel, to be used in the future when pick_first is the child of a petiole policy. (Currently, this code is not actually called anywhere.)
…34222) Rolls forward part of the dualstack changes, mostly from #33427 and a little bit from #32692, both of which were reverted in #33718. Specifically: - For petiole policies, unconditionally start health watch on subchannels, even if client side health checking is not enabled; in this case, the health watch will report the subchannel's raw connectivity state. - Fix edge cases in health check reporting that occur when a watcher is started before the initial state is reported. - When client-side health checking fails, add the subchannel's address to the RPC failure status message. - Outlier detection now works only via the health checking watch, not via the raw connectivity state watch. - Remove now-unnecessary hack to ensure that outlier detection does not work for pick_first.
More work on the dualstack backend design: