Enable load balancing policy extensions#17400
Conversation
Enables LOAD_BALANCING_POLICY_CONFIG enum value in LbPolicy and supports typed load balancers specified in load_balancing_policy. Continues work done by Charlie Getzen <charliegetzenlc@gmail.com> in PR envoyproxy#15827. Custom load balancers specified by load_balancing_policy are created as implementations of ThreadAwareLoadBalancer. Thread-local load balancers can be implemented as thread-aware load balancers that contain no logic at the thread-aware level, i.e. the purpose of the thread-aware LB is solely to contain the factory used to instantiate the thread-local LBs. (In the future it might be appropriate to provide a construct that abstracts away thread-aware aspects of ThreadAwareLoadBalancer for LBs that don't need to be thread-aware.) A cluster that uses LOAD_BALANCING_POLICY_CONFIG may not also set a subset LB configuration. If the load balancer type makes use of subsetting, it should include a subset configuration in its own configuration message. Future work on load balancing extensions should include moving the subset LB to use load balancing extensions. Similarly, a cluster that uses LOAD_BALANCING_POLICY_CONFIG may not set the CommonLbConfig, and it is not passed into load balancer creation (mostly owing to its dubious applicability as a top level configuration message to hierarchical load balancing policy). If the load balancer type makes use of the CommonLbConfig, it should include a CommonLbConfig in the configuration message for the load balancing policy. Considerations for migration of existing load balancers: - pieces of the ThreadAwareLoadBalancerBase implementation are specific to the built-in hashing load balancers and should be moved into a base class specifically for hashing load balancers. As it stands, custom load balancing policies are required to implement a createLoadBalancer() method even if the architecture of the LB policy does not require a hashing load balancer. I think we would also benefit from disentangling ThreadAwareLoadBalancerBase from LoadBalancerBase, as the former never actually does any host picking. - as we convert existing thread-local load balancers to thread-aware load balancers, new local LBs will be re-created upon membership changes. We should provide a mechanism allowing load balancers to control whether this rebuild should occur, e.g. a callback that calls create() for thread-aware load balancers by default, which can be overridden to do nothing for thread-local LBs. Signed-off-by: Eugene Chan <eugenechan@google.com>
|
/assign markdroth |
Signed-off-by: Eugene Chan <eugenechan@google.com>
Signed-off-by: Eugene Chan <eugenechan@google.com>
|
I'll let those who are more familiar with the Envoy code review this in detail, but at a high level, the functionality describe here looks great, and the API changes look good as well. Thanks for doing this! |
|
Probably makes sense for @lizan as a non-Googler for multi org policy reasons. |
|
@snowp may want to review as well. |
|
OK I will unassign myself. Let me know if you want me to look at anything in particular. |
| const auto& lb_policy = | ||
| std::find_if(config.load_balancing_policy().policies().begin(), | ||
| config.load_balancing_policy().policies().end(), | ||
| [](envoy::config::cluster::v3::LoadBalancingPolicy_Policy policy) { | ||
| return Registry::FactoryRegistry<TypedLoadBalancerFactory>::getFactory( | ||
| policy.name()) != nullptr; | ||
| }); | ||
|
|
||
| if (lb_policy == config.load_balancing_policy().policies().end()) { | ||
| throw EnvoyException(fmt::format( | ||
| "Didn't find a registered load balancer factory implementation for cluster: '{}'", | ||
| name_)); | ||
| } |
There was a problem hiding this comment.
This might not needed when we use getAndCheckFactoryByName appropriately.
There was a problem hiding this comment.
I don't think the cluster manager should be responsible for picking the LB policy out of load_balancing_policies. IMO it's better if the cluster info does this instead - then it knows which policy was picked, and doesn't have to carry around config for any policies that weren't picked.
There was a problem hiding this comment.
Why we want to do a) check whether the lb policy factory registered , and b) create lb policy from factory in two different places? That doesn't make sense either.
You can pick LB policy in ClusterManagerImpl and store a reference to the lb policy factory in a member in ClusterInfoImpl, and just use that in cluster manager.
There was a problem hiding this comment.
I think I'm missing an important detail somewhere... I'm under the impression that we can't store the factory we've picked from the ClusterManagerImpl, since info() returns a const pointer. By the time the cluster manager picks the LB policy it's too late to write what we picked into the ClusterInfoImpl. That's why I was hoping to pick the policy in the ClusterInfoImpl constructor instead.
This is mostly a result of wanting to be able to write a unit test that verifies that the set of policies { 1. policy with no registered factory, 2. policy with a registered factory } will result in the known policy being picked. It seems to me this requires that the ClusterInfoImpl to know which policy was picked, and hence to do the picking itself. I suppose this problem disappears if we don't bother with that bit of the test but it seems to me like a reasonably important behaviour to ensure is correct here.
Arguably a) is not so much about checking whether the factory is registered as much as it is about actually picking the policy - which happens to guarantee as a postcondition that the chosen factory is registered by definition. So I'm not how problematic it is to have policy picking in the ClusterInfoImpl constructor and the actual LB instantiation in the cluster manager instead. What do you think?
There was a problem hiding this comment.
I think I'm missing an important detail somewhere... I'm under the impression that we can't store the factory we've picked from the
ClusterManagerImpl, sinceinfo()returns aconstpointer. By the time the cluster manager picks the LB policy it's too late to write what we picked into theClusterInfoImpl. That's why I was hoping to pick the policy in theClusterInfoImplconstructor instead.
I meant you can implement lb_policy_factory() in ClusterInfoImpl to return the reference to the factory. My point here is to reduce manual check config and throw EnvoyException to make the error message more consistent and easy to evolve extension API in general, that's why I pointed to just use TypedExtensionConfig in proto part as well.
Signed-off-by: Eugene Chan <eugenechan@google.com>
b830333 to
13dc8db
Compare
168fd12 to
6953110
Compare
Removes LoadBalancingPolicy.Policy type from v4alpha. Changes the type of an unimplemented field. Signed-off-by: Eugene Chan <eugenechan@google.com>
6953110 to
f362afd
Compare
Signed-off-by: Eugene Chan <eugenechan@google.com>
| // Deprecated and replaced by TypedExtensionConfig. | ||
| message Policy { | ||
| option deprecated = true; |
There was a problem hiding this comment.
Is this referred anywhere? If no just delete?
There was a problem hiding this comment.
I was having trouble getting the presubmits to pass when I just deleted, but I didn't realize I needed to deprecate the unused v2 message as well. Done (for the parts of Policy that are removed).
| const auto& lb_policy = | ||
| std::find_if(config.load_balancing_policy().policies().begin(), | ||
| config.load_balancing_policy().policies().end(), | ||
| [](envoy::config::cluster::v3::LoadBalancingPolicy_Policy policy) { | ||
| return Registry::FactoryRegistry<TypedLoadBalancerFactory>::getFactory( | ||
| policy.name()) != nullptr; | ||
| }); | ||
|
|
||
| if (lb_policy == config.load_balancing_policy().policies().end()) { | ||
| throw EnvoyException(fmt::format( | ||
| "Didn't find a registered load balancer factory implementation for cluster: '{}'", | ||
| name_)); | ||
| } |
There was a problem hiding this comment.
I think I'm missing an important detail somewhere... I'm under the impression that we can't store the factory we've picked from the
ClusterManagerImpl, sinceinfo()returns aconstpointer. By the time the cluster manager picks the LB policy it's too late to write what we picked into theClusterInfoImpl. That's why I was hoping to pick the policy in theClusterInfoImplconstructor instead.
I meant you can implement lb_policy_factory() in ClusterInfoImpl to return the reference to the factory. My point here is to reduce manual check config and throw EnvoyException to make the error message more consistent and easy to evolve extension API in general, that's why I pointed to just use TypedExtensionConfig in proto part as well.
| config.load_balancing_policy().policies().end(), | ||
| [](const envoy::config::core::v3::TypedExtensionConfig& policy) { | ||
| return Registry::FactoryRegistry<TypedLoadBalancerFactory>::getFactory( | ||
| policy.name()) != nullptr; |
There was a problem hiding this comment.
We no longer use name in extension config to identify which factory should be used, it is determined by the type_url in typed_config, so this you should use Config::Utility::getAndCheckFactory() and pass the whole message instead.
There was a problem hiding this comment.
Done, thanks.
| message LoadBalancingPolicy { | ||
| option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.LoadBalancingPolicy"; | ||
|
|
||
| // Deprecated and replaced by TypedExtensionConfig. |
There was a problem hiding this comment.
Instead of deprecating this whole message, let's deprecate the name and typed_config fields and add the new TypedExtensionConfig field inside of this message. That way, if we need to add a new field on a per-policy basis in the future, we'll have somewhere to put it.
There was a problem hiding this comment.
That seems to be an anti-pattern, what field do we expect? In general a per-policy field should be added inside the message packed into Any.
There was a problem hiding this comment.
If it's a field only for one individual policy, that would make sense. But if we discover the need to add another field that we want to be set for all policies, then this wrapper would provide a place to put that.
There was a problem hiding this comment.
Could we put that into CommonLbConfig then, and include the message in the policy's message? Or is the idea that policies would need to set this field whether they use the common LB config or not?
The verbosity of load_balancing_policy.policy.typed_extension_config.typed_config.foo seems a bit excessive. Is it something we could change if/when we actually have such a common field?
There was a problem hiding this comment.
Not all policies will include CommonLbConfig in their config, and the idea here is to have a way to add the field for any policy -- e.g., if we wanted to add something to express the control flow of how the client picks which policy to use.
We could make this change later, but it would be a breaking change at that point. The idea here is to future-proof a bit.
Signed-off-by: Eugene Chan <eugenechan@google.com>
|
PTAL, thanks! |
api/envoy/api/v2/cluster.proto
Outdated
| message Policy { | ||
| // Required. The name of the LB policy. | ||
| string name = 1; | ||
| string name = 1 [deprecated = true]; |
There was a problem hiding this comment.
I don't think this needs to be deprecated in v2. It doesn't really make sense to do this unless we're also adding the new field to v2, and I don't see any need for that.
Instead, these fields should be kept in v3 and marked as deprecated there.
There was a problem hiding this comment.
Thanks! The API changes look great here, but I'll let Lizan approve.
Signed-off-by: Eugene Chan <eugenechan@google.com>
lizan
left a comment
There was a problem hiding this comment.
Thanks this is on the right track.
| string name = 1 [deprecated = true, (envoy.annotations.deprecated_at_minor_version) = "3.0"]; | ||
|
|
||
| google.protobuf.Any typed_config = 3; | ||
| google.protobuf.Any typed_config = 3 | ||
| [deprecated = true, (envoy.annotations.deprecated_at_minor_version) = "3.0"]; |
There was a problem hiding this comment.
I thought we can delete them instead of deprecating since the whole message was not-implemented-hide before, and the implementation doesn't support this either.
There was a problem hiding this comment.
I'd be okay with fully removing these fields too, unless that causes problems. But I think there's very little cost to leaving them here and marking them deprecated.
There was a problem hiding this comment.
Hmm. I tried to remove the whole Policy message from v3 only earlier and it failed the presubmit because it was expecting some v3 message to correspond to the v2 version of the same message. (That's why I ended up deprecating v2 fields in an earlier commit, which I've since undone.)
Removing these fields seems to be fine, though, so I've done that.
There was a problem hiding this comment.
Thanks. While there's little cost to leaving them here, but then the doc will be confusing as we never use those field in implementation.
Signed-off-by: Eugene Chan <eugenechan@google.com>
|
Retrying Azure Pipelines: |
Signed-off-by: Eugene Chan eugenechan@google.com
Commit Message: Enable load balancing policy extensions
Additional Description:
Enables
LOAD_BALANCING_POLICY_CONFIGenum value inLbPolicyand supports typed load balancers specified inload_balancing_policy. Continues work done by Charlie Getzen charliegetzenlc@gmail.com in #15827.Custom load balancers specified by
load_balancing_policyare created as implementations ofThreadAwareLoadBalancer. Thread-local load balancers can be implemented as thread-aware load balancers that contain no logic at the thread-aware level, i.e. the purpose of the thread-aware LB is solely to contain the factory used to instantiate the thread-local LBs. (In the future it might be appropriate to provide a construct that abstracts away thread-aware aspects ofThreadAwareLoadBalancerfor LBs that don't need to be thread-aware.)A cluster that uses
LOAD_BALANCING_POLICY_CONFIGmay not also set a subset LB configuration. If the load balancer type makes use of subsetting, it should include a subset configuration in its own configuration message. Future work on load balancing extensions should include moving the subset LB to use load balancing extensions.Similarly, a cluster that uses
LOAD_BALANCING_POLICY_CONFIGmay not set theCommonLbConfig, and it is not passed into load balancer creation (mostly owing to its dubious applicability as a top level configuration message to hierarchical load balancing policy). If the load balancer type makes use of theCommonLbConfig, it should include aCommonLbConfigin the configuration message for the load balancing policy.Considerations for migration of existing load balancers:
pieces of the
ThreadAwareLoadBalancerBaseimplementation are specific to the built-in hashing load balancers and should be moved into a base class specifically for hashing load balancers. As it stands, custom load balancing policies are required to implement acreateLoadBalancer()method even if the architecture of the LB policy does not require a hashing load balancer. I think we would also benefit from disentanglingThreadAwareLoadBalancerBasefromLoadBalancerBase, as the former never actually does any host picking.as we convert existing thread-local load balancers to thread-aware load balancers, new local LBs will be re-created upon membership changes. We should provide a mechanism allowing load balancers to control whether this rebuild should occur, e.g. a callback that calls
create()for thread-aware LBs by default, which can be overridden to do nothing for thread-local LBs.Risk Level: low
Testing: brought up a cluster with a custom load balancer specified by
load_balancing_policy; new unit tests includedDocs Changes: n/a
Release Notes: Enable load balancing policy extensions
Platform Specific Features: n/a
Fixes #5598