extend load balancer context interface to provide override host#17848
extend load balancer context interface to provide override host#17848snowp merged 8 commits intoenvoyproxy:mainfrom
Conversation
Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, flushing out some initial comments.
/wait
envoy/upstream/load_balancer.h
Outdated
| using ExpectedHost = std::pair<std::string, ExpectedHostStatus>; | ||
|
|
||
| /** | ||
| * Returns the host the load balancer shouled slected directly. If the expected host exists and |
There was a problem hiding this comment.
| * Returns the host the load balancer shouled slected directly. If the expected host exists and | |
| * Returns the host the load balancer should select directly. If the expected host exists and |
envoy/upstream/load_balancer.h
Outdated
| */ | ||
| virtual Network::TransportSocketOptionsConstSharedPtr upstreamTransportSocketOptions() const PURE; | ||
|
|
||
| using ExpectedHostStatus = uint32_t; |
There was a problem hiding this comment.
Why does this need to be uin32_t? Can't you just use the health status enum directly? I think I see that you did this because you want to allow multiple status as a logical OR? Is that right?
There was a problem hiding this comment.
Yes. For the logical OR. 👍
envoy/upstream/load_balancer.h
Outdated
| virtual Network::TransportSocketOptionsConstSharedPtr upstreamTransportSocketOptions() const PURE; | ||
|
|
||
| using ExpectedHostStatus = uint32_t; | ||
| using ExpectedHost = std::pair<std::string, ExpectedHostStatus>; |
There was a problem hiding this comment.
I would call this OverrideHost
There was a problem hiding this comment.
Ok 👌 . It will make the code more uniform.
| // ThreadAwareLoadBalancerBase inherits LoadBalancerBase, but in fact the `chooseHost` method | ||
| // should never be called. | ||
| HostConstSharedPtr chooseHost(LoadBalancerContext*) override { NOT_IMPLEMENTED_GCOVR_EXCL_LINE; } |
There was a problem hiding this comment.
I think we have gone wrong somewhere with the interfaces if we have to do this. Can you figure out a different way of expressing this hierarchy? I'm also confused why this is now needed given no interface changes that I can see?
There was a problem hiding this comment.
This is code is not actually not directly related to what the PR hopes to do.
I just happened to find that ThreadAwareLoadBalancerBase inherited LoadBalancerBase and its chooseHost method. But in fact, chooseHost of ThreadAwareLoadBalancerBase should never be used.
So this additional override method was added to ensure that no one would abuse ThreadAwareLoadBalancerBase.
There is no problem at all to remove this code, because chooseHostOnce can also achieve similar constraints. 🤔
| @@ -1,5 +1,6 @@ | |||
| #include "source/common/upstream/thread_aware_lb_impl.h" | |||
|
|
|||
| #include <atomic> | |||
There was a problem hiding this comment.
Thanks. This is added by the clangd. Maybe it's because I tried atomic_store before.
|
|
||
| HostConstSharedPtr host; | ||
| host = LoadBalancerBase::selectOverrideHost(cross_priority_host_map_.get(), context); | ||
| if (host != nullptr && !context->shouldSelectAnotherHost(*host)) { |
There was a problem hiding this comment.
I don't think mixing the override and the "should select another host" concept is worth the complexity. If someone sets an override host I would just use it.
There was a problem hiding this comment.
This design was originally designed to correctly select a new host when a request is retried. 🤔 But I think you are right. When retrying, LoadBalancerContext should set the override host to null.
source/common/upstream/subset_lb.cc
Outdated
| HostConstSharedPtr SubsetLoadBalancer::chooseHost(LoadBalancerContext* context) { | ||
| HostConstSharedPtr override_host = | ||
| LoadBalancerBase::selectOverrideHost(cross_priority_host_map_.get(), context); | ||
| if (override_host != nullptr && !context->shouldSelectAnotherHost(*override_host)) { |
There was a problem hiding this comment.
Same comment about shouldSelectAnotherHost
| return false; | ||
| } | ||
|
|
||
| LoadBalancerContext::ExpectedHostStatus LoadBalancerContextBase::createExpectedHostStatus( |
There was a problem hiding this comment.
If this is only used in test code, please move it to test code.
There was a problem hiding this comment.
Yep this is only be used in the test code for now. But it would be a useful util function in the next PR.
We can jut move it to test code temporary and move it back when we actually use it.
| HostConstSharedPtr host; | ||
|
|
||
| host = selectOverrideHost(cross_priority_host_map_.get(), context); | ||
| if (host != nullptr && !context->shouldSelectAnotherHost(*host)) { |
There was a problem hiding this comment.
Same comment about shouldSelectAnotherHost
| // This leads to the possibility of simultaneous reading and writing of cross_priority_host_map_ | ||
| // in different threads. For this reason, an additional mutex is necessary to guard | ||
| // cross_priority_host_map_. | ||
| absl::Mutex cross_priority_host_map_mutex_; |
There was a problem hiding this comment.
I'm not crazy about having this mutex be away from the underlying variable so we can't use thread annotations. One option here would be to not have cross_priority_host_map_ in the load balancer base and have it in all derived classes? What do you think? Another option I suppose would be to just move the mutex to the base class and have it always be accessed there. Not sure how often that would create a perf issue or not (maybe not really ever if using read/write mutex).
There was a problem hiding this comment.
Thanks. I will re-thinking about it. I currently prefer to place the cross_priority_host_map_ in derived classes. Because in addition to ThreadAwareLoadBalancerBase, no one else has to pay any performance overhead (or even a simple R/W Lock).
Signed-off-by: wbpcode <wbphub@live.com>
Signed-off-by: wbpcode <wbphub@live.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks, this LGTM with a small doc comment. Nice work. Please double check that you have test coverage on all of the new functionality across all of the different LB types. (You can check the coverage report in CI.)
@snowp can you take a pass also?
/wait
| */ | ||
| virtual Network::TransportSocketOptionsConstSharedPtr upstreamTransportSocketOptions() const PURE; | ||
|
|
||
| using OverrideHostStatus = uint32_t; |
There was a problem hiding this comment.
This still needs documentation on how the bitfield works. It's not clear from reading this.
Signed-off-by: wbpcode <wbphub@live.com>
|
/retest |
|
Retrying Azure Pipelines: |
Signed-off-by: wbpcode <wbphub@live.com>
mattklein123
left a comment
There was a problem hiding this comment.
Thanks LGTM pending @snowp review.
snowp
left a comment
There was a problem hiding this comment.
Thanks this looks pretty good to me, just a few comments
| } | ||
|
|
||
| auto host_iter = host_map->find(override_host.value().first); | ||
| HostConstSharedPtr host = host_iter != host_map->end() ? host_iter->second : nullptr; |
There was a problem hiding this comment.
I think doing an early return here and breaking up L549 into two if checks would make it a bit easier to follow
| if (host_map == nullptr) { | ||
| return nullptr; | ||
| } | ||
|
|
There was a problem hiding this comment.
When would this be nullptr? Don't we always have a host_map?
There was a problem hiding this comment.
- Before the first priority update, it is nullptr.
- The host maps of all sub-LBs of subset lb are nullptr.
So I think keeping this check will make the code safer.
There was a problem hiding this comment.
Could we document this via a comment somewhere? Would be good to let people reading the code understand the need for a null host_map, thanks!
| choosePriority(uint64_t hash, const HealthyLoad& healthy_per_priority_load, | ||
| const DegradedLoad& degraded_per_priority_load); | ||
|
|
||
| HostConstSharedPtr chooseHost(LoadBalancerContext* context) override; |
There was a problem hiding this comment.
Just so I understand, why did you move this down in the class hierarchy? Thanks
There was a problem hiding this comment.
This question is a bit more complicated.
Both ThreadAwareLoadBalancerBase and ZoneAwareLoadBalancerBase need a cross priority host map. But the use of the two is completely different. The former needs to ensure that the host map is thread-safe, while the latter does not need to consider this issue at all.
This led me to finally decide to add a member of cross_priority_host_map_ to ThreadAwareLoadBalancerBase and ZoneAwareLoadBalancerBase separately, instead of adding a member of cross_priority_host_map_ to the common base class LoadBalancerBase of the two.
But this brings a new problem, that is, the chooseHost method in LoadBalancerBase needs to access cross_priority_host_map_ to search for override host.
Although I know that the base class can access the members of the derived class through virtual functions, but there is no need to be so complicated.
The chooseHost method in LoadBalancerBase is only meaningful to ZoneAwareLoadBalancerBase and ZoneAwareLoadBalancerBase's derived classes, so there is no need to implement it in LoadBalancerBase.
I finally moved chooseHost to ZoneAwareLoadBalancerBase, so that it will not affect the function, but also can directly access cross_priority_host_map_ in chooseHost, and it is simpler and cleaner.
There was a problem hiding this comment.
Thanks for explaining! This makes sense to me
| HostConstSharedPtr host; | ||
| if (host = LoadBalancerContextBase::selectOverrideHost(cross_priority_host_map_.get(), context); |
There was a problem hiding this comment.
I would either initialize host outside the if-block, or move the declaration inside. Right now I'm not seeing much benefit to this pattern
| static HostConstSharedPtr selectOverrideHost(const HostMap* host_map, | ||
| LoadBalancerContext* context); |
There was a problem hiding this comment.
I wonder if it makes sense to have this be on the LoadBalancerBase, it just seems a tad bit odd to have a static function on a class that accepts a pointer to said class. Not a big deal if you prefer it here
There was a problem hiding this comment.
I haven't found out a better position to place these util functions. I have no particular insistence on this, I think we can move it as needed in the future.
| priority_set_.cross_priority_host_map_ = host_map; | ||
| host_set_.runCallbacks({}, {}); | ||
|
|
||
| // Expected host is not exist in the host map and then `chooseHostOnce` will be called. |
There was a problem hiding this comment.
I'd do this as // The expected host does not exist in the host map, therefore ...
| priority_set_.cross_priority_host_map_ = host_map; | ||
| host_set_.runCallbacks({}, {}); | ||
|
|
||
| // Host status does not match the expected host status and then `chooseHostOnce` will be called. |
|
|
||
| TEST(LoadBalancerContextBaseTest, LoadBalancerContextBaseTest) { | ||
| { | ||
| auto context = LoadBalancerContextBase(); |
There was a problem hiding this comment.
LoadBalancerContextBase context; would be more idiomatic here
| static constexpr uint32_t DegradedStatus = 1u << static_cast<size_t>(Host::Health::Degraded); | ||
| static constexpr uint32_t HealthyStatus = 1u << static_cast<size_t>(Host::Health::Healthy); | ||
|
|
||
| LoadBalancerContext::OverrideHostStatus createOverrideHostStatus( |
There was a problem hiding this comment.
Maybe I'm missing something but this doesn't seem to be used?
There was a problem hiding this comment.
It is used by some new added test case.
{
Protobuf::RepeatedPtrField<envoy::config::core::v3::HealthStatus> statuses;
statuses.Add(envoy::config::core::v3::HealthStatus::UNKNOWN);
statuses.Add(envoy::config::core::v3::HealthStatus::HEALTHY);
EXPECT_EQ(createOverrideHostStatus(statuses), HealthyStatus);
}
There was a problem hiding this comment.
Those tests seem to be testing createOverrideHostStatus itself, can we just remove it nothing is actually using it?
Signed-off-by: wbpcode <wbphub@live.com>
|
/retest |
|
Retrying Azure Pipelines: |
Signed-off-by: wbpcode <wbphub@live.com>
…host (envoyproxy#17848) Signed-off-by: wbpcode <wbphub@live.com> Signed-off-by: code <wbpcode@users.noreply.github.com>
Introduce a new stateful session extension. This is a change after #17848 #17290. In #17290, we added a new cross-priority host map for fast host searching. In #17848, we extend `LoadBalancerContext` interface to provide an override host and to select the upstream host by the override host. Finally, in this change, we expand a new API to allow users to extract the state of the session from the request and change the result of load balancing by setting the override host value. Related doc: https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing. Risk Level: Medium Testing: Added Docs Changes: Added Release Notes: Added Platform-Specific Features: N/A. Signed-off-by: wbpcode <wbphub@live.com>
Introduce a new stateful session extension. This is a change after envoyproxy#17848 envoyproxy#17290. In envoyproxy#17290, we added a new cross-priority host map for fast host searching. In envoyproxy#17848, we extend `LoadBalancerContext` interface to provide an override host and to select the upstream host by the override host. Finally, in this change, we expand a new API to allow users to extract the state of the session from the request and change the result of load balancing by setting the override host value. Related doc: https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing. Risk Level: Medium Testing: Added Docs Changes: Added Release Notes: Added Platform-Specific Features: N/A. Signed-off-by: wbpcode <wbphub@live.com> Signed-off-by: Josh Perry <josh.perry@mx.com>
Signed-off-by: wbpcode wbphub@live.com
Commit Message: extend load balancer context interface to provide override host
Additional Description:
Continuation of #17290. Part of #16698. In the #17290, we implement a cross priority host map for fast host searching and add it to all types of Cluster. This PR extends load balancer context interface to provider override host. Then the load balancer can select host based on this override host directly and bypass algorithm selection.
Get more detailed design of stateful session sticky by ref https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing
Risk Level: Mid.
Testing: Add.
Docs Changes: N/A.
Release Notes: N/A.
Platform Specific Features: N/A.