Skip to content

extend load balancer context interface to provide primary host#17242

Closed
wbpcode wants to merge 3 commits intoenvoyproxy:mainfrom
wbpcode:stateful-session-sticky
Closed

extend load balancer context interface to provide primary host#17242
wbpcode wants to merge 3 commits intoenvoyproxy:mainfrom
wbpcode:stateful-session-sticky

Conversation

@wbpcode
Copy link
Copy Markdown
Member

@wbpcode wbpcode commented Jul 6, 2021

Commit Message: extend load balancer context interface to provide primary host

Additional Description:

Extend load balancer context interface to provide primary host and allow the load balancer to skip the load balancing calculation and directly select the eligible primary host.

This is one of the most critical tasks related to #16698. After completing the work, we can complete #16698 by extending the specific implementation of LocalBalancerContext.

Risk Level: Medium.
Testing: Added.
Docs Changes: N/A.
Release Notes: N/A.

Signed-off-by: wbpcode <wbphub@live.com>
@mattklein123 mattklein123 self-assigned this Jul 6, 2021
Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. This is much too large of a change to regression prone components to not have more information attached to it. Please provide more information about the overall design, optimally in the form of a short google doc. Thank you!

/wait

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Jul 7, 2021

Thanks. @mattklein123
Here is a document about the background of the entire issue and the implementation design.

https://docs.google.com/document/d/1IU4b76AgOXijNa4sew1gfBfSiOMbZNiEt5Dhis8QpYg/edit?usp=sharing

In fact, this document already exists in #16698. This time I just added some specific implementation designs. I'm not a native speaker. So if the documentation is unclear, please feel free to point it.

Signed-off-by: wbpcode <wbphub@live.com>
@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Jul 8, 2021

/retest

@repokitteh-read-only
Copy link
Copy Markdown

Retrying Azure Pipelines:
Retried failed jobs in: envoy-presubmit

🐱

Caused by: a #17242 (comment) was created by @wbpcode.

see: more, trace.

@rgs1
Copy link
Copy Markdown
Member

rgs1 commented Jul 8, 2021

Sorry to jump late into this discussion, but it looks to me that the use case described in #16698 can be achieved with the Subset LB and the header-to-metadata filter (that's pretty much how we do something very similar). Happy to discuss this more via slack or in #16698.

@wbpcode
Copy link
Copy Markdown
Member Author

wbpcode commented Jul 8, 2021

Sorry to jump late into this discussion, but it looks to me that the use case described in #16698 can be achieved with the Subset LB and the header-to-metadata filter (that's pretty much how we do something very similar). Happy to discuss this more via slack or in #16698.

Thanks. If there is a suitable plan to achieve this demand, I will be happy to accept it. I think we can talk in detail on slack.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. At a high level I agree with the intention, but I would like to simplify and unify the implementation.

Let's start by doing a PR to only add the all_host_map into the central priority set and get it propagating. After that we can discuss the various LB changes in other PRs.

/wait

* health status of the host matches the expectation, the load balancer can bypass the load
* balancing algorithm and return the corresponding host directly.
*/
virtual absl::optional<ExpectedHost> primaryHostShouldSelected() const PURE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this function overrideHostToSelect(). I think that will be more clear. "primary" host I don't think is very helpful. Can we update this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course. 👌

Comment on lines +505 to +506
* string. If the PrioritySet does not provide the fast searching of host, this method returns
* nullptr.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to only make this work for EDS. We should make this work for all cluster types if we are going to do this. See my other comment below.

std::vector<LocalityWeightsMap> locality_weights_map_;
HostMap all_hosts_;

HostMapSharedPtr all_host_map_;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having this be in the EDS cluster only, I would recommend moving this into the cluster base class, or directly in the priority set. This way it will be used and populated by all cluster types (per my other comment). Additionally, if we do this, I think we can remove all of the special case code you added in cluster manager impl. It's just more data that should be updated and posted on updates. I don't fully see the value of all that new code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123
I added the postThreadLocalHostMapUpdate method to the clusterManager to ensure that all_host_map_ in the EDS Cluster can be synchronized to each worker thread after the update. When batchUpdate is used, postThreadLocalClusterUpdate cannot do this.
If you maintain all_host_map_ directly in PrioritySet, it is indeed possible to remove these codes. But it may bring some frequent HostMap copy problems (I'm not entirely sure).
But I agree that we can implement a general all_host_map first, and then continue. So we can continue to discuss this issue in the new PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it won't be too costly to maintain a map in the priority set and then update it when we update everything else. Let's try that first and optimize later if needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will prepare a new PR as soon as possible this week. 👌

Copy link
Copy Markdown
Member Author

@wbpcode wbpcode Jul 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 Here is a new sub PR #17290

@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 11, 2021
@github-actions
Copy link
Copy Markdown

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

stale stalebot believes this issue/PR has not been touched recently waiting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants