Skip to content

Maglev Shuffle Shard#14804

Closed
cgetzen wants to merge 3 commits intoenvoyproxy:mainfrom
cgetzen-forks:feature-shuffleshard-maglev
Closed

Maglev Shuffle Shard#14804
cgetzen wants to merge 3 commits intoenvoyproxy:mainfrom
cgetzen-forks:feature-shuffleshard-maglev

Conversation

@cgetzen
Copy link
Copy Markdown

@cgetzen cgetzen commented Jan 23, 2021

Draft

Issue: #14663

This implements option 2 of the google doc.

Some choices that were made:

  • Shard generation could be done with [hash, hash+1, ...], or seeding random with (hash % table_size) and using [hash, hash+rand(), ...] or seeding random with hash and using [hash, hash+rand(), ...]. This implementation uses the third option.
  • Turned off priority_set callbacks when shuffle sharding is in place. Without turning this off, a client can make their shard unhealthy and slowly consume all backends. Are there other implications to turning this off?
  • No shard state. This limits the inner LB method to random or least requests, and limits host weight to only the maglev table.

@repokitteh-read-only
Copy link
Copy Markdown

Hi @cgetzen, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #14804 was opened by cgetzen.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
API shepherd assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.

🐱

Caused by: #14804 was opened by cgetzen.

see: more, trace.

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
@cgetzen cgetzen force-pushed the feature-shuffleshard-maglev branch 4 times, most recently from 161727d to 1bbde02 Compare January 25, 2021 23:48
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
@cgetzen cgetzen force-pushed the feature-shuffleshard-maglev branch from 1bbde02 to 0e2e99c Compare January 26, 2021 01:13
@cgetzen
Copy link
Copy Markdown
Author

cgetzen commented Jan 26, 2021

Hi @mattklein123 , want to make you aware of this one. If it gets a tentative 👍 I will work on tests and cleanup.

@mattklein123 mattklein123 self-assigned this Jan 26, 2021
@mattklein123 mattklein123 added the no stalebot Disables stalebot from closing an issue label Jan 26, 2021
@mattklein123
Copy link
Copy Markdown
Member

FYI I'm on paternity leave. I will look but it will be delayed. Cc @wgallagher who might be able to look.

Copy link
Copy Markdown
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. A few comments to get started.

/wait

option (udpa.annotations.versioning).previous_message_type =
"envoy.api.v2.Cluster.CommonLbConfig.ConsistentHashingLbConfig";

enum LbPolicy {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting out with some API comments.

  1. We should implement an entirely new load balancer for this as an extension. This will mean actually implementing this enum type which is not yet implemented:

    // [#not-implemented-hide:] Use the new :ref:`load_balancing_policy
    // <envoy_api_field_config.cluster.v3.Cluster.load_balancing_policy>` field to determine the LB policy.
    // [#next-major-version: In the v3 API, we should consider deprecating the lb_policy field
    // and instead using the new load_balancing_policy field as the one and only mechanism for
    // configuring this.]
    LOAD_BALANCING_POLICY_CONFIG = 7;
    (This will mean also resolving Load balancer extensibility #5598)

  2. Once we have a new extension LB for shuffle sharding, we can have an independent configuration for that load balancer. To implement I recommend sharing the maglev code as a base class in both load balances, but not substantially altering the existing maglev/ring hash code.

@cgetzen
Copy link
Copy Markdown
Author

cgetzen commented Feb 14, 2021

Thanks for the feedback :)
After talking with Bill, we're thinking that including dimensions (e.g. ability to balance over availability zones) will be harder with this implementation, because we'd need a maglev table for each dimension.

@mattklein123 mattklein123 removed the no stalebot Disables stalebot from closing an issue label Mar 14, 2021
@github-actions
Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 13, 2021
@github-actions
Copy link
Copy Markdown

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@github-actions github-actions bot closed this Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api stale stalebot believes this issue/PR has not been touched recently waiting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants