Feature shuffleshard balancer by cgetzen · Pull Request #14655 · envoyproxy/envoy

cgetzen · 2021-01-11T23:56:56Z

Title: Shuffle Sharding to combinatorially isolate clients

Description:

Please read the initial post https://groups.google.com/g/envoy-dev/c/d6LyDJyqF58

Hi @mattklein123, thank you for the response! There's a lot to unpack there.
On the shuffle-shard math: You're right -- using shard_size=total_size/2 yields the worst case scenario. However, choosing more reasonable shard sizes still results in many shards (which is kind of the point, isolation!). I think the following are more reasonable shard sizes:
30 choose 5 = 142,506 shards
50 choose 3 = 19,600 shards
Should envoy impose a limit to how many hosts a shard can contain?
If having ~10^4 - 10^5 shards in memory is reasonable, I will turn off turn off cache eviction and potentially pre-calculate all shards.

How will shards be configured?
They are currently configured dynamically on chooseHost(..) when the shard doesn't exist in the cache.
The cache key is a combinatoric index (0 to nCk-1). The hosts each shard adds are also configured with this index through a "combo(i, n, k)" method, which finds the ith combination of n choose k. Maybe there is a more performant method?

How will shards be chosen?
Currently, the cache index is calculated with route.RouteAction.HashPolicy, by taking context->computeHashKey() % num_shards.

So it seems like a 2 layer approach with a mechanism to create shards, a mechanism to hash to shards, and then a 2nd tier LB in each shard would work out nicely?
In full agreement.

Some other thoughts that may spark better ideas:

Currently, when a host goes unhealthy, callbacks are run on all shards. If there is some way to get the list of hosts with changed statuses, a prefix-tree may help reduce the number of callbacks.
When the total number of upstreams increases or decreases, I believe we need to invalidate all shards.
Each shard currently holds a vector<uint32_t>* which is the output of "combo(..)". This is because the predicate isn't able to hold onto iterator memory from instantiation to "callback(..)". This is bad and I'd like to make it better.

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

repokitteh-read-only · 2021-01-11T23:56:59Z

Hi @cgetzen, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #14655 was opened by cgetzen.

see: more, trace.

repokitteh-read-only · 2021-01-11T23:57:04Z

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
API shepherd assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).

🐱

Caused by: #14655 was opened by cgetzen.

see: more, trace.

mattklein123 · 2021-01-12T00:57:10Z

@cgetzen can we please move this to an issue with a proposed design that we can iterate on before we go to code? From there it might be more efficient to also iterate in a gdoc as part of the issue (up to you where you want to start). A few quick comments:

If having ~10^4 - 10^5 shards in memory is reasonable, I will turn off turn off cache eviction and potentially pre-calculate all shards.

I don't see a huge issue here (though it really depends as always on a memory <-> CPU/locking/complexity), and I would definitely do pre-computation if possible in the MVP, so I would probably have the max shards / hosts per shard be configurable so the user can clearly understand what the memory implications are? Also, if you implement as a thread aware load balancer I think you can do the pre-computation once on the main thread and then share the LB on all the other threads like maglev/ring do.

How will shards be configured? Maybe there is a more performant method?

My preference would be to do away with caching, etc. and do pre-computation, unless we show this is unreasonable. Also, I have general questions on whether the choosing has to be perfect or it could be probabilistically close enough. Imagine for example the following theoretical implementations:

If we don't need stable consistent hashing properties, I think everything could be implemented inline on the fly: a) compute shard hash, b) derive shard ID, c) use shard ID as a psuedo-random seed to generate the picks for members, d) use real random to pick the host index inside virtual shard.
If we need more consistency, use a giant Maglev table to map hash into shard ID. Then proceed to b) above. This would provide consistency but random LB within a shard. Edit: For shard member consistency I think you probably then want a 2nd Maglev of just the hosts, and then use the shard ID as a seed into a hash function that would pick hosts, dealing with duplicates.
Maglev + inner LBs that track state. This would give better LB but use more memory.
Some type of caching, on the fly implementation as you listed out above.

All of ^ have trade-offs. cc @tonya11en also who is into this kind of thing. Anyway, I would suggest we move to an issue and writeup a doc to look at this in more detail. Interesting stuff!

mattklein123 · 2021-01-12T01:24:15Z

FWIW I just looked at the AWS reference code and I think https://github.com/awslabs/route53-infima/blob/master/src/main/java/com/amazonaws/services/route53/infima/SimpleSignatureShuffleSharder.java is basically my (1) above in spirit. But like I said there are a lot of trade-offs here depending on requirements.

mattklein123 · 2021-01-12T17:53:27Z

Going to close this for now in favor of the issue and design doc.

cgetzen added 9 commits January 11, 2021 17:54

Copy working subset loadbalancer

b8f1248

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

working shuffle shards

e12a794

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

simplify interface, and comment

2e412d2

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

move to its own file

af33e35

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

cache

2255be2

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

Unhealthy host updates

08babfe

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

cleanup

491ecd7

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

current tests

f9a3a13

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

names, using hash policy

b9ac184

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>

repokitteh-read-only bot added api deps Approval required for changes to Envoy's external dependencies labels Jan 11, 2021

repokitteh-read-only bot assigned markdroth Jan 11, 2021

mattklein123 self-assigned this Jan 12, 2021

cgetzen mentioned this pull request Jan 12, 2021

Shuffle Shard Load Balancer Proposal #14663

Open

mattklein123 closed this Jan 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature shuffleshard balancer#14655

Feature shuffleshard balancer#14655
cgetzen wants to merge 9 commits intoenvoyproxy:masterfrom
cgetzen-forks:feature-shuffleshard-balancer

cgetzen commented Jan 11, 2021

Uh oh!

repokitteh-read-only bot commented Jan 11, 2021

Uh oh!

repokitteh-read-only bot commented Jan 11, 2021

Uh oh!

mattklein123 commented Jan 12, 2021 •

edited

Loading

Uh oh!

mattklein123 commented Jan 12, 2021

Uh oh!

mattklein123 commented Jan 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cgetzen commented Jan 11, 2021

Uh oh!

repokitteh-read-only bot commented Jan 11, 2021

Uh oh!

repokitteh-read-only bot commented Jan 11, 2021

Uh oh!

mattklein123 commented Jan 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattklein123 commented Jan 12, 2021

Uh oh!

mattklein123 commented Jan 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mattklein123 commented Jan 12, 2021 •

edited

Loading