Skip to content

Feature shuffleshard balancer#14655

Closed
cgetzen wants to merge 9 commits intoenvoyproxy:masterfrom
cgetzen-forks:feature-shuffleshard-balancer
Closed

Feature shuffleshard balancer#14655
cgetzen wants to merge 9 commits intoenvoyproxy:masterfrom
cgetzen-forks:feature-shuffleshard-balancer

Conversation

@cgetzen
Copy link
Copy Markdown

@cgetzen cgetzen commented Jan 11, 2021

Title: Shuffle Sharding to combinatorially isolate clients

Description:

Please read the initial post https://groups.google.com/g/envoy-dev/c/d6LyDJyqF58

Hi @mattklein123, thank you for the response! There's a lot to unpack there.
On the shuffle-shard math: You're right -- using shard_size=total_size/2 yields the worst case scenario. However, choosing more reasonable shard sizes still results in many shards (which is kind of the point, isolation!). I think the following are more reasonable shard sizes:
30 choose 5 = 142,506 shards
50 choose 3 = 19,600 shards
Should envoy impose a limit to how many hosts a shard can contain?
If having ~10^4 - 10^5 shards in memory is reasonable, I will turn off turn off cache eviction and potentially pre-calculate all shards.

How will shards be configured?
They are currently configured dynamically on chooseHost(..) when the shard doesn't exist in the cache.
The cache key is a combinatoric index (0 to nCk-1). The hosts each shard adds are also configured with this index through a "combo(i, n, k)" method, which finds the ith combination of n choose k. Maybe there is a more performant method?

How will shards be chosen?
Currently, the cache index is calculated with route.RouteAction.HashPolicy, by taking context->computeHashKey() % num_shards.

So it seems like a 2 layer approach with a mechanism to create shards, a mechanism to hash to shards, and then a 2nd tier LB in each shard would work out nicely?
In full agreement.

Some other thoughts that may spark better ideas:

  • Currently, when a host goes unhealthy, callbacks are run on all shards. If there is some way to get the list of hosts with changed statuses, a prefix-tree may help reduce the number of callbacks.
  • When the total number of upstreams increases or decreases, I believe we need to invalidate all shards.
  • Each shard currently holds a vector<uint32_t>* which is the output of "combo(..)". This is because the predicate isn't able to hold onto iterator memory from instantiation to "callback(..)". This is bad and I'd like to make it better.

Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
Signed-off-by: Charlie Getzen <charliegetzenlc@gmail.com>
@repokitteh-read-only
Copy link
Copy Markdown

Hi @cgetzen, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #14655 was opened by cgetzen.

see: more, trace.

@repokitteh-read-only repokitteh-read-only bot added api deps Approval required for changes to Envoy's external dependencies labels Jan 11, 2021
@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/envoy/.
API shepherd assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to api/envoy/.
CC @envoyproxy/dependency-shepherds: Your approval is needed for changes made to (bazel/.*repos.*\.bzl)|(bazel/dependency_imports\.bzl)|(api/bazel/.*\.bzl)|(.*/requirements\.txt)|(.*\.patch).

🐱

Caused by: #14655 was opened by cgetzen.

see: more, trace.

@mattklein123
Copy link
Copy Markdown
Member

mattklein123 commented Jan 12, 2021

@cgetzen can we please move this to an issue with a proposed design that we can iterate on before we go to code? From there it might be more efficient to also iterate in a gdoc as part of the issue (up to you where you want to start). A few quick comments:

If having ~10^4 - 10^5 shards in memory is reasonable, I will turn off turn off cache eviction and potentially pre-calculate all shards.

I don't see a huge issue here (though it really depends as always on a memory <-> CPU/locking/complexity), and I would definitely do pre-computation if possible in the MVP, so I would probably have the max shards / hosts per shard be configurable so the user can clearly understand what the memory implications are? Also, if you implement as a thread aware load balancer I think you can do the pre-computation once on the main thread and then share the LB on all the other threads like maglev/ring do.

How will shards be configured? Maybe there is a more performant method?

My preference would be to do away with caching, etc. and do pre-computation, unless we show this is unreasonable. Also, I have general questions on whether the choosing has to be perfect or it could be probabilistically close enough. Imagine for example the following theoretical implementations:

  1. If we don't need stable consistent hashing properties, I think everything could be implemented inline on the fly: a) compute shard hash, b) derive shard ID, c) use shard ID as a psuedo-random seed to generate the picks for members, d) use real random to pick the host index inside virtual shard.
  2. If we need more consistency, use a giant Maglev table to map hash into shard ID. Then proceed to b) above. This would provide consistency but random LB within a shard. Edit: For shard member consistency I think you probably then want a 2nd Maglev of just the hosts, and then use the shard ID as a seed into a hash function that would pick hosts, dealing with duplicates.
  3. Maglev + inner LBs that track state. This would give better LB but use more memory.
  4. Some type of caching, on the fly implementation as you listed out above.

All of ^ have trade-offs. cc @tonya11en also who is into this kind of thing. Anyway, I would suggest we move to an issue and writeup a doc to look at this in more detail. Interesting stuff!

@mattklein123
Copy link
Copy Markdown
Member

FWIW I just looked at the AWS reference code and I think https://github.com/awslabs/route53-infima/blob/master/src/main/java/com/amazonaws/services/route53/infima/SimpleSignatureShuffleSharder.java is basically my (1) above in spirit. But like I said there are a lot of trade-offs here depending on requirements.

@mattklein123
Copy link
Copy Markdown
Member

Going to close this for now in favor of the issue and design doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api deps Approval required for changes to Envoy's external dependencies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants