-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Feature Request: configurable number of digits in generated shard ranges #15744
Description
Feature Description
key.GenerateShardRanges uses 2 hex digits in shard ranges when there are 256 or fewer shards.
Lines 369 to 384 in f11de06
| func GenerateShardRanges(shards int) ([]string, error) { | |
| var format string | |
| var maxShards int | |
| switch { | |
| case shards <= 0: | |
| return nil, errors.New("shards must be greater than zero") | |
| case shards <= 256: | |
| format = "%02x" | |
| maxShards = 256 | |
| case shards <= 65536: | |
| format = "%04x" | |
| maxShards = 65536 | |
| default: | |
| return nil, errors.New("this function does not support more than 65336 shards in a single keyspace") | |
| } |
Add a new parameter to this method, or a new method, that allows users to configure the # of digits in a shard range.
If this FR is accepted, a follow-up FR will be to plumb this configurability in to planetscale/vitess-operator
Use Case(s)
When using non-power-of-2 # of shards, there is a degree of "lossiness" where the last shard is larger than it should be, and the rest are smaller than they should be.
Lines 406 to 420 in f11de06
| // If shards does not divide evenly into maxShards, then there is some lossiness, | |
| // where each shard is smaller than it should technically be (if, for example, size == 25.6). | |
| // If we choose to keep everything in ints, then we have two choices: | |
| // - Have every shard in #numshards be a uniform size, tack on an additional shard | |
| // at the end of the range to account for the loss. This is bad because if you ask for | |
| // 7 shards, you'll actually get 7 uniform shards with 1 small shard, for 8 total shards. | |
| // It's also bad because one shard will have much different data distribution than the rest. | |
| // - Expand the final shard to include whatever is left in the keyrange. This will give the | |
| // correct number of shards, which is good, but depending on how lossy each individual shard is, | |
| // you could end with that final shard being significantly larger than the rest of the shards, | |
| // so this doesn't solve the data distribution problem. | |
| // | |
| // By tracking the "real" end (both in the real number sense, and in the truthfulness of the value sense), | |
| // we can re-truncate the integer end on each iteration, which spreads the lossiness more | |
| // evenly across the shards. |
By allowing users to configure the # of hex digits in their shard ranges, they can configure the degree of lossiness.
For example, generating a 5-shard cluster with 2-digit shard ranges results in: -33 33-66 66-99 99-cc cc-.
If we increase the shard digits for the same 5-shard cluster, we end up with: -3333 3333-6666 6666-9999 9999-cccc cccc-, which is less lossy.