Skip to content

Feature Request: configurable number of digits in generated shard ranges #15744

@maxenglander

Description

@maxenglander

Feature Description

key.GenerateShardRanges uses 2 hex digits in shard ranges when there are 256 or fewer shards.

vitess/go/vt/key/key.go

Lines 369 to 384 in f11de06

func GenerateShardRanges(shards int) ([]string, error) {
var format string
var maxShards int
switch {
case shards <= 0:
return nil, errors.New("shards must be greater than zero")
case shards <= 256:
format = "%02x"
maxShards = 256
case shards <= 65536:
format = "%04x"
maxShards = 65536
default:
return nil, errors.New("this function does not support more than 65336 shards in a single keyspace")
}

Add a new parameter to this method, or a new method, that allows users to configure the # of digits in a shard range.

If this FR is accepted, a follow-up FR will be to plumb this configurability in to planetscale/vitess-operator

Use Case(s)

When using non-power-of-2 # of shards, there is a degree of "lossiness" where the last shard is larger than it should be, and the rest are smaller than they should be.

vitess/go/vt/key/key.go

Lines 406 to 420 in f11de06

// If shards does not divide evenly into maxShards, then there is some lossiness,
// where each shard is smaller than it should technically be (if, for example, size == 25.6).
// If we choose to keep everything in ints, then we have two choices:
// - Have every shard in #numshards be a uniform size, tack on an additional shard
// at the end of the range to account for the loss. This is bad because if you ask for
// 7 shards, you'll actually get 7 uniform shards with 1 small shard, for 8 total shards.
// It's also bad because one shard will have much different data distribution than the rest.
// - Expand the final shard to include whatever is left in the keyrange. This will give the
// correct number of shards, which is good, but depending on how lossy each individual shard is,
// you could end with that final shard being significantly larger than the rest of the shards,
// so this doesn't solve the data distribution problem.
//
// By tracking the "real" end (both in the real number sense, and in the truthfulness of the value sense),
// we can re-truncate the integer end on each iteration, which spreads the lossiness more
// evenly across the shards.

By allowing users to configure the # of hex digits in their shard ranges, they can configure the degree of lossiness.

For example, generating a 5-shard cluster with 2-digit shard ranges results in: -33 33-66 66-99 99-cc cc-.

If we increase the shard digits for the same 5-shard cluster, we end up with: -3333 3333-6666 6666-9999 9999-cccc cccc-, which is less lossy.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions