Skip to content

storage: splits don't seem to take into account range size properly #21689

@jordanlewis

Description

@jordanlewis

I noticed this while playing with the tpcc 1000 warehouse dataset, but it's easy to reproduce with just 10 warehouses.

Under the tpcc data generation load, ranges get queued for splitting due to their size despite not actually being over the range size limit.

To observe this, simply run a 1-node cockroach cluster locally and run ./tpcc -load -warehouses=10. After a short while, you should notice that the cluster is performing a ton of splits on many different tables.

For example, after loading 10 warehouses, the stock table has 232 ranges. The first one is pretty large, containing more than a single 100,000 row warehouse. The second one is of similar size. The third is fairly small, containing just below 18,000 rows, and the rest are very small, containing only a few thousand rows each. Here's a snippet of the ranges:

root@:26257/tpcc> show testing_ranges from table stock;
+-----------+-----------+----------+----------+--------------+
| Start Key |  End Key  | Range ID | Replicas | Lease Holder |
+-----------+-----------+----------+----------+--------------+
| NULL      | /1/699    |       59 | {1}      |            1 |
| /1/699    | /2/1384   |       61 | {1}      |            1 |
| /2/1384   | /2/18000  |       62 | {1}      |            1 |
| /2/18000  | /2/20000  |       63 | {1}      |            1 |
| /2/20000  | /2/23000  |       64 | {1}      |            1 |
| /2/23000  | /2/27000  |       65 | {1}      |            1 |
| /2/27000  | /2/30000  |       66 | {1}      |            1 |
| /2/30000  | /2/34000  |       67 | {1}      |            1 |
| /2/34000  | /2/36000  |       68 | {1}      |            1 |
| /2/36000  | /2/41000  |       69 | {1}      |            1 |
| /2/41000  | /2/43000  |       70 | {1}      |            1 |
<snip>

As a total split queue novice, I poked around and added some debug output:

diff --git a/pkg/storage/split_queue.go b/pkg/storage/split_queue.go
index 8d9e5f9be..e26eea045 100644
--- a/pkg/storage/split_queue.go
+++ b/pkg/storage/split_queue.go
@@ -20,6 +20,8 @@ import (

        "github.com/pkg/errors"

+       "fmt"
+
        "github.com/cockroachdb/cockroach/pkg/config"
        "github.com/cockroachdb/cockroach/pkg/gossip"
        "github.com/cockroachdb/cockroach/pkg/internal/client"
@@ -83,6 +85,7 @@ func (sq *splitQueue) shouldQueue(
        // Add priority based on the size of range compared to the max
        // size for the zone it's in.
        if ratio := float64(repl.GetMVCCStats().Total()) / float64(repl.GetMaxBytes()); ratio > 1 {
+               fmt.Println("Ratio: ", ratio, desc.RangeID, desc.StartKey, desc.EndKey, repl.GetMVCCStats().Total(), repl.GetMaxBytes())
                priority += ratio
                shouldQ = true
        }

This log line gets fired many times for a particular range when a split happens, claiming that the result of Total() is in fact greater than 64 megabytes. This is empirically false - the rows in these small ranges are no larger bytes-wise than those in the large ranges.

So, my hypothesis is that something's overcounting range size. Since this behavior takes a while to kick in (a couple warehouses), I would guess that there's an issue during range splits themselves that causes the size of the new range to be overcounted.

cc @petermattis @tschottdorf as likely candidates for people who know about MVCC stats.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions