Skip to content

storage: Delayed update of per-store write stats can cause rebalance thrashing #17970

@a-robinson

Description

@a-robinson

After a rebalance has happened, the LogicalBytes and WritesPerSecond logically change for the store that the replica was added to or removed from. If the LogicalBytes stat changes by a large amount, then the store will re-gossip its StoreCapacity ahead of schedule. While it isn't guaranteed by any means, in practice this does a mostly decent job of getting updated information spread throughout the cluster quickly enough that a bunch more rebalance operations aren't based on outdated information.

The same is not true for WritesPerSecond, though. We don't start counting WritesPerSecond stats until a replica has been on a new store for 5 seconds, so a good deal more rebalancing decisions can be made without considering the additional writes on the node. In some circumstances (as seen on indigo), this can make for rebalance thrashing where we add a replica to a node, then decide that the replica isn't a good fit for it. If the store's WritesPerSecond stat had been updated, we wouldn't move the replica.

As seen on indigo, this was closely intertwined with #17971.

@BramGruneir has previously suggested passing along WritesPerSecond stats as part of a rebalance to combat this. I'm still not sure about that, but the leaseholder that made the rebalance decisions should be capable of updating its own local copy of the other store's descriptor so that it has a more accurate view when deciding which replica to remove as part of a rebalance.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions