Skip to content

stability: real-world testing of proposal quota #8659

@petermattis

Description

@petermattis

From discussion spawned out of #8639, we need to introduce a flow control mechanism for admitting write operations to replicas. If writes are being applied to a replica sufficiently fast, the raft log might be growing faster than we can generate and apply a snapshot. If that situation arises we'll get a loop of continuous snapshot generation and application which is a drain on the system (and, in effect, throttles all writes). Adjusting the Raft log truncation heuristics (again) is not sufficient as applying a sufficiently large chunk of Raft log entries is slower than using a snapshot.

One idea for a flow control mechanism is to throttle incoming write operations based on the size of the Raft log. A small Raft log indicates that the replicas are all keeping up. As the Raft log grows closer to its target max size (currently the replica size) we would want to throttle writes. I haven't thought of a specific heuristic to use, but am thinking we'd want something that incorporated the excess "raft log capacity" (the delta between the current raft log size and its target max size).

Cc @cockroachdb/stability

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions