Skip to content

[dnm] kvserver: subject MsgApp to admission control#81289

Closed
tbg wants to merge 7 commits intocockroachdb:masterfrom
tbg:poc-follower-write-admission-control
Closed

[dnm] kvserver: subject MsgApp to admission control#81289
tbg wants to merge 7 commits intocockroachdb:masterfrom
tbg:poc-follower-write-admission-control

Conversation

@tbg
Copy link
Copy Markdown
Member

@tbg tbg commented May 16, 2022

Context: #79215 (comment)

  • block on raft receive queue (first commit)
  • block in handleRaftReady (second commit)
  • treat followers with high r-amp as probing (TBD)
  • delay ready handling until admission control passed (TBD)

Release note: None

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@tbg tbg force-pushed the poc-follower-write-admission-control branch 2 times, most recently from 1db3037 to 1f46dd0 Compare May 16, 2022 10:27
@tbg
Copy link
Copy Markdown
Member Author

tbg commented May 16, 2022

First experiment: Naively backpressuring on the incoming raft stream via local admission control in https://github.com/cockroachdb/cockroach/blob/1f46dd08d8ad3fa82038abb6852a5920f0f93305/pkg/kv/kvserver/store_raft.go#L299-L301. This wreaked havoc immediately (including unavailable liveness, I think) since the liveness leaseholder was probably on n3, and n3 was backpressuring on each incoming MsgApp regardless of which range it was for. "Most" quota pools in the system were depleted:

image

It would've been interesting to see the metrics in that state but alas, metrics timeseries ranges were also all unavailable.

@tbg tbg force-pushed the poc-follower-write-admission-control branch from 9bb0516 to f5d8594 Compare May 16, 2022 11:30
tbg added 5 commits May 16, 2022 16:06
They would previously be `store1` since we mostly run single-store
deployments. It's often helpful to target a particular node, for
example for lease preferences.

Before:

```
 select node_id, attrs from crdb_internal.kv_store_status;
  node_id |               attrs
----------+-------------------------------------
        1 | ["store1"]
```

After:

```
  node_id |               attrs
        1 | ["node1", "node1store1", "store1"]
```

Release note: None
<what was there before: Previously, ...>
<why it needed to change: This was inadequate because ...>
<what you did about it: To address this, this patch ...>

Release note (<category, see below>): <what> <show> <why>
Custom charts were previously difficult to read because the graph didn't
include any indication of what was being graphed. This is now rectified
by printing the metrics name. This isn't the most user friendly thing
(ideally we would also include the actual title, and make the help text
available) but with my limited TSX knowledge this is solving 90% of the
problem with a 61 character change.

Related to cockroachdb#81035.

Release note: None
Wait in handleRaftReady, which isn't great either, but at least allows
us to target individual ranges (mod Go scheduler concurrency).

Release note: None
@tbg tbg force-pushed the poc-follower-write-admission-control branch from f5d8594 to cfc7c74 Compare May 16, 2022 14:08
Sloppy for now

Release note: None
@tbg
Copy link
Copy Markdown
Member Author

tbg commented May 16, 2022

For a second experiment, I moved the backpressure to handleRaftReadyRaftMuLocked. This has the advantage that it allows filtering by RangeID and that it doesn't block the entire receive queue to the store indiscriminately. The disadvantage is that this is below-raft throttling, so we're now holding raftMu for extended periods of time, which balloons the raft receive queue (and might still stall it, if we consume all raft scheduler goroutines):

image

I also pre-split 1000 ranges for this workload. Unfortunately, n1 (which has all leases pinned to it) was starting to get high read-amp as well, but we can see in the graphs that for quite some time before that, n3 is struggling with high read-amp but that it is staying bounded - meaning admission control on the raft follower path is doing its job. This is most obvious in the bottom graph, which measures specifically time spent in admission control in handleRaftReady (red is n3).
The end-to-end latencies are consequently pretty bad

image

In the experiment, n1 ended up with high r-amp and started
backpressuring below raft as well. This isn't the intention
since those writes will be doubly backpressured. We ended up
in a regime where n1 is constantly overloaded and n3 is actually
doing ok, perhaps metaunstable.

Release note: None
@tbg
Copy link
Copy Markdown
Member Author

tbg commented May 17, 2022

There's an internal thread here about whether this experiment is flawed in the sense that the cluster cannot possibly sustain 6mb/s, since I'm running it with the stock binary now (leases on n1 and n2) and n1 and n2 are the ones that have high read amp, despite being provisioned at 250mb/s (double the default).

tbg added a commit to tbg/cockroach that referenced this pull request Jun 9, 2022
This is a less ad-hoc version of the experiment in cockroachdb#81289, where I
messed with the EBS configuration. This can't be done programmatically,
and so here we use an IO nemesis on n3 instead.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Jun 23, 2022
This is a less ad-hoc version of the experiment in cockroachdb#81289, where I
messed with the EBS configuration. This can't be done programmatically,
and so here we use an IO nemesis on n3 instead.

Release note: None
@tbg
Copy link
Copy Markdown
Member Author

tbg commented Jul 12, 2022

This is being productionized in #83851.

@tbg tbg closed this Jul 12, 2022
tbg added a commit to tbg/cockroach that referenced this pull request Aug 11, 2022
This is is a less ad-hoc version of the experiment in cockroachdb#81289, where I
messed with the EBS configuration. This can't be done programmatically,
and so here we use an IO nemesis on n3 instead.

Part of cockroachdb#79215.
Closes cockroachdb#81834.

Release note: None
tbg added a commit to tbg/cockroach that referenced this pull request Aug 18, 2022
This is is a less ad-hoc version of the experiment in cockroachdb#81289, where I
messed with the EBS configuration. This can't be done programmatically,
and so here we use an IO nemesis on n3 instead.

Part of cockroachdb#79215.
Closes cockroachdb#81834.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants