Skip to content

storage: make queue timeouts dynamic and add cluster setting for snapshots#42686

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
ajwerner:ajwerner/independent-timeouts-per-queue
Dec 13, 2019
Merged

storage: make queue timeouts dynamic and add cluster setting for snapshots#42686
craig[bot] merged 2 commits intocockroachdb:masterfrom
ajwerner:ajwerner/independent-timeouts-per-queue

Conversation

@ajwerner
Copy link
Copy Markdown
Contributor

The background storage queues each carry a timeout. This timeout seems like a
good idea to unstick a potentially stuck queue. I'm not going to speculate as
to why these queues might find themselves getting stuck but let's just say maybe
it happens and maybe having a timeout is a good idea. Unfortunately sometimes
these timeouts can come to bite us, especially when things are unexpectedly slow
or data sizes are unexpectedly large. Failure to make progress before a timeout
expires in queue processing is a common cause of long-term outages. In those
cases it's good to have an escape hatch.

Another concern on the horizon is the desire to have larger range sizes. Today
our default queue processing timeout is 1m. The raft snapshot queue does not
override this. The default snapshot rate is 8 MB/s.

 (512MB / 8MB/s) = 64s
                 > 1m

This unfortunate fact means that ranges larger than 512 MB can never
successfully receive a snapshot from the raft snapshot queue. The next
commit will utilize this fact by adding a cluster setting to control the
timeout of the raft snapshot queue.

The second commit adds a hidden setting to control the timeout of raftSnapshotQueue
processing. It was added as an escape hatch to deal with snapshots for large
ranges. At the default send rate of 8MB/s a range must stay smaller than 500MB
to be successfully sent before the default 1m timeout. When this has been hit
traditionally it is has been mitigated by increasing the send rate. This may
not always be desirable. Additionally this commit and PR more generally paves
the way to move to a higher timeout for the snapshot queue. A later PR will
adjust the default value in anticipation of larger default ranges.

@ajwerner ajwerner requested a review from andreimatei November 22, 2019 00:37
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@ajwerner ajwerner changed the title Ajwerner/independent timeouts per queue storage: make queue timeouts dynamic and add cluster setting for snapshots Nov 22, 2019
@ajwerner ajwerner force-pushed the ajwerner/independent-timeouts-per-queue branch 2 times, most recently from 62ca322 to 107e037 Compare November 25, 2019 13:51
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

But why not start using the flexible timeout machinery for calibrating snapshot timeouts to range sizes?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

@ajwerner
Copy link
Copy Markdown
Contributor Author

But why not start using the flexible timeout machinery for calibrating snapshot timeouts to range sizes?

Yes, I want to do that. To do that I'll need to pass the replica being processed into the timeout func. Want that now or in a follow-up?

@andreimatei
Copy link
Copy Markdown
Contributor

andreimatei commented Nov 25, 2019 via email

@ajwerner ajwerner force-pushed the ajwerner/independent-timeouts-per-queue branch 3 times, most recently from 5224291 to 1023103 Compare December 4, 2019 15:20
@ajwerner
Copy link
Copy Markdown
Contributor Author

ajwerner commented Dec 4, 2019

@andreimatei I made the timeout a function of the range size and the send rate. I maintained the same minimum and additionally added a cluster setting to control that minimum. PTAL

The background storage queues each carry a timeout. This timeout seems like a
good idea to unstick a potentially stuck queue. I'm not going to speculate as
to why these queues might find themselves getting stuck but, let's just say,
maybe it happens and maybe having a timeout is a good idea. Unfortunately
sometimes these timeouts can come to bite us, especially when things are
unexpectedly slow or data sizes are unexpectedly large. Failure to make
progress before a timeout expires in queue processing is a common cause of
long-term outages. In those cases it's good to have an escape hatch.

Another concern on the horizon is the desire to have larger range sizes. Today
our default queue processing timeout is 1m. The raft snapshot queue does not
override this. The default snapshot rate is 8 MB/s.

```
 (512MB / 8MB/s) = 64s
                 > 1m
```

This unfortunate fact means that ranges larger than 512 MB can never
successfully receive a snapshot from the raft snapshot queue. The next
commit will utilize this fact by adding a cluster setting to control the
timeout of the raft snapshot queue.

This commit changes the constant per-queue timeout to a function which can
consult both cluster settings and the Replica which is about to be processed.

Release note: None.
@ajwerner ajwerner force-pushed the ajwerner/independent-timeouts-per-queue branch 2 times, most recently from c58e2d4 to b803d62 Compare December 13, 2019 17:58
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💯

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)


pkg/storage/raft_snapshot_queue.go, line 44 at r2 (raw file):

	// permit `minimum_timeout` but rather asks for it to be `.minimum.timeout`.
	"kv.raft_snapshot_queue.process.timeout_minimum",
	"minimum duration after which the processing of the raft snapshot queue will time out",

can you mention kv.snapshot_recovery.max_rate in this description too somehow pls? In fact, I'd move most of the top comment into the description.

…nd size

This commit adds a hidden setting to control the minimum timeout of
raftSnapshotQueue processing. It is an escape hatch to deal with snapshots for
large ranges. At the default send rate of 8MB/s a range must stay smaller than
500MB to be successfully sent before the default 1m timeout. When this has been
hit traditionally it is has been mitigated by increasing the send rate. This
may not always be desirable.

In addition to the minimum timeout the change also now computes the timeout
on a per Replica basis based on the current snapshot rate limit and the size of
the snapshot being sent. This should prevent large ranges with slow send rates
from timing out.

Maybe there should be a release note but because the setting is hidden I opted
not to add it.

Release note: None.
@ajwerner ajwerner force-pushed the ajwerner/independent-timeouts-per-queue branch from b803d62 to 494bfc0 Compare December 13, 2019 19:05
Copy link
Copy Markdown
Contributor

@andreimatei andreimatei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

@ajwerner
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r=andreimatei

craig bot pushed a commit that referenced this pull request Dec 13, 2019
42686: storage: make queue timeouts dynamic and add cluster setting for snapshots r=andreimatei a=ajwerner

The background storage queues each carry a timeout. This timeout seems like a
good idea to unstick a potentially stuck queue. I'm not going to speculate as
to why these queues might find themselves getting stuck but let's just say maybe
it happens and maybe having a timeout is a good idea. Unfortunately sometimes
these timeouts can come to bite us, especially when things are unexpectedly slow
or data sizes are unexpectedly large. Failure to make progress before a timeout
expires in queue processing is a common cause of long-term outages. In those
cases it's good to have an escape hatch.

Another concern on the horizon is the desire to have larger range sizes. Today
our default queue processing timeout is 1m. The raft snapshot queue does not
override this. The default snapshot rate is 8 MB/s.

```
 (512MB / 8MB/s) = 64s
                 > 1m
```

This unfortunate fact means that ranges larger than 512 MB can never
successfully receive a snapshot from the raft snapshot queue. The next
commit will utilize this fact by adding a cluster setting to control the
timeout of the raft snapshot queue.

The second commit adds a hidden setting to control the timeout of raftSnapshotQueue
processing. It was added as an escape hatch to deal with snapshots for large
ranges. At the default send rate of 8MB/s a range must stay smaller than 500MB
to be successfully sent before the default 1m timeout. When this has been hit
traditionally it is has been mitigated by increasing the send rate. This may
not always be desirable. Additionally this commit and PR more generally paves
the way to move to a higher timeout for the snapshot queue. A later PR will
adjust the default value in anticipation of larger default ranges.


Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 13, 2019

Build succeeded

@craig craig bot merged commit 494bfc0 into cockroachdb:master Dec 13, 2019
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Feb 6, 2020
…namic

In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the
size of the snapshot being sent. We also added an escape hatch to control
the timeout of processing of that queue. This change generalizes that
cluster setting to apply to all of the queues.

It so happens that the replicate queue and the merge queue also sometimes
need to send snapshots. This PR gives them similar treatment to the
raft snapshot queue.

The previous cluster setting was never released and is reserved so it does not
need a release note.

Release note: None
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Feb 6, 2020
…namic

In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the
size of the snapshot being sent. We also added an escape hatch to control
the timeout of processing of that queue. This change generalizes that
cluster setting to apply to all of the queues.

It so happens that the replicate queue and the merge queue also sometimes
need to send snapshots. This PR gives them similar treatment to the
raft snapshot queue.

The previous cluster setting was never released and is reserved so it does not
need a release note.

Release note: None
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Feb 10, 2020
…namic

In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the
size of the snapshot being sent. We also added an escape hatch to control
the timeout of processing of that queue. This change generalizes that
cluster setting to apply to all of the queues.

It so happens that the replicate queue and the merge queue also sometimes
need to send snapshots. This PR gives them similar treatment to the
raft snapshot queue.

The previous cluster setting was never released and is reserved so it does not
need a release note.

Release note: None
craig bot pushed a commit that referenced this pull request Feb 10, 2020
44809: storage: make queue timeouts controllable, snapshot sending queues dynamic r=knz a=ajwerner

In #42686 we made the raft snapshot queue timeout dynamic and based on the
size of the snapshot being sent. We also added an escape hatch to control
the timeout of processing of that queue. This change generalizes that
cluster setting to apply to all of the queues.

It so happens that the replicate queue and the merge queue also sometimes
need to send snapshots. This PR gives them similar treatment to the
raft snapshot queue.

The previous cluster setting was never released and is reserved so it does not
need a release note.

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Feb 11, 2020
…namic

In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the
size of the snapshot being sent. We also added an escape hatch to control
the timeout of processing of that queue. This change generalizes that
cluster setting to apply to all of the queues.

It so happens that the replicate queue and the merge queue also sometimes
need to send snapshots. This PR gives them similar treatment to the
raft snapshot queue.

The previous cluster setting was never released and is reserved so it does not
need a release note.

Release note: None
ajwerner pushed a commit to ajwerner/cockroach that referenced this pull request Mar 4, 2020
…namic

In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the
size of the snapshot being sent. We also added an escape hatch to control
the timeout of processing of that queue. This change generalizes that
cluster setting to apply to all of the queues.

It so happens that the replicate queue and the merge queue also sometimes
need to send snapshots. This PR gives them similar treatment to the
raft snapshot queue.

The previous cluster setting was never released and is reserved so it does not
need a release note.

Release note (bug fix): Fixed a bug where large ranges with slow send rates
would hit the timeout in several storage system queues by making the timeout
dynamic based on the current rate limit and the size of the data being
sent. This affects several storage system queues: the Raft snapshot queue,
the replication queue, and the merge queue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants