storage: make queue timeouts dynamic and add cluster setting for snapshots by ajwerner · Pull Request #42686 · cockroachdb/cockroach

ajwerner · 2019-11-22T00:37:37Z

The background storage queues each carry a timeout. This timeout seems like a
good idea to unstick a potentially stuck queue. I'm not going to speculate as
to why these queues might find themselves getting stuck but let's just say maybe
it happens and maybe having a timeout is a good idea. Unfortunately sometimes
these timeouts can come to bite us, especially when things are unexpectedly slow
or data sizes are unexpectedly large. Failure to make progress before a timeout
expires in queue processing is a common cause of long-term outages. In those
cases it's good to have an escape hatch.

Another concern on the horizon is the desire to have larger range sizes. Today
our default queue processing timeout is 1m. The raft snapshot queue does not
override this. The default snapshot rate is 8 MB/s.

 (512MB / 8MB/s) = 64s
                 > 1m

This unfortunate fact means that ranges larger than 512 MB can never
successfully receive a snapshot from the raft snapshot queue. The next
commit will utilize this fact by adding a cluster setting to control the
timeout of the raft snapshot queue.

The second commit adds a hidden setting to control the timeout of raftSnapshotQueue
processing. It was added as an escape hatch to deal with snapshots for large
ranges. At the default send rate of 8MB/s a range must stay smaller than 500MB
to be successfully sent before the default 1m timeout. When this has been hit
traditionally it is has been mitigated by increasing the send rate. This may
not always be desirable. Additionally this commit and PR more generally paves
the way to move to a higher timeout for the snapshot queue. A later PR will
adjust the default value in anticipation of larger default ranges.

cockroach-teamcity · 2019-11-22T00:37:44Z

This change is

andreimatei

LGTM

But why not start using the flexible timeout machinery for calibrating snapshot timeouts to range sizes?

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)

ajwerner · 2019-11-25T18:12:27Z

But why not start using the flexible timeout machinery for calibrating snapshot timeouts to range sizes?

Yes, I want to do that. To do that I'll need to pass the replica being processed into the timeout func. Want that now or in a follow-up?

andreimatei · 2019-11-25T18:14:30Z

Either way. Feel free to merge

…

On Mon, Nov 25, 2019 at 1:12 PM ajwerner ***@***.***> wrote: But why not start using the flexible timeout machinery for calibrating snapshot timeouts to range sizes? Yes, I want to do that. To do that I'll need to pass the replica being processed into the timeout func. Want that now or in a follow-up? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#42686?email_source=notifications&email_token=AAC4C4IK7FSPTRHUNBNSZJDQVQIQXA5CNFSM4JQKBZQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFDJWHY#issuecomment-558275359>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAC4C4OBYMY5DKQGMOLTHALQVQIQXANCNFSM4JQKBZQQ> .

ajwerner · 2019-12-04T15:27:27Z

@andreimatei I made the timeout a function of the range size and the send rate. I maintained the same minimum and additionally added a cluster setting to control that minimum. PTAL

The background storage queues each carry a timeout. This timeout seems like a good idea to unstick a potentially stuck queue. I'm not going to speculate as to why these queues might find themselves getting stuck but, let's just say, maybe it happens and maybe having a timeout is a good idea. Unfortunately sometimes these timeouts can come to bite us, especially when things are unexpectedly slow or data sizes are unexpectedly large. Failure to make progress before a timeout expires in queue processing is a common cause of long-term outages. In those cases it's good to have an escape hatch. Another concern on the horizon is the desire to have larger range sizes. Today our default queue processing timeout is 1m. The raft snapshot queue does not override this. The default snapshot rate is 8 MB/s. ``` (512MB / 8MB/s) = 64s > 1m ``` This unfortunate fact means that ranges larger than 512 MB can never successfully receive a snapshot from the raft snapshot queue. The next commit will utilize this fact by adding a cluster setting to control the timeout of the raft snapshot queue. This commit changes the constant per-queue timeout to a function which can consult both cluster settings and the Replica which is about to be processed. Release note: None.

andreimatei

LGTM 💯

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

pkg/storage/raft_snapshot_queue.go, line 44 at r2 (raw file):

	// permit `minimum_timeout` but rather asks for it to be `.minimum.timeout`.
	"kv.raft_snapshot_queue.process.timeout_minimum",
	"minimum duration after which the processing of the raft snapshot queue will time out",

can you mention kv.snapshot_recovery.max_rate in this description too somehow pls? In fact, I'd move most of the top comment into the description.

…nd size This commit adds a hidden setting to control the minimum timeout of raftSnapshotQueue processing. It is an escape hatch to deal with snapshots for large ranges. At the default send rate of 8MB/s a range must stay smaller than 500MB to be successfully sent before the default 1m timeout. When this has been hit traditionally it is has been mitigated by increasing the send rate. This may not always be desirable. In addition to the minimum timeout the change also now computes the timeout on a per Replica basis based on the current snapshot rate limit and the size of the snapshot being sent. This should prevent large ranges with slow send rates from timing out. Maybe there should be a release note but because the setting is hidden I opted not to add it. Release note: None.

andreimatei

LGTM

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)

ajwerner · 2019-12-13T19:42:25Z

TFTR!

bors r=andreimatei

42686: storage: make queue timeouts dynamic and add cluster setting for snapshots r=andreimatei a=ajwerner The background storage queues each carry a timeout. This timeout seems like a good idea to unstick a potentially stuck queue. I'm not going to speculate as to why these queues might find themselves getting stuck but let's just say maybe it happens and maybe having a timeout is a good idea. Unfortunately sometimes these timeouts can come to bite us, especially when things are unexpectedly slow or data sizes are unexpectedly large. Failure to make progress before a timeout expires in queue processing is a common cause of long-term outages. In those cases it's good to have an escape hatch. Another concern on the horizon is the desire to have larger range sizes. Today our default queue processing timeout is 1m. The raft snapshot queue does not override this. The default snapshot rate is 8 MB/s. ``` (512MB / 8MB/s) = 64s > 1m ``` This unfortunate fact means that ranges larger than 512 MB can never successfully receive a snapshot from the raft snapshot queue. The next commit will utilize this fact by adding a cluster setting to control the timeout of the raft snapshot queue. The second commit adds a hidden setting to control the timeout of raftSnapshotQueue processing. It was added as an escape hatch to deal with snapshots for large ranges. At the default send rate of 8MB/s a range must stay smaller than 500MB to be successfully sent before the default 1m timeout. When this has been hit traditionally it is has been mitigated by increasing the send rate. This may not always be desirable. Additionally this commit and PR more generally paves the way to move to a higher timeout for the snapshot queue. A later PR will adjust the default value in anticipation of larger default ranges. Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>

craig · 2019-12-13T20:15:55Z

Build succeeded

GitHub CI (Cockroach)

…namic In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note: None

44809: storage: make queue timeouts controllable, snapshot sending queues dynamic r=knz a=ajwerner In #42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note: None Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>

…namic In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note: None

…namic In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note (bug fix): Fixed a bug where large ranges with slow send rates would hit the timeout in several storage system queues by making the timeout dynamic based on the current rate limit and the size of the data being sent. This affects several storage system queues: the Raft snapshot queue, the replication queue, and the merge queue.

ajwerner requested a review from andreimatei November 22, 2019 00:37

ajwerner changed the title ~~Ajwerner/independent timeouts per queue~~ storage: make queue timeouts dynamic and add cluster setting for snapshots Nov 22, 2019

ajwerner mentioned this pull request Nov 22, 2019

OOM occurred while running replica GC old version #42531

Closed

ajwerner force-pushed the ajwerner/independent-timeouts-per-queue branch 2 times, most recently from 62ca322 to 107e037 Compare November 25, 2019 13:51

andreimatei reviewed Nov 25, 2019

View reviewed changes

ajwerner force-pushed the ajwerner/independent-timeouts-per-queue branch 3 times, most recently from 5224291 to 1023103 Compare December 4, 2019 15:20

ajwerner force-pushed the ajwerner/independent-timeouts-per-queue branch 2 times, most recently from c58e2d4 to b803d62 Compare December 13, 2019 17:58

andreimatei approved these changes Dec 13, 2019

View reviewed changes

ajwerner force-pushed the ajwerner/independent-timeouts-per-queue branch from b803d62 to 494bfc0 Compare December 13, 2019 19:05

andreimatei reviewed Dec 13, 2019

View reviewed changes

ajwerner mentioned this pull request Dec 13, 2019

storage: investigate larger default range sizes #39717

Closed

craig bot merged commit 494bfc0 into cockroachdb:master Dec 13, 2019

ajwerner mentioned this pull request Feb 6, 2020

storage: make queue timeouts controllable, snapshot sending queues dynamic #44809

Merged

ajwerner mentioned this pull request Feb 11, 2020

release-19.2: storage: make queue timeouts controllable, snapshot sending queues dynamic #44952

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: make queue timeouts dynamic and add cluster setting for snapshots#42686

storage: make queue timeouts dynamic and add cluster setting for snapshots#42686
craig[bot] merged 2 commits intocockroachdb:masterfrom
ajwerner:ajwerner/independent-timeouts-per-queue

ajwerner commented Nov 22, 2019

Uh oh!

cockroach-teamcity commented Nov 22, 2019

Uh oh!

andreimatei left a comment

Uh oh!

ajwerner commented Nov 25, 2019

Uh oh!

andreimatei commented Nov 25, 2019 via email

Uh oh!

ajwerner commented Dec 4, 2019

Uh oh!

andreimatei left a comment

Uh oh!

andreimatei left a comment

Uh oh!

ajwerner commented Dec 13, 2019

Uh oh!

craig bot commented Dec 13, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ajwerner commented Nov 22, 2019

Uh oh!

cockroach-teamcity commented Nov 22, 2019

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner commented Nov 25, 2019

Uh oh!

andreimatei commented Nov 25, 2019 via email

Uh oh!

ajwerner commented Dec 4, 2019

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

andreimatei left a comment

Choose a reason for hiding this comment

Uh oh!

ajwerner commented Dec 13, 2019

Uh oh!

craig bot commented Dec 13, 2019

Build succeeded

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants