storage: make queue timeouts dynamic and add cluster setting for snapshots#42686
Conversation
62ca322 to
107e037
Compare
andreimatei
left a comment
There was a problem hiding this comment.
LGTM
But why not start using the flexible timeout machinery for calibrating snapshot timeouts to range sizes?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei)
Yes, I want to do that. To do that I'll need to pass the replica being processed into the timeout func. Want that now or in a follow-up? |
|
Either way. Feel free to merge
…On Mon, Nov 25, 2019 at 1:12 PM ajwerner ***@***.***> wrote:
But why not start using the flexible timeout machinery for calibrating
snapshot timeouts to range sizes?
Yes, I want to do that. To do that I'll need to pass the replica being
processed into the timeout func. Want that now or in a follow-up?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#42686?email_source=notifications&email_token=AAC4C4IK7FSPTRHUNBNSZJDQVQIQXA5CNFSM4JQKBZQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFDJWHY#issuecomment-558275359>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAC4C4OBYMY5DKQGMOLTHALQVQIQXANCNFSM4JQKBZQQ>
.
|
5224291 to
1023103
Compare
|
@andreimatei I made the timeout a function of the range size and the send rate. I maintained the same minimum and additionally added a cluster setting to control that minimum. PTAL |
The background storage queues each carry a timeout. This timeout seems like a
good idea to unstick a potentially stuck queue. I'm not going to speculate as
to why these queues might find themselves getting stuck but, let's just say,
maybe it happens and maybe having a timeout is a good idea. Unfortunately
sometimes these timeouts can come to bite us, especially when things are
unexpectedly slow or data sizes are unexpectedly large. Failure to make
progress before a timeout expires in queue processing is a common cause of
long-term outages. In those cases it's good to have an escape hatch.
Another concern on the horizon is the desire to have larger range sizes. Today
our default queue processing timeout is 1m. The raft snapshot queue does not
override this. The default snapshot rate is 8 MB/s.
```
(512MB / 8MB/s) = 64s
> 1m
```
This unfortunate fact means that ranges larger than 512 MB can never
successfully receive a snapshot from the raft snapshot queue. The next
commit will utilize this fact by adding a cluster setting to control the
timeout of the raft snapshot queue.
This commit changes the constant per-queue timeout to a function which can
consult both cluster settings and the Replica which is about to be processed.
Release note: None.
c58e2d4 to
b803d62
Compare
andreimatei
left a comment
There was a problem hiding this comment.
LGTM 💯
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
pkg/storage/raft_snapshot_queue.go, line 44 at r2 (raw file):
// permit `minimum_timeout` but rather asks for it to be `.minimum.timeout`. "kv.raft_snapshot_queue.process.timeout_minimum", "minimum duration after which the processing of the raft snapshot queue will time out",
can you mention kv.snapshot_recovery.max_rate in this description too somehow pls? In fact, I'd move most of the top comment into the description.
…nd size This commit adds a hidden setting to control the minimum timeout of raftSnapshotQueue processing. It is an escape hatch to deal with snapshots for large ranges. At the default send rate of 8MB/s a range must stay smaller than 500MB to be successfully sent before the default 1m timeout. When this has been hit traditionally it is has been mitigated by increasing the send rate. This may not always be desirable. In addition to the minimum timeout the change also now computes the timeout on a per Replica basis based on the current snapshot rate limit and the size of the snapshot being sent. This should prevent large ranges with slow send rates from timing out. Maybe there should be a release note but because the setting is hidden I opted not to add it. Release note: None.
b803d62 to
494bfc0
Compare
andreimatei
left a comment
There was a problem hiding this comment.
LGTM
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner)
|
TFTR! bors r=andreimatei |
42686: storage: make queue timeouts dynamic and add cluster setting for snapshots r=andreimatei a=ajwerner
The background storage queues each carry a timeout. This timeout seems like a
good idea to unstick a potentially stuck queue. I'm not going to speculate as
to why these queues might find themselves getting stuck but let's just say maybe
it happens and maybe having a timeout is a good idea. Unfortunately sometimes
these timeouts can come to bite us, especially when things are unexpectedly slow
or data sizes are unexpectedly large. Failure to make progress before a timeout
expires in queue processing is a common cause of long-term outages. In those
cases it's good to have an escape hatch.
Another concern on the horizon is the desire to have larger range sizes. Today
our default queue processing timeout is 1m. The raft snapshot queue does not
override this. The default snapshot rate is 8 MB/s.
```
(512MB / 8MB/s) = 64s
> 1m
```
This unfortunate fact means that ranges larger than 512 MB can never
successfully receive a snapshot from the raft snapshot queue. The next
commit will utilize this fact by adding a cluster setting to control the
timeout of the raft snapshot queue.
The second commit adds a hidden setting to control the timeout of raftSnapshotQueue
processing. It was added as an escape hatch to deal with snapshots for large
ranges. At the default send rate of 8MB/s a range must stay smaller than 500MB
to be successfully sent before the default 1m timeout. When this has been hit
traditionally it is has been mitigated by increasing the send rate. This may
not always be desirable. Additionally this commit and PR more generally paves
the way to move to a higher timeout for the snapshot queue. A later PR will
adjust the default value in anticipation of larger default ranges.
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Build succeeded |
…namic In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note: None
…namic In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note: None
…namic In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note: None
44809: storage: make queue timeouts controllable, snapshot sending queues dynamic r=knz a=ajwerner In #42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note: None Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
…namic In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note: None
…namic In cockroachdb#42686 we made the raft snapshot queue timeout dynamic and based on the size of the snapshot being sent. We also added an escape hatch to control the timeout of processing of that queue. This change generalizes that cluster setting to apply to all of the queues. It so happens that the replicate queue and the merge queue also sometimes need to send snapshots. This PR gives them similar treatment to the raft snapshot queue. The previous cluster setting was never released and is reserved so it does not need a release note. Release note (bug fix): Fixed a bug where large ranges with slow send rates would hit the timeout in several storage system queues by making the timeout dynamic based on the current rate limit and the size of the data being sent. This affects several storage system queues: the Raft snapshot queue, the replication queue, and the merge queue.
The background storage queues each carry a timeout. This timeout seems like a
good idea to unstick a potentially stuck queue. I'm not going to speculate as
to why these queues might find themselves getting stuck but let's just say maybe
it happens and maybe having a timeout is a good idea. Unfortunately sometimes
these timeouts can come to bite us, especially when things are unexpectedly slow
or data sizes are unexpectedly large. Failure to make progress before a timeout
expires in queue processing is a common cause of long-term outages. In those
cases it's good to have an escape hatch.
Another concern on the horizon is the desire to have larger range sizes. Today
our default queue processing timeout is 1m. The raft snapshot queue does not
override this. The default snapshot rate is 8 MB/s.
This unfortunate fact means that ranges larger than 512 MB can never
successfully receive a snapshot from the raft snapshot queue. The next
commit will utilize this fact by adding a cluster setting to control the
timeout of the raft snapshot queue.
The second commit adds a hidden setting to control the timeout of raftSnapshotQueue
processing. It was added as an escape hatch to deal with snapshots for large
ranges. At the default send rate of 8MB/s a range must stay smaller than 500MB
to be successfully sent before the default 1m timeout. When this has been hit
traditionally it is has been mitigated by increasing the send rate. This may
not always be desirable. Additionally this commit and PR more generally paves
the way to move to a higher timeout for the snapshot queue. A later PR will
adjust the default value in anticipation of larger default ranges.