backupccl: ensure restore on success is run once#43933
backupccl: ensure restore on success is run once#43933craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
d8c50c5 to
56c0c35
Compare
pkg/ccl/backupccl/restore.go
Outdated
| latestStats []*stats.TableStatisticProto | ||
| execCfg *sql.ExecutorConfig | ||
| // A collection of stages that should only be run once. | ||
| statsInserted bool |
There was a problem hiding this comment.
don't you need to persist these, in the job detail or progress?
There was a problem hiding this comment.
yep. looks like this was the wrong branch. re-pushed.
There was a problem hiding this comment.
👍
I think you might need to actually update the job too (UpdateProgress or UpdatePayload depending on where you put it) to save your change to the in-mem version you're updating here. You could do it in the same txn ideally I think?
There was a problem hiding this comment.
Here's how IMPORT does something similar (ensuring that all the descs get their state changed): https://github.com/cockroachdb/cockroach/blob/master/pkg/ccl/importccl/import_stmt.go#L900
56c0c35 to
9dc99cf
Compare
spaskob
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @pbardea and @spaskob)
pkg/ccl/backupccl/restore.go, line 1787 at r2 (raw file):
return nil } defer func() { details.statsInserted = true }()
I am confused why this is deferred. Should you call it only in case InsertNewStats succeeds? And also where do you actually persist the job details? Don't you need a call to job.update(...)
|
here's an example of how to use update
https://github.com/cockroachdb/cockroach/blob/master/pkg/jobs/jobs.go#L181
…On Mon, Jan 13, 2020 at 4:49 PM David Taylor ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/ccl/backupccl/restore.go
<#43933 (comment)>
:
> @@ -1579,6 +1579,9 @@ type restoreResumer struct {
tables []*sqlbase.TableDescriptor
latestStats []*stats.TableStatisticProto
execCfg *sql.ExecutorConfig
+ // A collection of stages that should only be run once.
+ statsInserted bool
👍
I think you might need to actually update the job too to save your change
to the in-mem version you're updating here. You could do it in the same txn
ideally I think?
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#43933?email_source=notifications&email_token=AANL634SGZYEICIKO63LTLDQ5TOX7A5CNFSM4KGIG5EKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCRSP4WA#discussion_r366047052>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANL633HMO6SILEBKAUTE6DQ5TOX7ANCNFSM4KGIG5EA>
.
|
dc5ff1b to
7f928ef
Compare
|
RFAL |
dt
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @pbardea)
pkg/ccl/backupccl/restore.go, line 1786 at r3 (raw file):
} txn := r.execCfg.DB.NewTxn(ctx, "insert-stats")
Is there a reason to use this over the .Txn(..., fn) that takes the retry-able closure and does the error handling?
It seems that jobs today do not ensure that the OnSuccess callback is called exactly once. This PR moves the cleanup stages of RESTORE, formerly located in the OnSuccess callback to be the final steps of Resume. This should help ensure that these stages are run once and only once. Release note (bug fix): Ensure that RESTORE cleanup is run exactly once.
7f928ef to
62a2cde
Compare
pbardea
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt and @spaskob)
pkg/ccl/backupccl/restore.go, line 1787 at r2 (raw file):
Previously, spaskob (Spas Bojanov) wrote…
I am confused why this is deferred. Should you call it only in case InsertNewStats succeeds? And also where do you actually persist the job details? Don't you need a call to job.update(...)
Yep, as per our offline discussion I wasn't sure if returning an error from resume would ensure that it would not be run again (and therefore continuously retry to re-insert stats). But it should be safe to just return an error and mark this stage as complete in the same txn that performs the insertion. Same applies for publishing tables below.
pkg/ccl/backupccl/restore.go, line 1786 at r3 (raw file):
Previously, dt (David Taylor) wrote…
Is there a reason to use this over the
.Txn(..., fn)that takes the retry-able closure and does the error handling?
Nope - updated to use the retry-able closure.
dt
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @dt and @spaskob)
spaskob
left a comment
There was a problem hiding this comment.
LGTM
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @dt and @spaskob)
|
TFTRs! |
Build failed |
|
|
|
yes I have seen this test flake
…On Wed, Jan 15, 2020 at 10:15 AM Paul Bardea ***@***.***> wrote:
TestTruncateWhileColumnBackfill failed and TeamCity says it looks flaky.
Trying again.
bors r+
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#43933?email_source=notifications&email_token=AANL632NXY3633ER62F5EO3Q54SBFA5CNFSM4KGIG5EKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJAU6MQ#issuecomment-574705458>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AANL636TRFCMYZD5AGCIWPTQ54SBFANCNFSM4KGIG5EA>
.
|
Build failed |
|
Third time's the charm? |
Build failed (retrying...) |
43720: coldata: fix behavior of Vec.Append in some cases when NULLs are present r=yuzefovich a=yuzefovich We would always Get and then Set a value while Append'ing without paying attention to whether the value is actually NULL. This can lead to problems in case of flat bytes if the necessary invariant is unmaintained. Now this is fixed by explicitly enforcing the invariant. Additionally, this commit ensures that the destination slice has the desired capacity before appending one value at a time (in case of a present selection vector). I tried approach with paying attention to whether the value is NULL before appending it and saw a significant performance hit, so I think this approach is the least evil. Fixes: #42774. Release note: None 43933: backupccl: ensure restore on success is run once r=pbardea a=pbardea It seems that jobs today do not ensure that the OnSuccess callback is called exactly once. This PR moves the cleanup stages of RESTORE, formerly located in the OnSuccess callback to be the final steps of Resume. This should help ensure that these stages are run once and only once. Release note (bug fix): Ensure that RESTORE cleanup is run exactly once. 44013: roachtest: skip acceptance/version-upgrade because flaky r=andreimatei a=andreimatei Very flaky, apparently because of some problem with a recent migration. Touches #43957, #44005 Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com> Co-authored-by: Paul Bardea <pbardea@gmail.com> Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
|
bors r+ |
|
Already running a review |
Build succeeded |
It seems that jobs today do not ensure that the OnSuccess callback is
called exactly once. This PR moves the cleanup stages of RESTORE,
formerly located in the OnSuccess callback to be the final steps of
Resume. This should help ensure that these stages are run once and only
once.
Release note (bug fix): Ensure that RESTORE cleanup is run exactly once.