sql: fix two bugs with TRUNCATE#28721
Conversation
knz
left a comment
There was a problem hiding this comment.
Last commit: on principle, I like the idea to cancel ongoing jobs. However I recommend to actually stop the jobs with an error message that says "table was dropped" so that the reason can be inspected with SHOW JOBS.
Reviewed 1 of 1 files at r1, 6 of 6 files at r2, 4 of 4 files at r3.
Reviewable status:complete! 1 of 0 LGTMs obtained
pkg/sql/drop_table.go, line 359 at r3 (raw file):
} // Mark all jobs scheduled for schema changes as successful.
Just for clarity: can you extend the commit message and the PR description to explain this new behavior, and outline at a high level:
- what happens if the DROP is aborted (e.g. its txn is rolled back) -- what happens with these jobs?
- what happens if the DROP is retried -- what happens with the jobs?
- what happens if a user uses AS OF SYSTEM TIME or a restore in-between the time the schema change jobs were issued, and the time where DROP was issued.
pkg/sql/drop_table.go, line 372 at r3 (raw file):
return err } if err := job.WithTxn(p.txn).Succeeded(ctx, jobs.NoopFn); err != nil {
Maybe instead fail the job with an error that says "table has been dropped". This way the reason why the job was stopped becomes clear in the event log / SHOW JOBS.
066917e to
95fa285
Compare
vivekmenezes
left a comment
There was a problem hiding this comment.
Thanks for review this.
The code path can be executed via TRUNCATE and DROP. In the case of TRUNCATE the schema changes were short circuited and indeed finished. In the case of DROP, the table has been dropped so one argue those schema changes were short circuited to and completed. I suppose one can ask, is there an action the user can take if there saw the extra message, or would they find it uncomfortable to see the schema change complete?
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/sql/drop_table.go, line 359 at r3 (raw file):
Previously, knz (kena) wrote…
Just for clarity: can you extend the commit message and the PR description to explain this new behavior, and outline at a high level:
- what happens if the DROP is aborted (e.g. its txn is rolled back) -- what happens with these jobs?
- what happens if the DROP is retried -- what happens with the jobs?
- what happens if a user uses AS OF SYSTEM TIME or a restore in-between the time the schema change jobs were issued, and the time where DROP was issued.
Added a comment to the PR saying that the change to jobs table is done in the same txn. The AS OF SYSTEM TIME question seems to be irrelevant to this PR.
vivekmenezes
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/sql/drop_table.go, line 372 at r3 (raw file):
Previously, knz (kena) wrote…
Maybe instead fail the job with an error that says "table has been dropped". This way the reason why the job was stopped becomes clear in the event log / SHOW JOBS.
Done.
knz
left a comment
There was a problem hiding this comment.
Yes my point in my previous comment is that if the job gets aborted because the table is dropped, this is not a normal schema change completion.
If a schema change is marked to have completed successfully, then there exists a point in time which the user can use with AS OF SYSTEM TIME to go and observe the data just after the schema change has completed.
If you make the job table lie about the completion (by marking the job as completed whereas the change did not, in fact, complete), then the user who uses AS OF SYSTEM TIME to go to the end of the job will see incorrect/invalid/incomplete data in their database.
This is why I think it is important to mark the job as failed and not succeeded.
Reviewed 6 of 6 files at r5, 4 of 4 files at r6.
Reviewable status:complete! 1 of 0 LGTMs obtained
pkg/sql/drop_table.go, line 359 at r3 (raw file):
Previously, vivekmenezes wrote…
Added a comment to the PR saying that the change to jobs table is done in the same txn. The AS OF SYSTEM TIME question seems to be irrelevant to this PR.
Where is the comment?
I was thinking to paste it at the top of the PR (in the first description) because this is the text that gets copied to the git log. Comments below that do not go to the git log.
pkg/sql/drop_table.go, line 372 at r3 (raw file):
Previously, vivekmenezes wrote…
Done.
I don't see a change?
vivekmenezes
left a comment
There was a problem hiding this comment.
Thanks for taking the time to review this and I appreciate your comments.
Well what I'm saying is that the job is marked successful when the schema change has completed. The job has a timestamp at which it was marked successful. If you use AS OF SYSTEM TIME before that time you should not expect the schema change to have been completed. If you use a timestamp after that you can expect 1. The schema change complete on a TRUNCATE table, or 2. The schema change complete on a DROP table (which has no schema by definition).
If there is some confusion in this matter it's worthy to consider 1. that this set of events happens rarely, and 2. when it does happen is there really going to be confusion.
Thanks
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/sql/drop_table.go, line 359 at r3 (raw file):
Previously, knz (kena) wrote…
Where is the comment?
I was thinking to paste it at the top of the PR (in the first description) because this is the text that gets copied to the git log. Comments below that do not go to the git log.
Done.
pkg/sql/drop_table.go, line 372 at r3 (raw file):
Previously, knz (kena) wrote…
I don't see a change?
I'm arguing against this change.
95fa285 to
f2305c6
Compare
|
@knz do you have any thoughts on this PR? Thanks! |
knz
left a comment
There was a problem hiding this comment.
You still have me on record that if a schema change is aborted by a DROP, then it should be marked as aborted not completed.
But in the end this is a subjective argument from the perspective of UX, not a correctness argument. So your guess is probably as good as mine.
I have approved the PR already so feel free to move forward.
Reviewed 5 of 6 files at r8, 2 of 4 files at r9.
Reviewable status:complete! 1 of 0 LGTMs obtained
pkg/sql/drop_table.go, line 372 at r3 (raw file):
Previously, vivekmenezes wrote…
I'm arguing against this change.
I was confused by the word "Done".
related to cockroachdb#27687 Release note (sql change): fixes table descriptor corruption when TRUNCATE TABLE is run while DROP COLUMN is being processed.
Release note: None
The job is modified in the same transaction as the user's DROP/TRUNCATE transaction. Release note (sql change): Fixes problem where when a DROP/TRUNCATE is run while a schema change like CREATE INDEX is running, the corresponding job would stay running indefinitely.
f2305c6 to
e6c904f
Compare
|
Thank you for the review! |
|
bors r+ |
Build failed (retrying...) |
28721: sql: fix two bugs with TRUNCATE r=vivekmenezes a=vivekmenezes sql: correctly resolve schema mutations during TRUNCATE related to #27687 sql: mark schema change job as successful when table DROP/TRUNCATE The job is modified in the same transaction as the user's DROP/TRUNCATE transaction. 29134: rangefeed: small perf-related changes r=nvanbenschoten a=nvanbenschoten This PR includes two small perf tweak that came up which running cdc/w=100/nodes=3/init=true. Release note: None Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Build succeeded |
29262: backport-2.1: sql: fix two bugs with TRUNCATE r=vivekmenezes a=vivekmenezes Backport 3/3 commits from #28721. /cc @cockroachdb/release --- sql: correctly resolve schema mutations during TRUNCATE related to #27687 sql: mark schema change job as successful when table DROP/TRUNCATE The job is modified in the same transaction as the user's DROP/TRUNCATE transaction. Co-authored-by: Vivek Menezes <vivek@cockroachlabs.com>
sql: correctly resolve schema mutations during TRUNCATE
related to #27687
sql: mark schema change job as successful when table DROP/TRUNCATE
The job is modified in the same transaction as the user's DROP/TRUNCATE transaction.