sql: disallow multiple modification subqueries of same table#70976
sql: disallow multiple modification subqueries of same table#70976craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
knz
left a comment
There was a problem hiding this comment.
I think this check is too strict. The bug that this PR attempts to work around only affects certain combinations of mutations, not all.
For example:
- Two subsequent INSERTs, even on the same table, are fine.
- Two subsequent DELETEs are fine.
What is generally problematic, is a statement in a CTE that reads and writes after another CTE that writes. For example:
- UPDATE after another mutation
- UPSERT after another mutation
- DELETE with a join clause.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jordanlewis)
9e53ab4 to
87eb442
Compare
michae2
left a comment
There was a problem hiding this comment.
Good points!
- Two subsequent INSERTs, even on the same table, are fine.
I've changed the check to allow multiple INSERTs to the same table.
- Two subsequent DELETEs are fine.
It does seem like KV del requests are idempotent (they appear to only leave a single intent per row) but based on your comments on RETURNING I did not make an allowance for multiple DELETEs of the same table. (INSERT RETURNING doesn't have the same problem because the inserted rows will always be disjoint, so the order of execution doesn't affect RETURNING.)
I think this is RFAL.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball and @jordanlewis)
knz
left a comment
There was a problem hiding this comment.
will also want a review by someone from the queries team.
Reviewed 10 of 10 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @michae2, and @RaduBerinde)
-- commits, line 37 at r2:
"must not" is maybe a strong word. Maybe you can explain that if they do change values in indexed columns they may get index corruption.
I think that mutations to non-indexed columns are unaffected.
pkg/sql/logictest/testdata/logic_test/with, line 0 at r2 (raw file):
can you change the file type of with so that github doesn't consider binary and we get a reviewable diff?
pkg/sql/opt/optbuilder/testdata/delete, line 242 at r2 (raw file):
# Non-referenced CTE with mutation. build WITH cte AS (SELECT y FROM [DELETE FROM xyz WHERE z > 0 RETURNING *]) DELETE FROM abcde WHERE a=b
why did you change this?
pkg/sql/opt/optbuilder/testdata/update, line 517 at r2 (raw file):
# Non-referenced CTE with mutation. build WITH cte AS (SELECT y FROM [UPDATE xyz SET y = y + 1 RETURNING *]) UPDATE abcde SET a=b
same question
michae2
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @knz, @michae2, and @RaduBerinde)
pkg/sql/logictest/testdata/logic_test/with, line at r2 (raw file):
Previously, knz (kena) wrote…
can you change the file type of
withso that github doesn't consider binary and we get a reviewable diff?
git diff and github both treat it as text for me. It appears to be specific to reviewable. I added a pattern to .gitattributes, let's see if reviewable looks at that...
pkg/sql/opt/optbuilder/testdata/delete, line 242 at r2 (raw file):
Previously, knz (kena) wrote…
why did you change this?
This query modifies the same table twice, and hits the new error. This testcase is checking optbuilder correctness, not actually executing the query, so I thought it would be better for the query to still produce output. Maybe @RaduBerinde can comment on whether this makes sense.
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 10 of 10 files at r2, 1 of 1 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @knz, @michae2, and @RaduBerinde)
.gitattributes, line 5 at r4 (raw file):
pkg/BUILD.bazel -diff pkg/sql/logictest/testdata/** diff pkg/sql/opt/optbuilder/testdata/** diff
Did you mean to modify this file?
Edit: I see the reason below. Doesn't look like it's working.
pkg/sql/logictest/testdata/logic_test/with, line at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
git diffand github both treat it as text for me. It appears to be specific to reviewable. I added a pattern to .gitattributes, let's see if reviewable looks at that...
Doesn't look like the .gitattibutes thing worked.
I've seen this problem before with this file. I think it has to do with a particular test case that has a lot of binary-looking data:
Anyway, I can't comment on this file directly, but you've added a bunch of test cases that look useful, but it's hard to say exactly what they are testing. Could you add some more comments to the new test cases to explain what they are showing?
pkg/sql/opt/optbuilder/builder.go, line 145 at r4 (raw file):
// true when all mutations of the table are inserts (without ON CONFLICT) and // false otherwise. tablesMutated map[cat.StableID]bool
nit: is there any way to make the name of this variable more self-documenting and mention something about inserts?
pkg/sql/opt/optbuilder/util.go, line 487 at r4 (raw file):
b.tablesMutated = make(map[cat.StableID]bool) } prevInserts, prevMutations := b.tablesMutated[tab.ID()]
nit: if prevMutations is false, I would just return early. I think it's confusing that you continue to check prevMutations below.
pkg/sql/opt/optbuilder/util.go, line 488 at r4 (raw file):
} prevInserts, prevMutations := b.tablesMutated[tab.ID()] multiMutationsOk := multipleModificationsOfTableEnabled.Get(&b.evalCtx.Settings.SV)
nit: we don't need to check this unless prevMutations && (!prevInserts || !insert) is true.
pkg/sql/opt/optbuilder/testdata/delete, line 242 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
This query modifies the same table twice, and hits the new error. This testcase is checking optbuilder correctness, not actually executing the query, so I thought it would be better for the query to still produce output. Maybe @RaduBerinde can comment on whether this makes sense.
lgtm
pkg/sql/opt/optbuilder/testdata/update, line 517 at r2 (raw file):
Previously, knz (kena) wrote…
same question
lgtm
michae2
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @knz, @RaduBerinde, and @rytaft)
Previously, knz (kena) wrote…
"must not" is maybe a strong word. Maybe you can explain that if they do change values in indexed columns they may get index corruption.
I think that mutations to non-indexed columns are unaffected.
Commit message has been modified a bit.
pkg/sql/logictest/testdata/logic_test/with, line at r2 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Doesn't look like the .gitattibutes thing worked.
I've seen this problem before with this file. I think it has to do with a particular test case that has a lot of binary-looking data:
Anyway, I can't comment on this file directly, but you've added a bunch of test cases that look useful, but it's hard to say exactly what they are testing. Could you add some more comments to the new test cases to explain what they are showing?
Added some comments and undid the .gitattributes change.
pkg/sql/opt/optbuilder/builder.go, line 145 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: is there any way to make the name of this variable more self-documenting and mention something about inserts?
Done.
pkg/sql/opt/optbuilder/util.go, line 487 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: if
prevMutationsis false, I would just return early. I think it's confusing that you continue to checkprevMutationsbelow.
Done.
pkg/sql/opt/optbuilder/util.go, line 488 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: we don't need to check this unless
prevMutations && (!prevInserts || !insert)is true.
Done.
.gitattributes, line 5 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Did you mean to modify this file?
Edit: I see the reason below. Doesn't look like it's working.
Undid the change.
knz
left a comment
There was a problem hiding this comment.
no further comment from me
Reviewed all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @knz, @RaduBerinde, and @rytaft)
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 7 of 9 files at r5, 1 of 1 files at r6, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball, @jordanlewis, @knz, @RaduBerinde, and @rytaft)
|
TFTR! bors r=rytaft |
|
Build failed (retrying...): |
|
This PR was included in a batch that was canceled, it will be automatically retried |
|
Build failed (retrying...): |
|
Hmm, looks like there's a problem with the test. bors r- |
|
Canceled. |
Addresses: cockroachdb#70731 Disallow multiple INSERT ON CONFLICT, UPSERT, UPDATE, or DELETE subqueries on the same table until we are better able to detect multiple modifications of the same row by a single statement. (Note that multiple INSERTs without ON CONFLICT do not suffer from this issue and are always allowed.) Release note (sql change): Statements containing multiple INSERT ON CONFLICT, UPSERT, UPDATE, or DELETE subqueries can cause data corruption if they modify the same row multiple times. For example, the following SELECT 1 statement will cause corruption of table t: ```sql CREATE TABLE t (i INT, j INT, PRIMARY KEY (i), INDEX (j)); INSERT INTO t VALUES (0, 0); WITH cte1 AS (UPDATE t SET j = 1 WHERE i = 0 RETURNING *), cte2 AS (UPDATE t SET j = 2 WHERE i = 0 RETURNING *) SELECT 1; ``` Until this is fixed, this change disallows statements with multiple subqueries which modify the same table. Applications can work around this by rewriting problematic statements. For example, the query above can be rewritten as an explicit multi-statement transaction: ```sql BEGIN; UPDATE t SET j = 1 WHERE i = 0; UPDATE t SET j = 2 WHERE i = 0; SELECT 1; COMMIT; ``` or, if it doesn't matter which update "wins", as multiple non-mutating CTEs on an UPDATE statement: ```sql WITH cte1 AS (SELECT 1), cte2 AS (SELECT 2) UPDATE t SET j = x.j FROM (SELECT * FROM cte1 UNION ALL SELECT * FROM cte2) AS x (j) WHERE i = 0 RETURNING 1; ``` which in this case could be written more simply as: ```sql UPDATE t SET j = x.j FROM (VALUES (1), (2)) AS x (j) WHERE i = 0 RETURNING 1; ``` (Note that in these last two rewrites the first update will win, rather than the last.) None of these rewrites suffer from the corruption problem. To override this change and allow these statements in spite of the risk of corruption, applications can: ```sql SET CLUSTER SETTING sql.multiple_modifications_of_table.enabled = true ``` But be warned that with this enabled there is nothing to prevent this type of corruption from occuring if the same row is modified multiple times by a single statment. To check for corruption, use the EXPERIMENTAL SCRUB command: ```sql EXPERIMENTAL SCRUB TABLE t WITH OPTIONS INDEX ALL; ```
|
bors r=rytaft |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from 630d142 to blathers/backport-release-21.1-70976: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 21.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Addresses: #70731
Disallow multiple INSERT ON CONFLICT, UPSERT, UPDATE, or DELETE
subqueries on the same table until we are better able to detect multiple
modifications of the same row by a single statement. (Note that multiple
INSERTs without ON CONFLICT do not suffer from this issue and are always
allowed.)
Release note (sql change): Statements containing multiple INSERT ON
CONFLICT, UPSERT, UPDATE, or DELETE subqueries can cause data corruption
if they modify the same row multiple times. For example, the following
SELECT 1 statement will cause corruption of table t:
Until this is fixed, this change disallows statements with multiple
subqueries which modify the same table. Applications can work around
this by rewriting problematic statements. For example, the query above
can be rewritten as an explicit multi-statement transaction:
or, if it doesn't matter which update "wins", as multiple non-mutating
CTEs on an UPDATE statement:
which in this case could be written more simply as:
(Note that in these last two rewrites the first update will win, rather
than the last.) None of these rewrites suffer from the corruption problem.
To override this change and allow these statements in spite of the risk of
corruption, applications can:
But be warned that with this enabled there is nothing to prevent this
type of corruption from occuring if the same row is modified multiple
times by a single statment.
To check for corruption, use the EXPERIMENTAL SCRUB command: