opt: remove conflicting input rows in ON CONFLICT DO NOTHING case#45443
opt: remove conflicting input rows in ON CONFLICT DO NOTHING case#45443craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
2dcb3ce to
5bb6021
Compare
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andy-kimball, @RaduBerinde, and @rytaft)
pkg/sql/opt/ops/relational.opt, line 533 at r1 (raw file):
# ErrorOnDup, if true, triggers an error if any aggregation group contains # more than one row. This can only be true for the UpsertDistinctOn operator. ErrorOnDup bool
I think this code needs to be updated to check the new field instead: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/statistics_builder.go#L1500
rytaft
left a comment
There was a problem hiding this comment.
(once you address Radu's point)
Reviewed 17 of 17 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball)
pkg/sql/opt/memo/testdata/logprops/upsert, line 222 at r1 (raw file):
├── key: (1) ├── fd: (1)-->(2-4), (2)-->(3), (4)~~>(1-3), (2,3)~~>(1,4) └── upsert-distinct-on
should ErrOnDup be printed here?
pkg/sql/opt/norm/groupby.go, line 219 at r1 (raw file):
} case *memo.UpsertDistinctOnExpr:
Can you also include regular DistinctOn here?
pkg/sql/opt/optbuilder/insert.go, line 741 at r1 (raw file):
// Add an UpsertDistinctOn operator to ensure there are no duplicate input // rows for this unique index. Duplicate rows trigger conflict errors,
Might help to specify that these conflict errors happen at execution time
andy-kimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/testdata/logprops/upsert, line 222 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
should ErrOnDup be printed here?
Done.
pkg/sql/opt/norm/groupby.go, line 219 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Can you also include regular
DistinctOnhere?
I think it could in theory be included, but I think the case would be so rare as to be not worth the extra complication to test and maintain. I want to keep the cases constrained to what we might see in upsert queries (I added DistinctOn at the top-level mostly to make it easier to test). I thought about using a Rule property so that we don't descend the tree on every distinct operator, but figured it was OK as-is, since we only descend down a single path here, and not the entire subtree.
pkg/sql/opt/ops/relational.opt, line 533 at r1 (raw file):
Previously, RaduBerinde wrote…
I think this code needs to be updated to check the new field instead: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/statistics_builder.go#L1500
Yes, I missed that. Fixed.
pkg/sql/opt/optbuilder/insert.go, line 741 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Might help to specify that these conflict errors happen at execution time
Done.
5bb6021 to
1a2c1b1
Compare
andy-kimball
left a comment
There was a problem hiding this comment.
Note that I also fixed the problem that Radu found in the previous review, regarding incorrect removal of upsert-distinct-on when it wraps left-join.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
RaduBerinde
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball, @RaduBerinde, and @rytaft)
pkg/sql/opt/norm/groupby.go, line 238 at r2 (raw file):
} } if !keyCols.SubsetOf(eqCols) {
Don't we want to use ColsAreLaxKey? This won't work well if we have multiple keys.
1a2c1b1 to
bf2f9c8
Compare
andy-kimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/norm/groupby.go, line 238 at r2 (raw file):
Previously, RaduBerinde wrote…
Don't we want to use
ColsAreLaxKey? This won't work well if we have multiple keys.
Yes, ColsAreLaxKey is better. Switched to that.
The current code was able to work OK b/c we always left-join with an index scan having a key that matches the upsert-distinct-on grouping cols.
andy-kimball
left a comment
There was a problem hiding this comment.
Another possibility I've thought about is whether we should instead have an insert-distinct-on operator rather than adding ErrorOnDup to upsert-distinct-on.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
rytaft
left a comment
There was a problem hiding this comment.
I like the idea of having a separate operator so you don't need the extra field
Reviewed 9 of 12 files at r2, 4 of 4 files at r3.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @RaduBerinde)
pkg/sql/opt/memo/extract.go, line 227 at r3 (raw file):
// ExtractJoinEquality returns true if the given condition is a simple equality // condition with two variables (e.g. a=b), where one of the variables (returned // as "left") is in the set of leftCols and the other (returned as "right") in
[nit] (returned as "right") in -> (returned as "right") is in
pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):
// will raise an error, so in non-error cases its distinct count is the // same as its row count. colStat.DistinctCount = s.RowCount
But this only applies if the columns in colSet match the grouping columns, right? I think the code above will already do what you want in that case.
andy-kimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/extract.go, line 227 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
[nit] (returned as "right") in -> (returned as "right") is in
Done.
pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
But this only applies if the columns in colSet match the grouping columns, right? I think the code above will already do what you want in that case.
How so? The RowCount of the distinct operator in the error case is equal to the RowCount of its input. But it's DistinctCount should also be equal to the RowCount of its input, which the code above would not do.
Maybe I should put this new code only in the 2nd branch of that code (when cols are grouping cols). So DistinctCount would equal RowCount when cols are grouping cols, but DistinctCount would be equal to 1st branch if not.
4d12d4d to
e7051d0
Compare
andy-kimball
left a comment
There was a problem hiding this comment.
Re: the separate operator, I think I'll just keep as-is for now. There are pros and cons to each approach, and I don't think we gain much by switching. We can always change later if we decide it's not working well.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 5 of 5 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @RaduBerinde)
pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
How so? The
RowCountof the distinct operator in the error case is equal to theRowCountof its input. But it'sDistinctCountshould also be equal to theRowCountof its input, which the code above would not do.Maybe I should put this new code only in the 2nd branch of that code (when cols are grouping cols). So
DistinctCountwould equalRowCountwhen cols are grouping cols, butDistinctCountwould be equal to 1st branch if not.
But you'd need to be sure that the colSet is equal to groupingColSet. If there are two grouping columns and colSet only contains one of them, then this logic is incorrect. The GroupBy and DistinctOn operators don't change the distinct counts of grouping columns, regardless of whether ErrOnDup is set or not. (ErrOnDup does indicate something about the row count, though, which you've already handled above in buildGroupBy).
So the distinct count for colSet here should be the same as the distinct count of colSet in the input, which is what line 1546 is already doing (copying from input).
andy-kimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/memo/statistics_builder.go, line 1554 at r3 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
But you'd need to be sure that the
colSetis equal togroupingColSet. If there are two grouping columns andcolSetonly contains one of them, then this logic is incorrect. TheGroupByandDistinctOnoperators don't change the distinct counts of grouping columns, regardless of whetherErrOnDupis set or not. (ErrOnDupdoes indicate something about the row count, though, which you've already handled above inbuildGroupBy).So the distinct count for
colSethere should be the same as the distinct count ofcolSetin the input, which is what line 1546 is already doing (copying from input).
Done.
rytaft
left a comment
There was a problem hiding this comment.
Reviewed 2 of 2 files at r5.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
andy-kimball
left a comment
There was a problem hiding this comment.
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @RaduBerinde)
Build failed (retrying...) |
|
bors r- optbuilder testfiles need a rewrite: https://teamcity.cockroachdb.com/viewLog.html?buildId=1785780&buildTypeId=Cockroach_UnitTests |
Canceled |
Previously, the INSERT..ON CONFLICT DO NOTHING statement raised an error if its input contained rows which conflicted with one another. For example: CREATE TABLE t (x INT PRIMARY KEY) INSERT INTO t (x) VALUES (1), (1) ON CONFLICT DO NOTHING This raised a unique key conflict error rather than doing nothing. This commit fixes that by wrapping the input in one or more UpsertDistinctOn operators - one for each unique key in the target table. This ensures that any duplicates which might cause a conflict are removed from the input. Fixes cockroachdb#37880 Fixes cockroachdb#44434 Release note (sql change): Duplicate rows in the input to an INSERT..ON CONFLICT DO NOTHING statement will now be ignored rather than triggering an error.
|
bors r+ |
Build failed |
|
bors r+ |
Build failed |
|
bors r+ |
Build succeeded |
Previously, the INSERT..ON CONFLICT DO NOTHING statement raised an error
if its input contained rows which conflicted with one another. For example:
CREATE TABLE t (x INT PRIMARY KEY)
INSERT INTO t (x) VALUES (1), (1) ON CONFLICT DO NOTHING
This raised a unique key conflict error rather than doing nothing.
This commit fixes that by wrapping the input in one or more UpsertDistinctOn
operators - one for each unique key in the target table. This ensures that
any duplicates which might cause a conflict are removed from the input.
Fixes #37880
Fixes #44434
Release note (sql change): Duplicate rows in the input to an INSERT..ON
CONFLICT DO NOTHING statement will now be ignored rather than triggering
an error.