opt: lock all column families for new SELECT FOR UPDATE#116170
opt: lock all column families for new SELECT FOR UPDATE#116170craig[bot] merged 2 commits intocockroachdb:masterfrom
Conversation
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):
// In some simple cases we can push the locking down into the input operation // instead of adding what would be a redundant lookup join.
Does this optimization need to change to ensure that the input is scanning the same columns as are required by the lock?
pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):
switch input := lock.Input.(type) { case *memo.ScanExpr: if md.Table(input.Table) == md.Table(lock.Table) && input.Index == cat.PrimaryIndex {
[nit] I think this check is correct, but occasionally I've seen the catalog tables be different pointers even for the same table. I think it would be best to use md.Table(input.Table).ID() == md.Table(lock.Table).ID()
pkg/sql/opt/optbuilder/locking.go line 398 at r2 (raw file):
oldColID := lb.table.ColumnID(i) newColID := newTabID.ColumnID(i) colMap.Set(int(oldColID), int(newColID))
[nit] ColMaps can have poor performance for queries with a lot of columns. Can we follow the example of DuplicateColumnIDs instead? Or better yet, call into a shared method on the factory that returns the new table and column IDs?
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 403 at r2 (raw file):
statement ok CREATE TABLE abcde (a INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, FAMILY (a, b), FAMILY (c), FAMILY (d), FAMILY (e))
A case with a nullable column might be interesting. I think attempting to lock (only) families with id > 0 and all nullable columns will also lock family 0.
DrewKimball
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Does this optimization need to change to ensure that the input is scanning the same columns as are required by the lock?
Or at least, should we check that the input is scanning at least the columns that need to be locked?
|
Previously, DrewKimball (Drew Kimball) wrote…
|
|
Previously, mgartner (Marcus Gartner) wrote…
I suppose that's what we want here, since this code was comparing the |
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @nvanbenschoten)
pkg/sql/opt/optbuilder/locking.go line 409 at r2 (raw file):
oldColID := inScope.cols[i].id if newColID, ok := colMap.Get(int(oldColID)); ok { cols.Add(opt.ColumnID(newColID))
Is cols supposed to include columns from both inScope and the new table?
michae2
left a comment
There was a problem hiding this comment.
TFTRs!
Drew's questions brought up a number of problems that I have been working on. Hold off on reviewing for a bit, will push today with fixes.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Or at least, should we check that the input is scanning at least the columns that need to be locked?
Yes, great point, this optimization does now need to check columns.
pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
I suppose that's what we want here, since this code was comparing the
opt.TableIDs before?
Yes, exactly, a query that joins a table to itself now has a problem, which I am currently fixing.
|
I have realized there's a problem with this approach: statements like |
michae2
left a comment
There was a problem hiding this comment.
Actually, I changed my mind, because locking individual column families seems really useful. I plan to make SELECT 1 FROM foo FOR UPDATE lock the first column family, as it does now.
I think this new version will work, but hold off on reviewing until I add more tests.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
41b362d to
c025af6
Compare
michae2
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):
// empty. In this case we will only lock the first column family. var lockCols opt.ColSet for _, col := range inScope.cols {
Hmm, ran into another quandary. This method of picking the columns to lock from the projection scope means that a statement like:
SELECT a + b FROM foo FOR UPDATE;won't actually lock the column families of a and b, but will instead only lock the first column family, just as I'm doing for SELECT 1 FROM foo FOR UPDATE. Maybe even scarier is a statement like:
SELECT a::int FROM foo FOR UPDATE;which really looks like it should lock column family a, but does not.
I think this is too subtle for people to work with safely. I'm going to go back to locking all column families.
michae2
left a comment
There was a problem hiding this comment.
Changed it to lock all column families.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Yes, great point, this optimization does now need to check columns.
I made the optimization only apply to single-column-family tables. This saves us from having to compare the columns produced by the input to the columns locked, because there's only one column family in play.
pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Yes, exactly, a query that joins a table to itself now has a problem, which I am currently fixing.
I switched back to comparing the exact opt.TableID (by saving it in the KeySource field of the LockPrivate).
pkg/sql/opt/optbuilder/locking.go line 398 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit] ColMaps can have poor performance for queries with a lot of columns. Can we follow the example of
DuplicateColumnIDsinstead? Or better yet, call into a shared method on the factory that returns the new table and column IDs?
I ended up putting every column of the duplicated table into the set, instead of using the scope to pick, so the ColMap went away. (This is done to ensure we lock all column families.)
pkg/sql/opt/optbuilder/locking.go line 409 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
Is
colssupposed to include columns from bothinScopeand the new table?
I added the LockCols field to make this more clear: Cols is essentially passthrough columns from inScope and LockCols has columns from the new table.
pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):
Previously, michae2 (Michael Erickson) wrote…
Hmm, ran into another quandary. This method of picking the columns to lock from the projection scope means that a statement like:
SELECT a + b FROM foo FOR UPDATE;won't actually lock the column families of a and b, but will instead only lock the first column family, just as I'm doing for
SELECT 1 FROM foo FOR UPDATE. Maybe even scarier is a statement like:SELECT a::int FROM foo FOR UPDATE;which really looks like it should lock column family a, but does not.
I think this is too subtle for people to work with safely. I'm going to go back to locking all column families.
I think we should introduce a syntax something like SELECT ... FOR UPDATE OF foo (a, b) to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.
DrewKimball
left a comment
There was a problem hiding this comment.
Great job figuring all this out!
I think there's one last problem - when the primary key is fully specified (e.g. not looking up with a prefix), and a column family is fully contained in the key column set, we can skip scanning that family and perform get requests on the others. Example:
CREATE TABLE abc (a INT NOT NULL, b INT NOT NULL, c INT NOT NULL, PRIMARY KEY (a, b), FAMILY foo (a), FAMILY bar (b), FAMILY baz (c));
EXPLAIN ANALYZE (DEBUG) SELECT * FROM abc WHERE a = 5 AND b = 6;
--From plan.txt:
• scan
columns: (a int, b int, c int)
nodes: n1
actual row count: 0
vectorized batch count: 0
KV time: 1ms
KV contention time: 0µs
KV rows decoded: 0
KV pairs read: 0
KV bytes read: 0 B
KV gRPC calls: 1
estimated max memory allocated: 20 KiB
sql cpu time: 35µs
MVCC step count (ext/int): 0/0
MVCC seek count (ext/int): 1/1
estimated row count: 1 (missing stats)
table: abc@abc_pkey
spans: /5/6/2/1
Note from the spans that only the baz column family was scanned. The span-splitting logic is called from DistSQLPlanner.createPlanForLookupJoin. We could potentially make that a separate fix - whatever's easiest.
Reviewed 1 of 6 files at r3, 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @nvanbenschoten)
pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):
I switched back to comparing the exact
opt.TableID(by saving it in theKeySourcefield of theLockPrivate).
SGTM. Seems best to limit the scope of the optimization for now.
pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):
I think we should introduce a syntax something like
SELECT ... FOR UPDATE OF foo (a, b)to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.
I like this idea.
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 403 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
A case with a nullable column might be interesting. I think attempting to lock (only) families with id > 0 and all nullable columns will also lock family 0.
A nullable case might still be interesting, but seems less important now.
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @nvanbenschoten)
pkg/sql/opt/ops/mutation.opt line 384 at r5 (raw file):
# KeySource identifies the source of the key columns. KeySource TableID
In a follow-up, consider adding some assertions about the interesting invariants of this expression in check_expr.go, like:
- Assert that all columns in KeyCols are from the KeySource table.
- Assert that all columns in LockCols are from Table.
- Assert that Cols contains KeyCols.
- Assert that Cols does not intersect LockCols.
- Assert the order of KeyCols.
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 400 at r5 (raw file):
statement ok CREATE TABLE abcde (a INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, FAMILY (a, b), FAMILY (c), FAMILY (d), FAMILY (e))
nit: The table would be easier to read if split on multiple lines.
michae2
left a comment
There was a problem hiding this comment.
I think there's one last problem - when the primary key is fully specified (e.g. not looking up with a prefix), and a column family is fully contained in the key column set, we can skip scanning that family and perform
getrequests on the others.
Wow, great catch!
I pushed a second commit fixing that behavior.
Interestingly, I was not able to come up with a scenario where this lack of a lock on family bar would cause a problem, because any change to b causes a delete and a put, which touch all column families. And we always have at least the first column family locked. But maybe someone else can come up with a scenario?
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 403 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
A nullable case might still be interesting, but seems less important now.
Done. (column d)
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 400 at r5 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: The table would be easier to read if split on multiple lines.
Done.
michae2
left a comment
There was a problem hiding this comment.
My fix was a little too aggressive, and broke a test added by #42572 for FK checks on multi-column-family tables. Now I'm only modifying the behavior for lookup joins under read committed isolation.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
DrewKimball
left a comment
There was a problem hiding this comment.
Had some suggestions/comments, but I'm happy with this as-is if we want to merge.
Interestingly, I was not able to come up with a scenario where this lack of a lock on family
barwould cause a problem, because any change tobcauses a delete and a put, which touch all column families. And we always have at least the first column family locked. But maybe someone else can come up with a scenario?
That's a good point. Since the problem case requires the column to be part of the primary key, any mutations to that column involve all families automatically. So, maybe the only problematic case is two locking reads.
Reviewed 2 of 2 files at r6, 4 of 7 files at r7, 3 of 3 files at r8, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @michae2, and @nvanbenschoten)
pkg/sql/distsql_physical_planner.go line 3120 at r8 (raw file):
if joinReaderSpec.LockingStrength != descpb.ScanLockingStrength_FOR_NONE && planCtx.ExtendedEvalCtx.TxnIsoLevel != isolation.Serializable { splitter = span.MakeSplitterForSideEffect(n.table.desc, n.table.index, fetchOrdinals)
I'm happy with the current implementation, so please feel free to ignore, but one thought: we could just check that this condition is false before calling MakeSplitter, since a nil SplitFamilyIDs indicates that no splitting will be performed and all column families should be scanned. Then, we could even just get rid of the entire first commit, since we would be guaranteed to scan all column families, no need for the LockCols field or TableID duplication.
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 484 at r6 (raw file):
SELECT d FROM abcde WHERE a = 45 AND b = 46 FOR UPDATE ---- NULL
BTW, one of the things that makes this case interesting is that AFAIK we don't actually write a KV entry for entirely NULL column families other than family 0. So, if and when we decide to do something other than locking all families, we'll want to make sure the persistent locks interact well with that.
rytaft
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, @michae2, and @nvanbenschoten)
pkg/sql/opt/ops/mutation.opt line 378 at r8 (raw file):
[Private] define LockPrivate {
nit: This is all a bit confusing for someone without the context here. Can you add an example, maybe using the query above the Lock definition, and specify what Table, KeySource, KeyCols etc are in this case?
The new implementation of SELECT FOR UPDATE used for Read Committed isolation starts as a lock operator built from a table scan, which becomes a lookup join during execbuild. This lock operator was re-using the same table and column references as the original table scan, which caused the resulting lookup join to only lock the first column family of the table. Instead, we need to use fresh table and column references for the lock operator, so that the resulting lookup join is able to control which column families it locks. For now we choose to lock all column families, even if some column family would not normally be read by the select statement. We do this to more closely match Postgres behavior for statements like: ``` SELECT 1 FROM foo FOR UPDATE ``` Fixes: cockroachdb#115014 Release note (sql change): The new SELECT FOR UPDATE implementation used under Read Committed isolation (and under Serializable isolation when optimizer_use_lock_op_for_serializable is true) now locks all column families instead of only the first column family.
When determining which column families to read from an index for a set of requested columns, the span splitter has an optimization: it skips over the families for requested columns that are completely encoded in the index key, because these columns can be decoded from the key of *any* column family. For example, for the following query we do not need to read column families foo and bar: ``` CREATE TABLE abc ( a INT NOT NULL, b INT NOT NULL, c INT NOT NULL, PRIMARY KEY (a, b), FAMILY foo (a), FAMILY bar (b), FAMILY baz (c) ); SELECT * FROM abc WHERE a = 5 AND b = 6; ``` When the read has a side-effect, however (e.g. SELECT FOR UPDATE) we need to lock all of the requested column families. To do so, this commit adds a variant of `span.MakeSplitter` for locked reads, called `span.MakeSplitterForSideEffect`, which does not use this optimization. Fixes: cockroachdb#115014 Release note (sql change): Fix a bug where SELECT FOR UPDATE under Read Committed isolation on multi-column-family tables was not locking column families containing only key columns.
michae2
left a comment
There was a problem hiding this comment.
TFTRs! (Looks like @mgartner has drafts so I will wait.)
Also had a conversation with Nathan in slack about locking all column families.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)
pkg/sql/distsql_physical_planner.go line 3120 at r8 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I'm happy with the current implementation, so please feel free to ignore, but one thought: we could just check that this condition is false before calling
MakeSplitter, since a nilSplitFamilyIDsindicates that no splitting will be performed and all column families should be scanned. Then, we could even just get rid of the entire first commit, since we would be guaranteed to scan all column families, no need for theLockColsfield orTableIDduplication.
Interesting. This is a really good point, and not something I thought of.
I'm a little scared to do this, because by the time we're doing physical planning we don't know if the locking lookup join came from a SELECT FOR UPDATE statement or a FK check. I believe FK checks are still only locking column family 0, and at least back in the time of #42572 we were trying to preserve this behavior. I think using a nil SplitFamilyIDs would cause all locking lookup joins, including those used for FK checks, to lock all column families.
pkg/sql/opt/ops/mutation.opt line 378 at r8 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: This is all a bit confusing for someone without the context here. Can you add an example, maybe using the query above the
Lockdefinition, and specify whatTable,KeySource,KeyColsetc are in this case?
Done, lmk if that is helpful.
mgartner
left a comment
There was a problem hiding this comment.
Reviewed 1 of 6 files at r3, 2 of 2 files at r6, 4 of 7 files at r7, 3 of 3 files at r8, 7 of 7 files at r9, 6 of 6 files at r10, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @michae2, and @nvanbenschoten)
|
Thanks! bors r=DrewKimball,mgartner |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error getting backport branch release-23.1.12-rc.FROZEN: unexpected status code: 404 Not Found Backport to branch 23.1.12-rc.FROZEN failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
michae2
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale)
pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I think we should introduce a syntax something like
SELECT ... FOR UPDATE OF foo (a, b)to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.I like this idea.
Filed #116838 about this.
pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 484 at r6 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
BTW, one of the things that makes this case interesting is that AFAIK we don't actually write a KV entry for entirely NULL column families other than family 0. So, if and when we decide to do something other than locking all families, we'll want to make sure the persistent locks interact well with that.
Yes, and as it turns out this is still an issue. Filed #116836.
The span splitter has an optimization to skip reading column families whose columns are contained entirely in the index key. In cockroachdb#116170 we added a flag to the span splitter to ignore this optimization when the read has a side-effect (i.e. it's a locking read). The intention of cockroachdb#116170 was to ensure that SELECT FOR UPDATE statements under Read Committed isolation lock all requested column families. But the span splitter flag also applies to the implicit for-update locking added to mutation statements. This was not intentional. The implicit for-update locking added to mutation statements is not needed for correctness, but is supposed to be a lightweight optimization to prevent retries of racing write-write conflicts. The implicit for-update locking is already not durable. It's fine if it does not lock every requested column family. This commit changes the decision about whether to use the span splitter optimization to depend on the locking durability, rather than the isolation level. This extends the meaning of locking durability slightly, but I think it fits within the semantics of best effort = the requested lock might not actually be held. This should fix the YCSB regression under Read Committed isolation. Fixes: cockroachdb#143138 Release note: None
In cockroachdb#114401 we added some simple optimizations to push locking down into input operations for SELECT FOR UPDATE under Read Committed isolation. In cockroachdb#116170 we disabled these optimizations for multi-column-family tables because we weren't sure that all necessary column families would be locked. This commit enables the optimizations for multi-column-family tables, after confirming that all families needing to be locked are fetched by the input operation. Informs: cockroachdb#114566, cockroachdb#116838 Release note: None
In cockroachdb#116170 we added span.MakeSplitterForSideEffect which does not perform a column-family-skipping optimization. We were using this for locking lookup joins of multi-column-family tables, but forgot to use it in locking scans, locking index joins, and other places that could be performing a locked read of a multi-column-family table. This did not really matter until the previous commit, which made it more likely that we would use one of these other operations with locking on a multi-column-family table. Informs: cockroachdb#116838 Release note: None
The span splitter has an optimization to skip reading column families whose columns are contained entirely in the index key. In cockroachdb#116170 we added a flag to the span splitter to ignore this optimization when the read has a side-effect (i.e. it's a locking read). The intention of cockroachdb#116170 was to ensure that SELECT FOR UPDATE statements under Read Committed isolation lock all requested column families. But the span splitter flag also applies to the implicit for-update locking added to mutation statements. This was not intentional. The implicit for-update locking added to mutation statements is not needed for correctness, but is supposed to be a lightweight optimization to prevent retries of racing write-write conflicts. The implicit for-update locking is already not durable. It's fine if it does not lock every requested column family. This commit changes the decision about whether to use the span splitter optimization to depend on the locking durability, rather than the isolation level. This extends the meaning of locking durability slightly, but I think it fits within the semantics of best effort = the requested lock might not actually be held. This should fix the YCSB regression under Read Committed isolation. Fixes: cockroachdb#143138 Release note: None
In cockroachdb#116170 we added span.MakeSplitterForSideEffect which does not perform a column-family-skipping optimization. We were using this for locking lookup joins of multi-column-family tables, but forgot to use it in locking scans, locking index joins, and other places that could be performing a locked read of a multi-column-family table. This did not really matter until the previous commit, which made it more likely that we would use one of these other operations with locking on a multi-column-family table. Informs: cockroachdb#116838 Release note: None
The span splitter has an optimization to skip reading column families whose columns are contained entirely in the index key. In #116170 we added a flag to the span splitter to ignore this optimization when the read has a side-effect (i.e. it's a locking read). The intention of #116170 was to ensure that SELECT FOR UPDATE statements under Read Committed isolation lock all requested column families. But the span splitter flag also applies to the implicit for-update locking added to mutation statements. This was not intentional. The implicit for-update locking added to mutation statements is not needed for correctness, but is supposed to be a lightweight optimization to prevent retries of racing write-write conflicts. The implicit for-update locking is already not durable. It's fine if it does not lock every requested column family. This commit changes the decision about whether to use the span splitter optimization to depend on the locking durability, rather than the isolation level. This extends the meaning of locking durability slightly, but I think it fits within the semantics of best effort = the requested lock might not actually be held. This should fix the YCSB regression under Read Committed isolation. Fixes: #143138 Release note: None
In #114401 we added some simple optimizations to push locking down into input operations for SELECT FOR UPDATE under Read Committed isolation. In #116170 we disabled these optimizations for multi-column-family tables because we weren't sure that all necessary column families would be locked. This commit enables the optimizations for multi-column-family tables, after confirming that all families needing to be locked are fetched by the input operation. Informs: #114566, #116838 Release note: None
In #116170 we added span.MakeSplitterForSideEffect which does not perform a column-family-skipping optimization. We were using this for locking lookup joins of multi-column-family tables, but forgot to use it in locking scans, locking index joins, and other places that could be performing a locked read of a multi-column-family table. This did not really matter until the previous commit, which made it more likely that we would use one of these other operations with locking on a multi-column-family table. Informs: #116838 Release note: None
opt: lock all column families for new SELECT FOR UPDATE
The new implementation of SELECT FOR UPDATE used for Read Committed isolation starts as a lock operator built from a table scan, which becomes a lookup join during execbuild. This lock operator was re-using the same table and column references as the original table scan, which caused the resulting lookup join to only lock the first column family of the table.
Instead, we need to use fresh table and column references for the lock operator, so that the resulting lookup join is able to control which column families it locks.
For now we choose to lock all column families, even if some column family would not normally be read by the select statement. We do this to more closely match Postgres behavior for statements like:
Fixes: #115014
Release note (sql change): The new SELECT FOR UPDATE implementation used under Read Committed isolation (and under Serializable isolation when optimizer_use_lock_op_for_serializable is true) now locks all column families instead of only the first column family.
sql: do not skip key col families when performing a locked read
When determining which column families to read from an index for a set of requested columns, the span splitter has an optimization: it skips over the families for requested columns that are completely encoded in the index key, because these columns can be decoded from the key of any column family. For example, for the following query we do not need to read column families foo and bar:
When the read has a side-effect, however (e.g. SELECT FOR UPDATE) we need to lock all of the requested column families. To do so, this commit adds a variant of
span.MakeSplitterfor locked reads, calledspan.MakeSplitterForSideEffect, which does not use this optimization.Fixes: #115014
Release note (sql change): Fix a bug where SELECT FOR UPDATE under Read Committed isolation on multi-column-family tables was not locking column families containing only key columns.