opt: lock all column families for new SELECT FOR UPDATE by michae2 · Pull Request #116170 · cockroachdb/cockroach

michae2 · 2023-12-12T07:43:52Z

opt: lock all column families for new SELECT FOR UPDATE

The new implementation of SELECT FOR UPDATE used for Read Committed isolation starts as a lock operator built from a table scan, which becomes a lookup join during execbuild. This lock operator was re-using the same table and column references as the original table scan, which caused the resulting lookup join to only lock the first column family of the table.

Instead, we need to use fresh table and column references for the lock operator, so that the resulting lookup join is able to control which column families it locks.

For now we choose to lock all column families, even if some column family would not normally be read by the select statement. We do this to more closely match Postgres behavior for statements like:

SELECT 1 FROM foo FOR UPDATE

Fixes: #115014

Release note (sql change): The new SELECT FOR UPDATE implementation used under Read Committed isolation (and under Serializable isolation when optimizer_use_lock_op_for_serializable is true) now locks all column families instead of only the first column family.

sql: do not skip key col families when performing a locked read

When determining which column families to read from an index for a set of requested columns, the span splitter has an optimization: it skips over the families for requested columns that are completely encoded in the index key, because these columns can be decoded from the key of any column family. For example, for the following query we do not need to read column families foo and bar:

CREATE TABLE abc (
  a INT NOT NULL,
  b INT NOT NULL,
  c INT NOT NULL,
  PRIMARY KEY (a, b),
  FAMILY foo (a),
  FAMILY bar (b),
  FAMILY baz (c)
);
SELECT * FROM abc WHERE a = 5 AND b = 6;

When the read has a side-effect, however (e.g. SELECT FOR UPDATE) we need to lock all of the requested column families. To do so, this commit adds a variant of span.MakeSplitter for locked reads, called span.MakeSplitterForSideEffect, which does not use this optimization.

Fixes: #115014

Release note (sql change): Fix a bug where SELECT FOR UPDATE under Read Committed isolation on multi-column-family tables was not locking column families containing only key columns.

cockroach-teamcity · 2023-12-12T07:44:02Z

This change is

DrewKimball

Nice fix!

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)

pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):

	// In some simple cases we can push the locking down into the input operation
	// instead of adding what would be a redundant lookup join.

Does this optimization need to change to ensure that the input is scanning the same columns as are required by the lock?

pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

	switch input := lock.Input.(type) {
	case *memo.ScanExpr:
		if md.Table(input.Table) == md.Table(lock.Table) && input.Index == cat.PrimaryIndex {

[nit] I think this check is correct, but occasionally I've seen the catalog tables be different pointers even for the same table. I think it would be best to use md.Table(input.Table).ID() == md.Table(lock.Table).ID()

pkg/sql/opt/optbuilder/locking.go line 398 at r2 (raw file):

		oldColID := lb.table.ColumnID(i)
		newColID := newTabID.ColumnID(i)
		colMap.Set(int(oldColID), int(newColID))

[nit] ColMaps can have poor performance for queries with a lot of columns. Can we follow the example of DuplicateColumnIDs instead? Or better yet, call into a shared method on the factory that returns the new table and column IDs?

DrewKimball

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)

pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 403 at r2 (raw file):


statement ok
CREATE TABLE abcde (a INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, FAMILY (a, b), FAMILY (c), FAMILY (d), FAMILY (e))

A case with a nullable column might be interesting. I think attempting to lock (only) families with id > 0 and all nullable columns will also lock family 0.

DrewKimball

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)

pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Does this optimization need to change to ensure that the input is scanning the same columns as are required by the lock?

Or at least, should we check that the input is scanning at least the columns that need to be locked?

mgartner · 2023-12-14T18:07:55Z

pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] I think this check is correct, but occasionally I've seen the catalog tables be different pointers even for the same table. I think it would be best to use md.Table(input.Table).ID() == md.Table(lock.Table).ID()

cat.Table.ID() returns the stable ID of the underlying table in the schema, not the logic table in the plan. So in a query that joins a table to itself, could that cause problems here?

mgartner · 2023-12-14T18:09:58Z

pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

cat.Table.ID() returns the stable ID of the underlying table in the schema, not the logic table in the plan. So in a query that joins a table to itself, could that cause problems here?

I suppose that's what we want here, since this code was comparing the opt.TableIDs before?

mgartner

Reviewed 1 of 1 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @nvanbenschoten)

pkg/sql/opt/optbuilder/locking.go line 409 at r2 (raw file):

		oldColID := inScope.cols[i].id
		if newColID, ok := colMap.Get(int(oldColID)); ok {
			cols.Add(opt.ColumnID(newColID))

Is cols supposed to include columns from both inScope and the new table?

michae2

TFTRs!

Drew's questions brought up a number of problems that I have been working on. Hold off on reviewing for a bit, will push today with fixes.

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @nvanbenschoten)

pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Or at least, should we check that the input is scanning at least the columns that need to be locked?

Yes, great point, this optimization does now need to check columns.

pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I suppose that's what we want here, since this code was comparing the opt.TableIDs before?

Yes, exactly, a query that joins a table to itself now has a problem, which I am currently fixing.

michae2 · 2023-12-15T00:55:02Z

I have realized there's a problem with this approach: statements like SELECT 1 FROM foo FOR UPDATE won't lock any column families. For 23.2 I am going to switch to locking all column families, which will simplify this a lot, and maybe we can revisit this in 24.1.

michae2

Actually, I changed my mind, because locking individual column families seems really useful. I plan to make SELECT 1 FROM foo FOR UPDATE lock the first column family, as it does now.

I think this new version will work, but hold off on reviewing until I add more tests.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)

michae2

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)

pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):

	// empty. In this case we will only lock the first column family.
	var lockCols opt.ColSet
	for _, col := range inScope.cols {

Hmm, ran into another quandary. This method of picking the columns to lock from the projection scope means that a statement like:

SELECT a + b FROM foo FOR UPDATE;

won't actually lock the column families of a and b, but will instead only lock the first column family, just as I'm doing for SELECT 1 FROM foo FOR UPDATE. Maybe even scarier is a statement like:

SELECT a::int FROM foo FOR UPDATE;

which really looks like it should lock column family a, but does not.

I think this is too subtle for people to work with safely. I'm going to go back to locking all column families.

michae2

Changed it to lock all column families.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)

pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Yes, great point, this optimization does now need to check columns.

I made the optimization only apply to single-column-family tables. This saves us from having to compare the columns produced by the input to the columns locked, because there's only one column family in play.

pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Yes, exactly, a query that joins a table to itself now has a problem, which I am currently fixing.

I switched back to comparing the exact opt.TableID (by saving it in the KeySource field of the LockPrivate).

pkg/sql/opt/optbuilder/locking.go line 398 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] ColMaps can have poor performance for queries with a lot of columns. Can we follow the example of DuplicateColumnIDs instead? Or better yet, call into a shared method on the factory that returns the new table and column IDs?

I ended up putting every column of the duplicated table into the set, instead of using the scope to pick, so the ColMap went away. (This is done to ensure we lock all column families.)

pkg/sql/opt/optbuilder/locking.go line 409 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is cols supposed to include columns from both inScope and the new table?

I added the LockCols field to make this more clear: Cols is essentially passthrough columns from inScope and LockCols has columns from the new table.

pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Hmm, ran into another quandary. This method of picking the columns to lock from the projection scope means that a statement like:
SELECT a + b FROM foo FOR UPDATE;
won't actually lock the column families of a and b, but will instead only lock the first column family, just as I'm doing for SELECT 1 FROM foo FOR UPDATE. Maybe even scarier is a statement like:
SELECT a::int FROM foo FOR UPDATE;
which really looks like it should lock column family a, but does not.

I think this is too subtle for people to work with safely. I'm going to go back to locking all column families.

I think we should introduce a syntax something like SELECT ... FOR UPDATE OF foo (a, b) to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.

DrewKimball

Great job figuring all this out!

I think there's one last problem - when the primary key is fully specified (e.g. not looking up with a prefix), and a column family is fully contained in the key column set, we can skip scanning that family and perform get requests on the others. Example:

CREATE TABLE abc (a INT NOT NULL, b INT NOT NULL, c INT NOT NULL, PRIMARY KEY (a, b), FAMILY foo (a), FAMILY bar (b), FAMILY baz (c));
EXPLAIN ANALYZE (DEBUG) SELECT * FROM abc WHERE a = 5 AND b = 6;
--From plan.txt:
• scan
  columns: (a int, b int, c int)
  nodes: n1
  actual row count: 0
  vectorized batch count: 0
  KV time: 1ms
  KV contention time: 0µs
  KV rows decoded: 0
  KV pairs read: 0
  KV bytes read: 0 B
  KV gRPC calls: 1
  estimated max memory allocated: 20 KiB
  sql cpu time: 35µs
  MVCC step count (ext/int): 0/0
  MVCC seek count (ext/int): 1/1
  estimated row count: 1 (missing stats)
  table: abc@abc_pkey
  spans: /5/6/2/1

Note from the spans that only the baz column family was scanned. The span-splitting logic is called from DistSQLPlanner.createPlanForLookupJoin. We could potentially make that a separate fix - whatever's easiest.

Reviewed 1 of 6 files at r3, 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @nvanbenschoten)

pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

I switched back to comparing the exact opt.TableID (by saving it in the KeySource field of the LockPrivate).

SGTM. Seems best to limit the scope of the optimization for now.

pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):

I think we should introduce a syntax something like SELECT ... FOR UPDATE OF foo (a, b) to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.

I like this idea.

pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 403 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

A case with a nullable column might be interesting. I think attempting to lock (only) families with id > 0 and all nullable columns will also lock family 0.

A nullable case might still be interesting, but seems less important now.

mgartner

modulo Drew's comment

Reviewed 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @nvanbenschoten)

pkg/sql/opt/ops/mutation.opt line 384 at r5 (raw file):


    # KeySource identifies the source of the key columns.
    KeySource TableID

In a follow-up, consider adding some assertions about the interesting invariants of this expression in check_expr.go, like:

Assert that all columns in KeyCols are from the KeySource table.
Assert that all columns in LockCols are from Table.
Assert that Cols contains KeyCols.
Assert that Cols does not intersect LockCols.
Assert the order of KeyCols.

pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 400 at r5 (raw file):


statement ok
CREATE TABLE abcde (a INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, FAMILY (a, b), FAMILY (c), FAMILY (d), FAMILY (e))

nit: The table would be easier to read if split on multiple lines.

michae2

I think there's one last problem - when the primary key is fully specified (e.g. not looking up with a prefix), and a column family is fully contained in the key column set, we can skip scanning that family and perform get requests on the others.

Wow, great catch!

I pushed a second commit fixing that behavior.

Interestingly, I was not able to come up with a scenario where this lack of a lock on family bar would cause a problem, because any change to b causes a delete and a put, which touch all column families. And we always have at least the first column family locked. But maybe someone else can come up with a scenario?

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)

pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 403 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

A nullable case might still be interesting, but seems less important now.

Done. (column d)

pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 400 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: The table would be easier to read if split on multiple lines.

Done.

michae2

My fix was a little too aggressive, and broke a test added by #42572 for FK checks on multi-column-family tables. Now I'm only modifying the behavior for lookup joins under read committed isolation.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)

DrewKimball

Had some suggestions/comments, but I'm happy with this as-is if we want to merge.

Interestingly, I was not able to come up with a scenario where this lack of a lock on family bar would cause a problem, because any change to b causes a delete and a put, which touch all column families. And we always have at least the first column family locked. But maybe someone else can come up with a scenario?

That's a good point. Since the problem case requires the column to be part of the primary key, any mutations to that column involve all families automatically. So, maybe the only problematic case is two locking reads.

Reviewed 2 of 2 files at r6, 4 of 7 files at r7, 3 of 3 files at r8, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @michae2, and @nvanbenschoten)

pkg/sql/distsql_physical_planner.go line 3120 at r8 (raw file):

	if joinReaderSpec.LockingStrength != descpb.ScanLockingStrength_FOR_NONE &&
		planCtx.ExtendedEvalCtx.TxnIsoLevel != isolation.Serializable {
		splitter = span.MakeSplitterForSideEffect(n.table.desc, n.table.index, fetchOrdinals)

I'm happy with the current implementation, so please feel free to ignore, but one thought: we could just check that this condition is false before calling MakeSplitter, since a nil SplitFamilyIDs indicates that no splitting will be performed and all column families should be scanned. Then, we could even just get rid of the entire first commit, since we would be guaranteed to scan all column families, no need for the LockCols field or TableID duplication.

pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 484 at r6 (raw file):

SELECT d FROM abcde WHERE a = 45 AND b = 46 FOR UPDATE
----
NULL

BTW, one of the things that makes this case interesting is that AFAIK we don't actually write a KV entry for entirely NULL column families other than family 0. So, if and when we decide to do something other than locking all families, we'll want to make sure the persistent locks interact well with that.

rytaft

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, @michae2, and @nvanbenschoten)

pkg/sql/opt/ops/mutation.opt line 378 at r8 (raw file):


[Private]
define LockPrivate {

nit: This is all a bit confusing for someone without the context here. Can you add an example, maybe using the query above the Lock definition, and specify what Table, KeySource, KeyCols etc are in this case?

The new implementation of SELECT FOR UPDATE used for Read Committed isolation starts as a lock operator built from a table scan, which becomes a lookup join during execbuild. This lock operator was re-using the same table and column references as the original table scan, which caused the resulting lookup join to only lock the first column family of the table. Instead, we need to use fresh table and column references for the lock operator, so that the resulting lookup join is able to control which column families it locks. For now we choose to lock all column families, even if some column family would not normally be read by the select statement. We do this to more closely match Postgres behavior for statements like: ``` SELECT 1 FROM foo FOR UPDATE ``` Fixes: cockroachdb#115014 Release note (sql change): The new SELECT FOR UPDATE implementation used under Read Committed isolation (and under Serializable isolation when optimizer_use_lock_op_for_serializable is true) now locks all column families instead of only the first column family.

When determining which column families to read from an index for a set of requested columns, the span splitter has an optimization: it skips over the families for requested columns that are completely encoded in the index key, because these columns can be decoded from the key of *any* column family. For example, for the following query we do not need to read column families foo and bar: ``` CREATE TABLE abc ( a INT NOT NULL, b INT NOT NULL, c INT NOT NULL, PRIMARY KEY (a, b), FAMILY foo (a), FAMILY bar (b), FAMILY baz (c) ); SELECT * FROM abc WHERE a = 5 AND b = 6; ``` When the read has a side-effect, however (e.g. SELECT FOR UPDATE) we need to lock all of the requested column families. To do so, this commit adds a variant of `span.MakeSplitter` for locked reads, called `span.MakeSplitterForSideEffect`, which does not use this optimization. Fixes: cockroachdb#115014 Release note (sql change): Fix a bug where SELECT FOR UPDATE under Read Committed isolation on multi-column-family tables was not locking column families containing only key columns.

michae2

TFTRs! (Looks like @mgartner has drafts so I will wait.)

Also had a conversation with Nathan in slack about locking all column families.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)

pkg/sql/distsql_physical_planner.go line 3120 at r8 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I'm happy with the current implementation, so please feel free to ignore, but one thought: we could just check that this condition is false before calling MakeSplitter, since a nil SplitFamilyIDs indicates that no splitting will be performed and all column families should be scanned. Then, we could even just get rid of the entire first commit, since we would be guaranteed to scan all column families, no need for the LockCols field or TableID duplication.

Interesting. This is a really good point, and not something I thought of.

I'm a little scared to do this, because by the time we're doing physical planning we don't know if the locking lookup join came from a SELECT FOR UPDATE statement or a FK check. I believe FK checks are still only locking column family 0, and at least back in the time of #42572 we were trying to preserve this behavior. I think using a nil SplitFamilyIDs would cause all locking lookup joins, including those used for FK checks, to lock all column families.

pkg/sql/opt/ops/mutation.opt line 378 at r8 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: This is all a bit confusing for someone without the context here. Can you add an example, maybe using the query above the Lock definition, and specify what Table, KeySource, KeyCols etc are in this case?

Done, lmk if that is helpful.

mgartner

Reviewed 1 of 6 files at r3, 2 of 2 files at r6, 4 of 7 files at r7, 3 of 3 files at r8, 7 of 7 files at r9, 6 of 6 files at r10, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @michae2, and @nvanbenschoten)

michae2 · 2023-12-19T21:11:14Z

Thanks!

bors r=DrewKimball,mgartner

craig · 2023-12-19T22:24:14Z

Build succeeded:

Bazel Essential CI (Cockroach)

blathers-crl · 2023-12-19T22:24:36Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error getting backport branch release-23.1.12-rc.FROZEN: unexpected status code: 404 Not Found

Backport to branch 23.1.12-rc.FROZEN failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

michae2

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale)

pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I think we should introduce a syntax something like SELECT ... FOR UPDATE OF foo (a, b) to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.

I like this idea.

Filed #116838 about this.

pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 484 at r6 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

BTW, one of the things that makes this case interesting is that AFAIK we don't actually write a KV entry for entirely NULL column families other than family 0. So, if and when we decide to do something other than locking all families, we'll want to make sure the persistent locks interact well with that.

Yes, and as it turns out this is still an issue. Filed #116836.

The span splitter has an optimization to skip reading column families whose columns are contained entirely in the index key. In cockroachdb#116170 we added a flag to the span splitter to ignore this optimization when the read has a side-effect (i.e. it's a locking read). The intention of cockroachdb#116170 was to ensure that SELECT FOR UPDATE statements under Read Committed isolation lock all requested column families. But the span splitter flag also applies to the implicit for-update locking added to mutation statements. This was not intentional. The implicit for-update locking added to mutation statements is not needed for correctness, but is supposed to be a lightweight optimization to prevent retries of racing write-write conflicts. The implicit for-update locking is already not durable. It's fine if it does not lock every requested column family. This commit changes the decision about whether to use the span splitter optimization to depend on the locking durability, rather than the isolation level. This extends the meaning of locking durability slightly, but I think it fits within the semantics of best effort = the requested lock might not actually be held. This should fix the YCSB regression under Read Committed isolation. Fixes: cockroachdb#143138 Release note: None

In cockroachdb#114401 we added some simple optimizations to push locking down into input operations for SELECT FOR UPDATE under Read Committed isolation. In cockroachdb#116170 we disabled these optimizations for multi-column-family tables because we weren't sure that all necessary column families would be locked. This commit enables the optimizations for multi-column-family tables, after confirming that all families needing to be locked are fetched by the input operation. Informs: cockroachdb#114566, cockroachdb#116838 Release note: None

In cockroachdb#116170 we added span.MakeSplitterForSideEffect which does not perform a column-family-skipping optimization. We were using this for locking lookup joins of multi-column-family tables, but forgot to use it in locking scans, locking index joins, and other places that could be performing a locked read of a multi-column-family table. This did not really matter until the previous commit, which made it more likely that we would use one of these other operations with locking on a multi-column-family table. Informs: cockroachdb#116838 Release note: None

The span splitter has an optimization to skip reading column families whose columns are contained entirely in the index key. In cockroachdb#116170 we added a flag to the span splitter to ignore this optimization when the read has a side-effect (i.e. it's a locking read). The intention of cockroachdb#116170 was to ensure that SELECT FOR UPDATE statements under Read Committed isolation lock all requested column families. But the span splitter flag also applies to the implicit for-update locking added to mutation statements. This was not intentional. The implicit for-update locking added to mutation statements is not needed for correctness, but is supposed to be a lightweight optimization to prevent retries of racing write-write conflicts. The implicit for-update locking is already not durable. It's fine if it does not lock every requested column family. This commit changes the decision about whether to use the span splitter optimization to depend on the locking durability, rather than the isolation level. This extends the meaning of locking durability slightly, but I think it fits within the semantics of best effort = the requested lock might not actually be held. This should fix the YCSB regression under Read Committed isolation. Fixes: cockroachdb#143138 Release note: None

In cockroachdb#116170 we added span.MakeSplitterForSideEffect which does not perform a column-family-skipping optimization. We were using this for locking lookup joins of multi-column-family tables, but forgot to use it in locking scans, locking index joins, and other places that could be performing a locked read of a multi-column-family table. This did not really matter until the previous commit, which made it more likely that we would use one of these other operations with locking on a multi-column-family table. Informs: cockroachdb#116838 Release note: None

The span splitter has an optimization to skip reading column families whose columns are contained entirely in the index key. In #116170 we added a flag to the span splitter to ignore this optimization when the read has a side-effect (i.e. it's a locking read). The intention of #116170 was to ensure that SELECT FOR UPDATE statements under Read Committed isolation lock all requested column families. But the span splitter flag also applies to the implicit for-update locking added to mutation statements. This was not intentional. The implicit for-update locking added to mutation statements is not needed for correctness, but is supposed to be a lightweight optimization to prevent retries of racing write-write conflicts. The implicit for-update locking is already not durable. It's fine if it does not lock every requested column family. This commit changes the decision about whether to use the span splitter optimization to depend on the locking durability, rather than the isolation level. This extends the meaning of locking durability slightly, but I think it fits within the semantics of best effort = the requested lock might not actually be held. This should fix the YCSB regression under Read Committed isolation. Fixes: #143138 Release note: None

In #114401 we added some simple optimizations to push locking down into input operations for SELECT FOR UPDATE under Read Committed isolation. In #116170 we disabled these optimizations for multi-column-family tables because we weren't sure that all necessary column families would be locked. This commit enables the optimizations for multi-column-family tables, after confirming that all families needing to be locked are fetched by the input operation. Informs: #114566, #116838 Release note: None

In #116170 we added span.MakeSplitterForSideEffect which does not perform a column-family-skipping optimization. We were using this for locking lookup joins of multi-column-family tables, but forgot to use it in locking scans, locking index joins, and other places that could be performing a locked read of a multi-column-family table. This did not really matter until the previous commit, which made it more likely that we would use one of these other operations with locking on a multi-column-family table. Informs: #116838 Release note: None

michae2 requested review from DrewKimball and nvb December 12, 2023 07:43

michae2 requested review from a team as code owners December 12, 2023 07:43

DrewKimball approved these changes Dec 12, 2023

View reviewed changes

DrewKimball reviewed Dec 12, 2023

View reviewed changes

mgartner requested a review from DrewKimball December 14, 2023 18:15

mgartner reviewed Dec 14, 2023

View reviewed changes

michae2 commented Dec 14, 2023

View reviewed changes

michae2 force-pushed the lockcolfam branch from e80aca6 to ab4b7fc Compare December 15, 2023 07:06

michae2 commented Dec 15, 2023

View reviewed changes

michae2 force-pushed the lockcolfam branch 2 times, most recently from 41b362d to c025af6 Compare December 16, 2023 07:06

michae2 commented Dec 16, 2023

View reviewed changes

michae2 force-pushed the lockcolfam branch from c025af6 to 29380c9 Compare December 18, 2023 06:46

michae2 changed the title ~~opt: lock correct column families for new SELECT FOR UPDATE~~ opt: lock all column families for new SELECT FOR UPDATE Dec 18, 2023

michae2 commented Dec 18, 2023

View reviewed changes

DrewKimball requested a review from mgartner December 18, 2023 09:17

DrewKimball approved these changes Dec 18, 2023

View reviewed changes

mgartner reviewed Dec 18, 2023

View reviewed changes

michae2 force-pushed the lockcolfam branch from 29380c9 to f215488 Compare December 19, 2023 01:30

michae2 added backport-23.2.x PAST MAINTENANCE SUPPORT: 23.2 patch releases via ER request only backport-23.1.12-rc.FROZEN labels Dec 19, 2023

michae2 commented Dec 19, 2023

View reviewed changes

DrewKimball requested a review from mgartner December 19, 2023 10:26

DrewKimball approved these changes Dec 19, 2023

View reviewed changes

rytaft requested a review from DrewKimball December 19, 2023 15:03

rytaft reviewed Dec 19, 2023

View reviewed changes

michae2 added 2 commits December 19, 2023 10:24

michae2 force-pushed the lockcolfam branch from 40d497a to 2ed7abe Compare December 19, 2023 18:25

michae2 commented Dec 19, 2023

View reviewed changes

mgartner approved these changes Dec 19, 2023

View reviewed changes

craig bot merged commit 06f0cf2 into cockroachdb:master Dec 19, 2023

This was referenced Dec 19, 2023

release-23.2: opt: lock all column families for new SELECT FOR UPDATE #116826

Merged

release-23.2.0-rc: opt: lock all column families for new SELECT FOR UPDATE #116828

Merged

michae2 deleted the lockcolfam branch December 19, 2023 22:45

michae2 removed the backport-23.1.12-rc.FROZEN label Dec 19, 2023

This was referenced Dec 20, 2023

sql: SELECT FOR UPDATE on multi-column-family tables under read committed does not lock completely-NULL column families #116836

Open

sql: lock only necessary column families from SELECT FOR UPDATE #116838

Open

michae2 commented Dec 20, 2023

View reviewed changes

michae2 mentioned this pull request Apr 16, 2025

release-25.2: sql: only use side-effect splitter if locking durability = guaranteed #144551

Merged

Conversation

michae2 commented Dec 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cockroach-teamcity commented Dec 12, 2023

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner commented Dec 14, 2023

Uh oh!

mgartner commented Dec 14, 2023

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

michae2 left a comment

Choose a reason for hiding this comment

Uh oh!

michae2 commented Dec 15, 2023

Uh oh!

michae2 left a comment

Choose a reason for hiding this comment

Uh oh!

michae2 left a comment

Choose a reason for hiding this comment

Uh oh!

michae2 left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

michae2 left a comment

Choose a reason for hiding this comment

Uh oh!

michae2 left a comment

Choose a reason for hiding this comment

Uh oh!

DrewKimball left a comment

Choose a reason for hiding this comment

Uh oh!

rytaft left a comment

Choose a reason for hiding this comment

Uh oh!

michae2 left a comment

Choose a reason for hiding this comment

Uh oh!

mgartner left a comment

Choose a reason for hiding this comment

Uh oh!

michae2 commented Dec 19, 2023

Uh oh!

craig bot commented Dec 19, 2023

Uh oh!

blathers-crl bot commented Dec 19, 2023

Uh oh!

michae2 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

michae2 commented Dec 12, 2023 •

edited

Loading