Skip to content

opt: lock all column families for new SELECT FOR UPDATE#116170

Merged
craig[bot] merged 2 commits intocockroachdb:masterfrom
michae2:lockcolfam
Dec 19, 2023
Merged

opt: lock all column families for new SELECT FOR UPDATE#116170
craig[bot] merged 2 commits intocockroachdb:masterfrom
michae2:lockcolfam

Conversation

@michae2
Copy link
Copy Markdown
Collaborator

@michae2 michae2 commented Dec 12, 2023

opt: lock all column families for new SELECT FOR UPDATE

The new implementation of SELECT FOR UPDATE used for Read Committed isolation starts as a lock operator built from a table scan, which becomes a lookup join during execbuild. This lock operator was re-using the same table and column references as the original table scan, which caused the resulting lookup join to only lock the first column family of the table.

Instead, we need to use fresh table and column references for the lock operator, so that the resulting lookup join is able to control which column families it locks.

For now we choose to lock all column families, even if some column family would not normally be read by the select statement. We do this to more closely match Postgres behavior for statements like:

SELECT 1 FROM foo FOR UPDATE

Fixes: #115014

Release note (sql change): The new SELECT FOR UPDATE implementation used under Read Committed isolation (and under Serializable isolation when optimizer_use_lock_op_for_serializable is true) now locks all column families instead of only the first column family.


sql: do not skip key col families when performing a locked read

When determining which column families to read from an index for a set of requested columns, the span splitter has an optimization: it skips over the families for requested columns that are completely encoded in the index key, because these columns can be decoded from the key of any column family. For example, for the following query we do not need to read column families foo and bar:

CREATE TABLE abc (
  a INT NOT NULL,
  b INT NOT NULL,
  c INT NOT NULL,
  PRIMARY KEY (a, b),
  FAMILY foo (a),
  FAMILY bar (b),
  FAMILY baz (c)
);
SELECT * FROM abc WHERE a = 5 AND b = 6;

When the read has a side-effect, however (e.g. SELECT FOR UPDATE) we need to lock all of the requested column families. To do so, this commit adds a variant of span.MakeSplitter for locked reads, called span.MakeSplitterForSideEffect, which does not use this optimization.

Fixes: #115014

Release note (sql change): Fix a bug where SELECT FOR UPDATE under Read Committed isolation on multi-column-family tables was not locking column families containing only key columns.

@michae2 michae2 requested review from DrewKimball and nvb December 12, 2023 07:43
@michae2 michae2 requested review from a team as code owners December 12, 2023 07:43
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice fix!

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)


pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):

	// In some simple cases we can push the locking down into the input operation
	// instead of adding what would be a redundant lookup join.

Does this optimization need to change to ensure that the input is scanning the same columns as are required by the lock?


pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

	switch input := lock.Input.(type) {
	case *memo.ScanExpr:
		if md.Table(input.Table) == md.Table(lock.Table) && input.Index == cat.PrimaryIndex {

[nit] I think this check is correct, but occasionally I've seen the catalog tables be different pointers even for the same table. I think it would be best to use md.Table(input.Table).ID() == md.Table(lock.Table).ID()


pkg/sql/opt/optbuilder/locking.go line 398 at r2 (raw file):

		oldColID := lb.table.ColumnID(i)
		newColID := newTabID.ColumnID(i)
		colMap.Set(int(oldColID), int(newColID))

[nit] ColMaps can have poor performance for queries with a lot of columns. Can we follow the example of DuplicateColumnIDs instead? Or better yet, call into a shared method on the factory that returns the new table and column IDs?

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)


pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 403 at r2 (raw file):


statement ok
CREATE TABLE abcde (a INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, FAMILY (a, b), FAMILY (c), FAMILY (d), FAMILY (e))

A case with a nullable column might be interesting. I think attempting to lock (only) families with id > 0 and all nullable columns will also lock family 0.

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @nvanbenschoten)


pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Does this optimization need to change to ensure that the input is scanning the same columns as are required by the lock?

Or at least, should we check that the input is scanning at least the columns that need to be locked?

@mgartner
Copy link
Copy Markdown
Contributor

pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] I think this check is correct, but occasionally I've seen the catalog tables be different pointers even for the same table. I think it would be best to use md.Table(input.Table).ID() == md.Table(lock.Table).ID()

cat.Table.ID() returns the stable ID of the underlying table in the schema, not the logic table in the plan. So in a query that joins a table to itself, could that cause problems here?

@mgartner
Copy link
Copy Markdown
Contributor

pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

cat.Table.ID() returns the stable ID of the underlying table in the schema, not the logic table in the plan. So in a query that joins a table to itself, could that cause problems here?

I suppose that's what we want here, since this code was comparing the opt.TableIDs before?

@mgartner mgartner requested a review from DrewKimball December 14, 2023 18:15
Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @nvanbenschoten)


pkg/sql/opt/optbuilder/locking.go line 409 at r2 (raw file):

		oldColID := inScope.cols[i].id
		if newColID, ok := colMap.Get(int(oldColID)); ok {
			cols.Add(opt.ColumnID(newColID))

Is cols supposed to include columns from both inScope and the new table?

Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs!

Drew's questions brought up a number of problems that I have been working on. Hold off on reviewing for a bit, will push today with fixes.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @DrewKimball and @nvanbenschoten)


pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Or at least, should we check that the input is scanning at least the columns that need to be locked?

Yes, great point, this optimization does now need to check columns.


pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I suppose that's what we want here, since this code was comparing the opt.TableIDs before?

Yes, exactly, a query that joins a table to itself now has a problem, which I am currently fixing.

@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented Dec 15, 2023

I have realized there's a problem with this approach: statements like SELECT 1 FROM foo FOR UPDATE won't lock any column families. For 23.2 I am going to switch to locking all column families, which will simplify this a lot, and maybe we can revisit this in 24.1.

Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I changed my mind, because locking individual column families seems really useful. I plan to make SELECT 1 FROM foo FOR UPDATE lock the first column family, as it does now.

I think this new version will work, but hold off on reviewing until I add more tests.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)

@michae2 michae2 force-pushed the lockcolfam branch 2 times, most recently from 41b362d to c025af6 Compare December 16, 2023 07:06
Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)


pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):

	// empty. In this case we will only lock the first column family.
	var lockCols opt.ColSet
	for _, col := range inScope.cols {

Hmm, ran into another quandary. This method of picking the columns to lock from the projection scope means that a statement like:

SELECT a + b FROM foo FOR UPDATE;

won't actually lock the column families of a and b, but will instead only lock the first column family, just as I'm doing for SELECT 1 FROM foo FOR UPDATE. Maybe even scarier is a statement like:

SELECT a::int FROM foo FOR UPDATE;

which really looks like it should lock column family a, but does not.

I think this is too subtle for people to work with safely. I'm going to go back to locking all column families.

@michae2 michae2 changed the title opt: lock correct column families for new SELECT FOR UPDATE opt: lock all column families for new SELECT FOR UPDATE Dec 18, 2023
Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed it to lock all column families.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)


pkg/sql/opt/exec/execbuilder/mutation.go line 1150 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Yes, great point, this optimization does now need to check columns.

I made the optimization only apply to single-column-family tables. This saves us from having to compare the columns produced by the input to the columns locked, because there's only one column family in play.


pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Yes, exactly, a query that joins a table to itself now has a problem, which I am currently fixing.

I switched back to comparing the exact opt.TableID (by saving it in the KeySource field of the LockPrivate).


pkg/sql/opt/optbuilder/locking.go line 398 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] ColMaps can have poor performance for queries with a lot of columns. Can we follow the example of DuplicateColumnIDs instead? Or better yet, call into a shared method on the factory that returns the new table and column IDs?

I ended up putting every column of the duplicated table into the set, instead of using the scope to pick, so the ColMap went away. (This is done to ensure we lock all column families.)


pkg/sql/opt/optbuilder/locking.go line 409 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

Is cols supposed to include columns from both inScope and the new table?

I added the LockCols field to make this more clear: Cols is essentially passthrough columns from inScope and LockCols has columns from the new table.


pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):

Previously, michae2 (Michael Erickson) wrote…

Hmm, ran into another quandary. This method of picking the columns to lock from the projection scope means that a statement like:

SELECT a + b FROM foo FOR UPDATE;

won't actually lock the column families of a and b, but will instead only lock the first column family, just as I'm doing for SELECT 1 FROM foo FOR UPDATE. Maybe even scarier is a statement like:

SELECT a::int FROM foo FOR UPDATE;

which really looks like it should lock column family a, but does not.

I think this is too subtle for people to work with safely. I'm going to go back to locking all column families.

I think we should introduce a syntax something like SELECT ... FOR UPDATE OF foo (a, b) to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.

@DrewKimball DrewKimball requested a review from mgartner December 18, 2023 09:17
Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Great job figuring all this out!

I think there's one last problem - when the primary key is fully specified (e.g. not looking up with a prefix), and a column family is fully contained in the key column set, we can skip scanning that family and perform get requests on the others. Example:

CREATE TABLE abc (a INT NOT NULL, b INT NOT NULL, c INT NOT NULL, PRIMARY KEY (a, b), FAMILY foo (a), FAMILY bar (b), FAMILY baz (c));
EXPLAIN ANALYZE (DEBUG) SELECT * FROM abc WHERE a = 5 AND b = 6;
--From plan.txt:
• scan
  columns: (a int, b int, c int)
  nodes: n1
  actual row count: 0
  vectorized batch count: 0
  KV time: 1ms
  KV contention time: 0µs
  KV rows decoded: 0
  KV pairs read: 0
  KV bytes read: 0 B
  KV gRPC calls: 1
  estimated max memory allocated: 20 KiB
  sql cpu time: 35µs
  MVCC step count (ext/int): 0/0
  MVCC seek count (ext/int): 1/1
  estimated row count: 1 (missing stats)
  table: abc@abc_pkey
  spans: /5/6/2/1

Note from the spans that only the baz column family was scanned. The span-splitting logic is called from DistSQLPlanner.createPlanForLookupJoin. We could potentially make that a separate fix - whatever's easiest.

Reviewed 1 of 6 files at r3, 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @mgartner, @michae2, and @nvanbenschoten)


pkg/sql/opt/exec/execbuilder/mutation.go line 1160 at r2 (raw file):

I switched back to comparing the exact opt.TableID (by saving it in the KeySource field of the LockPrivate).

SGTM. Seems best to limit the scope of the optimization for now.


pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):

I think we should introduce a syntax something like SELECT ... FOR UPDATE OF foo (a, b) to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.

I like this idea.


pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 403 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

A case with a nullable column might be interesting. I think attempting to lock (only) families with id > 0 and all nullable columns will also lock family 0.

A nullable case might still be interesting, but seems less important now.

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: modulo Drew's comment

Reviewed 4 of 6 files at r4, 3 of 3 files at r5, all commit messages.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @DrewKimball, @michae2, and @nvanbenschoten)


pkg/sql/opt/ops/mutation.opt line 384 at r5 (raw file):


    # KeySource identifies the source of the key columns.
    KeySource TableID

In a follow-up, consider adding some assertions about the interesting invariants of this expression in check_expr.go, like:

  • Assert that all columns in KeyCols are from the KeySource table.
  • Assert that all columns in LockCols are from Table.
  • Assert that Cols contains KeyCols.
  • Assert that Cols does not intersect LockCols.
  • Assert the order of KeyCols.

pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 400 at r5 (raw file):


statement ok
CREATE TABLE abcde (a INT NOT NULL PRIMARY KEY, b INT NOT NULL, c INT NOT NULL, d INT NOT NULL, e INT NOT NULL, FAMILY (a, b), FAMILY (c), FAMILY (d), FAMILY (e))

nit: The table would be easier to read if split on multiple lines.

@michae2 michae2 added backport-23.2.x PAST MAINTENANCE SUPPORT: 23.2 patch releases via ER request only backport-23.1.12-rc.FROZEN labels Dec 19, 2023
Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's one last problem - when the primary key is fully specified (e.g. not looking up with a prefix), and a column family is fully contained in the key column set, we can skip scanning that family and perform get requests on the others.

Wow, great catch!

I pushed a second commit fixing that behavior.

Interestingly, I was not able to come up with a scenario where this lack of a lock on family bar would cause a problem, because any change to b causes a delete and a put, which touch all column families. And we always have at least the first column family locked. But maybe someone else can come up with a scenario?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)


pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 403 at r2 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

A nullable case might still be interesting, but seems less important now.

Done. (column d)


pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 400 at r5 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: The table would be easier to read if split on multiple lines.

Done.

Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fix was a little too aggressive, and broke a test added by #42572 for FK checks on multi-column-family tables. Now I'm only modifying the behavior for lookup joins under read committed isolation.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)

@DrewKimball DrewKimball requested a review from mgartner December 19, 2023 10:26
Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Had some suggestions/comments, but I'm happy with this as-is if we want to merge.

Interestingly, I was not able to come up with a scenario where this lack of a lock on family bar would cause a problem, because any change to b causes a delete and a put, which touch all column families. And we always have at least the first column family locked. But maybe someone else can come up with a scenario?

That's a good point. Since the problem case requires the column to be part of the primary key, any mutations to that column involve all families automatically. So, maybe the only problematic case is two locking reads.

Reviewed 2 of 2 files at r6, 4 of 7 files at r7, 3 of 3 files at r8, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner, @michae2, and @nvanbenschoten)


pkg/sql/distsql_physical_planner.go line 3120 at r8 (raw file):

	if joinReaderSpec.LockingStrength != descpb.ScanLockingStrength_FOR_NONE &&
		planCtx.ExtendedEvalCtx.TxnIsoLevel != isolation.Serializable {
		splitter = span.MakeSplitterForSideEffect(n.table.desc, n.table.index, fetchOrdinals)

I'm happy with the current implementation, so please feel free to ignore, but one thought: we could just check that this condition is false before calling MakeSplitter, since a nil SplitFamilyIDs indicates that no splitting will be performed and all column families should be scanned. Then, we could even just get rid of the entire first commit, since we would be guaranteed to scan all column families, no need for the LockCols field or TableID duplication.


pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 484 at r6 (raw file):

SELECT d FROM abcde WHERE a = 45 AND b = 46 FOR UPDATE
----
NULL

BTW, one of the things that makes this case interesting is that AFAIK we don't actually write a KV entry for entirely NULL column families other than family 0. So, if and when we decide to do something other than locking all families, we'll want to make sure the persistent locks interact well with that.

@rytaft rytaft requested a review from DrewKimball December 19, 2023 15:03
Copy link
Copy Markdown
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @mgartner, @michae2, and @nvanbenschoten)


pkg/sql/opt/ops/mutation.opt line 378 at r8 (raw file):


[Private]
define LockPrivate {

nit: This is all a bit confusing for someone without the context here. Can you add an example, maybe using the query above the Lock definition, and specify what Table, KeySource, KeyCols etc are in this case?

The new implementation of SELECT FOR UPDATE used for Read Committed
isolation starts as a lock operator built from a table scan, which
becomes a lookup join during execbuild. This lock operator was re-using
the same table and column references as the original table scan, which
caused the resulting lookup join to only lock the first column family of
the table.

Instead, we need to use fresh table and column references for the lock
operator, so that the resulting lookup join is able to control which
column families it locks.

For now we choose to lock all column families, even if some column
family would not normally be read by the select statement. We do this to
more closely match Postgres behavior for statements like:

```
SELECT 1 FROM foo FOR UPDATE
```

Fixes: cockroachdb#115014

Release note (sql change): The new SELECT FOR UPDATE implementation used
under Read Committed isolation (and under Serializable isolation when
optimizer_use_lock_op_for_serializable is true) now locks all column
families instead of only the first column family.
When determining which column families to read from an index for a set
of requested columns, the span splitter has an optimization: it skips
over the families for requested columns that are completely encoded in
the index key, because these columns can be decoded from the key of
*any* column family. For example, for the following query we do not need
to read column families foo and bar:

```
CREATE TABLE abc (
  a INT NOT NULL,
  b INT NOT NULL,
  c INT NOT NULL,
  PRIMARY KEY (a, b),
  FAMILY foo (a),
  FAMILY bar (b),
  FAMILY baz (c)
);
SELECT * FROM abc WHERE a = 5 AND b = 6;
```

When the read has a side-effect, however (e.g. SELECT FOR UPDATE) we
need to lock all of the requested column families. To do so, this commit
adds a variant of `span.MakeSplitter` for locked reads, called
`span.MakeSplitterForSideEffect`, which does not use this optimization.

Fixes: cockroachdb#115014

Release note (sql change): Fix a bug where SELECT FOR UPDATE under Read
Committed isolation on multi-column-family tables was not locking column
families containing only key columns.
Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTRs! (Looks like @mgartner has drafts so I will wait.)

Also had a conversation with Nathan in slack about locking all column families.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @DrewKimball, @mgartner, and @nvanbenschoten)


pkg/sql/distsql_physical_planner.go line 3120 at r8 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I'm happy with the current implementation, so please feel free to ignore, but one thought: we could just check that this condition is false before calling MakeSplitter, since a nil SplitFamilyIDs indicates that no splitting will be performed and all column families should be scanned. Then, we could even just get rid of the entire first commit, since we would be guaranteed to scan all column families, no need for the LockCols field or TableID duplication.

Interesting. This is a really good point, and not something I thought of.

I'm a little scared to do this, because by the time we're doing physical planning we don't know if the locking lookup join came from a SELECT FOR UPDATE statement or a FK check. I believe FK checks are still only locking column family 0, and at least back in the time of #42572 we were trying to preserve this behavior. I think using a nil SplitFamilyIDs would cause all locking lookup joins, including those used for FK checks, to lock all column families.


pkg/sql/opt/ops/mutation.opt line 378 at r8 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: This is all a bit confusing for someone without the context here. Can you add an example, maybe using the query above the Lock definition, and specify what Table, KeySource, KeyCols etc are in this case?

Done, lmk if that is helpful.

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 6 files at r3, 2 of 2 files at r6, 4 of 7 files at r7, 3 of 3 files at r8, 7 of 7 files at r9, 6 of 6 files at r10, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball, @michae2, and @nvanbenschoten)

@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented Dec 19, 2023

Thanks!

bors r=DrewKimball,mgartner

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 19, 2023

Build succeeded:

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Dec 19, 2023

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error getting backport branch release-23.1.12-rc.FROZEN: unexpected status code: 404 Not Found

Backport to branch 23.1.12-rc.FROZEN failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale)


pkg/sql/opt/optbuilder/locking.go line 405 at r4 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I think we should introduce a syntax something like SELECT ... FOR UPDATE OF foo (a, b) to make it unambiguous which column families are being locked, instead of trying to rely on the select clause.

I like this idea.

Filed #116838 about this.


pkg/sql/logictest/testdata/logic_test/select_for_update_read_committed line 484 at r6 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

BTW, one of the things that makes this case interesting is that AFAIK we don't actually write a KV entry for entirely NULL column families other than family 0. So, if and when we decide to do something other than locking all families, we'll want to make sure the persistent locks interact well with that.

Yes, and as it turns out this is still an issue. Filed #116836.

michae2 added a commit to michae2/cockroach that referenced this pull request Apr 16, 2025
The span splitter has an optimization to skip reading column families
whose columns are contained entirely in the index key. In cockroachdb#116170 we
added a flag to the span splitter to ignore this optimization when the
read has a side-effect (i.e. it's a locking read).

The intention of cockroachdb#116170 was to ensure that SELECT FOR UPDATE statements
under Read Committed isolation lock all requested column families. But
the span splitter flag also applies to the implicit for-update locking
added to mutation statements. This was not intentional. The implicit
for-update locking added to mutation statements is not needed for
correctness, but is supposed to be a lightweight optimization to prevent
retries of racing write-write conflicts. The implicit for-update locking
is already not durable. It's fine if it does not lock every requested
column family.

This commit changes the decision about whether to use the span splitter
optimization to depend on the locking durability, rather than the
isolation level. This extends the meaning of locking durability
slightly, but I think it fits within the semantics of best effort = the
requested lock might not actually be held.

This should fix the YCSB regression under Read Committed isolation.

Fixes: cockroachdb#143138

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Apr 16, 2025
In cockroachdb#114401 we added some simple optimizations to push locking down into
input operations for SELECT FOR UPDATE under Read Committed
isolation. In cockroachdb#116170 we disabled these optimizations for
multi-column-family tables because we weren't sure that all necessary
column families would be locked.

This commit enables the optimizations for multi-column-family tables,
after confirming that all families needing to be locked are fetched by
the input operation.

Informs: cockroachdb#114566, cockroachdb#116838

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Apr 16, 2025
In cockroachdb#116170 we added span.MakeSplitterForSideEffect which does not
perform a column-family-skipping optimization. We were using this for
locking lookup joins of multi-column-family tables, but forgot to use it
in locking scans, locking index joins, and other places that could be
performing a locked read of a multi-column-family table.

This did not really matter until the previous commit, which made it more
likely that we would use one of these other operations with locking on a
multi-column-family table.

Informs: cockroachdb#116838

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Apr 16, 2025
The span splitter has an optimization to skip reading column families
whose columns are contained entirely in the index key. In cockroachdb#116170 we
added a flag to the span splitter to ignore this optimization when the
read has a side-effect (i.e. it's a locking read).

The intention of cockroachdb#116170 was to ensure that SELECT FOR UPDATE statements
under Read Committed isolation lock all requested column families. But
the span splitter flag also applies to the implicit for-update locking
added to mutation statements. This was not intentional. The implicit
for-update locking added to mutation statements is not needed for
correctness, but is supposed to be a lightweight optimization to prevent
retries of racing write-write conflicts. The implicit for-update locking
is already not durable. It's fine if it does not lock every requested
column family.

This commit changes the decision about whether to use the span splitter
optimization to depend on the locking durability, rather than the
isolation level. This extends the meaning of locking durability
slightly, but I think it fits within the semantics of best effort = the
requested lock might not actually be held.

This should fix the YCSB regression under Read Committed isolation.

Fixes: cockroachdb#143138

Release note: None
michae2 added a commit to michae2/cockroach that referenced this pull request Apr 16, 2025
In cockroachdb#116170 we added span.MakeSplitterForSideEffect which does not
perform a column-family-skipping optimization. We were using this for
locking lookup joins of multi-column-family tables, but forgot to use it
in locking scans, locking index joins, and other places that could be
performing a locked read of a multi-column-family table.

This did not really matter until the previous commit, which made it more
likely that we would use one of these other operations with locking on a
multi-column-family table.

Informs: cockroachdb#116838

Release note: None
blathers-crl bot pushed a commit that referenced this pull request Apr 16, 2025
The span splitter has an optimization to skip reading column families
whose columns are contained entirely in the index key. In #116170 we
added a flag to the span splitter to ignore this optimization when the
read has a side-effect (i.e. it's a locking read).

The intention of #116170 was to ensure that SELECT FOR UPDATE statements
under Read Committed isolation lock all requested column families. But
the span splitter flag also applies to the implicit for-update locking
added to mutation statements. This was not intentional. The implicit
for-update locking added to mutation statements is not needed for
correctness, but is supposed to be a lightweight optimization to prevent
retries of racing write-write conflicts. The implicit for-update locking
is already not durable. It's fine if it does not lock every requested
column family.

This commit changes the decision about whether to use the span splitter
optimization to depend on the locking durability, rather than the
isolation level. This extends the meaning of locking durability
slightly, but I think it fits within the semantics of best effort = the
requested lock might not actually be held.

This should fix the YCSB regression under Read Committed isolation.

Fixes: #143138

Release note: None
blathers-crl bot pushed a commit that referenced this pull request Apr 16, 2025
In #114401 we added some simple optimizations to push locking down into
input operations for SELECT FOR UPDATE under Read Committed
isolation. In #116170 we disabled these optimizations for
multi-column-family tables because we weren't sure that all necessary
column families would be locked.

This commit enables the optimizations for multi-column-family tables,
after confirming that all families needing to be locked are fetched by
the input operation.

Informs: #114566, #116838

Release note: None
blathers-crl bot pushed a commit that referenced this pull request Apr 16, 2025
In #116170 we added span.MakeSplitterForSideEffect which does not
perform a column-family-skipping optimization. We were using this for
locking lookup joins of multi-column-family tables, but forgot to use it
in locking scans, locking index joins, and other places that could be
performing a locked read of a multi-column-family table.

This did not really matter until the previous commit, which made it more
likely that we would use one of these other operations with locking on a
multi-column-family table.

Informs: #116838

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-23.2.x PAST MAINTENANCE SUPPORT: 23.2 patch releases via ER request only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql: SELECT FOR UPDATE of single column family not locked properly under read committed isolation

5 participants