Skip to content

sql: more precise use of MakeSplitterForSideEffect#144449

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
michae2:ycsb_regression
Apr 16, 2025
Merged

sql: more precise use of MakeSplitterForSideEffect#144449
craig[bot] merged 3 commits intocockroachdb:masterfrom
michae2:ycsb_regression

Conversation

@michae2
Copy link
Copy Markdown
Collaborator

@michae2 michae2 commented Apr 15, 2025

Make the usage of span.MakeSplitterForSideEffect more targeted, so that we use it when locking multi-column-family tables for SELECT FOR UPDATE statements but not for implicit for-update locking in mutation statements.

The first commit should fix the recent YCSB regression under Read Committed isolation. (See individual commits for details.)

Informs: #114566, #116838
Fixes: #143138

Release note: None

@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Apr 15, 2025

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

Only the first commit is needed to fix #143138, correct? Should we backport only that commit to 25.2 and give the others a chance to bake? Or do you think they are low risk?

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 4 of 4 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @michae2)


pkg/sql/opt/exec/execbuilder/mutation.go line 1349 at r2 (raw file):

	// locking-semi-LookupJoin as the enforcer of last resort.
	tab := b.mem.Metadata().Table(lock.Table)
	lockedFamilies := familiesForCols(tab, lock.LockCols)

nit: call this lockFamilies instead?


pkg/sql/opt/exec/execbuilder/mutation.go line 1353 at r2 (raw file):

	case *memo.ScanExpr:
		if input.Table == lock.KeySource && input.Index == cat.PrimaryIndex &&
			// Check that all needed column families would be locked.

nit: "needed" is a bit unclear. What we're checking here is that all the families that would be locked by the lock operator are also locked by the scan/index-join/lookup-join if the lock is push into them, correct?


pkg/sql/opt/exec/execbuilder/mutation.go line 1405 at r2 (raw file):

}

// colsNeedAllLockedFamilies checks that the families needed for cols include

nit: the name is a bit confusing. maybe colFamiliesContainLockFamilies?

michae2 added 2 commits April 15, 2025 15:45
The span splitter has an optimization to skip reading column families
whose columns are contained entirely in the index key. In cockroachdb#116170 we
added a flag to the span splitter to ignore this optimization when the
read has a side-effect (i.e. it's a locking read).

The intention of cockroachdb#116170 was to ensure that SELECT FOR UPDATE statements
under Read Committed isolation lock all requested column families. But
the span splitter flag also applies to the implicit for-update locking
added to mutation statements. This was not intentional. The implicit
for-update locking added to mutation statements is not needed for
correctness, but is supposed to be a lightweight optimization to prevent
retries of racing write-write conflicts. The implicit for-update locking
is already not durable. It's fine if it does not lock every requested
column family.

This commit changes the decision about whether to use the span splitter
optimization to depend on the locking durability, rather than the
isolation level. This extends the meaning of locking durability
slightly, but I think it fits within the semantics of best effort = the
requested lock might not actually be held.

This should fix the YCSB regression under Read Committed isolation.

Fixes: cockroachdb#143138

Release note: None
In cockroachdb#114401 we added some simple optimizations to push locking down into
input operations for SELECT FOR UPDATE under Read Committed
isolation. In cockroachdb#116170 we disabled these optimizations for
multi-column-family tables because we weren't sure that all necessary
column families would be locked.

This commit enables the optimizations for multi-column-family tables,
after confirming that all families needing to be locked are fetched by
the input operation.

Informs: cockroachdb#114566, cockroachdb#116838

Release note: None
@michae2 michae2 changed the title fix ycsb regression sql: more precise use of MakeSplitterForSideEffect Apr 16, 2025
@michae2 michae2 requested review from a team, DrewKimball and rytaft April 16, 2025 08:00
@michae2 michae2 marked this pull request as ready for review April 16, 2025 08:01
Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the second commit, added some tests, and wrote commit messages.

Only the first commit is needed to fix #143138, correct? Should we backport only that commit to 25.2 and give the others a chance to bake?

Yes, I think we should only backport the first commit to 25.2.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @DrewKimball, @mgartner, and @rytaft)


pkg/sql/opt/exec/execbuilder/mutation.go line 1349 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: call this lockFamilies instead?

Done.


pkg/sql/opt/exec/execbuilder/mutation.go line 1353 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: "needed" is a bit unclear. What we're checking here is that all the families that would be locked by the lock operator are also locked by the scan/index-join/lookup-join if the lock is push into them, correct?

Done.


pkg/sql/opt/exec/execbuilder/mutation.go line 1405 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: the name is a bit confusing. maybe colFamiliesContainLockFamilies?

Done.

Copy link
Copy Markdown
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice work!

Let's say we changed the splitter behavior so that if forSideEffect is true, we always say all column families are needed (just return NoopSplitter in that case). Then, could we always apply the lock operator inlining optimization?

Reviewed 6 of 6 files at r4, 3 of 3 files at r5, 5 of 5 files at r6, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/distsql_physical_planner.go line 3470 at r6 (raw file):

	if (joinReaderSpec.LockingStrength != descpb.ScanLockingStrength_FOR_NONE &&
		joinReaderSpec.LockingDurability == descpb.ScanLockingDurability_GUARANTEED) ||
		joinReaderSpec.LockingWaitPolicy != descpb.ScanLockingWaitPolicy_BLOCK {

nit: This stuff seems like it's begging for a helper function that takes in the strength, durability, and wait policy, and spits out "should use side-effect splitter".

@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented Apr 16, 2025

blathers backport 25.2

In cockroachdb#116170 we added span.MakeSplitterForSideEffect which does not
perform a column-family-skipping optimization. We were using this for
locking lookup joins of multi-column-family tables, but forgot to use it
in locking scans, locking index joins, and other places that could be
performing a locked read of a multi-column-family table.

This did not really matter until the previous commit, which made it more
likely that we would use one of these other operations with locking on a
multi-column-family table.

Informs: cockroachdb#116838

Release note: None
Copy link
Copy Markdown
Collaborator Author

@michae2 michae2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR! (get some sleep! 🙂)

Let's say we changed the splitter behavior so that if forSideEffect is true, we always say all column families are needed (just return NoopSplitter in that case). Then, could we always apply the lock operator inlining optimization?

Interesting, I had not thought of this. One thing that makes me hesitate is that assuming we do #52420 or #116838 one day, the requested columns might not actually cover every column family (though they currently do).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @rytaft)


pkg/sql/distsql_physical_planner.go line 3470 at r6 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

nit: This stuff seems like it's begging for a helper function that takes in the strength, durability, and wait policy, and spits out "should use side-effect splitter".

Good call. I made a helper for the places that use opt.Locking, and added a comment to these two spots in the physical planner.

@michae2 michae2 added the backport-25.2.x Flags PRs that need to be backported to 25.2 label Apr 16, 2025
@michae2
Copy link
Copy Markdown
Collaborator Author

michae2 commented Apr 16, 2025

bors r=DrewKimball

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 16, 2025

@craig craig bot merged commit 70b0bd8 into cockroachdb:master Apr 16, 2025
22 of 23 checks passed
@michae2 michae2 deleted the ycsb_regression branch April 16, 2025 20:05
Copy link
Copy Markdown
Contributor

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Thanks!

Reviewed 6 of 6 files at r4, 3 of 3 files at r5, 5 of 5 files at r7, 6 of 6 files at r8, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale)

michae2 added a commit to michae2/cockroach that referenced this pull request May 2, 2025
The testcases I added in cockroachdb#144449 depend on a large enough batch size to
match the expected tracing.

Fixes: cockroachdb#144830

Release note: None
craig bot pushed a commit that referenced this pull request May 6, 2025
145984: logictest: deflake recent addition to select_for_update r=yuzefovich a=michae2

The testcases I added in #144449 depend on a large enough batch size to match the expected tracing.

Fixes: #144830

Release note: None

146155: explain: deflake TestMaximumMemoryUsage for good r=yuzefovich a=yuzefovich

This commit reverts adeddb5 which attempted to deflake `TestMaximumMemoryUsage` which could fail under metamorphic randomization. Instead, this commit simply forces production values for the relevant batch sizes (I think `kv-batch-size` is the one to blame) because the previous attempt of introducing the retry loop could still flake.

Fixes: #146052.

Release note: None

Co-authored-by: Michael Erickson <michae2@cockroachlabs.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this pull request May 6, 2025
The testcases I added in #144449 depend on a large enough batch size to
match the expected tracing.

Fixes: #144830

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-25.2.x Flags PRs that need to be backported to 25.2 v25.3.0-prerelease

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: unknown regression in read committed benchmarks

4 participants