sql,backfill: avoid holding a protected timestamp during the backfill by rafiss · Pull Request #143451 · cockroachdb/cockroach

rafiss · 2025-03-25T20:02:08Z

sql,rowexec: use WriteTS as timestamp for AddSSTable in backfill

PR #64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, #63945 was
being worked on, which made it so that we use a fixed write timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.

This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.

Release note: None

sql,backfill: choose new read timestamp when backfill job is resumed

When #73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of #63945, where we initially added the ability to advance the
read timestamp forward.

This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in #139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in #92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).

This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.

Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.

sql, backfill: use a current timestamp to scan each chunk of the source index

There's no need for the backfill to use a fixed read timestamp for the
entire job. Instead, each chunk of the original index that needs to be
scanned can use a current timestamp. This allows us to remove all the
logic for adding protected timestamps for the backfill.

Release note (performance improvement): Schema changes that require
data to be backfilled no longer hold a protected timestamp for the
entire duration of the backfill, which means there is less overhead
caused by MVCC garbage collection after the backfill completes.

sql: stop setting readAsOf in index backfiller

As of the parent commit, this is unused by the index backfiller.

Release note: None

fixes #140629
informs #142339
informs #142117
informs #141773

cockroach-teamcity · 2025-03-25T20:02:26Z

This change is

rafiss · 2025-03-25T21:14:48Z

Running roachtests on this PR here: https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestNightlyGceBazel/19220281?buildTab=overview&hideProblemsFromDependencies=false&hideTestsFromDependencies=false&expandBuildChangesSection=true#%2Fschemachange%2Fbulkingest%2Frun_1%2Fartifacts.zip

fqazi

Great work!

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 3 of 3 files at r3, 4 of 4 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DrewKimball, and @dt)

pkg/sql/rowexec/indexbackfiller.go line 200 at r4 (raw file):

		MaxBufferSize:            maxBufferSize,
		SkipDuplicates:           ib.ContainsInvertedIndex(),
		BatchTimestamp:           ib.spec.WriteAsOf,

I wonder if it is safe to move the write timestamp forward was well between batches, which might lead to less merges on long running backfills.

PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was being worked on, which made it so that we use a fixed _write_ timestamp for the keys that are backfilled. This write timestamp is the actual minimum lower bound timestamp for the backfill operation, and the read timstamp is allowed to change, so it makes more sense to use the write timestamp for the AddSSTable operation. Looking at the diffs in the PRs makes it seem pretty likely that this was just missed as part of a trivial branch skew. This commit does not cause any behavior change, since at the time of writing, the read timestamp and write timestamp are always identical for the backfill. Release note: None

When cockroachdb#73861 got merged, we seem to have accidentally made it so the index backfiller will never select a new read timestamp. That is contrary to the goals of cockroachdb#63945, where we initially added the ability to advance the read timestamp forward. This mistake is why we were seeing behavior where a GC threshold error would cause the backfill to retry infinitely (which in turn led us to treat as a permanent error in cockroachdb#139203, which was something we never should have done). We never noticed the bug because when CREATE INDEX was added to the declarative schema changer in cockroachdb#92128, we turned off the declarative schema changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC). This commit makes it so we choose a current timestamp as the read timestamp when planning the backfill, and fixes up that test to show that this works for index backfills in the declarative schema changer. Release note (bug fix): Fixed a bug where a GC threshold error (which appears as "batch timestamp must be after replica GC threshold ...") could cause a schema change that backfills data to fail. Now, the error will cause the backfill to be retried at a higher timestamp to avoid the error.

…ce index There's no need for the backfill to use a fixed read timestamp for the entire job. Instead, each chunk of the original index that needs to be scanned can use a current timestamp. This allows us to remove all the logic for adding protected timestamps for the backfill. Release note (performance improvement): Schema changes that require data to be backfilled no longer hold a protected timestamp for the entire duration of the backfill, which means there is less overhead caused by MVCC garbage collection after the backfill completes.

As of the parent commit, this is unused by the index backfiller. Release note: None

rafiss

tftr!

bors r+

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DrewKimball, @dt, and @fqazi)

pkg/sql/rowexec/indexbackfiller.go line 200 at r4 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

I wonder if it is safe to move the write timestamp forward was well between batches, which might lead to less merges on long running backfills.

One consideration with that is advancing the write timestamp means that we'd need to re-scan the partially built index to push the timestamp cache. That was the original motivation of #63945. As the index gets larger, that scan gets more expensive. It still could be worth experimenting with this idea though, since it's a tradeoff with making merge cheaper like you pointed out.

rafiss · 2025-03-26T18:03:09Z

Adding backport labels opportunistically to evaluate riskiness, since the behavior of treating the GC threshold error as permanent can be disruptive. Regardless, I'll let this bake on master for some time first.

craig · 2025-03-26T18:56:06Z

Build succeeded:

blathers-crl · 2025-03-26T18:56:40Z

Encountered an error creating backports. Some common things that can go wrong:

The backport branch might have already existed.
There was a merge conflict.
The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.

error creating merge commit from bae95e3 to blathers/backport-release-24.1-143451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.1.x failed. See errors above.

error creating merge commit from bae95e3 to blathers/backport-release-24.3-143451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.3.x failed. See errors above.

error creating merge commit from bae95e3 to blathers/backport-release-25.1-143451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 25.1.x failed. See errors above.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

dt · 2025-03-28T19:26:43Z

If we have no PTS at all now, is there anything to guarantee that we can make progress in one batch if the gc TTL is very short and a single batch takes longer to generate?

I see we’re backporting this so I’m just curious if we’ve sketched out all the possible failure modes and ramifications to be 100% certain this cannot regress any cases that currently work in order to qualify for backport.

rafiss changed the title ~~Backfill avoid long pts~~ sql,backfill: avoid holding a protected timestamp during the backfill Mar 25, 2025

rafiss marked this pull request as ready for review March 25, 2025 21:13

rafiss requested review from a team as code owners March 25, 2025 21:13

rafiss requested review from DrewKimball, annrpom, dt and fqazi and removed request for a team March 25, 2025 21:13

fqazi approved these changes Mar 26, 2025

View reviewed changes

rafiss added 4 commits March 26, 2025 11:07

sql: stop setting readAsOf in index backfiller

bae95e3

As of the parent commit, this is unused by the index backfiller. Release note: None

rafiss force-pushed the backfill-avoid-long-pts branch from 109a389 to bae95e3 Compare March 26, 2025 15:07

rafiss commented Mar 26, 2025

View reviewed changes

rafiss added backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.3.x Flags PRs that need to be backported to 24.3 backport-25.1.x labels Mar 26, 2025

craig bot merged commit 1c93e8a into cockroachdb:master Mar 26, 2025
24 checks passed

celeste-cockroachdb bot added the target-release-25.2.0 label Mar 26, 2025

blathers-crl bot mentioned this pull request Mar 26, 2025

sql/schemachanger: backfills should not use protected timestamp for their entire life span #140629

Closed

blathers-crl bot added the backport-failed label Mar 26, 2025

celeste-cockroachdb bot added v25.2.0-prerelease and removed target-release-25.2.0 labels Apr 7, 2025

rafiss deleted the backfill-avoid-long-pts branch April 15, 2025 21:54

cockroachdb deleted a comment from blathers-crl bot May 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql,backfill: avoid holding a protected timestamp during the backfill#143451

sql,backfill: avoid holding a protected timestamp during the backfill#143451
craig[bot] merged 4 commits intocockroachdb:masterfrom
rafiss:backfill-avoid-long-pts

rafiss commented Mar 25, 2025 •

edited

Loading

Uh oh!

cockroach-teamcity commented Mar 25, 2025

Uh oh!

rafiss commented Mar 25, 2025 •

edited

Loading

Uh oh!

fqazi left a comment

Uh oh!

rafiss left a comment

Uh oh!

rafiss commented Mar 26, 2025 •

edited

Loading

Uh oh!

craig bot commented Mar 26, 2025

Uh oh!

Uh oh!

blathers-crl bot commented Mar 26, 2025

Uh oh!

dt commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

rafiss commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

sql,rowexec: use WriteTS as timestamp for AddSSTable in backfill

sql,backfill: choose new read timestamp when backfill job is resumed

sql, backfill: use a current timestamp to scan each chunk of the source index

sql: stop setting readAsOf in index backfiller

Uh oh!

cockroach-teamcity commented Mar 25, 2025

Uh oh!

rafiss commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fqazi left a comment

Choose a reason for hiding this comment

Uh oh!

rafiss left a comment

Choose a reason for hiding this comment

Uh oh!

rafiss commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

craig bot commented Mar 26, 2025

Uh oh!

Uh oh!

blathers-crl bot commented Mar 26, 2025

Uh oh!

dt commented Mar 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

rafiss commented Mar 25, 2025 •

edited

Loading

rafiss commented Mar 25, 2025 •

edited

Loading

rafiss commented Mar 26, 2025 •

edited

Loading