Skip to content

sql,backfill: avoid holding a protected timestamp during the backfill#143451

Merged
craig[bot] merged 4 commits intocockroachdb:masterfrom
rafiss:backfill-avoid-long-pts
Mar 26, 2025
Merged

sql,backfill: avoid holding a protected timestamp during the backfill#143451
craig[bot] merged 4 commits intocockroachdb:masterfrom
rafiss:backfill-avoid-long-pts

Conversation

@rafiss
Copy link
Copy Markdown
Collaborator

@rafiss rafiss commented Mar 25, 2025

sql,rowexec: use WriteTS as timestamp for AddSSTable in backfill

PR #64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, #63945 was
being worked on, which made it so that we use a fixed write timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.

This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.

Release note: None

sql,backfill: choose new read timestamp when backfill job is resumed

When #73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of #63945, where we initially added the ability to advance the
read timestamp forward.

This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in #139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in #92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).

This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.

Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.

sql, backfill: use a current timestamp to scan each chunk of the source index

There's no need for the backfill to use a fixed read timestamp for the
entire job. Instead, each chunk of the original index that needs to be
scanned can use a current timestamp. This allows us to remove all the
logic for adding protected timestamps for the backfill.

Release note (performance improvement): Schema changes that require
data to be backfilled no longer hold a protected timestamp for the
entire duration of the backfill, which means there is less overhead
caused by MVCC garbage collection after the backfill completes.

sql: stop setting readAsOf in index backfiller

As of the parent commit, this is unused by the index backfiller.

Release note: None


fixes #140629
informs #142339
informs #142117
informs #141773

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@rafiss rafiss changed the title Backfill avoid long pts sql,backfill: avoid holding a protected timestamp during the backfill Mar 25, 2025
@rafiss rafiss marked this pull request as ready for review March 25, 2025 21:13
@rafiss rafiss requested review from a team as code owners March 25, 2025 21:13
@rafiss rafiss requested review from DrewKimball, annrpom, dt and fqazi and removed request for a team March 25, 2025 21:13
@rafiss
Copy link
Copy Markdown
Collaborator Author

rafiss commented Mar 25, 2025

Copy link
Copy Markdown
Collaborator

@fqazi fqazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: Great work!

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 3 of 3 files at r3, 4 of 4 files at r4, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DrewKimball, and @dt)


pkg/sql/rowexec/indexbackfiller.go line 200 at r4 (raw file):

		MaxBufferSize:            maxBufferSize,
		SkipDuplicates:           ib.ContainsInvertedIndex(),
		BatchTimestamp:           ib.spec.WriteAsOf,

I wonder if it is safe to move the write timestamp forward was well between batches, which might lead to less merges on long running backfills.

rafiss added 4 commits March 26, 2025 11:07
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was
being worked on, which made it so that we use a fixed _write_ timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.

This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.

Release note: None
When cockroachdb#73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of cockroachdb#63945, where we initially added the ability to advance the
read timestamp forward.

This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in cockroachdb#139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in cockroachdb#92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).

This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.

Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.
…ce index

There's no need for the backfill to use a fixed read timestamp for the
entire job. Instead, each chunk of the original index that needs to be
scanned can use a current timestamp. This allows us to remove all the
logic for adding protected timestamps for the backfill.

Release note (performance improvement): Schema changes that require
data to be backfilled no longer hold a protected timestamp for the
entire duration of the backfill, which means there is less overhead
caused by MVCC garbage collection after the backfill completes.
As of the parent commit, this is unused by the index backfiller.

Release note: None
@rafiss rafiss force-pushed the backfill-avoid-long-pts branch from 109a389 to bae95e3 Compare March 26, 2025 15:07
Copy link
Copy Markdown
Collaborator Author

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tftr!

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DrewKimball, @dt, and @fqazi)


pkg/sql/rowexec/indexbackfiller.go line 200 at r4 (raw file):

Previously, fqazi (Faizan Qazi) wrote…

I wonder if it is safe to move the write timestamp forward was well between batches, which might lead to less merges on long running backfills.

One consideration with that is advancing the write timestamp means that we'd need to re-scan the partially built index to push the timestamp cache. That was the original motivation of #63945. As the index gets larger, that scan gets more expensive. It still could be worth experimenting with this idea though, since it's a tradeoff with making merge cheaper like you pointed out.

@rafiss
Copy link
Copy Markdown
Collaborator Author

rafiss commented Mar 26, 2025

Adding backport labels opportunistically to evaluate riskiness, since the behavior of treating the GC threshold error as permanent can be disruptive. Regardless, I'll let this bake on master for some time first.

@rafiss rafiss added backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.3.x Flags PRs that need to be backported to 24.3 backport-25.1.x labels Mar 26, 2025
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 26, 2025

@craig craig bot merged commit 1c93e8a into cockroachdb:master Mar 26, 2025
24 checks passed
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Mar 26, 2025

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from bae95e3 to blathers/backport-release-24.1-143451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.1.x failed. See errors above.


error creating merge commit from bae95e3 to blathers/backport-release-24.3-143451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 24.3.x failed. See errors above.


error creating merge commit from bae95e3 to blathers/backport-release-25.1-143451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 25.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@dt
Copy link
Copy Markdown
Contributor

dt commented Mar 28, 2025

If we have no PTS at all now, is there anything to guarantee that we can make progress in one batch if the gc TTL is very short and a single batch takes longer to generate?

I see we’re backporting this so I’m just curious if we’ve sketched out all the possible failure modes and ramifications to be 100% certain this cannot regress any cases that currently work in order to qualify for backport.

@rafiss rafiss deleted the backfill-avoid-long-pts branch April 15, 2025 21:54
@cockroachdb cockroachdb deleted a comment from blathers-crl bot May 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-24.1.x Flags PRs that need to be backported to 24.1. backport-24.3.x Flags PRs that need to be backported to 24.3 backport-failed v25.2.0-prerelease

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql/schemachanger: backfills should not use protected timestamp for their entire life span

4 participants