sql: use read ts as batch ts to send index backfill SSTs#64023
Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom Apr 22, 2021
Merged
sql: use read ts as batch ts to send index backfill SSTs#64023craig[bot] merged 1 commit intocockroachdb:masterfrom
craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
Member
8c636e6 to
68e92b5
Compare
ajwerner
approved these changes
Apr 22, 2021
Contributor
ajwerner
left a comment
There was a problem hiding this comment.
Reviewed 7 of 7 files at r1.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @pbardea)
Contributor
Author
|
TFTR! bors r+ |
This was referenced Apr 22, 2021
Contributor
|
Build failed (retrying...): |
This changes SSTs sent by index backfills to be sent with a batch timestamp equal to the timestamp that was used to read the rows from which the index entries in that SST were generated. This is done to ensure that if the index span has already GCed any keys related to more recent SQL writes the SST will be rejected. Adding the SST generated from older writes could otherwise accidentally re-add an entry after a later SQL delete's tombstone had been GC'ed, as described in cockroachdb#64014. Release note (bug fix): fix a theoretical issue in index backfills that could result in stale entries that would likely fail validation (cockroachdb#64014).
Contributor
|
Canceled. |
Contributor
Author
|
bors r+ |
Contributor
|
Build succeeded: |
rafiss
added a commit
to rafiss/cockroach
that referenced
this pull request
Mar 26, 2025
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was being worked on, which made it so that we use a fixed _write_ timestamp for the keys that are backfilled. This write timestamp is the actual minimum lower bound timestamp for the backfill operation, and the read timstamp is allowed to change, so it makes more sense to use the write timestamp for the AddSSTable operation. Looking at the diffs in the PRs makes it seem pretty likely that this was just missed as part of a trivial branch skew. This commit does not cause any behavior change, since at the time of writing, the read timestamp and write timestamp are always identical for the backfill. Release note: None
craig bot
pushed a commit
that referenced
this pull request
Mar 26, 2025
143270: kvserver: better obs in TestTxnReadWithinUncertaintyIntervalAfterRangeMerge r=tbg a=tbg Closes #143260. This is a complex test with a rare failure mode. Trace all of the relevant operations so that we can meaningfully engage with it. Epic: none Release note: none 143451: sql,backfill: avoid holding a protected timestamp during the backfill r=rafiss a=rafiss ### sql,rowexec: use WriteTS as timestamp for AddSSTable in backfill PR #64023 made it so we use the backfill's read TS as the batch TS for the AddSSTable operation. Simultaneously with that change, #63945 was being worked on, which made it so that we use a fixed _write_ timestamp for the keys that are backfilled. This write timestamp is the actual minimum lower bound timestamp for the backfill operation, and the read timstamp is allowed to change, so it makes more sense to use the write timestamp for the AddSSTable operation. Looking at the diffs in the PRs makes it seem pretty likely that this was just missed as part of a trivial branch skew. This commit does not cause any behavior change, since at the time of writing, the read timestamp and write timestamp are always identical for the backfill. Release note: None ### sql,backfill: choose new read timestamp when backfill job is resumed When #73861 got merged, we seem to have accidentally made it so the index backfiller will never select a new read timestamp. That is contrary to the goals of #63945, where we initially added the ability to advance the read timestamp forward. This mistake is why we were seeing behavior where a GC threshold error would cause the backfill to retry infinitely (which in turn led us to treat as a permanent error in #139203, which was something we never should have done). We never noticed the bug because when CREATE INDEX was added to the declarative schema changer in #92128, we turned off the declarative schema changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC). This commit makes it so we choose a current timestamp as the read timestamp when planning the backfill, and fixes up that test to show that this works for index backfills in the declarative schema changer. Release note (bug fix): Fixed a bug where a GC threshold error (which appears as "batch timestamp must be after replica GC threshold ...") could cause a schema change that backfills data to fail. Now, the error will cause the backfill to be retried at a higher timestamp to avoid the error. ### sql, backfill: use a current timestamp to scan each chunk of the source index There's no need for the backfill to use a fixed read timestamp for the entire job. Instead, each chunk of the original index that needs to be scanned can use a current timestamp. This allows us to remove all the logic for adding protected timestamps for the backfill. Release note (performance improvement): Schema changes that require data to be backfilled no longer hold a protected timestamp for the entire duration of the backfill, which means there is less overhead caused by MVCC garbage collection after the backfill completes. ### sql: stop setting readAsOf in index backfiller As of the parent commit, this is unused by the index backfiller. Release note: None --- fixes #140629 informs #142339 informs #142117 informs #141773 Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com> Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
rafiss
added a commit
to rafiss/cockroach
that referenced
this pull request
Mar 26, 2025
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was being worked on, which made it so that we use a fixed _write_ timestamp for the keys that are backfilled. This write timestamp is the actual minimum lower bound timestamp for the backfill operation, and the read timstamp is allowed to change, so it makes more sense to use the write timestamp for the AddSSTable operation. Looking at the diffs in the PRs makes it seem pretty likely that this was just missed as part of a trivial branch skew. This commit does not cause any behavior change, since at the time of writing, the read timestamp and write timestamp are always identical for the backfill. Release note: None
rafiss
added a commit
to rafiss/cockroach
that referenced
this pull request
Mar 27, 2025
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was being worked on, which made it so that we use a fixed _write_ timestamp for the keys that are backfilled. This write timestamp is the actual minimum lower bound timestamp for the backfill operation, and the read timstamp is allowed to change, so it makes more sense to use the write timestamp for the AddSSTable operation. Looking at the diffs in the PRs makes it seem pretty likely that this was just missed as part of a trivial branch skew. This commit does not cause any behavior change, since at the time of writing, the read timestamp and write timestamp are always identical for the backfill. Release note: None
rafiss
added a commit
to rafiss/cockroach
that referenced
this pull request
Mar 27, 2025
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was being worked on, which made it so that we use a fixed _write_ timestamp for the keys that are backfilled. This write timestamp is the actual minimum lower bound timestamp for the backfill operation, and the read timstamp is allowed to change, so it makes more sense to use the write timestamp for the AddSSTable operation. Looking at the diffs in the PRs makes it seem pretty likely that this was just missed as part of a trivial branch skew. This commit does not cause any behavior change, since at the time of writing, the read timestamp and write timestamp are always identical for the backfill. Release note: None
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This changes SSTs sent by index backfills to be sent with a batch
timestamp equal to the timestamp that was used to read the rows from
which the index entries in that SST were generated.
This is done to ensure that if the index span has already GCed any
keys related to more recent SQL writes the SST will be rejected. Adding
the SST generated from older writes could otherwise accidentally re-add
an entry after a later SQL delete's tombstone had been GC'ed, as described
in #64014.
Release note (bug fix): fix a theoretical issue in index backfills that could result in stale entries that would likely fail validation (#64014).