Skip to content

sql: use read ts as batch ts to send index backfill SSTs#64023

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
dt:index-sst-batch-ts
Apr 22, 2021
Merged

sql: use read ts as batch ts to send index backfill SSTs#64023
craig[bot] merged 1 commit intocockroachdb:masterfrom
dt:index-sst-batch-ts

Conversation

@dt
Copy link
Copy Markdown
Contributor

@dt dt commented Apr 21, 2021

This changes SSTs sent by index backfills to be sent with a batch
timestamp equal to the timestamp that was used to read the rows from
which the index entries in that SST were generated.

This is done to ensure that if the index span has already GCed any
keys related to more recent SQL writes the SST will be rejected. Adding
the SST generated from older writes could otherwise accidentally re-add
an entry after a later SQL delete's tombstone had been GC'ed, as described
in #64014.

Release note (bug fix): fix a theoretical issue in index backfills that could result in stale entries that would likely fail validation (#64014).

@dt dt requested review from ajwerner and pbardea April 21, 2021 22:14
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@dt dt force-pushed the index-sst-batch-ts branch 3 times, most recently from 8c636e6 to 68e92b5 Compare April 22, 2021 03:15
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @pbardea)

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Apr 22, 2021

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 22, 2021

Build failed (retrying...):

This changes SSTs sent by index backfills to be sent with a batch
timestamp equal to the timestamp that was used to read the rows from
which the index entries in that SST were generated.

This is done to ensure that if the index span has already GCed any
keys related to more recent SQL writes the SST will be rejected. Adding
the SST generated from older writes could otherwise accidentally re-add
an entry after a later SQL delete's tombstone had been GC'ed, as described
in cockroachdb#64014.

Release note (bug fix): fix a theoretical issue in index backfills that could result in stale entries that would likely fail validation (cockroachdb#64014).
@dt dt force-pushed the index-sst-batch-ts branch from 68e92b5 to f8d44df Compare April 22, 2021 04:43
@dt dt requested a review from a team April 22, 2021 04:43
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 22, 2021

Canceled.

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Apr 22, 2021

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Apr 22, 2021

Build succeeded:

@craig craig bot merged commit ad0a1be into cockroachdb:master Apr 22, 2021
@dt dt deleted the index-sst-batch-ts branch April 22, 2021 13:53
rafiss added a commit to rafiss/cockroach that referenced this pull request Mar 26, 2025
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was
being worked on, which made it so that we use a fixed _write_ timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.

This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.

Release note: None
craig bot pushed a commit that referenced this pull request Mar 26, 2025
143270: kvserver: better obs in TestTxnReadWithinUncertaintyIntervalAfterRangeMerge r=tbg a=tbg

Closes #143260.

This is a complex test with a rare failure mode. Trace all of the relevant operations so that we can meaningfully engage with it.

Epic: none
Release note: none

143451: sql,backfill: avoid holding a protected timestamp during the backfill r=rafiss a=rafiss

### sql,rowexec: use WriteTS as timestamp for AddSSTable in backfill

PR #64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, #63945 was
being worked on, which made it so that we use a fixed _write_ timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.

This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.

Release note: None

### sql,backfill: choose new read timestamp when backfill job is resumed

When #73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of #63945, where we initially added the ability to advance the
read timestamp forward.

This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in #139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in #92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).

This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.

Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.

### sql, backfill: use a current timestamp to scan each chunk of the source index

There's no need for the backfill to use a fixed read timestamp for the
entire job. Instead, each chunk of the original index that needs to be
scanned can use a current timestamp. This allows us to remove all the
logic for adding protected timestamps for the backfill.

Release note (performance improvement): Schema changes that require
data to be backfilled no longer hold a protected timestamp for the
entire duration of the backfill, which means there is less overhead
caused by MVCC garbage collection after the backfill completes.

### sql: stop setting readAsOf in index backfiller

As of the parent commit, this is unused by the index backfiller.

Release note: None

---

fixes #140629
informs #142339
informs #142117
informs #141773

Co-authored-by: Tobias Grieger <tobias.b.grieger@gmail.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
rafiss added a commit to rafiss/cockroach that referenced this pull request Mar 26, 2025
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was
being worked on, which made it so that we use a fixed _write_ timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.

This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.

Release note: None
rafiss added a commit to rafiss/cockroach that referenced this pull request Mar 27, 2025
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was
being worked on, which made it so that we use a fixed _write_ timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.

This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.

Release note: None
rafiss added a commit to rafiss/cockroach that referenced this pull request Mar 27, 2025
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was
being worked on, which made it so that we use a fixed _write_ timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.

This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants