sql,backfill: avoid holding a protected timestamp during the backfill#143451
sql,backfill: avoid holding a protected timestamp during the backfill#143451craig[bot] merged 4 commits intocockroachdb:masterfrom
Conversation
fqazi
left a comment
There was a problem hiding this comment.
Reviewed 1 of 1 files at r1, 3 of 3 files at r2, 3 of 3 files at r3, 4 of 4 files at r4, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DrewKimball, and @dt)
pkg/sql/rowexec/indexbackfiller.go line 200 at r4 (raw file):
MaxBufferSize: maxBufferSize, SkipDuplicates: ib.ContainsInvertedIndex(), BatchTimestamp: ib.spec.WriteAsOf,
I wonder if it is safe to move the write timestamp forward was well between batches, which might lead to less merges on long running backfills.
PR cockroachdb#64023 made it so we use the backfill's read TS as the batch TS for the AddSSTable operation. Simultaneously with that change, cockroachdb#63945 was being worked on, which made it so that we use a fixed _write_ timestamp for the keys that are backfilled. This write timestamp is the actual minimum lower bound timestamp for the backfill operation, and the read timstamp is allowed to change, so it makes more sense to use the write timestamp for the AddSSTable operation. Looking at the diffs in the PRs makes it seem pretty likely that this was just missed as part of a trivial branch skew. This commit does not cause any behavior change, since at the time of writing, the read timestamp and write timestamp are always identical for the backfill. Release note: None
When cockroachdb#73861 got merged, we seem to have accidentally made it so the index backfiller will never select a new read timestamp. That is contrary to the goals of cockroachdb#63945, where we initially added the ability to advance the read timestamp forward. This mistake is why we were seeing behavior where a GC threshold error would cause the backfill to retry infinitely (which in turn led us to treat as a permanent error in cockroachdb#139203, which was something we never should have done). We never noticed the bug because when CREATE INDEX was added to the declarative schema changer in cockroachdb#92128, we turned off the declarative schema changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC). This commit makes it so we choose a current timestamp as the read timestamp when planning the backfill, and fixes up that test to show that this works for index backfills in the declarative schema changer. Release note (bug fix): Fixed a bug where a GC threshold error (which appears as "batch timestamp must be after replica GC threshold ...") could cause a schema change that backfills data to fail. Now, the error will cause the backfill to be retried at a higher timestamp to avoid the error.
…ce index There's no need for the backfill to use a fixed read timestamp for the entire job. Instead, each chunk of the original index that needs to be scanned can use a current timestamp. This allows us to remove all the logic for adding protected timestamps for the backfill. Release note (performance improvement): Schema changes that require data to be backfilled no longer hold a protected timestamp for the entire duration of the backfill, which means there is less overhead caused by MVCC garbage collection after the backfill completes.
As of the parent commit, this is unused by the index backfiller. Release note: None
109a389 to
bae95e3
Compare
rafiss
left a comment
There was a problem hiding this comment.
tftr!
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @annrpom, @DrewKimball, @dt, and @fqazi)
pkg/sql/rowexec/indexbackfiller.go line 200 at r4 (raw file):
Previously, fqazi (Faizan Qazi) wrote…
I wonder if it is safe to move the write timestamp forward was well between batches, which might lead to less merges on long running backfills.
One consideration with that is advancing the write timestamp means that we'd need to re-scan the partially built index to push the timestamp cache. That was the original motivation of #63945. As the index gets larger, that scan gets more expensive. It still could be worth experimenting with this idea though, since it's a tradeoff with making merge cheaper like you pointed out.
|
Adding backport labels opportunistically to evaluate riskiness, since the behavior of treating the GC threshold error as permanent can be disruptive. Regardless, I'll let this bake on master for some time first. |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from bae95e3 to blathers/backport-release-24.1-143451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.1.x failed. See errors above. error creating merge commit from bae95e3 to blathers/backport-release-24.3-143451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 24.3.x failed. See errors above. error creating merge commit from bae95e3 to blathers/backport-release-25.1-143451: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 25.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
If we have no PTS at all now, is there anything to guarantee that we can make progress in one batch if the gc TTL is very short and a single batch takes longer to generate? I see we’re backporting this so I’m just curious if we’ve sketched out all the possible failure modes and ramifications to be 100% certain this cannot regress any cases that currently work in order to qualify for backport. |
sql,rowexec: use WriteTS as timestamp for AddSSTable in backfill
PR #64023 made it so we use the backfill's read TS as the batch TS for
the AddSSTable operation. Simultaneously with that change, #63945 was
being worked on, which made it so that we use a fixed write timestamp
for the keys that are backfilled. This write timestamp is the actual
minimum lower bound timestamp for the backfill operation, and the read
timstamp is allowed to change, so it makes more sense to use the write
timestamp for the AddSSTable operation. Looking at the diffs in the PRs
makes it seem pretty likely that this was just missed as part of a
trivial branch skew.
This commit does not cause any behavior change, since at the time of
writing, the read timestamp and write timestamp are always identical for
the backfill.
Release note: None
sql,backfill: choose new read timestamp when backfill job is resumed
When #73861 got merged, we seem to have accidentally made it so the index
backfiller will never select a new read timestamp. That is contrary to
the goals of #63945, where we initially added the ability to advance the
read timestamp forward.
This mistake is why we were seeing behavior where a GC threshold error
would cause the backfill to retry infinitely (which in turn led us to treat
as a permanent error in #139203, which was something we never should have
done). We never noticed the bug because when CREATE INDEX was added to the
declarative schema changer in #92128, we turned off the declarative schema
changer in the test that would have caught this (i.e. TestIndexBackfillAfterGC).
This commit makes it so we choose a current timestamp as the read
timestamp when planning the backfill, and fixes up that test to show
that this works for index backfills in the declarative schema changer.
Release note (bug fix): Fixed a bug where a GC threshold error (which
appears as "batch timestamp must be after replica GC threshold ...")
could cause a schema change that backfills data to fail. Now, the error
will cause the backfill to be retried at a higher timestamp to avoid the
error.
sql, backfill: use a current timestamp to scan each chunk of the source index
There's no need for the backfill to use a fixed read timestamp for the
entire job. Instead, each chunk of the original index that needs to be
scanned can use a current timestamp. This allows us to remove all the
logic for adding protected timestamps for the backfill.
Release note (performance improvement): Schema changes that require
data to be backfilled no longer hold a protected timestamp for the
entire duration of the backfill, which means there is less overhead
caused by MVCC garbage collection after the backfill completes.
sql: stop setting readAsOf in index backfiller
As of the parent commit, this is unused by the index backfiller.
Release note: None
fixes #140629
informs #142339
informs #142117
informs #141773