upgrades: batch the job_info backfill upgrade#104545
upgrades: batch the job_info backfill upgrade#104545craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
adityamaru
left a comment
There was a problem hiding this comment.
changes LGTM, thanks for doing this. If you decide to make the batch size a cluster setting, we have a test TestBackfillJobsInfoTable that should maybe metamorphic set that cluster setting to [1, num(jobs)] just to ensure the resumeAfter logic works fine?
|
yeah, I ran the test by hand with the |
Release note (bug fix): The backfill of system.job_info upgrade migration that runs during upgrades from 22.2 now processes rows in batches to avoid cases where it could become stuck due to contention and transaction retries. Epic: none.
|
TFTR! bors r+ |
|
This PR was included in a batch that successfully built, but then failed to merge into master. It will not be retried. Additional information: {"message":"Changes must be made through a pull request.","documentation_url":"https://docs.github.com/articles/about-protected-branches"} |
|
bors r+ single on |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error setting reviewers, but backport branch blathers/backport-release-23.1-104545 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/104574/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. [] Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
104752: upgrades: fix txn retry bug in upgrade batching r=stevendanna a=adityamaru In #104545 we broke up the txn that is responsible for backfill the `system.job_info` table as part of an upgrade. That diff had a bug where a txn retry inside the `db.Txn` closure could result in us skipping rows to backfill. The consequence of this is that some jobs will not have their payload and progress copied over from the `system.jobs` table to the `system.job_info` table. This is bad because once the cluster is fully upgraded, the job system will **only** consult the `system.job_info` table during execution. When it does so, the job is destined to fail as there will be no payload or progress entry corresponding to that job. Fixes: #104653 Release note (bug fix): fixes a bug where a txn retry during the backfill of the jobs info table could result in job rows being missed Co-authored-by: adityamaru <adityamaru@gmail.com>
Release note (bug fix): The backfill of system.job_info upgrade migration that runs during upgrades from 22.2 now processes rows in batches to avoid cases where it could become stuck due to contention and transaction retries.
Epic: none.