Skip to content

jobs: fix mixed-version jobs flake#108357

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
adityamaru:acceptance-fix
Aug 12, 2023
Merged

jobs: fix mixed-version jobs flake#108357
craig[bot] merged 1 commit intocockroachdb:masterfrom
adityamaru:acceptance-fix

Conversation

@adityamaru
Copy link
Copy Markdown
Contributor

Similar to #107570 this is a short term fix for when an a query is executed with an AS OF SYSTEM TIME picks a transaction timestamp before the job_info migration has run. In which case parts of the jobs infrastructure will attempt to query the job_info column even though it doesn't exist at the transaction's timestamp.

As a short term fix, when we encounter an UndefinedObject error for the job_info table we generate a synthetic retryable error so that the txn is pushed to a higher timestamp at which the upgrade will have completed and the job_info table will be visible. The longer term fix is being tracked in #106764.

On master I can no longer reproduce the failure in #105032 but on 23.1 with this change I can successfully run 30 iterations of the test on a seed (-8690666577594439584) which previously saw occurrences of this flake.

Fixes: #103239
Fixes: #105032

Release note: None

Similar to cockroachdb#107570
this is a short term fix for when an a query is executed with an AS OF SYSTEM TIME
picks a transaction timestamp before the job_info migration has run.
In which case parts of the jobs infrastructure will attempt to query
the job_info column even though it doesn't exist at the transaction's timestamp.

As a short term fix, when we encounter an UndefinedObject error for the job_info table
we generate a synthetic retryable error so that the txn is pushed to a higher timestamp
at which the upgrade will have completed and the job_info table will be visible.
The longer term fix is being tracked in cockroachdb#106764.

On master I can no longer reproduce the failure in cockroachdb#105032 but
on 23.1 with this change I can successfully run 30 iterations of the test
on a seed (-8690666577594439584) which previously saw occurrences
of this flake.

Fixes: cockroachdb#103239
Fixes: cockroachdb#105032

Release note: None
@adityamaru adityamaru requested review from a team as code owners August 8, 2023 15:07
@blathers-crl
Copy link
Copy Markdown

blathers-crl bot commented Aug 8, 2023

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@adityamaru
Copy link
Copy Markdown
Contributor Author

TFTR!

bors r=knz

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Aug 12, 2023

Build succeeded:

@craig craig bot merged commit 9184ffb into cockroachdb:master Aug 12, 2023
@adityamaru adityamaru added the backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only label Aug 12, 2023
@adityamaru
Copy link
Copy Markdown
Contributor Author

blathers backport 23.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-23.1.x PAST MAINTENANCE SUPPORT: 23.1 patch releases via ER request only

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roachtest: acceptance/version-upgrade failed jobs: "system.job_info does not exist" during cluster upgrade

3 participants