Skip to content

jobs: TestJobInfoUpgradeRegressionTests failed #106347

@cockroach-teamcity

Description

@cockroach-teamcity

jobs.TestJobInfoUpgradeRegressionTests failed with artifacts on master @ 818aec861357579eb3a3e987cf5887f3cf112be4:

I230706 22:11:07.395057 1840 upgrade/upgradecluster/cluster.go:121  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1767  executing bump-cluster-version=1000022.2-77(fence) on nodes n{1}
I230706 22:11:07.404103 16892 server/migration.go:150  [T1,n1,bump-cluster-version] 1768  active cluster version setting is now 1000022.2-77(fence) (up from 1000022.2-76)
I230706 22:11:07.404575 1840 upgrade/upgrademanager/manager.go:657  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1769  executing operation validate-cluster-version=1000022.2-78
I230706 22:11:07.404985 1840 upgrade/upgradecluster/cluster.go:121  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1770  executing validate-cluster-version=1000022.2-78 on nodes n{1}
I230706 22:11:07.406167 1840 upgrade/upgrademanager/manager.go:657  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1771  executing operation bump-cluster-version=1000022.2-78
I230706 22:11:07.406594 1840 upgrade/upgradecluster/cluster.go:121  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1772  executing bump-cluster-version=1000022.2-78 on nodes n{1}
I230706 22:11:07.406999 16897 server/migration.go:150  [T1,n1,bump-cluster-version] 1773  active cluster version setting is now 1000022.2-78 (up from 1000022.2-77(fence))
I230706 22:11:07.421796 1840 upgrade/upgrademanager/manager.go:517  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1774  stepping through 1000022.2-80
I230706 22:11:07.421963 1840 upgrade/upgrademanager/manager.go:657  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1775  executing operation bump-cluster-version=1000022.2-79(fence)
I230706 22:11:07.422398 1840 upgrade/upgradecluster/cluster.go:121  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1776  executing bump-cluster-version=1000022.2-79(fence) on nodes n{1}
I230706 22:11:07.423272 16939 server/migration.go:150  [T1,n1,bump-cluster-version] 1777  active cluster version setting is now 1000022.2-79(fence) (up from 1000022.2-78)
I230706 22:11:07.424778 1840 upgrade/upgrademanager/manager.go:657  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1778  executing operation validate-cluster-version=1000022.2-80
I230706 22:11:07.425030 1840 upgrade/upgradecluster/cluster.go:121  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1779  executing validate-cluster-version=1000022.2-80 on nodes n{1}
I230706 22:11:07.450150 16863 jobs/adopt.go:261  [T1,n1] 1781  job 880184745739550721: resuming execution
I230706 22:11:07.448976 1840 upgrade/upgrademanager/manager.go:742  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1780  running Upgrade to 1000022.2-80: "backfill the system.job_info table with the payload and progress of each job in the system.jobs table"
I230706 22:11:07.457530 16865 jobs/registry.go:1606  [T1,n1] 1782  MIGRATION job 880184745739550721: stepping through state running
I230706 22:11:07.547964 16865 upgrade/upgrades/backfill_job_info_table_migration.go:81  [T1,n1,job=MIGRATION id=880184745739550721,upgrade=1000022.2-80] 1783  backfilling job_info, step0, batch0 done; resume after 0, done false
I230706 22:11:07.551043 16865 upgrade/upgrades/backfill_job_info_table_migration.go:81  [T1,n1,job=MIGRATION id=880184745739550721,upgrade=1000022.2-80] 1784  backfilling job_info, step0, batch1 done; resume after 880184745739550721, done true
I230706 22:11:07.632088 16865 upgrade/upgrades/backfill_job_info_table_migration.go:81  [T1,n1,job=MIGRATION id=880184745739550721,upgrade=1000022.2-80] 1785  backfilling job_info, step1, batch0 done; resume after 0, done false
I230706 22:11:07.649981 16865 upgrade/upgrades/backfill_job_info_table_migration.go:81  [T1,n1,job=MIGRATION id=880184745739550721,upgrade=1000022.2-80] 1786  backfilling job_info, step1, batch1 done; resume after 880184745739550721, done true
I230706 22:11:07.651325 16865 jobs/registry.go:1606  [T1,n1] 1787  MIGRATION job 880184745739550721: stepping through state succeeded
I230706 22:11:07.662184 1840 jobs/wait.go:145  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1788  waited for 1 [880184745739550721] queued jobs to complete 210.019003ms
I230706 22:11:07.662257 1840 upgrade/upgrademanager/manager.go:657  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1789  executing operation bump-cluster-version=1000022.2-80
I230706 22:11:07.662566 1840 upgrade/upgradecluster/cluster.go:121  [T1,n1,client=127.0.0.1:41314,hostssl,user=root,migration-mgr] 1790  executing bump-cluster-version=1000022.2-80 on nodes n{1}
I230706 22:11:07.662831 17135 server/migration.go:150  [T1,n1,bump-cluster-version] 1791  active cluster version setting is now 1000022.2-80 (up from 1000022.2-79(fence))
I230706 22:11:07.667852 1840 util/log/event_log.go:32  [T1,n1,client=127.0.0.1:41314,hostssl,user=root] 1792 ={"Timestamp":1688681461014926435,"EventType":"set_cluster_setting","Statement":"SET CLUSTER SETTING version = $1","Tag":"SET CLUSTER SETTING","User":"root","PlaceholderValues":["'1000022.2-80'"],"SettingName":"version","Value":"1000022.2-80"}
    job_info_storage_test.go:366: query 'SELECT count(*) FROM crdb_internal.system_jobs WHERE job_type = 'BACKUP'': expected:
        1
        
        got:
        0
        
W230706 22:11:07.756134 17097 kv/kvserver/intentresolver/intent_resolver.go:826  [-] 1793  failed to gc transaction record: could not GC completed transaction anchored at /Table/6/1/"version"/0: node unavailable; try another peer
I230706 22:11:07.756204 900 sql/stats/automatic_stats.go:572  [T1,n1] 1794  quiescing auto stats refresher
I230706 22:11:07.756382 10921 jobs/registry.go:1606  [T1,n1] 1795  KEY VISUALIZER job 100: stepping through state succeeded
W230706 22:11:07.758610 10921 jobs/adopt.go:531  [T1,n1] 1796  could not clear job claim: clear-job-claim: failed to send RPC: sending to all replicas failed; last error: ba: Scan [/Table/15/1/100,/Table/15/1/101), [txn: cac76053], [can-forward-ts] RPC error: node unavailable; try another peer
I230706 22:11:07.759080 901 sql/stats/automatic_stats.go:624  [T1,n1] 1797  quiescing stats garbage collector
I230706 22:11:07.759309 373 server/start_listen.go:103  [T1,n1] 1798  server shutting down: instructing cmux to stop accepting
I230706 22:11:07.762217 9363 jobs/registry.go:1606  [T1,n1] 1799  AUTO SPAN CONFIG RECONCILIATION job 880184732354183169: stepping through state succeeded
W230706 22:11:07.762427 11268 jobs/adopt.go:531  [T1,n1] 1800  could not clear job claim: clear-job-claim: node unavailable; try another peer
W230706 22:11:07.762529 650 sql/sqlliveness/slinstance/slinstance.go:334  [T1,n1] 1801  exiting heartbeat loop
W230706 22:11:07.764876 650 sql/sqlliveness/slinstance/slinstance.go:321  [T1,n1] 1804  exiting heartbeat loop with error: node unavailable; try another peer
I230706 22:11:07.762669 977 jobs/registry.go:1606  [T1,n1] 1802  AUTO SPAN CONFIG RECONCILIATION job 880184715132862465: stepping through state succeeded
W230706 22:11:07.764785 9363 jobs/adopt.go:531  [T1,n1] 1803  could not clear job claim: clear-job-claim: node unavailable; try another peer
E230706 22:11:07.765004 650 server/server_sql.go:514  [T1,n1] 1805  failed to run update of instance with new session ID: node unavailable; try another peer
E230706 22:11:07.765174 977 jobs/registry.go:1004  [T1,n1] 1806  error getting live session: node unavailable; try another peer
I230706 22:11:07.768845 58 server/server_controller_orchestration.go:263  [T1,n1] 1807  server controller shutting down ungracefully
I230706 22:11:07.769028 58 server/server_controller_orchestration.go:274  [T1,n1] 1808  waiting for tenant servers to report stopped
W230706 22:11:07.769212 58 server/server_sql.go:1712  [T1,n1] 1809  server shutdown without a prior graceful drain
--- FAIL: TestJobInfoUpgradeRegressionTests (9.81s)
Help

See also: How To Investigate a Go Test Failure (internal)

Same failure on other branches

/cc @cockroachdb/jobs @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-29520

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryA-jobsC-test-failureBroken test (automatically or manually discovered).O-robotOriginated from a bot.T-jobsbranch-masterFailures and bugs on the master branch.release-blockerIndicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.v23.1.9

Type

No type

Projects

Relationships

None yet

Development

No branches or pull requests

Issue actions