Skip to content

backupccl: backups fail during upgrade to v21.2.0 #72839

@shermanCRL

Description

@shermanCRL

Observed by @adwittumuluri on CockroachCloud:

We're seeing errors like the following for clusters that have been recently upgraded from 21.1 to 21.2:

1I211116 20:33:07.555036 43 backupper/main.go:286  sleeping 20m0s {"loop_iteration":"2021-11-16T20:33:06.948Z"}
2I211116 20:53:07.555181 43 backupper/backup.go:66  started backup runner for auto backup {"loop_iteration":"2021-11-16T20:53:07.555Z"}
3I211116 20:53:07.555214 43 backupper/backup.go:208  running a full backup {"loop_iteration":"2021-11-16T20:53:07.555Z"}
4I211116 20:53:08.084279 43 backupper/backup.go:258  finished full backup {"loop_iteration":"2021-11-16T20:53:07.555Z","elapsed_time":0.529054056}
5E211116 20:53:08.084330 43 backupper/backup.go:127  failed backup run {error 25 0  pq: internal error: backup-lookup-tenants: descriptor not found} {"loop_iteration":"2021-11-16T20:53:07.555Z"}
6I211116 20:53:08.180094 43 backupper/main.go:286  sleeping 20m0s {"loop_iteration":"2021-11-16T20:53:07.555Z"}
7I211116 21:13:08.180241 43 backupper/backup.go:66  started backup runner for auto backup {"loop_iteration":"2021-11-16T21:13:08.180Z"}
8I211116 21:13:08.180273 43 backupper/backup.go:208  running a full backup {"loop_iteration":"2021-11-16T21:13:08.180Z"}
9I211116 21:13:08.700771 43 backupper/backup.go:258  finished full backup {"loop_iteration":"2021-11-16T21:13:08.180Z","elapsed_time":0.520495416}
10E211116 21:13:08.700820 43 backupper/backup.go:127  failed backup run {error 25 0  pq: internal error: backup-lookup-tenants: descriptor not found} {"loop_iteration":"2021-11-16T21:13:08.180Z"}
11I211116 21:13:08.778906 43 backupper/main.go:286  sleeping 20m0s {"loop_iteration":"2021-11-16T21:13:08.180Z”}

This backup-lookup-tenants: descriptor not found are happening for both full and incremental backups.

Diagnosis from @ajwerner:

const tenantMetadataQuery = `
SELECT
tenants.id, /* 0 */
tenants.active, /* 1 */
tenants.info, /* 2 */
tenant_usage.ru_burst_limit, /* 3 */
tenant_usage.ru_refill_rate, /* 4 */
tenant_usage.ru_current, /* 5 */
tenant_usage.total_consumption /* 6 */
FROM
system.tenants
LEFT JOIN system.tenant_usage ON
tenants.id = tenant_usage.tenant_id AND tenant_usage.instance_id = 0`
was updated to join against the tenant usage table which is added as a part of a migration. This breaks backups in the mixed-version state.

Likely impact is that backups will fail while the upgrade process is running, but not yet finalized.

Metadata

Metadata

Assignees

Labels

A-disaster-recoveryC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-technical-advisoryCaused a technical advisoryT-disaster-recoverybranch-release-21.2Used to mark GA and release blockers, technical advisories, and bugs for 21.2

Type

No type

Projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions