backupccl: scheduled backups should stop adding to an incremental chain after a certain number of running jobs

In https://github.com/cockroachlabs/support/issues/2030 we saw an instance of a schedule running incremental backups every hour with `on_previous_running` set to `start`. For *reasons* the hourly incrementals were not completing which resulted in a buildup of 30+ running, incremental jobs. This resulted in nodes OOMing, and general cluster instability. As such, this is working as expected, but it is an easy footgun and one we should safeguard against. If a backup schedule observes > x incremental backup jobs running on its behalf we should do _something_.  This could include skipping scheduling an incremental until the running jobs count falls below x with adequate logs/warnings.

Jira issue: CRDB-23951

Epic CRDB-21944

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backupccl: scheduled backups should stop adding to an incremental chain after a certain number of running jobs #96110

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

backupccl: scheduled backups should stop adding to an incremental chain after a certain number of running jobs #96110

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions