-
Notifications
You must be signed in to change notification settings - Fork 4.1k
spanconfig: checkpoint the reconciliation job and retry eagerly when possible #73694
Copy link
Copy link
Closed
Labels
A-zone-configsC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)X-staleno-issue-activity
Description
This is the tracking issue for follow-on work from #71994. Specifically we want to:
- Checkpoint the
spanconfig.Reconciler's incremental progress - Use the checkpoint to (possibly) avoid work if reconciling from scratch (if the job fails for any reason -- including pod shut down)
- Ensure that the reconciliation job opportunistically re-attempts reconciliation if running into the (unlikely) rangefeed errors surfaced in rangefeed: surface unrecoverable errors and don't hopelessly retry #73086. These errors indicate that we were attempting to establish a rangefeed, with diffs, at a timestamp that was already GC-ed. Bouncing the reconciler again immediately instead of failing the whole job seems like saner recovery behavior.
Jira issue: CRDB-11696
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-zone-configsC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)X-staleno-issue-activity