Skip to content

spanconfig: checkpoint the reconciliation job and retry eagerly when possible #73694

@irfansharif

Description

@irfansharif

This is the tracking issue for follow-on work from #71994. Specifically we want to:

  • Checkpoint the spanconfig.Reconciler's incremental progress
  • Use the checkpoint to (possibly) avoid work if reconciling from scratch (if the job fails for any reason -- including pod shut down)
  • Ensure that the reconciliation job opportunistically re-attempts reconciliation if running into the (unlikely) rangefeed errors surfaced in rangefeed: surface unrecoverable errors and don't hopelessly retry  #73086. These errors indicate that we were attempting to establish a rangefeed, with diffs, at a timestamp that was already GC-ed. Bouncing the reconciler again immediately instead of failing the whole job seems like saner recovery behavior.

Jira issue: CRDB-11696

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions