Skip to content

Detect jobs on deleted nodes during run reconciliation#4622

Merged
mauriceyap merged 5 commits intomasterfrom
reconciliation-deleted-nodes
Jan 27, 2026
Merged

Detect jobs on deleted nodes during run reconciliation#4622
mauriceyap merged 5 commits intomasterfrom
reconciliation-deleted-nodes

Conversation

@mauriceyap
Copy link
Collaborator

Previously, the reconciler only checked for pool/reservation mismatches on existing nodes. Jobs assigned to nodes that no longer exist were silently ignored, leaving them in an invalid state.

Now the reconciler builds a set of current node IDs and fails reconciliation for any job whose assigned node is missing. This ensures orphaned jobs are properly marked as failed rather than remaining in limbo.

Also fixes typos in error messages ("this jobs" -> "this job's", "resevation" -> "reservation").

Previously, the reconciler only checked for pool/reservation mismatches on existing nodes. Jobs assigned to nodes that no longer exist were silently ignored, leaving them in an invalid state.

Now the reconciler builds a set of current node IDs and fails reconciliation for any job whose assigned node is missing. This ensures orphaned jobs are properly marked as failed rather than remaining in limbo.

Also fixes typos in error messages ("this jobs" -> "this job's", "resevation" -> "reservation").

Signed-off-by: Maurice Yap <mauriceyap@hotmail.co.uk>
dejanzele
dejanzele previously approved these changes Jan 26, 2026
Signed-off-by: Maurice Yap <mauriceyap@hotmail.co.uk>
nikola-jokic
nikola-jokic previously approved these changes Jan 27, 2026
Signed-off-by: Maurice Yap <mauriceyap@hotmail.co.uk>
@mauriceyap mauriceyap enabled auto-merge (squash) January 27, 2026 18:56
@mauriceyap mauriceyap merged commit d794d17 into master Jan 27, 2026
14 checks passed
@mauriceyap mauriceyap deleted the reconciliation-deleted-nodes branch January 27, 2026 19:06
Sigele pushed a commit to Sigele/armada that referenced this pull request Jan 30, 2026
…#4622)

Previously, the reconciler only checked for pool/reservation mismatches
on existing nodes. Jobs assigned to nodes that no longer exist were
silently ignored, leaving them in an invalid state.

Now the reconciler builds a set of current node IDs and fails
reconciliation for any job whose assigned node is missing. This ensures
orphaned jobs are properly marked as failed rather than remaining in
limbo.

Also fixes typos in error messages ("this jobs" -> "this job's",
"resevation" -> "reservation").

---------

Signed-off-by: Maurice Yap <mauriceyap@hotmail.co.uk>
Signed-off-by: Sigele Nickerson-Adams <sigele.nickerson-adams@nmc2.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants