Skip to content

fix(kanban): clear stale claim locks on ready tasks before dispatch#29214

Open
hettt211 wants to merge 1 commit into
NousResearch:mainfrom
hettt211:fix/kanban-stale-claim-locks
Open

fix(kanban): clear stale claim locks on ready tasks before dispatch#29214
hettt211 wants to merge 1 commit into
NousResearch:mainfrom
hettt211:fix/kanban-stale-claim-locks

Conversation

@hettt211

Copy link
Copy Markdown

What does this PR do?

Clear stale claim_lock/claim_expires/worker_pid fields on ready tasks
at the start of each dispatch tick. A task in ready status cannot have
a live worker — these fields are always stale residue from a previous
crashed or blocked run. release_stale_claims only inspects running
tasks, so stale locks were permanently blocking dispatch.

Related Issue

Fixes #22926

Type of Change

  • 🐛 Bug fix

Changes Made

  • hermes_cli/kanban_db.py: added cleanup SQL (+11 lines) before the
    dispatch spawn loop — UPDATE tasks SET claim_lock=NULL, claim_expires=NULL, worker_pid=NULL WHERE status='ready' AND claim_lock IS NOT NULL

How to Test

  1. Dispatch a task that crashes immediately (e.g. worker profile with
    an invalid API key)
  2. Wait for gateway to auto-block it
  3. Manually reset to ready via SQL without clearing claim_lock
  4. Run dispatch — without fix: spawned=0; with fix: task appears normally

Checklist

Documentation & Housekeeping

  • Documentation: N/A
  • Config: N/A
  • Architecture: N/A
  • Cross-platform: N/A (pure SQL, platform-independent)
  • Tool schemas: N/A

Stale claim_lock/claim_expires/worker_pid fields can leak onto ready
tasks when a worker crashes and the task is later manually unblocked
(e.g. from 'blocked' back to 'ready' via SQL). The reclaim path only
inspects status='running' tasks, so these stale locks are never
released and the task is permanently skipped by dispatch.

Fix: clear claim_lock/claim_expires/worker_pid on any ready task
before the dispatch spawn loop. A task in status='ready' cannot have
a live worker by definition, so these fields are always stale.

Fixes NousResearch#22926
@alt-glitch alt-glitch added type/bug Something isn't working P3 Low — cosmetic, nice to have comp/plugins Plugin system and bundled plugins labels May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Kanban stale claim locks from dead workers have no auto-cleanup — tasks permanently stuck until manual intervention

2 participants