Skip to content

fix(kanban): skip invalid task skills and add recovery commands#23154

Closed
qWaitCrypto wants to merge 7 commits into
NousResearch:mainfrom
qWaitCrypto:fix/kanban-dispatch-recovery-followup
Closed

fix(kanban): skip invalid task skills and add recovery commands#23154
qWaitCrypto wants to merge 7 commits into
NousResearch:mainfrom
qWaitCrypto:fix/kanban-dispatch-recovery-followup

Conversation

@qWaitCrypto

@qWaitCrypto qWaitCrypto commented May 10, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

This PR tightens Kanban dispatch/recovery around operator-fixable task states without introducing a full capability model.

It adds a defensive dispatcher preflight for tasks whose persisted task.skills are already known-bad, keeps those tasks in ready while surfacing an explicit skip reason, and expands the existing recovery surface with narrow CLI actions for resetting failure counters and clearing stale claim fields.

It also adds diagnostics for persisted invalid task skills, missing assignee profiles, and expired running claims.

Related Issue

Related to #22925, #22926

Stacked on top of #22933.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • Skip dispatcher spawn for ready tasks whose persisted task.skills contain known toolset names
  • Surface skipped invalid-skill tasks in DispatchResult and hermes kanban dispatch output
  • Add edit --reset-failures recovery command
  • Add edit --clear-claim recovery command for non-running tasks with stale claim fields
  • Add diagnostics for:
    • invalid_task_skills
    • assignee_profile_not_found
    • stale_running_claim
  • Add targeted tests covering dispatch skip behavior, new recovery commands, clear-claim run cleanup, and diagnostics

How to Test

  1. Run targeted Kanban tests:
    PYTHONWARNINGS=ignore pytest -q tests/hermes_cli/test_kanban_db.py::test_dispatch_skips_invalid_task_skills_and_keeps_ready tests/hermes_cli/test_kanban_db.py::test_dispatch_skips_invalid_task_skills_without_event_spam tests/hermes_cli/test_kanban_db.py::test_reset_task_failures_clears_counter_and_emits_event tests/hermes_cli/test_kanban_db.py::test_edit_task_recovery_fields_clear_claim_on_non_running_task tests/hermes_cli/test_kanban_core_functionality.py::test_cli_edit_reset_failures tests/hermes_cli/test_kanban_core_functionality.py::test_cli_edit_reset_failures_rejects_result_fields tests/hermes_cli/test_kanban_core_functionality.py::test_cli_edit_clear_claim tests/hermes_cli/test_kanban_core_functionality.py::test_cli_edit_clear_claim_rejects_result_fields tests/hermes_cli/test_kanban_diagnostics.py::test_invalid_task_skills_warns tests/hermes_cli/test_kanban_diagnostics.py::test_assignee_profile_not_found_warns tests/hermes_cli/test_kanban_diagnostics.py::test_stale_running_claim_warns
  2. Create or patch a ready task so tasks.skills contains web, then run hermes kanban dispatch --json and verify it is reported under skipped_invalid_skills and remains ready
  3. Use hermes kanban edit <task_id> --reset-failures and hermes kanban edit <task_id> --clear-claim to verify both recovery actions succeed on eligible tasks
  4. For --clear-claim, verify that active running runs are closed as reclaimed, while already-terminal runs are not rewritten

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Linux (WSL-style dev environment)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

  • Targeted verification passed: 9 passed in 18.73s

@alt-glitch alt-glitch added type/bug Something isn't working comp/cron Cron scheduler and job management P3 Low — cosmetic, nice to have labels May 10, 2026
@qWaitCrypto qWaitCrypto changed the title Fix/kanban dispatch recovery followup fix(kanban): skip invalid task skills and add recovery commands May 10, 2026
…ics-preflight-22921

# Conflicts:
#	tests/tools/test_kanban_tools.py
@teknium1

Copy link
Copy Markdown
Contributor

Closing in favor of #23183 below. This PR is a strict git superset of #22974 (every commit from there plus two more), and #23183 is in turn a strict superset of this one. See the combined comment on #23183 for the full picture.

Quick summary specific to this PR: the create-time validation refactor (INVALID_TASK_SKILL_NAMES rename) is redundant against PR #23273 which landed on main using KNOWN_TOOLSET_NAMES from the live toolset registry. The genuinely novel work — dispatch-time skip-and-keep-ready behavior with DispatchResult.skipped_invalid_skills, plus the --reset-failures and --clear-claim operator recovery flags — is real and worth keeping. But it's bundled in the larger #23183 stack now.

Thanks @qWaitCrypto!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/cron Cron scheduler and job management P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants