Skip to content

fix(kanban): validate worker profile before spawn#20065

Closed
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:fix/kanban-profile-readiness
Closed

fix(kanban): validate worker profile before spawn#20065
steezkelly wants to merge 1 commit into
NousResearch:mainfrom
steezkelly:fix/kanban-profile-readiness

Conversation

@steezkelly

Copy link
Copy Markdown
Contributor

Summary

  • Add a cheap pre-spawn readiness guard for Kanban worker profiles
  • Fail fast when a non-default assignee profile does not exist or lacks config.yaml
  • Prevent _default_spawn from launching hermes -p <profile> for half-created profile directories
  • Update spawn-env tests to model runnable profiles explicitly

Fixes #20054

Scope

This intentionally checks deterministic local readiness only:

  • profile exists
  • profile has config.yaml

It does not attempt provider-specific credential validation; bad/expired credentials can still fail during worker startup and are reported through the existing spawn-failure path.

Test Plan

  • venv/bin/python -m pytest tests/hermes_cli/test_kanban_boards.py::TestWorkerSpawnEnv::test_default_spawn_rejects_half_created_profile -q -o 'addopts=' — watched fail before implementation, then pass
  • venv/bin/python -m pytest tests/hermes_cli/test_kanban_boards.py::TestWorkerSpawnEnv -q -o 'addopts=' → 3 passed
  • venv/bin/python -m pytest tests/hermes_cli/test_kanban_boards.py tests/hermes_cli/test_kanban_db.py tests/hermes_cli/test_kanban_cli.py -q -o 'addopts=' → 124 passed
  • venv/bin/python -m pytest tests/tools/test_kanban_tools.py tests/plugins/test_kanban_dashboard_plugin.py -q -o 'addopts=' → 89 passed, 2 unrelated deprecation warnings
  • venv/bin/python -m py_compile hermes_cli/kanban_db.py tests/hermes_cli/test_kanban_boards.py tests/hermes_cli/test_kanban_db.py → passed

@steezkelly

Copy link
Copy Markdown
Contributor Author

Local verification completed before opening this fork PR:

  • venv/bin/python -m pytest tests/hermes_cli/test_kanban_boards.py::TestWorkerSpawnEnv::test_default_spawn_rejects_half_created_profile -q -o 'addopts=' → failed before implementation as expected, then passed
  • venv/bin/python -m pytest tests/hermes_cli/test_kanban_boards.py::TestWorkerSpawnEnv -q -o 'addopts=' → 3 passed
  • venv/bin/python -m pytest tests/hermes_cli/test_kanban_boards.py tests/hermes_cli/test_kanban_db.py tests/hermes_cli/test_kanban_cli.py -q -o 'addopts=' → 124 passed
  • venv/bin/python -m pytest tests/tools/test_kanban_tools.py tests/plugins/test_kanban_dashboard_plugin.py -q -o 'addopts=' → 89 passed, 2 unrelated deprecation warnings
  • venv/bin/python -m py_compile hermes_cli/kanban_db.py tests/hermes_cli/test_kanban_boards.py tests/hermes_cli/test_kanban_db.py → passed

GitHub Actions are currently showing action_required for this fork PR, so CI appears to be waiting for maintainer approval rather than failing on code.

@teknium1

teknium1 commented May 5, 2026

Copy link
Copy Markdown
Contributor

Closing in favor of #20165 (merged, commit f25d3ec), which fixes the same underlying issue #20054 at a tighter call site.

Your PR's readiness check ran inside _default_spawn (before the subprocess launch). The merged fix runs earlier — inside dispatch_once(), before the task is even claimed. That matters because claim-then-fail creates the crash loop (zombie worker, TTL reclaim, next tick re-spawn), whereas skip-before-claim is silent. The existence-only check also solves a broader real-world scenario (control-plane terminal lane names like orion-cc that never had a profile to begin with) rather than just the half-created-profile case.

On the config.yaml-specific check: we considered auto-materializing a default config for half-created profiles but rejected it. A profile without config.yaml has no model/provider/auth routing; synthesizing a default means the worker silently inherits the root ~/.hermes/config.yaml, which is almost certainly wrong (wrong model, wrong skills, missing provider creds → different failure mode, not a fix). Per the issue reporter's own expected behavior — "leave the task unclaimed and comment with the readiness failure" — fail-fast with a precise diagnostic is the right answer.

Thanks for the contribution — appreciate the clear scoping and test plan in the PR description.

#20165

@teknium1 teknium1 closed this May 5, 2026
@steezkelly steezkelly deleted the fix/kanban-profile-readiness branch May 5, 2026 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Kanban dispatcher should validate assignee profile readiness before spawning workers

3 participants