UPDATE — Apr 28, 2026 — This RFC is now tracking the implemented PR. Current status, review history, and all bug/fix summaries live on PR #16100: #16100
Since this RFC was filed:
- Moved from a cron-driven dispatcher to a real long-lived daemon (
hermes kanban daemon) with a systemd unit. Cron was burning LLM tokens per tick.
- Runs as first-class — one row per attempt; preserves full retry history with structured
summary + metadata handoff. (Ported from @vulcan-artivus's review.)
- Dashboard plugin with drag-drop board, Run History, Worker Log panel, live WebSocket updates, and per-task event attribution.
- Four audit passes + external review by @erosika + full battle-test suite — 15 bugs found and fixed across the cycle.
tests/stress/ exercises real multi-process concurrency, real subprocess E2E, property fuzzing (~40k randomized ops, 9 invariants), and scale benchmarks at 10k tasks.
- Tutorial + 10 dashboard screenshots walking the four user stories:
website/docs/user-guide/features/kanban-tutorial.md.
- 182/182 main kanban test suite green.
Comments on this issue remain open for v2 design input (workflow templates, structured comments as multi-peer session substrate, skill-aware routing). PR #16100 is the merge-tracking location.
Request for review: Kanban — durable multi-profile collaboration board
PR: #16100
Design spec: docs/hermes-kanban-v1-spec.pdf (committed in the PR, 14 sections + diagrams + bibliography)
Design discussion: Nous Discord thread, April 25–26 2026 (contributors credited at bottom)
Kanban is a new durable, SQLite-backed task board shared across all Hermes profiles on a host. Tasks carry an assignee (a profile name), optional dependency links, a workspace kind (scratch / worktree / dir:<path>), and an optional tenant namespace. A cron-driven dispatcher atomically claims ready tasks and spawns the assigned profile as its own OS process — no in-process subagent swarms. The /kanban slash command works in both CLI and all gateway platforms (same COMMAND_REGISTRY pipe).
Before we merge, we'd like eyes on it from anyone who runs multiple profiles, has opinions on agent coordination primitives, or plans to use this for non-coding workloads (research, ops, digital twins, fleet work). The PR is substantial (~2900 LOC including tests, spec, skills, and docs) and introduces a new top-level concept users will need to reason about alongside delegate_task.
The shape at a glance
- Board:
~/.hermes/kanban.db, WAL-mode SQLite, profile-agnostic. Four tables (tasks, task_links, task_comments, task_events) + six indexes.
- Status machine:
todo → ready → running → done (plus blocked and archived side branches). Only one role may transition each status; eliminates write contention.
- Atomic claim: compare-and-swap
UPDATE ... WHERE status='ready' AND claim_lock IS NULL inside BEGIN IMMEDIATE. Proven-serial under SQLite's WAL; the test suite includes a concurrent-thread race where exactly one of 8 claimers wins.
- Dispatcher:
hermes kanban dispatch — reclaims stale running tasks (15-min claim TTL), promotes todo → ready when all parents done, atomically claims, spawns hermes -p <profile> chat -q "work kanban task <id>" with HERMES_KANBAN_TASK / HERMES_KANBAN_WORKSPACE / HERMES_TENANT env vars set, redirects output to ~/.hermes/kanban/logs/<id>.log.
- Workspace kinds:
scratch (default) — fresh tmp dir per task, GC'd on archive.
worktree — git worktree under .worktrees/<id>/ for coding tasks.
dir:<path> — existing shared directory (Obsidian vault, mail ops dir, per-account folder).
- Tenant column: one nullable string; one specialist fleet can serve many business contexts (
--tenant business-a) with data isolation by workspace path + memory key prefix.
- Zero changes to run_agent.py. No new core tools. No tool-schema bloat on any API call.
CLI / gateway surface
Fifteen verbs, all available as both hermes kanban <verb> and /kanban <verb>:
init · create · list · show · assign · link · unlink · claim ·
comment · complete · block · unblock · archive · tail · dispatch · context · gc
The slash command bypasses the running-agent guard in the gateway — /kanban unblock can free a stuck worker while the main agent is mid-conversation. Board writes don't touch agent state.
Skills shipped alongside
kanban-worker — how a profile claims context, does work in its workspace, blocks on ambiguity, completes with a result, delegates follow-ups.
kanban-orchestrator — "you are a dispatcher, not a worker" template with anti-temptation rules and a standard specialist roster (researcher, writer, analyst, backend-eng, reviewer, ops).
Why not just delegate_task?
These look similar and they are not the same primitive. The one-sentence distinction: delegate_task is a function call; Kanban is a durable work queue where every handoff is a row any profile (or human) can read and edit. The full 12-dimension comparison table is in §6 of the spec.
They coexist. A kanban worker may call delegate_task internally for reasoning within its own run. The single test: does this handoff need to outlive a single API loop and be visible to others?
Use delegate_task for short, self-contained reasoning subtasks the parent agent wants an answer to before continuing — seconds-to-minutes, no human in the loop, result goes back into parent's context.
Use Kanban for work that crosses agent boundaries, needs to survive restarts, might need human input, might be picked up by a different role (engineer → reviewer → engineer), or needs to be discoverable after the fact.
What we'd especially like feedback on
-
The delegate_task / Kanban boundary. Is the "does this handoff need to outlive a single API loop" test clear enough? Should the spec land a doc page explicitly titled "when to use which"? Are there workloads you can't tell which side of the line they fall on?
-
Eight collaboration patterns. Spec §5 names P1 Fan-out, P2 Pipeline, P3 Voting/quorum, P4 Long-running journal, P5 Human-in-the-loop triage, P6 @mention delegation, P7 Thread-scoped workspace, P8 Fleet farming. P6 and P8 are the only patterns that require infra beyond the base primitives (P6 is a parser hook; P8 is a dispatch-fleet helper). Is the set right? Missing any obvious shapes?
-
Workspace kinds. Three: scratch, worktree, dir:<path>. Research / ops / digital-twin use cases all work with the default scratch; coding uses worktree; long-running journals and per-subject fleets use dir:. Is the kind vocabulary right, or should we flatten it (e.g., always dir:; scratch is just an auto-allocated path)?
-
Tenant as one nullable column. Design choice: tenants are namespaces, not entity types. One researcher profile serves multiple businesses via --tenant business-a. Is this enough for people actually running multi-business setups (cc @sudo_relax from the design thread), or does it need more — per-tenant access control, cross-tenant task linking, tenant-scoped profile definitions?
-
Dispatcher cadence. Runs via cron, default 60 seconds. Cheap "mini dispatch" (recompute ready) also runs on every hermes kanban list invocation to keep laptop-sleep-wake cases responsive. Too aggressive? Too conservative? Worth a dedicated long-lived ticker process instead?
-
Claim TTL. Default 15 minutes before a claim is considered stale and reclaimed. Workers that know they'll run longer should call heartbeat_claim() periodically. Is 15m the right default, or should it scale with profile (e.g., 60m for backend-eng, 5m for researcher)?
-
terminal-based spawn, output to log file. dispatch_once uses subprocess.Popen with start_new_session=True and redirects output to ~/.hermes/kanban/logs/<id>.log. No stdin. Acceptable, or do we need more — PID recording on the task row? Structured log format? Live-streaming output back to the dispatcher's gateway session?
-
Orchestrator profile design. The kanban-orchestrator skill plus a recommendation to restrict toolsets to [kanban, gateway, memory] is the proposed fix for the "orchestrator does the work itself" failure mode raised by @sudo_relax. Is this enough, or do we need kernel-level enforcement (a "router-only" profile flag that the dispatcher honors)?
-
Running-agent guard bypass. /kanban is in the bypass list (same tier as /background). Mutations are allowed mid-run because the board is profile-agnostic and doesn't touch the running agent's state. Worth a stricter rule — mutations gated, reads allowed?
-
What's left out. Deliberately not in v1: per-tenant access control, cross-tenant links, tenant-scoped profile definitions, round-robin worker pools, auto-assignment ("any idle profile claims it"), smart routing, per-agent budgets, approval gates, fleet dashboards, org-chart types. All user-space (plugins or profile conventions). If any of these feel like they belong in the kernel, say so now.
-
Bugs, edge cases, race conditions — the usual. The concurrent-claim race is tested; the stale-claim recovery is tested; cycle detection in link_tasks is tested (caught a bug during implementation — direction of graph walk). What else should be in the test matrix?
What Kanban does NOT do (intentionally)
- Does not run workers in-process — every worker is a full OS process with its own HERMES_HOME. No SDK-lifecycle fragility (the NanoClaw failure class).
- Does not auto-assign, auto-route, or auto-escalate. All those are user-space profile behaviors.
- Does not delete anything automatically. Archive only;
gc removes scratch workspace dirs for archived tasks.
- Does not modify
run_agent.py, model_tools.py, or any tool schema.
- Does not invalidate the main session's prompt cache. The board is external to any agent's context.
- Does not cross tenants with task links (v1 limitation; noted in spec §7).
How to try it locally
gh pr checkout 16100
hermes kanban init
hermes kanban create "research AI funding" --assignee researcher
hermes kanban list
hermes kanban dispatch --dry-run
# Then for real:
hermes kanban dispatch
The two skills (kanban-worker + kanban-orchestrator) are in skills/devops/ and load like any other skill.
For a full worked example, see spec §5 (research triage), §6 (the 8 patterns), and §9 (50-account fleet example).
Related systems & design input
The design synthesizes three existing systems plus one April-2026 release:
- Cline Kanban — board + linked tasks + ephemeral worktrees shape. We adopted.
- Paperclip — atomic task checkout + persistent agent identity. Mapped onto Hermes profiles.
- NanoClaw Agent Swarms — the negative lesson: in-process SDK subagent swarms are fragile to upstream lifecycle semantics. We explicitly reject.
- Google Gemini Enterprise Agent Designer + CLI Subagents (April 2026) — portable subagent-as-file artifacts (we'll match in a follow-up
hermes profile export), and @name delegation syntax (implemented as P6).
Community design input from the Nous Discord design thread, credited in the PR body: @Teknium, waxhy, A Real Icehole, Keimpe, LLM.STORE, caco, hunter_cat, djm, ionmanden, psbd, Aiz, Rikllo, sudo_relax, neo2k8.
Timing
Happy to let this sit for review. The PR is standing — not merged pending design approval. If something needs to change before we ship, flag it on the PR directly. Higher-level design concerns (primitives, scope boundaries, naming) go here on the RFC.
Thanks in advance for the eyes.
Request for review: Kanban — durable multi-profile collaboration board
PR: #16100
Design spec:
docs/hermes-kanban-v1-spec.pdf(committed in the PR, 14 sections + diagrams + bibliography)Design discussion: Nous Discord thread, April 25–26 2026 (contributors credited at bottom)
Kanban is a new durable, SQLite-backed task board shared across all Hermes profiles on a host. Tasks carry an assignee (a profile name), optional dependency links, a workspace kind (
scratch/worktree/dir:<path>), and an optional tenant namespace. A cron-driven dispatcher atomically claims ready tasks and spawns the assigned profile as its own OS process — no in-process subagent swarms. The/kanbanslash command works in both CLI and all gateway platforms (sameCOMMAND_REGISTRYpipe).Before we merge, we'd like eyes on it from anyone who runs multiple profiles, has opinions on agent coordination primitives, or plans to use this for non-coding workloads (research, ops, digital twins, fleet work). The PR is substantial (~2900 LOC including tests, spec, skills, and docs) and introduces a new top-level concept users will need to reason about alongside
delegate_task.The shape at a glance
~/.hermes/kanban.db, WAL-mode SQLite, profile-agnostic. Four tables (tasks,task_links,task_comments,task_events) + six indexes.todo → ready → running → done(plusblockedandarchivedside branches). Only one role may transition each status; eliminates write contention.UPDATE ... WHERE status='ready' AND claim_lock IS NULLinsideBEGIN IMMEDIATE. Proven-serial under SQLite's WAL; the test suite includes a concurrent-thread race where exactly one of 8 claimers wins.hermes kanban dispatch— reclaims stale running tasks (15-min claim TTL), promotestodo → readywhen all parentsdone, atomically claims, spawnshermes -p <profile> chat -q "work kanban task <id>"withHERMES_KANBAN_TASK/HERMES_KANBAN_WORKSPACE/HERMES_TENANTenv vars set, redirects output to~/.hermes/kanban/logs/<id>.log.scratch(default) — fresh tmp dir per task, GC'd on archive.worktree— git worktree under.worktrees/<id>/for coding tasks.dir:<path>— existing shared directory (Obsidian vault, mail ops dir, per-account folder).--tenant business-a) with data isolation by workspace path + memory key prefix.CLI / gateway surface
Fifteen verbs, all available as both
hermes kanban <verb>and/kanban <verb>:The slash command bypasses the running-agent guard in the gateway —
/kanban unblockcan free a stuck worker while the main agent is mid-conversation. Board writes don't touch agent state.Skills shipped alongside
kanban-worker— how a profile claims context, does work in its workspace, blocks on ambiguity, completes with a result, delegates follow-ups.kanban-orchestrator— "you are a dispatcher, not a worker" template with anti-temptation rules and a standard specialist roster (researcher,writer,analyst,backend-eng,reviewer,ops).Why not just
delegate_task?These look similar and they are not the same primitive. The one-sentence distinction:
delegate_taskis a function call; Kanban is a durable work queue where every handoff is a row any profile (or human) can read and edit. The full 12-dimension comparison table is in §6 of the spec.They coexist. A kanban worker may call
delegate_taskinternally for reasoning within its own run. The single test: does this handoff need to outlive a single API loop and be visible to others?Use
delegate_taskfor short, self-contained reasoning subtasks the parent agent wants an answer to before continuing — seconds-to-minutes, no human in the loop, result goes back into parent's context.Use Kanban for work that crosses agent boundaries, needs to survive restarts, might need human input, might be picked up by a different role (engineer → reviewer → engineer), or needs to be discoverable after the fact.
What we'd especially like feedback on
The
delegate_task/ Kanban boundary. Is the "does this handoff need to outlive a single API loop" test clear enough? Should the spec land a doc page explicitly titled "when to use which"? Are there workloads you can't tell which side of the line they fall on?Eight collaboration patterns. Spec §5 names P1 Fan-out, P2 Pipeline, P3 Voting/quorum, P4 Long-running journal, P5 Human-in-the-loop triage, P6
@mentiondelegation, P7 Thread-scoped workspace, P8 Fleet farming. P6 and P8 are the only patterns that require infra beyond the base primitives (P6 is a parser hook; P8 is adispatch-fleethelper). Is the set right? Missing any obvious shapes?Workspace kinds. Three:
scratch,worktree,dir:<path>. Research / ops / digital-twin use cases all work with the defaultscratch; coding usesworktree; long-running journals and per-subject fleets usedir:. Is the kind vocabulary right, or should we flatten it (e.g., alwaysdir:; scratch is just an auto-allocated path)?Tenant as one nullable column. Design choice: tenants are namespaces, not entity types. One researcher profile serves multiple businesses via
--tenant business-a. Is this enough for people actually running multi-business setups (cc @sudo_relax from the design thread), or does it need more — per-tenant access control, cross-tenant task linking, tenant-scoped profile definitions?Dispatcher cadence. Runs via cron, default 60 seconds. Cheap "mini dispatch" (recompute ready) also runs on every
hermes kanban listinvocation to keep laptop-sleep-wake cases responsive. Too aggressive? Too conservative? Worth a dedicated long-lived ticker process instead?Claim TTL. Default 15 minutes before a claim is considered stale and reclaimed. Workers that know they'll run longer should call
heartbeat_claim()periodically. Is 15m the right default, or should it scale with profile (e.g., 60m forbackend-eng, 5m forresearcher)?terminal-based spawn, output to log file.dispatch_onceusessubprocess.Popenwithstart_new_session=Trueand redirects output to~/.hermes/kanban/logs/<id>.log. No stdin. Acceptable, or do we need more — PID recording on the task row? Structured log format? Live-streaming output back to the dispatcher's gateway session?Orchestrator profile design. The
kanban-orchestratorskill plus a recommendation to restrict toolsets to[kanban, gateway, memory]is the proposed fix for the "orchestrator does the work itself" failure mode raised by @sudo_relax. Is this enough, or do we need kernel-level enforcement (a "router-only" profile flag that the dispatcher honors)?Running-agent guard bypass.
/kanbanis in the bypass list (same tier as/background). Mutations are allowed mid-run because the board is profile-agnostic and doesn't touch the running agent's state. Worth a stricter rule — mutations gated, reads allowed?What's left out. Deliberately not in v1: per-tenant access control, cross-tenant links, tenant-scoped profile definitions, round-robin worker pools, auto-assignment ("any idle profile claims it"), smart routing, per-agent budgets, approval gates, fleet dashboards, org-chart types. All user-space (plugins or profile conventions). If any of these feel like they belong in the kernel, say so now.
Bugs, edge cases, race conditions — the usual. The concurrent-claim race is tested; the stale-claim recovery is tested; cycle detection in
link_tasksis tested (caught a bug during implementation — direction of graph walk). What else should be in the test matrix?What Kanban does NOT do (intentionally)
gcremoves scratch workspace dirs for archived tasks.run_agent.py,model_tools.py, or any tool schema.How to try it locally
The two skills (
kanban-worker+kanban-orchestrator) are inskills/devops/and load like any other skill.For a full worked example, see spec §5 (research triage), §6 (the 8 patterns), and §9 (50-account fleet example).
Related systems & design input
The design synthesizes three existing systems plus one April-2026 release:
hermes profile export), and@namedelegation syntax (implemented as P6).Community design input from the Nous Discord design thread, credited in the PR body: @Teknium, waxhy, A Real Icehole, Keimpe, LLM.STORE, caco, hunter_cat, djm, ionmanden, psbd, Aiz, Rikllo, sudo_relax, neo2k8.
Timing
Happy to let this sit for review. The PR is standing — not merged pending design approval. If something needs to change before we ship, flag it on the PR directly. Higher-level design concerns (primitives, scope boundaries, naming) go here on the RFC.
Thanks in advance for the eyes.