Public termination endpoint for kanban runs (POST /runs/{run_id}/terminate)

# Public termination endpoint for kanban runs (POST /runs/{run_id}/terminate)

## Motivation

`_terminate_reclaimed_worker` in `hermes_cli/kanban_db.py` (~line 3057) already implements
SIGTERM → grace period → SIGKILL, but no HTTP route exposes it. The only user-facing
termination path today is `POST /tasks/{task_id}/reclaim`, which is a _recovery_ action
for stuck/dead workers — not a clean "stop this running task" API. Operators who need to
cancel a live, well-behaved worker have no dashboard or API surface to do so without
SSHing into the host.

Adjacent evidence: issue #22176 (_CLI interrupt /stop not working_) shows user demand for
a stop primitive; a public terminate endpoint would satisfy the same need for tasks already
claimed and running.

## Design options

**Option A — Open endpoint**
`POST /runs/{run_id}/terminate` sends SIGTERM (→ SIGKILL after grace) immediately. Any
authenticated dashboard caller can terminate any run. Simple; matches the "no RBAC layer"
reality of all other dashboard routes today. Downside: no audit trail, no signal to the
dispatcher that the task was deliberately cancelled vs. crashed.

**Option B — Soft-cancel flag (proposed default)**
`POST /runs/{run_id}/terminate` returns 202 immediately and sets a
`runs.cancel_requested = 1` flag. The dispatcher's next tick reads the flag, sends SIGTERM,
waits for grace period, SIGKILLs if needed, and closes the run with `outcome=cancelled`.
`?force=true` skips the flag and sends SIGKILL directly. Advantages: dispatcher-mediated
semantics match how reclaim/claim work elsewhere; `?force` documents destructive intent
explicitly; the flag survives a dashboard restart.

**Option C — Scoped admin token**
Destructive ops (`terminate`, `kill`) require a separate `HERMES_ADMIN_TOKEN` env var
distinct from the dashboard read token. Safer for shared deployments; adds operational
overhead for solo installs.

## Proposed default

**Option B.** Soft-cancel + `?force` escape hatch is the right trade-off: it preserves
dispatcher-mediated semantics (everything goes through the loop), gives the worker a clean
shutdown path, and the `?force` flag makes SIGKILL an explicit opt-in rather than the
default. Option C can layer on top later if multi-user RBAC becomes a requirement.

## Next steps

Will follow up with a PR implementing Option B after design preference is confirmed in this
thread. Read-only sibling endpoints (`GET /workers/active`, `GET /runs/{run_id}`,
`GET /runs/{run_id}/inspect`) land in the companion PR (link to be added once opened).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Public termination endpoint for kanban runs (POST /runs/{run_id}/terminate) #23762

Public termination endpoint for kanban runs (POST /runs/{run_id}/terminate)

Motivation

Design options

Proposed default

Next steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Public termination endpoint for kanban runs (POST /runs/{run_id}/terminate) #23762

Description

Public termination endpoint for kanban runs (POST /runs/{run_id}/terminate)

Motivation

Design options

Proposed default

Next steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions