Skip to content

feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect)#23761

Closed
Interstellar-code wants to merge 1 commit into
NousResearch:mainfrom
Interstellar-code:feat/kanban-worker-tracking
Closed

feat(kanban): worker visibility endpoints (workers/active, runs/{id}, inspect)#23761
Interstellar-code wants to merge 1 commit into
NousResearch:mainfrom
Interstellar-code:feat/kanban-worker-tracking

Conversation

@Interstellar-code

Copy link
Copy Markdown
Contributor

Summary

Adds three read-only endpoints to the kanban dashboard plugin so the dashboard, the workspace UI, or any external consumer can track workers across tasks without N+1 round-trips through /tasks/{task_id}.

Route Purpose
GET /workers/active List every currently-running worker on the board (single SQL JOIN)
GET /runs/{run_id} Direct lookup of any task_runs row by id (404 when missing)
GET /runs/{run_id}/inspect Live PID stats via psutil — cpu, memory, threads, fds, status, cmdline

Why

Today the only way to enumerate active workers is to fetch the full board, filter for status='running', then GET /tasks/{id} for each — N+1 round-trips just to render an "active workers" pane. Likewise, a run_id shown in logs or events can only be resolved by knowing which task it belongs to; there's no direct fetch.

_run_dict and kanban_db.get_run already exist; this PR just exposes them at the route level and adds the cross-task active-worker JOIN + psutil inspector.

Endpoint details

GET /workers/active

SQL JOIN of task_runs + tasks where r.ended_at IS NULL AND r.worker_pid IS NOT NULL AND t.status='running'. Returns:
```json
{
"workers": [
{
"run_id": 42, "task_id": "t_b621", "task_title": "...",
"task_status": "running", "task_assignee": "neo",
"profile": "claude", "worker_pid": 88421,
"started_at": 1778425200, "claim_lock": "mac:88421",
"claim_expires": 1778428800, "last_heartbeat_at": 1778425230,
"max_runtime_seconds": 3600
}
],
"count": 1,
"checked_at": 1778425280
}
```

`GET /runs/{run_id}`

Mirrors `GET /tasks/{task_id}` 404 pattern. Returns `{run: _run_dict(r)}` or 404.

`GET /runs/{run_id}/inspect`

Short-circuits with `{alive: false, reason}` for: run already ended, no `worker_pid` recorded, psutil missing, or PID gone (`NoSuchProcess`). `AccessDenied` returns `{alive: true, pid, error: "access denied"}` rather than a 500. POSIX-only `num_fds` is omitted gracefully on Windows.

Tests

`tests/plugins/test_kanban_worker_runs.py` — 11 new tests:

  • workers/active: empty board, running task present, ended runs filtered, no-pid runs filtered
  • runs/{id}: 404 unknown id, ok with shape
  • runs/{id}/inspect: 404, already-ended reason, no-pid reason, dead-pid reason, live-pid stats (psutil mocked)

All pass under `uv run --extra dev --extra web pytest tests/plugins/test_kanban_worker_runs.py`.

Out of scope

A public `POST /runs/{run_id}/terminate` endpoint (wrapping the existing internal `_terminate_reclaimed_worker`) is being proposed in a separate issue, since its RBAC + soft-cancel-vs-force design needs maintainer input before code. Read-only endpoints land here first.

Validation

  • `pytest tests/plugins/test_kanban_worker_runs.py` → 11 passed
  • Style matches existing kanban plugin routes (docstrings, `_resolve_board` + `_conn` pattern, HTTPException 404 shape)
  • psutil import is gated; missing-psutil path returns `alive: false` instead of crashing

🤖 Generated with Claude Code

… inspect)

Adds three read-only endpoints to the kanban dashboard plugin so the
SwitchUI workspace (and any other dashboard consumer) can track
workers across tasks without N+1 round-trips through /tasks/{task_id}.

- GET /workers/active
  Single SQL JOIN of task_runs + tasks where ended_at IS NULL,
  worker_pid IS NOT NULL, status='running'. Returns
  {workers: [...], count, checked_at}.

- GET /runs/{run_id}
  Direct lookup of any task_run row by id. Reuses existing
  kanban_db.get_run() helper and _run_dict() serialiser. 404 when
  not found. Mirrors GET /tasks/{task_id} 404 shape.

- GET /runs/{run_id}/inspect
  Live PID stats via psutil.Process.as_dict() — cpu_percent,
  memory_rss_bytes, memory_vms_bytes, num_threads, num_fds, status,
  create_time, cmdline. Short-circuits with alive:false when run
  has ended, has no worker_pid, the pid is gone, or psutil is
  unavailable. AccessDenied surfaces as alive:true with error
  rather than a 500.

11 new tests in tests/plugins/test_kanban_worker_runs.py cover the
empty-board case, running-task case, ended-run filtering,
missing-pid filtering, 404 paths, already-ended inspect, no-pid
inspect, dead-pid inspect, and live-pid inspect (psutil mocked).
All pass.

Companion termination endpoint (POST /runs/{run_id}/terminate) is
intentionally out of scope here — opening a separate issue first
since the RBAC and dispatcher-mediated soft-cancel design needs
maintainer input before code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Interstellar-code

Copy link
Copy Markdown
Contributor Author

Related: #23762 (companion design discussion for the POST /runs/{run_id}/terminate endpoint).

@teknium1

Copy link
Copy Markdown
Contributor

Merged via PR #28432 (cherry-picked onto current main with your authorship preserved via rebase-merge — commit 02efad7). Thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/plugins Plugin system and bundled plugins P3 Low — cosmetic, nice to have type/feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants