Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-MCP-STOP-CONTROL ST1 — pause/resume wrappers#142

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-MCP-STOP-CONTROL-ST1
May 23, 2026
Merged

feat(kora): KR-MCP-STOP-CONTROL ST1 — pause/resume wrappers#142
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-MCP-STOP-CONTROL-ST1

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Adds kora__request_pause and kora__request_resume MCP tools wrapping the existing OperationalStateHolder — no duplicate state-machine code.

  • kora__request_pauseACTIVE → PAUSED via holder.transition_to(PrimaryState.PAUSED, trigger=reason). Cap-gated on kora__request_pause in mcp_callers.yaml.
  • kora__request_resumePAUSED → READY (see K-DG note below). Cap-gated on kora__request_resume.

Two new entries in ST2_TOOL_DESCRIPTORS + ST2_TOOL_DISPATCH (now 7 each). Audit emit via _emit_audit (KR-AUDIT-JSONL-SINK dual-write).

K-DG drift — resume targets READY, not ACTIVE

The bucket spec proposes PAUSED → ACTIVE. The actual R4.1 §9.1 TRANSITION_TABLE in agent/operational_state_holder.py has no PAUSED → ACTIVE entry; the canonical recovery edge from PAUSED is PAUSED → READY (operator-clearance / triage-complete recovery — the same edge the substrate-level kora_control L0 reset surfaces).

Resume therefore targets READY; the next claim cycle moves READY → ACTIVE naturally. This is semantically what "resume" should mean — "she's eligible to claim work again," not "she's holding a claim again." Verified by test_resume_from_paused_succeeds (asserts to_state == \"ready\" + holder.current.primary_state is PrimaryState.READY).

Capability distinctness

kora__request_pause and kora__request_resume are separate caps from kora__request_state_transition (the full-transition cap added in KR-MCP-RUNTIME-SURFACE ST2). The operator can grant pause/resume without the full state-machine cap, and vice-versa. Tests:

  • test_transition_cap_does_not_grant_pause — caller with only kora__request_state_transition-32001 capability_denied on pause.
  • test_transition_cap_does_not_grant_resume — same caller → -32001 on resume.
  • test_pause_without_capability_denied / test_resume_without_capability_denied — empty-caps caller.

ST2 deferred

kora__request_stop (L1/L2 levels writing to the kora_control SECDEF) is scoped for ST2 of this bucket — needs a Python writer for public.issue_kora_control(...) plus an actor_id: <uuid> field on mcp_callers.yaml entries (no Python writer exists today; entries lack the field).

Test plan

  • All 16 new tests pass: pytest tests/kora_cli/test_listeners/test_mcp_tools_stop_control.py
  • Full tests/kora_cli/test_listeners/ regression green (235 passed) — no MCP-surface regressions
  • Full repo regression delta vs base: zero new failures (47 failed pre-existing on feature/phase2-upgrades)
  • Descriptors surface in /mcp/tools/list with requires_cap_gate=True + dev_only=False
  • SECURITY: caller bearer token never appears in any error envelope (asserted across invalid-transition + bad-reason paths)

🤖 Generated with Claude Code

Adds kora__request_pause and kora__request_resume to the MCP
runtime surface. Both wrap the existing OperationalStateHolder
state-machine via holder.transition_to(...) — no duplicate
TRANSITION_TABLE logic.

  - kora__request_pause: ACTIVE → PAUSED. Caller must hold
    kora__request_pause capability in mcp_callers.yaml.
  - kora__request_resume: PAUSED → READY. Caller must hold
    kora__request_resume capability.

K-DG note on the resume edge: the bucket spec proposes
PAUSED → ACTIVE, but the actual R4.1 §9.1 TRANSITION_TABLE has
PAUSED → READY (operator-clearance recovery) — there is no
PAUSED → ACTIVE entry. Resume therefore targets READY; the next
claim cycle moves READY → ACTIVE naturally. This is the
semantically-correct edge for a "she's eligible to work again"
intent ("she's holding a claim again" was never what resume
meant — claims are acquired separately).

Capability gating is DISTINCT from kora__request_state_transition.
An operator can grant pause/resume without granting the full
transition cap, and vice-versa — see test_transition_cap_does_
not_grant_pause/resume.

Errors:
  - Invalid transition (e.g. pause when STOPPED, resume when
    BOOTING) → JSON-RPC -32602 with from_state in message.
  - Missing/empty reason → -32602.
  - Caller lacks capability → -32001 capability_denied with
    required_capability in error data.

Audit: _emit_audit dual-writes the structured-log line and the
KR-AUDIT-JSONL-SINK row; caller_actor_kind is tagged in both.

Tests at tests/kora_cli/test_listeners/test_mcp_tools_stop_
control.py cover all 7 ST1 scenarios from spec §2 plus 9 extras
including the transition-cap distinctness pair and the
"bearer token never in error envelope" security invariant
(asserted across both invalid-transition and bad-reason error
paths).

ST2 (kora__request_stop writing kora_control SECDEF) deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 63093c6 into feature/phase2-upgrades May 23, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-MCP-STOP-CONTROL-ST1 branch May 23, 2026 19:46
rafe-walker added a commit that referenced this pull request May 23, 2026
…152)

CI determinism restored. 77 insertions across 3 files — test-side only, no production changes.

Diagnostic findings:
1. Two leak sources identified — both from CC#3 prior buckets:
   - test_mcp_tools_stop_control.py (PR #142, ST1) — 13 sites set h_mod._HOLDER = holder with no teardown
   - test_mcp_audit_on_denial.py (PR #150, ST3) — 1 success-path test does the same
2. Why xdist surfaces it: xdist workers are separate processes (no cross-worker bleed), but within a worker tests run sequentially in scheduler order — and xdist worksteal scheduler does not preserve serial mode alphabetic file ordering that happens to put handlers/ before test_listeners/.
3. Why serial does not flake: alphabetic ordering means email handler tests always precede the mutating test_listeners tests in serial mode.

Fix: three autouse _reset_operational_state_holder fixtures — two plug the leak sources, the third is defensive belt-and-suspenders on the email handler tests so a future leak source cannot re-poison them. Pattern documented in fixture docstrings + PR body.

Production code (the _HOLDER module-level singleton) untouched — spec §3 explicit non-scope. Test-side fix only.

30/30 xdist verification runs all green; 5/5 spec-mandated runs all green; zero new failures vs ST3 baseline.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant