Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-MCP-CONSUMPTION ST2 — active health-check + payload extension#113

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-MCP-CONSUMPTION-ST2
May 22, 2026
Merged

feat(kora): KR-MCP-CONSUMPTION ST2 — active health-check + payload extension#113
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-MCP-CONSUMPTION-ST2

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Closes the KR-MCP-CONSUMPTION bucket. After this PR the heartbeat
scheduler periodically polls each pool endpoint + the
`/api/mcp/clients/list` panel surfaces live `connected` status,
real `tools_count`, and per-cycle `last_check_at` / `last_error`.

Bucket spec: `17_cc_bucket_prompts/KR-MCP-CONSUMPTION_pool_as_daemon_listener.md`

§4 ruling applied

5-minute default cadence per PM ruling. Operator override via
`KORA_MCP_HEALTH_CHECK_INTERVAL_SEC` (invalid/non-positive
values WARN + fall back to 300s).

Surface

Layer Addition
`kora_mcp/pool.py` + `MCPClientPool.list_tools(prefix)` — per-endpoint with full error surfacing (mirrors call_tool error model)
`kora_cli/listeners/mcp_consumption.py` + `HealthSnapshot` + `run_health_check` + `current_health_snapshots` + `_read_health_check_interval` + `register_periodic_task("mcp.health_check", ...)`
`kora_cli/web_server.py:list_mcp_clients` Endpoint reads snapshots + derives status; adds `last_check_at` + `last_error`
`web/src/lib/api.ts:MCPClient` Additive fields `last_check_at: string | null` + `last_error: string | null`

Status derivation

Condition status
auth env unset / empty `unhealthy`
auth env set, no snapshot OR stale OR connected=False `configured_but_unconnected`
auth env set + fresh snapshot + connected=True `connected`

Snapshot "stale" when `last_check_at` older than the cadence —
missed heartbeat cycles surface as `configured_but_unconnected`
rather than falsely reporting `connected`.

Security contract preserved

Both CC#2 security tests stay green:

  • Regex pin on `auth_token_env` shape
  • Walk-all-keys guard against token-value-shaped fields

`last_error` is the `str()` of an MCPCallFailed which is
constructed deliberately to not carry sensitive substrate state.

Test plan

  • 29 new tests (17 listener health + 5 endpoint snapshot wiring + 6 interval/registration + 1 updated shape)
  • 102/102 cross-bucket regression
  • 1 expected skip (integration test gated behind
    `KORA_INTEGRATION_TEST=1`)
  • Ruff clean
  • Manual smoke verifies new payload shape (null when no
    snapshots, populated after a cycle)

Out of scope (deferred to follow-on buckets)

  • FE rendering of `last_check_at` / `last_error` →
    KR-MCP-CLIENTS-HEALTH-DISPLAY (CC#2 lane, small)
  • Tool-call routing through pool from agent loop → runtime
    integration bucket
  • `kora.mcp.tool_called` audit chain events → KR-MCP-AUDIT
    once we decide which calls deserve substrate visibility

Cascade

Base: `feature/phase2-upgrades`. Closes KR-MCP-CONSUMPTION bucket
arc.

🤖 Generated with Claude Code

…tension

Closes the KR-MCP-CONSUMPTION bucket. After this PR, the heartbeat
scheduler periodically polls each pool endpoint + the
`/api/mcp/clients/list` panel surfaces live `connected` status +
real `tools_count` + per-cycle `last_check_at` / `last_error`.

# Periodic health-check (kora_cli/listeners/mcp_consumption.py)

  - `run_health_check()` — one pass: for each endpoint in the
    current pool, call `pool.list_tools(prefix)`; cache result as
    `HealthSnapshot(connected, tools_count, last_check_at, last_error)`.
  - Per-endpoint failure (MCPCallFailed or any other exception)
    captured on snapshot — does NOT crash the scheduler.
  - Per-endpoint 30s timeout via the endpoint's configured
    `timeout_seconds` (catalog defaults already set; matches §4 Q1).
  - `current_health_snapshots()` returns a defensive copy — caller
    mutations don't leak into the internal cache.
  - Shutdown clears the cache (avoid stale pre-restart data
    bleeding into the post-restart panel view).
  - Module-level `register_periodic_task("mcp.health_check",
    interval_seconds=_read_health_check_interval(), callable=run_health_check)`
    fires at import time.

# Cadence (§4 Q1: 5min ruling)

  - DEFAULT_HEALTH_CHECK_INTERVAL_SEC = 300.0
  - KORA_MCP_HEALTH_CHECK_INTERVAL_SEC env override
  - Invalid (non-numeric / ≤0) values WARN-log + fall back to default

# Additive pool API

`kora_mcp/pool.py:MCPClientPool.list_tools(prefix)` — per-endpoint
list_tools with full error surfacing. Mirrors `call_tool`'s
lazy-open + per-call timeout + cache-drop-on-error pattern;
raises MCPCallFailed on transport / protocol failure (vs
`list_tools_all` which catches per-endpoint errors + maps to
empty lists). Needed by the health-check task to populate
`last_error` per prefix.

# Endpoint payload extension (kora_cli/web_server.py)

`/api/mcp/clients/list` now reads snapshots from
`current_health_snapshots()` and derives status:

  - auth env unset / empty                              → unhealthy
  - auth env set, no snapshot OR stale OR not connected → configured_but_unconnected
  - auth env set + fresh snapshot + connected           → connected

A snapshot is "stale" when `last_check_at` is older than the
health-check cadence (compares `(now - last_check_at) > timedelta(seconds=cadence)`).
Missed heartbeat cycles surface as `configured_but_unconnected`
rather than falsely reporting `connected`.

Additive payload fields (no breaking change for FE):
  - last_check_at: ISO string when snapshot was taken (or null)
  - last_error:    operator-readable failure string (or null)
  - tools_count:   from snapshot (null when not connected; preserves
                   last-known count when stale + previously-connected)

# TS interface (web/src/lib/api.ts)

`MCPClient` interface extended with `last_check_at: string | null`
and `last_error: string | null`. Additive — existing consumers
keep working; new FE bucket (KR-MCP-CLIENTS-HEALTH-DISPLAY) lands
the operator-visible rendering on CC#2's lane.

# Security contract preserved (carried from KR-MCP-3 + FLIP)

Both CC#2 security tests stay green:
  - Regex pin on auth_token_env shape (UPPER_SNAKE)
  - Walk-all-keys guard against token-value-shaped fields

Neither last_check_at nor last_error introduces a token-bearing
field. last_error is the str(exception) from MCPCallFailed which
is constructed deliberately to NOT include sensitive substrate
state.

# Tests

29 new tests across 2 files:

`tests/kora_cli/test_listeners/test_mcp_consumption_health.py`
(17 tests):
  - run_health_check skips cleanly when no pool
  - One pass populates snapshot per endpoint
  - Successful list_tools → connected=True + tools_count=N
  - MCPCallFailed → connected=False + last_error preserved
  - Unknown exception type wrapped with ExceptionType prefix
  - Per-endpoint failure doesn't block siblings
  - current_health_snapshots returns a copy
  - Shutdown clears cache (with + without prior startup)
  - _read_health_check_interval: default / env override /
    invalid / non-positive / negative fallback
  - Periodic task registered in PERIODIC_TASK_REGISTRY at import

`tests/kora_cli/test_web_server_mcp_clients.py` (5 new tests +
1 updated shape test):
  - Updated `required = {...}` set to include last_check_at +
    last_error
  - Connected snapshot → status=connected + tools_count + ISO
    last_check_at
  - Failed snapshot → status=configured_but_unconnected +
    last_error surfaced
  - Stale snapshot (older than cadence) → status=configured_but_
    unconnected even if snapshot.connected=True
  - No snapshot → status=configured_but_unconnected + all
    additive fields null
  - Auth env unset → status=unhealthy overrides any snapshot;
    additive fields null

102/102 cross-bucket regression (test_listeners/ +
test_web_server_mcp_clients.py + tests/kora_mcp/) clean.
1 expected skip (integration test gated behind
KORA_INTEGRATION_TEST=1). Ruff clean.

Manual smoke: `curl localhost:9119/api/mcp/clients/list` returns
the new shape with last_check_at + last_error fields (both null
when no daemon/snapshots present, which matches the
unhealthy-because-no-auth-env path).

# After ST2 merges

External MCP calls wired end-to-end. The MCP-clients UI surfaces
live connection status + tools_count once the daemon's heartbeat
scheduler completes its first cycle (~5min after start). FE
visualization of last_check_at / last_error is a small CC#2
follow-on (KR-MCP-CLIENTS-HEALTH-DISPLAY).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 94e572d into feature/phase2-upgrades May 22, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-MCP-CONSUMPTION-ST2 branch May 22, 2026 06:40
rafe-walker added a commit that referenced this pull request May 22, 2026
…st_error (#117)

Small FE follow-on. CC#1 KR-MCP-CONSUMPTION ST2 (PR #113) landed the fields in payload + TS interface; this PR renders them.

- mcpHealth.ts helpers (stale threshold + truncation constants)
- MCPClientsPanel.tsx — collapsed row gets relative timestamp + stale chip + truncated error line; expanded view gets Last Check section; aggregate strip unions last_error !== null + adds stale count.
- 16 source-pin tests.

3-layer security contract: React default escaping + dangerouslySetInnerHTML ban (with comment-stripping so warning comment does not self-trigger) + plain string | null TS type.

16/16 new tests pass; tsc + vite clean.

Pre-existing failures in test_web_server_mcp_clients.py (13 failed + 5 errors) confirmed on bare base — flagged in PR body as separate cleanup.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant