This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-MCP-CONSUMPTION ST2 — active health-check + payload extension#113
Merged
rafe-walker merged 1 commit intoMay 22, 2026
Merged
Conversation
…tension
Closes the KR-MCP-CONSUMPTION bucket. After this PR, the heartbeat
scheduler periodically polls each pool endpoint + the
`/api/mcp/clients/list` panel surfaces live `connected` status +
real `tools_count` + per-cycle `last_check_at` / `last_error`.
# Periodic health-check (kora_cli/listeners/mcp_consumption.py)
- `run_health_check()` — one pass: for each endpoint in the
current pool, call `pool.list_tools(prefix)`; cache result as
`HealthSnapshot(connected, tools_count, last_check_at, last_error)`.
- Per-endpoint failure (MCPCallFailed or any other exception)
captured on snapshot — does NOT crash the scheduler.
- Per-endpoint 30s timeout via the endpoint's configured
`timeout_seconds` (catalog defaults already set; matches §4 Q1).
- `current_health_snapshots()` returns a defensive copy — caller
mutations don't leak into the internal cache.
- Shutdown clears the cache (avoid stale pre-restart data
bleeding into the post-restart panel view).
- Module-level `register_periodic_task("mcp.health_check",
interval_seconds=_read_health_check_interval(), callable=run_health_check)`
fires at import time.
# Cadence (§4 Q1: 5min ruling)
- DEFAULT_HEALTH_CHECK_INTERVAL_SEC = 300.0
- KORA_MCP_HEALTH_CHECK_INTERVAL_SEC env override
- Invalid (non-numeric / ≤0) values WARN-log + fall back to default
# Additive pool API
`kora_mcp/pool.py:MCPClientPool.list_tools(prefix)` — per-endpoint
list_tools with full error surfacing. Mirrors `call_tool`'s
lazy-open + per-call timeout + cache-drop-on-error pattern;
raises MCPCallFailed on transport / protocol failure (vs
`list_tools_all` which catches per-endpoint errors + maps to
empty lists). Needed by the health-check task to populate
`last_error` per prefix.
# Endpoint payload extension (kora_cli/web_server.py)
`/api/mcp/clients/list` now reads snapshots from
`current_health_snapshots()` and derives status:
- auth env unset / empty → unhealthy
- auth env set, no snapshot OR stale OR not connected → configured_but_unconnected
- auth env set + fresh snapshot + connected → connected
A snapshot is "stale" when `last_check_at` is older than the
health-check cadence (compares `(now - last_check_at) > timedelta(seconds=cadence)`).
Missed heartbeat cycles surface as `configured_but_unconnected`
rather than falsely reporting `connected`.
Additive payload fields (no breaking change for FE):
- last_check_at: ISO string when snapshot was taken (or null)
- last_error: operator-readable failure string (or null)
- tools_count: from snapshot (null when not connected; preserves
last-known count when stale + previously-connected)
# TS interface (web/src/lib/api.ts)
`MCPClient` interface extended with `last_check_at: string | null`
and `last_error: string | null`. Additive — existing consumers
keep working; new FE bucket (KR-MCP-CLIENTS-HEALTH-DISPLAY) lands
the operator-visible rendering on CC#2's lane.
# Security contract preserved (carried from KR-MCP-3 + FLIP)
Both CC#2 security tests stay green:
- Regex pin on auth_token_env shape (UPPER_SNAKE)
- Walk-all-keys guard against token-value-shaped fields
Neither last_check_at nor last_error introduces a token-bearing
field. last_error is the str(exception) from MCPCallFailed which
is constructed deliberately to NOT include sensitive substrate
state.
# Tests
29 new tests across 2 files:
`tests/kora_cli/test_listeners/test_mcp_consumption_health.py`
(17 tests):
- run_health_check skips cleanly when no pool
- One pass populates snapshot per endpoint
- Successful list_tools → connected=True + tools_count=N
- MCPCallFailed → connected=False + last_error preserved
- Unknown exception type wrapped with ExceptionType prefix
- Per-endpoint failure doesn't block siblings
- current_health_snapshots returns a copy
- Shutdown clears cache (with + without prior startup)
- _read_health_check_interval: default / env override /
invalid / non-positive / negative fallback
- Periodic task registered in PERIODIC_TASK_REGISTRY at import
`tests/kora_cli/test_web_server_mcp_clients.py` (5 new tests +
1 updated shape test):
- Updated `required = {...}` set to include last_check_at +
last_error
- Connected snapshot → status=connected + tools_count + ISO
last_check_at
- Failed snapshot → status=configured_but_unconnected +
last_error surfaced
- Stale snapshot (older than cadence) → status=configured_but_
unconnected even if snapshot.connected=True
- No snapshot → status=configured_but_unconnected + all
additive fields null
- Auth env unset → status=unhealthy overrides any snapshot;
additive fields null
102/102 cross-bucket regression (test_listeners/ +
test_web_server_mcp_clients.py + tests/kora_mcp/) clean.
1 expected skip (integration test gated behind
KORA_INTEGRATION_TEST=1). Ruff clean.
Manual smoke: `curl localhost:9119/api/mcp/clients/list` returns
the new shape with last_check_at + last_error fields (both null
when no daemon/snapshots present, which matches the
unhealthy-because-no-auth-env path).
# After ST2 merges
External MCP calls wired end-to-end. The MCP-clients UI surfaces
live connection status + tools_count once the daemon's heartbeat
scheduler completes its first cycle (~5min after start). FE
visualization of last_check_at / last_error is a small CC#2
follow-on (KR-MCP-CLIENTS-HEALTH-DISPLAY).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4 tasks
rafe-walker
added a commit
that referenced
this pull request
May 22, 2026
…st_error (#117) Small FE follow-on. CC#1 KR-MCP-CONSUMPTION ST2 (PR #113) landed the fields in payload + TS interface; this PR renders them. - mcpHealth.ts helpers (stale threshold + truncation constants) - MCPClientsPanel.tsx — collapsed row gets relative timestamp + stale chip + truncated error line; expanded view gets Last Check section; aggregate strip unions last_error !== null + adds stale count. - 16 source-pin tests. 3-layer security contract: React default escaping + dangerouslySetInnerHTML ban (with comment-stripping so warning comment does not self-trigger) + plain string | null TS type. 16/16 new tests pass; tsc + vite clean. Pre-existing failures in test_web_server_mcp_clients.py (13 failed + 5 errors) confirmed on bare base — flagged in PR body as separate cleanup.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the KR-MCP-CONSUMPTION bucket. After this PR the heartbeat
scheduler periodically polls each pool endpoint + the
`/api/mcp/clients/list` panel surfaces live `connected` status,
real `tools_count`, and per-cycle `last_check_at` / `last_error`.
Bucket spec: `17_cc_bucket_prompts/KR-MCP-CONSUMPTION_pool_as_daemon_listener.md`
§4 ruling applied
5-minute default cadence per PM ruling. Operator override via
`KORA_MCP_HEALTH_CHECK_INTERVAL_SEC` (invalid/non-positive
values WARN + fall back to 300s).
Surface
Status derivation
Snapshot "stale" when `last_check_at` older than the cadence —
missed heartbeat cycles surface as `configured_but_unconnected`
rather than falsely reporting `connected`.
Security contract preserved
Both CC#2 security tests stay green:
`last_error` is the `str()` of an MCPCallFailed which is
constructed deliberately to not carry sensitive substrate state.
Test plan
`KORA_INTEGRATION_TEST=1`)
snapshots, populated after a cycle)
Out of scope (deferred to follow-on buckets)
KR-MCP-CLIENTS-HEALTH-DISPLAY (CC#2 lane, small)
integration bucket
once we decide which calls deserve substrate visibility
Cascade
Base: `feature/phase2-upgrades`. Closes KR-MCP-CONSUMPTION bucket
arc.
🤖 Generated with Claude Code