feat(kora): KR-FE-PROBE-INVESTIGATION-VIEWER — wake→reason→DM xref panel by rafe-walker · Pull Request #171 · rafe-walker/kora

rafe-walker · 2026-05-24T03:14:38Z

Summary

New cockpit page + GET /api/probe-investigations endpoint that joins three sources per probe wake event so the operator sees, in one place, what Kora actually did about each issue the heartbeat probes detected.

The unified-operator-interface "Kora actually acts" loop is now flowing data (PR #163 wake emitter + PR #166 wake consumer + this panel). Read-only v1.

Screenshots

Populated state (3 mock cards illustrating the variants):

Empty state (what most operators will see at first — PR #166 only merged today, so most envs have zero wakes):

Per spec — empty state is a calm "everything's healthy" message with a green Sparkles glyph, NOT an empty card list. Source-pinned by test_empty_state_message_in_page so the reassurance copy survives later refactors.

caller_session_id pattern verification

Spec said the join key is "probe:{probe}:{category}". Verified at kora_cli/reasoning/anthropic_engine.py:1283-1289:

if message.source == "probe_investigation":
    probe = meta.get("probe_name") or "unknown"
    category = meta.get("issue_category") or "unknown"
    return f"probe:{probe}:{category}"

Endpoint uses the same literal at kora_cli/web_server.py:_probe_caller_session_id. Drift-guarded by test_caller_session_id_matches_reasoning_engine which greps both source files for the f-string shape and fails if either side moves.

STOP-ASKs (3 raised by spec §4 — all resolved inline per spec license to simplify v1)

"_append_outbound_log_entry doesn't carry caller_session_id" → Reality is worse: probe DMs aren't in slack_dm_log.jsonl AT ALL. wake_consumer._send_operator_dm calls client.post_dm() directly without the SlackDMHandler outbound-log path. Resolution: omit dm_sent from the response entirely rather than fabricating. Flagged to operator via V1NotesBanner + follow-on bucket KR-PROBE-DM-JSONL-WIRE.
"caller_session_id pattern differs" → Spec was correct. Grepped and verified — pinned by drift guard test.
"Resolution-status semantics require probe-observation timeline" → Confirmed no per-probe observation timeline exists in v1 substrate. Resolution: per spec's "likely simplifies fine for v1" license, mapped current snapshot.service_health[probe] → resolved (healthy) / active (unhealthy or degraded) / unknown. Stale wakes naturally fall to unknown. Operator's real question ("is this one still firing?") is answered.

What this PR contains

Backend (kora_cli/web_server.py):

GET /api/probe-investigations?window={24h|7d|all}&limit={1-200} — joins 3 sources, returns summary counts + items
_PROBE_CALLER_SESSION_RE regex pre-filter so non-probe reasoning.tool_called rows (slack_dm:, email:, mcp:) are dropped before the join
_project_reasoning_call() projection that whitelists only the fields the audit writer actually emits — defensive against future writers adding result/arguments (SECURITY test pin)

Frontend:

web/src/pages/ProbeInvestigationsPage.tsx (new) — summary header, v1_notes banner, per-card layout with severity colour + resolution badge + tool-call chips, empty state
api.getProbeInvestigations() + ProbeInvestigationsResponse / ProbeInvestigationItem / ProbeReasoningToolCall / ProbeResolutionStatus types in web/src/lib/api.ts
/probe-investigations route + sidebar nav entry (Sparkles icon; placed right after /heartbeat for the natural operator-flow: raw probe state → what Kora did about it)
usePanelView(\"ProbeInvestigationsPage\") per feat(kora): KR-PANEL-USE-INSTRUMENTATION — panel-view event emit (data-driven cut prerequisite) #159 discipline

v1 deferred (surfaced via inline V1NotesBanner so the roadmap is visible)

Field	Why deferred	Follow-on
per-call `cost_usd` / `model_used`	not durably recorded per-call (only aggregated in CostTelemetry)	link to `/cost-telemetry?route=probe_investigation`; `KR-PROBE-INVESTIGATION-COST-XREF`
`dm_sent` confirmation	probe DMs bypass `slack_dm_log.jsonl`	`KR-PROBE-DM-JSONL-WIRE`
Investigation summary text	engine response sent to Slack only, not persisted	`KR-REASONING-RESPONSE-AUDIT`
Time-windowed resolution ("resolved at T+N")	no per-probe observation timeline in substrate	substrate work, future

The V1NotesBanner shows these inline so no operator assumes they'll never come.

Test plan

tests/kora_cli/test_probe_investigations_endpoint.py — 26 tests, all passing:
- Backend (14): caller_session_id drift guard, empty audit, window filtering (24h/7d/all), non-probe SIDs ignored, resolution_status (parametrized over 5 health values), wake-without-reasoning → investigation=null, wake-with-reasoning joined correctly, limit cap + clamping, tool_calls chronologised, any_errored flag, exc_type carry-through, SECURITY: response doesn't echo raw tool bodies
- FE source-pins (8): api wrapper, types declared, item fields, page exists + usePanelView, route + nav, empty state copy committed, v1_notes banner rendered
pnpm tsc -b clean
pnpm build clean
Manual smoke: open /probe-investigations against a daemon with a fresh wake → entries render, switch window → reloads, switch to 24h with no recent wakes → empty state shows

Refs

rafe-walker/kora-docs → 17_cc_bucket_prompts/KR-FE-PROBE-INVESTIGATION-VIEWER_wake_reason_dm_xref.md
PR feat(kora): KR-PROBE-AUDIT-AND-CONVERT — cheap-cron + wake-event + fix-envelope per probe #163 — probe.wake_requested wake_emitter (audit source 1)
PR feat(kora): KR-PROBE-WAKE-CONSUMER — wake event → reasoning → DM operator #166 — probe wake consumer + caller_session_id wiring (audit source 2)
PR feat(kora): KR-CHEAP-PRE-WARMED-SNAPSHOT — daemon state every 5 min at zero LLM cost #157 — snapshot infrastructure (current_probe_health source 3)
PR feat(kora): KR-PANEL-USE-INSTRUMENTATION — panel-view event emit (data-driven cut prerequisite) #159 — usePanelView instrumentation pattern
PR feat(kora): KR-FE-COST-TELEMETRY-PANEL — per-route cost visibility #164 — CostTelemetryPage (cross-link target for per-call cost)

🤖 Generated with Claude Code

New cockpit page + GET /api/probe-investigations endpoint that joins three data sources per probe wake event so the operator sees, in one place, what Kora actually did about each issue detected by the heartbeat probes. Joined sources ============== 1. probe.wake_requested audit rows (PR #163 wake_emitter) 2. reasoning.tool_called audit rows keyed by caller_session_id == "probe:{probe}:{category}" (PR #166 wired this caller_session shape via _derive_caller_session_id in anthropic_engine.py — pinned by drift-guard test) 3. snapshot.service_health[probe] (current health → drives resolution_status: resolved / active / unknown) Backend (kora_cli/web_server.py) ================================ GET /api/probe-investigations?window={24h|7d|all}&limit={1-200} Returns summary counts (total / active / resolved / unknown), current_probe_health for the 5 known probes, and per-event items (newest first, capped by limit). Each item carries: * wake metadata (probe / category / severity / title / detail / envelope_enabled / envelope_fix_name) * caller_session_id (the join key, surfaced for debugging) * investigation: { tool_calls: [{tool_name, duration, status, exc_type?}], total_duration_ms, any_errored, call_count } | null when no reasoning rows joined (engine_unavailable etc.) * current_probe_health + resolution_status Resolution semantics simplified for v1 to "currently healthy → resolved / currently unhealthy → active." A per-probe observation timeline doesn't exist in the v1 substrate (only current state), and "is this one still firing?" is the operator's actual question. Frontend ======== * web/src/pages/ProbeInvestigationsPage.tsx — new top-level page; usePanelView("ProbeInvestigationsPage") on mount * api.getProbeInvestigations() + ProbeInvestigationsResponse / ProbeInvestigationItem / ProbeReasoningToolCall / ProbeResolutionStatus TS types in web/src/lib/api.ts * /probe-investigations route + sidebar nav entry. Placed after /heartbeat in the nav so the operator-flow is "Heartbeat (raw probe state) → Probe Investigations (what Kora did about an unhealthy probe)." * Window tabs (24h / 7d / All) reload on click * Per-card severity colour (critical=destructive, warning=yellow, info=outline) mirrors the wake_consumer's _SEVERITY_EMOJI mapping * Per-card resolution badge (Resolved=success / Active=warning / Unknown=outline) * Tool calls rendered as compact mono badges with per-call duration; errored calls toned warning Empty state (spec-critical) =========================== Per the spec — "data has only just started flowing (PR #166 merged today); render a calm 'everything's healthy' message when zero wakes, not an empty card list." When total_count==0, the page renders a green-toned EmptyState card with a Sparkles glyph + "No probe wakes in the last 24 hours" headline + reassuring copy explaining that probes haven't escalated anything. Source-pinned by test_empty_state_message_in_page so no one accidentally regresses it to "No data" later. v1 deferred scope (visible via V1NotesBanner) ============================================= Three things spec asked for that we deliberately omit, surfaced to the operator inline so the roadmap is visible: * Per-call cost_usd / model_used — not durably recorded; aggregate per-route cost lives at /cost-telemetry under route=probe_investigation. Follow-on KR-PROBE-INVESTIGATION-COST-XREF could add per-call records. * DM-sent confirmation — probe DMs go through wake_consumer._send_operator_dm which calls client.post_dm() DIRECTLY, bypassing SlackDMHandler._append_outbound_log_entry. The outbound JSONL has no entry to join, so we can't truthfully say "DM sent at T+N" in v1. Follow-on KR-PROBE-DM-JSONL-WIRE. * Per-investigation summary text / model_used — the engine's response text is sent to Slack only, not recorded. The V1NotesBanner surfaces these to the operator so the roadmap is visible. Pinned by test_v1_notes_banner_rendered. STOP-ASK conditions resolved inline =================================== Per spec §4 license to simplify v1: * "_append_outbound_log_entry doesn't carry caller_session_id" → Reality is stronger: probe DMs aren't in slack_dm_log.jsonl AT ALL. Omitting dm_sent from response rather than fabricating; flagged in v1_notes for follow-on. * "caller_session_id pattern differs" → grepped + verified "probe:{probe}:{category}" matches engine literal at anthropic_engine.py:1289. Pinned by drift-guard test. * "Resolution semantics require probe-observation timeline" → simplified to current health (matches spec's "fine for v1" fallback). Tests (tests/kora_cli/test_probe_investigations_endpoint.py) ============================================================ 26 tests, all passing: Backend (14): * caller_session_id drift guard vs anthropic_engine source * empty audit → calm zero response * window=24h / 7d / all filtering * non-probe caller_session_ids ignored in join * resolution_status parametrized (5 health values) * wake without reasoning → investigation=null * wake with reasoning → investigation populated correctly * limit cap + clamping * tool_calls chronologised within investigation * any_errored flag + exc_type carry-through * SECURITY: response doesn't echo raw tool input/output bodies even if a future writer adds them FE source-pins (8): * api.getProbeInvestigations wrapper * ProbeInvestigationsResponse / Item / ToolCall types * Discriminated investigation field * Page exists + usePanelView wired * Route + nav entry registered * Empty state copy committed (regression guard) * V1NotesBanner rendered Verification ============ * 26/26 tests pass * tsc -b clean * vite build clean * Screenshots rendered at web/docs/probe-investigations-viewer/ (populated.png + empty.png + preview.html sources) Refs: * rafe-walker/kora-docs 17_cc_bucket_prompts/KR-FE-PROBE-INVESTIGATION-VIEWER_wake_reason_dm_xref.md * PR #163 — probe.wake_requested wake_emitter * PR #166 — probe wake consumer + caller_session_id wiring * PR #159 — usePanelView instrumentation pattern * PR #164 — CostTelemetryPage (cross-link target for cost xref) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…esBanner gaps (#184) All 4 audit streams share caller_session_id for the joinable probe investigation timeline: 1. probe.wake_requested (#163) — probe runner emits 2. tool.probe_autofix_attempted (#182) — during investigation 3. probe.investigation_completed (NEW) — model/tokens/cost/summary/dm_status/autofix_attempted 4. slack_dm_log.jsonl entry (NEW path) — wake_consumer DM routes via extracted free function append_outbound_log_entry Key design calls: - _append_outbound_log_entry extracted to free function; handler instance method delegates. Byte-identical JSONL rows from both call sites. - Cost: estimate_usage_cost over telemetry snapshot (same calc as record_inference) — keeps audit-sum-by-day in lockstep with cost-ladder rung. Snapshot approach was racy under concurrent investigations. - dm_status enum combined to 4 values (sent / failed_send / engine_unavailable_fallback / engine_unavailable_failed_send) for single-pass chip-filter. Follow-on flagged: KR-FE-PROBE-INVESTIGATION-VIEWER-V2 (already covered by CC#2's in-flight panel-kit megabucket — Deliverable D will auto-pick up probe.investigation_completed once added). 37 wake_consumer tests (28 existing + 9 new) + 401 cross-bucket regression + ruff clean.

rafe-walker merged commit 37bd929 into feature/phase2-upgrades May 24, 2026

rafe-walker deleted the feat/kora-KR-FE-PROBE-INVESTIGATION-VIEWER branch May 24, 2026 03:16

rafe-walker mentioned this pull request May 24, 2026

KR-PROBE-INVESTIGATION-DATA-COMPLETION — close #171 V1NotesBanner gaps #184

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kora): KR-FE-PROBE-INVESTIGATION-VIEWER — wake→reason→DM xref panel#171

feat(kora): KR-FE-PROBE-INVESTIGATION-VIEWER — wake→reason→DM xref panel#171
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-FE-PROBE-INVESTIGATION-VIEWER

rafe-walker commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rafe-walker commented May 24, 2026

Summary

Screenshots

caller_session_id pattern verification

STOP-ASKs (3 raised by spec §4 — all resolved inline per spec license to simplify v1)

What this PR contains

v1 deferred (surfaced via inline V1NotesBanner so the roadmap is visible)

Test plan

Refs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant