This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-FE-PROBE-INVESTIGATION-VIEWER — wake→reason→DM xref panel#171
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Conversation
New cockpit page + GET /api/probe-investigations endpoint that joins three data sources per probe wake event so the operator sees, in one place, what Kora actually did about each issue detected by the heartbeat probes. Joined sources ============== 1. probe.wake_requested audit rows (PR #163 wake_emitter) 2. reasoning.tool_called audit rows keyed by caller_session_id == "probe:{probe}:{category}" (PR #166 wired this caller_session shape via _derive_caller_session_id in anthropic_engine.py — pinned by drift-guard test) 3. snapshot.service_health[probe] (current health → drives resolution_status: resolved / active / unknown) Backend (kora_cli/web_server.py) ================================ GET /api/probe-investigations?window={24h|7d|all}&limit={1-200} Returns summary counts (total / active / resolved / unknown), current_probe_health for the 5 known probes, and per-event items (newest first, capped by limit). Each item carries: * wake metadata (probe / category / severity / title / detail / envelope_enabled / envelope_fix_name) * caller_session_id (the join key, surfaced for debugging) * investigation: { tool_calls: [{tool_name, duration, status, exc_type?}], total_duration_ms, any_errored, call_count } | null when no reasoning rows joined (engine_unavailable etc.) * current_probe_health + resolution_status Resolution semantics simplified for v1 to "currently healthy → resolved / currently unhealthy → active." A per-probe observation timeline doesn't exist in the v1 substrate (only current state), and "is this one still firing?" is the operator's actual question. Frontend ======== * web/src/pages/ProbeInvestigationsPage.tsx — new top-level page; usePanelView("ProbeInvestigationsPage") on mount * api.getProbeInvestigations() + ProbeInvestigationsResponse / ProbeInvestigationItem / ProbeReasoningToolCall / ProbeResolutionStatus TS types in web/src/lib/api.ts * /probe-investigations route + sidebar nav entry. Placed after /heartbeat in the nav so the operator-flow is "Heartbeat (raw probe state) → Probe Investigations (what Kora did about an unhealthy probe)." * Window tabs (24h / 7d / All) reload on click * Per-card severity colour (critical=destructive, warning=yellow, info=outline) mirrors the wake_consumer's _SEVERITY_EMOJI mapping * Per-card resolution badge (Resolved=success / Active=warning / Unknown=outline) * Tool calls rendered as compact mono badges with per-call duration; errored calls toned warning Empty state (spec-critical) =========================== Per the spec — "data has only just started flowing (PR #166 merged today); render a calm 'everything's healthy' message when zero wakes, not an empty card list." When total_count==0, the page renders a green-toned EmptyState card with a Sparkles glyph + "No probe wakes in the last 24 hours" headline + reassuring copy explaining that probes haven't escalated anything. Source-pinned by test_empty_state_message_in_page so no one accidentally regresses it to "No data" later. v1 deferred scope (visible via V1NotesBanner) ============================================= Three things spec asked for that we deliberately omit, surfaced to the operator inline so the roadmap is visible: * Per-call cost_usd / model_used — not durably recorded; aggregate per-route cost lives at /cost-telemetry under route=probe_investigation. Follow-on KR-PROBE-INVESTIGATION-COST-XREF could add per-call records. * DM-sent confirmation — probe DMs go through wake_consumer._send_operator_dm which calls client.post_dm() DIRECTLY, bypassing SlackDMHandler._append_outbound_log_entry. The outbound JSONL has no entry to join, so we can't truthfully say "DM sent at T+N" in v1. Follow-on KR-PROBE-DM-JSONL-WIRE. * Per-investigation summary text / model_used — the engine's response text is sent to Slack only, not recorded. The V1NotesBanner surfaces these to the operator so the roadmap is visible. Pinned by test_v1_notes_banner_rendered. STOP-ASK conditions resolved inline =================================== Per spec §4 license to simplify v1: * "_append_outbound_log_entry doesn't carry caller_session_id" → Reality is stronger: probe DMs aren't in slack_dm_log.jsonl AT ALL. Omitting dm_sent from response rather than fabricating; flagged in v1_notes for follow-on. * "caller_session_id pattern differs" → grepped + verified "probe:{probe}:{category}" matches engine literal at anthropic_engine.py:1289. Pinned by drift-guard test. * "Resolution semantics require probe-observation timeline" → simplified to current health (matches spec's "fine for v1" fallback). Tests (tests/kora_cli/test_probe_investigations_endpoint.py) ============================================================ 26 tests, all passing: Backend (14): * caller_session_id drift guard vs anthropic_engine source * empty audit → calm zero response * window=24h / 7d / all filtering * non-probe caller_session_ids ignored in join * resolution_status parametrized (5 health values) * wake without reasoning → investigation=null * wake with reasoning → investigation populated correctly * limit cap + clamping * tool_calls chronologised within investigation * any_errored flag + exc_type carry-through * SECURITY: response doesn't echo raw tool input/output bodies even if a future writer adds them FE source-pins (8): * api.getProbeInvestigations wrapper * ProbeInvestigationsResponse / Item / ToolCall types * Discriminated investigation field * Page exists + usePanelView wired * Route + nav entry registered * Empty state copy committed (regression guard) * V1NotesBanner rendered Verification ============ * 26/26 tests pass * tsc -b clean * vite build clean * Screenshots rendered at web/docs/probe-investigations-viewer/ (populated.png + empty.png + preview.html sources) Refs: * rafe-walker/kora-docs 17_cc_bucket_prompts/KR-FE-PROBE-INVESTIGATION-VIEWER_wake_reason_dm_xref.md * PR #163 — probe.wake_requested wake_emitter * PR #166 — probe wake consumer + caller_session_id wiring * PR #159 — usePanelView instrumentation pattern * PR #164 — CostTelemetryPage (cross-link target for cost xref) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
…esBanner gaps (#184) All 4 audit streams share caller_session_id for the joinable probe investigation timeline: 1. probe.wake_requested (#163) — probe runner emits 2. tool.probe_autofix_attempted (#182) — during investigation 3. probe.investigation_completed (NEW) — model/tokens/cost/summary/dm_status/autofix_attempted 4. slack_dm_log.jsonl entry (NEW path) — wake_consumer DM routes via extracted free function append_outbound_log_entry Key design calls: - _append_outbound_log_entry extracted to free function; handler instance method delegates. Byte-identical JSONL rows from both call sites. - Cost: estimate_usage_cost over telemetry snapshot (same calc as record_inference) — keeps audit-sum-by-day in lockstep with cost-ladder rung. Snapshot approach was racy under concurrent investigations. - dm_status enum combined to 4 values (sent / failed_send / engine_unavailable_fallback / engine_unavailable_failed_send) for single-pass chip-filter. Follow-on flagged: KR-FE-PROBE-INVESTIGATION-VIEWER-V2 (already covered by CC#2's in-flight panel-kit megabucket — Deliverable D will auto-pick up probe.investigation_completed once added). 37 wake_consumer tests (28 existing + 9 new) + 401 cross-bucket regression + ruff clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New cockpit page +
GET /api/probe-investigationsendpoint that joins three sources per probe wake event so the operator sees, in one place, what Kora actually did about each issue the heartbeat probes detected.The unified-operator-interface "Kora actually acts" loop is now flowing data (PR #163 wake emitter + PR #166 wake consumer + this panel). Read-only v1.
Screenshots
Populated state (3 mock cards illustrating the variants):

Empty state (what most operators will see at first — PR #166 only merged today, so most envs have zero wakes):

Per spec — empty state is a calm "everything's healthy" message with a green Sparkles glyph, NOT an empty card list. Source-pinned by
test_empty_state_message_in_pageso the reassurance copy survives later refactors.caller_session_id pattern verification
Spec said the join key is
"probe:{probe}:{category}". Verified atkora_cli/reasoning/anthropic_engine.py:1283-1289:Endpoint uses the same literal at
kora_cli/web_server.py:_probe_caller_session_id. Drift-guarded bytest_caller_session_id_matches_reasoning_enginewhich greps both source files for the f-string shape and fails if either side moves.STOP-ASKs (3 raised by spec §4 — all resolved inline per spec license to simplify v1)
"
_append_outbound_log_entrydoesn't carry caller_session_id" → Reality is worse: probe DMs aren't inslack_dm_log.jsonlAT ALL.wake_consumer._send_operator_dmcallsclient.post_dm()directly without the SlackDMHandler outbound-log path. Resolution: omitdm_sentfrom the response entirely rather than fabricating. Flagged to operator via V1NotesBanner + follow-on bucketKR-PROBE-DM-JSONL-WIRE."caller_session_id pattern differs" → Spec was correct. Grepped and verified — pinned by drift guard test.
"Resolution-status semantics require probe-observation timeline" → Confirmed no per-probe observation timeline exists in v1 substrate. Resolution: per spec's "likely simplifies fine for v1" license, mapped current
snapshot.service_health[probe]→resolved(healthy) /active(unhealthy or degraded) /unknown. Stale wakes naturally fall tounknown. Operator's real question ("is this one still firing?") is answered.What this PR contains
Backend (
kora_cli/web_server.py):GET /api/probe-investigations?window={24h|7d|all}&limit={1-200}— joins 3 sources, returns summary counts + items_PROBE_CALLER_SESSION_REregex pre-filter so non-probereasoning.tool_calledrows (slack_dm:,email:,mcp:) are dropped before the join_project_reasoning_call()projection that whitelists only the fields the audit writer actually emits — defensive against future writers addingresult/arguments(SECURITY test pin)Frontend:
web/src/pages/ProbeInvestigationsPage.tsx(new) — summary header, v1_notes banner, per-card layout with severity colour + resolution badge + tool-call chips, empty stateapi.getProbeInvestigations()+ProbeInvestigationsResponse/ProbeInvestigationItem/ProbeReasoningToolCall/ProbeResolutionStatustypes inweb/src/lib/api.ts/probe-investigationsroute + sidebar nav entry (Sparkles icon; placed right after/heartbeatfor the natural operator-flow: raw probe state → what Kora did about it)usePanelView(\"ProbeInvestigationsPage\")per feat(kora): KR-PANEL-USE-INSTRUMENTATION — panel-view event emit (data-driven cut prerequisite) #159 disciplinev1 deferred (surfaced via inline V1NotesBanner so the roadmap is visible)
cost_usd/model_used/cost-telemetry?route=probe_investigation;KR-PROBE-INVESTIGATION-COST-XREFdm_sentconfirmationslack_dm_log.jsonlKR-PROBE-DM-JSONL-WIREKR-REASONING-RESPONSE-AUDITThe V1NotesBanner shows these inline so no operator assumes they'll never come.
Test plan
tests/kora_cli/test_probe_investigations_endpoint.py— 26 tests, all passing:pnpm tsc -bcleanpnpm buildclean/probe-investigationsagainst a daemon with a fresh wake → entries render, switch window → reloads, switch to24hwith no recent wakes → empty state showsRefs
rafe-walker/kora-docs→17_cc_bucket_prompts/KR-FE-PROBE-INVESTIGATION-VIEWER_wake_reason_dm_xref.mdprobe.wake_requestedwake_emitter (audit source 1)usePanelViewinstrumentation pattern🤖 Generated with Claude Code