Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-FE-PROBE-INVESTIGATION-VIEWER — wake→reason→DM xref panel#171

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-FE-PROBE-INVESTIGATION-VIEWER
May 24, 2026
Merged

feat(kora): KR-FE-PROBE-INVESTIGATION-VIEWER — wake→reason→DM xref panel#171
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-FE-PROBE-INVESTIGATION-VIEWER

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

New cockpit page + GET /api/probe-investigations endpoint that joins three sources per probe wake event so the operator sees, in one place, what Kora actually did about each issue the heartbeat probes detected.

The unified-operator-interface "Kora actually acts" loop is now flowing data (PR #163 wake emitter + PR #166 wake consumer + this panel). Read-only v1.

Screenshots

Populated state (3 mock cards illustrating the variants):
Probe Investigations — populated

Empty state (what most operators will see at first — PR #166 only merged today, so most envs have zero wakes):
Probe Investigations — empty

Per spec — empty state is a calm "everything's healthy" message with a green Sparkles glyph, NOT an empty card list. Source-pinned by test_empty_state_message_in_page so the reassurance copy survives later refactors.

caller_session_id pattern verification

Spec said the join key is "probe:{probe}:{category}". Verified at kora_cli/reasoning/anthropic_engine.py:1283-1289:

if message.source == "probe_investigation":
    probe = meta.get("probe_name") or "unknown"
    category = meta.get("issue_category") or "unknown"
    return f"probe:{probe}:{category}"

Endpoint uses the same literal at kora_cli/web_server.py:_probe_caller_session_id. Drift-guarded by test_caller_session_id_matches_reasoning_engine which greps both source files for the f-string shape and fails if either side moves.

STOP-ASKs (3 raised by spec §4 — all resolved inline per spec license to simplify v1)

  1. "_append_outbound_log_entry doesn't carry caller_session_id" → Reality is worse: probe DMs aren't in slack_dm_log.jsonl AT ALL. wake_consumer._send_operator_dm calls client.post_dm() directly without the SlackDMHandler outbound-log path. Resolution: omit dm_sent from the response entirely rather than fabricating. Flagged to operator via V1NotesBanner + follow-on bucket KR-PROBE-DM-JSONL-WIRE.

  2. "caller_session_id pattern differs" → Spec was correct. Grepped and verified — pinned by drift guard test.

  3. "Resolution-status semantics require probe-observation timeline" → Confirmed no per-probe observation timeline exists in v1 substrate. Resolution: per spec's "likely simplifies fine for v1" license, mapped current snapshot.service_health[probe]resolved (healthy) / active (unhealthy or degraded) / unknown. Stale wakes naturally fall to unknown. Operator's real question ("is this one still firing?") is answered.

What this PR contains

Backend (kora_cli/web_server.py):

  • GET /api/probe-investigations?window={24h|7d|all}&limit={1-200} — joins 3 sources, returns summary counts + items
  • _PROBE_CALLER_SESSION_RE regex pre-filter so non-probe reasoning.tool_called rows (slack_dm:, email:, mcp:) are dropped before the join
  • _project_reasoning_call() projection that whitelists only the fields the audit writer actually emits — defensive against future writers adding result/arguments (SECURITY test pin)

Frontend:

  • web/src/pages/ProbeInvestigationsPage.tsx (new) — summary header, v1_notes banner, per-card layout with severity colour + resolution badge + tool-call chips, empty state
  • api.getProbeInvestigations() + ProbeInvestigationsResponse / ProbeInvestigationItem / ProbeReasoningToolCall / ProbeResolutionStatus types in web/src/lib/api.ts
  • /probe-investigations route + sidebar nav entry (Sparkles icon; placed right after /heartbeat for the natural operator-flow: raw probe state → what Kora did about it)
  • usePanelView(\"ProbeInvestigationsPage\") per feat(kora): KR-PANEL-USE-INSTRUMENTATION — panel-view event emit (data-driven cut prerequisite) #159 discipline

v1 deferred (surfaced via inline V1NotesBanner so the roadmap is visible)

Field Why deferred Follow-on
per-call cost_usd / model_used not durably recorded per-call (only aggregated in CostTelemetry) link to /cost-telemetry?route=probe_investigation; KR-PROBE-INVESTIGATION-COST-XREF
dm_sent confirmation probe DMs bypass slack_dm_log.jsonl KR-PROBE-DM-JSONL-WIRE
Investigation summary text engine response sent to Slack only, not persisted KR-REASONING-RESPONSE-AUDIT
Time-windowed resolution ("resolved at T+N") no per-probe observation timeline in substrate substrate work, future

The V1NotesBanner shows these inline so no operator assumes they'll never come.

Test plan

  • tests/kora_cli/test_probe_investigations_endpoint.py26 tests, all passing:
    • Backend (14): caller_session_id drift guard, empty audit, window filtering (24h/7d/all), non-probe SIDs ignored, resolution_status (parametrized over 5 health values), wake-without-reasoning → investigation=null, wake-with-reasoning joined correctly, limit cap + clamping, tool_calls chronologised, any_errored flag, exc_type carry-through, SECURITY: response doesn't echo raw tool bodies
    • FE source-pins (8): api wrapper, types declared, item fields, page exists + usePanelView, route + nav, empty state copy committed, v1_notes banner rendered
  • pnpm tsc -b clean
  • pnpm build clean
  • Manual smoke: open /probe-investigations against a daemon with a fresh wake → entries render, switch window → reloads, switch to 24h with no recent wakes → empty state shows

Refs

🤖 Generated with Claude Code

New cockpit page + GET /api/probe-investigations endpoint that
joins three data sources per probe wake event so the operator
sees, in one place, what Kora actually did about each issue
detected by the heartbeat probes.

Joined sources
==============

  1. probe.wake_requested audit rows (PR #163 wake_emitter)
  2. reasoning.tool_called audit rows keyed by caller_session_id
     == "probe:{probe}:{category}" (PR #166 wired this
     caller_session shape via _derive_caller_session_id in
     anthropic_engine.py — pinned by drift-guard test)
  3. snapshot.service_health[probe] (current health → drives
     resolution_status: resolved / active / unknown)

Backend (kora_cli/web_server.py)
================================

GET /api/probe-investigations?window={24h|7d|all}&limit={1-200}

Returns summary counts (total / active / resolved / unknown),
current_probe_health for the 5 known probes, and per-event
items (newest first, capped by limit). Each item carries:

  * wake metadata (probe / category / severity / title / detail
    / envelope_enabled / envelope_fix_name)
  * caller_session_id (the join key, surfaced for debugging)
  * investigation: { tool_calls: [{tool_name, duration, status,
    exc_type?}], total_duration_ms, any_errored, call_count } |
    null when no reasoning rows joined (engine_unavailable etc.)
  * current_probe_health + resolution_status

Resolution semantics simplified for v1 to "currently healthy
→ resolved / currently unhealthy → active." A per-probe
observation timeline doesn't exist in the v1 substrate (only
current state), and "is this one still firing?" is the
operator's actual question.

Frontend
========

  * web/src/pages/ProbeInvestigationsPage.tsx — new top-level
    page; usePanelView("ProbeInvestigationsPage") on mount
  * api.getProbeInvestigations() + ProbeInvestigationsResponse /
    ProbeInvestigationItem / ProbeReasoningToolCall /
    ProbeResolutionStatus TS types in web/src/lib/api.ts
  * /probe-investigations route + sidebar nav entry. Placed
    after /heartbeat in the nav so the operator-flow is
    "Heartbeat (raw probe state) → Probe Investigations (what
    Kora did about an unhealthy probe)."
  * Window tabs (24h / 7d / All) reload on click
  * Per-card severity colour (critical=destructive,
    warning=yellow, info=outline) mirrors the
    wake_consumer's _SEVERITY_EMOJI mapping
  * Per-card resolution badge (Resolved=success / Active=warning
    / Unknown=outline)
  * Tool calls rendered as compact mono badges with per-call
    duration; errored calls toned warning

Empty state (spec-critical)
===========================

Per the spec — "data has only just started flowing (PR #166
merged today); render a calm 'everything's healthy' message
when zero wakes, not an empty card list." When total_count==0,
the page renders a green-toned EmptyState card with a Sparkles
glyph + "No probe wakes in the last 24 hours" headline +
reassuring copy explaining that probes haven't escalated
anything. Source-pinned by test_empty_state_message_in_page so
no one accidentally regresses it to "No data" later.

v1 deferred scope (visible via V1NotesBanner)
=============================================

Three things spec asked for that we deliberately omit, surfaced
to the operator inline so the roadmap is visible:

  * Per-call cost_usd / model_used — not durably recorded;
    aggregate per-route cost lives at /cost-telemetry under
    route=probe_investigation. Follow-on
    KR-PROBE-INVESTIGATION-COST-XREF could add per-call records.
  * DM-sent confirmation — probe DMs go through
    wake_consumer._send_operator_dm which calls
    client.post_dm() DIRECTLY, bypassing
    SlackDMHandler._append_outbound_log_entry. The outbound
    JSONL has no entry to join, so we can't truthfully say
    "DM sent at T+N" in v1. Follow-on KR-PROBE-DM-JSONL-WIRE.
  * Per-investigation summary text / model_used — the
    engine's response text is sent to Slack only, not
    recorded.

The V1NotesBanner surfaces these to the operator so the
roadmap is visible. Pinned by test_v1_notes_banner_rendered.

STOP-ASK conditions resolved inline
===================================

Per spec §4 license to simplify v1:

  * "_append_outbound_log_entry doesn't carry caller_session_id"
    → Reality is stronger: probe DMs aren't in slack_dm_log.jsonl
      AT ALL. Omitting dm_sent from response rather than
      fabricating; flagged in v1_notes for follow-on.
  * "caller_session_id pattern differs" → grepped + verified
    "probe:{probe}:{category}" matches engine literal at
    anthropic_engine.py:1289. Pinned by drift-guard test.
  * "Resolution semantics require probe-observation timeline" →
    simplified to current health (matches spec's "fine for v1"
    fallback).

Tests (tests/kora_cli/test_probe_investigations_endpoint.py)
============================================================

26 tests, all passing:

  Backend (14):
    * caller_session_id drift guard vs anthropic_engine source
    * empty audit → calm zero response
    * window=24h / 7d / all filtering
    * non-probe caller_session_ids ignored in join
    * resolution_status parametrized (5 health values)
    * wake without reasoning → investigation=null
    * wake with reasoning → investigation populated correctly
    * limit cap + clamping
    * tool_calls chronologised within investigation
    * any_errored flag + exc_type carry-through
    * SECURITY: response doesn't echo raw tool input/output
      bodies even if a future writer adds them

  FE source-pins (8):
    * api.getProbeInvestigations wrapper
    * ProbeInvestigationsResponse / Item / ToolCall types
    * Discriminated investigation field
    * Page exists + usePanelView wired
    * Route + nav entry registered
    * Empty state copy committed (regression guard)
    * V1NotesBanner rendered

Verification
============

  * 26/26 tests pass
  * tsc -b clean
  * vite build clean
  * Screenshots rendered at web/docs/probe-investigations-viewer/
    (populated.png + empty.png + preview.html sources)

Refs:
  * rafe-walker/kora-docs
    17_cc_bucket_prompts/KR-FE-PROBE-INVESTIGATION-VIEWER_wake_reason_dm_xref.md
  * PR #163 — probe.wake_requested wake_emitter
  * PR #166 — probe wake consumer + caller_session_id wiring
  * PR #159 — usePanelView instrumentation pattern
  * PR #164 — CostTelemetryPage (cross-link target for cost xref)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 37bd929 into feature/phase2-upgrades May 24, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-FE-PROBE-INVESTIGATION-VIEWER branch May 24, 2026 03:16
rafe-walker added a commit that referenced this pull request May 24, 2026
…esBanner gaps (#184)

All 4 audit streams share caller_session_id for the joinable probe investigation timeline:
1. probe.wake_requested (#163) — probe runner emits
2. tool.probe_autofix_attempted (#182) — during investigation
3. probe.investigation_completed (NEW) — model/tokens/cost/summary/dm_status/autofix_attempted
4. slack_dm_log.jsonl entry (NEW path) — wake_consumer DM routes via extracted free function append_outbound_log_entry

Key design calls:
- _append_outbound_log_entry extracted to free function; handler instance method delegates. Byte-identical JSONL rows from both call sites.
- Cost: estimate_usage_cost over telemetry snapshot (same calc as record_inference) — keeps audit-sum-by-day in lockstep with cost-ladder rung. Snapshot approach was racy under concurrent investigations.
- dm_status enum combined to 4 values (sent / failed_send / engine_unavailable_fallback / engine_unavailable_failed_send) for single-pass chip-filter.

Follow-on flagged: KR-FE-PROBE-INVESTIGATION-VIEWER-V2 (already covered by CC#2's in-flight panel-kit megabucket — Deliverable D will auto-pick up probe.investigation_completed once added).

37 wake_consumer tests (28 existing + 9 new) + 401 cross-bucket regression + ruff clean.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant