Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

KR-PROBE-INVESTIGATION-DATA-COMPLETION — close #171 V1NotesBanner gaps#184

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-PROBE-INVESTIGATION-DATA-COMPLETION
May 24, 2026
Merged

KR-PROBE-INVESTIGATION-DATA-COMPLETION — close #171 V1NotesBanner gaps#184
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-PROBE-INVESTIGATION-DATA-COMPLETION

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Bundles the three follow-on gaps CC#2 surfaced honestly via V1NotesBanner in PR #171:

  1. DM events don't appear in slack_dm_log.jsonl — wake_consumer bypassed SlackDMHandler outbound-log
  2. Per-call cost/model only in aggregate CostTelemetry — nothing per-investigation
  3. Investigation summary text sent to Slack but not persisted

After this lands, the probe-investigation viewer (#171) can render dm_sent time + investigation cost + summary text per row by joining four audit streams on the same caller_session_id.

Bucket spec: 17_cc_bucket_prompts/KR-PROBE-INVESTIGATION-DATA-COMPLETION_close_171_data_gaps.md.

K-DG findings

  • _append_outbound_log_entry is safely extractable. The only instance state it touches is self._log_path — pure I/O after that. Extracted to a free function append_outbound_log_entry(log_path, …) in kora_cli/handlers/slack_dm_handler.py; the handler's instance method now delegates. Byte-identical JSONL rows regardless of caller. Added a public resolve_slack_dm_log_path() accessor so non-handler call sites don't depend on a private helper.
  • caller_session_id derivation matches engine. _derive_caller_session_id in anthropic_engine.py:1546 already emits "probe:{probe}:{category}" for source="probe_investigation"; wake_consumer reuses that exact shape, so the 4 streams join on the same key without coordination.

Cost computation choice

Chose agent.usage_pricing.estimate_usage_cost over the cost_telemetry.snapshot() alternative. Two reasons:

  1. Per-call precision. The telemetry snapshot is aggregated across windows — there's no per-investigation row to fetch. Picking "the most recent matching call" is racy under concurrent investigations.
  2. Accounting lockstep. estimate_usage_cost is the same calculation agent.cost_state_holder.record_inference runs to bill the cost-ladder. Reusing it keeps the audit row in lockstep with the holder's daily-spend rung — operator can reconcile SUM(audit_row.total_cost_usd) by day against the cost-ladder without rounding drift.

Returns None when the model is unknown to the pricing registry (e.g., custom OpenRouter slug) OR usage_pricing raises. Field is always present so the panel can render "(unknown)" without a key-error.

Sample 4-stream audit trace (one synthetic investigation)

All four entries share caller_session_id="probe:fly:service_unhealthy".

Stream 1 — probe.wake_requested (emitted by probe runner per #163):

{
  "seam": "probe.wake_requested",
  "details": {"probe": "fly", "severity": "critical", "category": "service_unhealthy",
              "title": "Fly app(s) unreachable: HTTP 401",
              "envelope_enabled": true, "envelope_fix_name": "restart_unhealthy_machine"},
  "caller_session_id": "probe:fly:service_unhealthy",
  "source": "cron"
}

Stream 2 — tool.probe_autofix_attempted (emitted by kora__attempt_probe_autofix per #182, during the in-flight investigation):

{
  "seam": "tool.probe_autofix_attempted",
  "details": {"probe": "fly", "action": "restart_machine", "target_id": "1781e9f6c12d83",
              "reason_from_reasoning": "machine state stopped for 3 consecutive probe cycles",
              "status": "attempted", "before_state": {"state": "stopped"},
              "after_state": {"state": "started"}, "executor_duration_ms": 842},
  "caller_session_id": "probe:fly:service_unhealthy",
  "source": "reasoning"
}

Stream 3 — probe.investigation_completed (NEW, this PR):

{
  "seam": "probe.investigation_completed",
  "details": {
    "probe": "fly", "issue_category": "service_unhealthy", "severity": "critical",
    "model_used": "claude-haiku-4-5-20251001",
    "input_tokens": 1200, "output_tokens": 250,
    "cache_creation_input_tokens": 0, "cache_read_input_tokens": 800,
    "total_cost_usd": 0.00123,
    "investigation_duration_ms": 3147,
    "investigation_summary_text": "Fly machine 1781e9f6c12d83 was in 'stopped' state for the last 3 probe cycles. I tried restart_machine via the envelope; before=stopped, after=started in 842ms. Next probe cycle (≤5 min) will confirm health holds. If it flaps back, check fly logs for OOM patterns before another restart.",
    "dm_status": "sent",
    "autofix_attempted": true
  },
  "caller_session_id": "probe:fly:service_unhealthy",
  "source": "reasoning"
}

Stream 4 — slack_dm_log.jsonl outbound entry (NEW path via free function):

{
  "sent_at": "2026-05-24T00:14:09.412+00:00",
  "channel_id": "U01JOSHUA",
  "thread_ts": null,
  "text": "🚨 Probe alert · fly\n[reasoning text body]",
  "slack_message_ts": "1742345059.123456",
  "send_status": "ok",
  "model_used": "claude-haiku-4-5-20251001",
  "input_tokens": 1200,
  "output_tokens": 250,
  "reasoning_duration_ms": 3147,
  "cache_read_input_tokens": 800,
  "caller_session_id": "probe:fly:service_unhealthy"
}

dm_status values in the completed audit row: sent / failed_send / engine_unavailable_fallback / engine_unavailable_failed_send (combined fallback + send-failure path so CC#2 can branch cleanly).

Privacy posture

investigation_summary_text IS recorded verbatim. Per the #182 precedent (autofix reason_from_reasoning recorded verbatim because operator triage of "what did Kora decide and why" is the primary use case), and unlike #179 (email body redacted because user-supplied text). Kora's investigation summary is Kora-composed — doesn't echo back arbitrary unsanitized strings from external sources.

CC#2 follow-on recommendation: KR-FE-PROBE-INVESTIGATION-VIEWER-V2

With the BE plumbing in place, CC#2 can:

  1. Update /api/probe-investigations to JOIN on caller_session_id:
    • dm_sent time from slack_dm_log.jsonl (sent_at of the send_status="ok" entry)
    • investigation.summary from the probe.investigation_completed audit row
    • investigation.cost_usd + investigation.model_used from same
    • investigation.autofix_attempted boolean for the badge column
  2. Remove the three V1NotesBanner entries that were specific to these gaps (Per-call cost not yet displayed, DM sent time not yet populated, Investigation summary not yet shown).
  3. Add a dm_status chip filter (operator's primary triage lens — failed_send should be the top filter).

The wire shape is forward-compatible: total_cost_usd may be null when pricing is unknown; renderer handles with "(unknown)". Same for model_used on the engine-unavailable path.

Files

  • MOD kora_cli/handlers/slack_dm_handler.py — extracted append_outbound_log_entry free function + resolve_slack_dm_log_path public accessor; the existing instance method delegates
  • MOD kora_cli/probes/wake_consumer.py_send_operator_dm_routed writes to slack_dm_log; _emit_investigation_completed writes the new audit seam; module-level helpers for meta projection, cost computation, autofix back-reference
  • MOD kora_cli/audit/jsonl_sink.py — new probe.investigation_completed SeamName Literal entry
  • MOD tests/kora_cli/probes/test_wake_consumer.py — 9 new tests covering happy path / engine_unavailable_fallback / failed_send / combined fallback+failed / autofix back-reference (positive + negative) / audit-emit failure swallowed / 4-stream caller_session_id consistency / cost helper (missing model / zero tokens / pricing exception)

Test plan

  • 37 wake_consumer tests pass (28 existing + 9 new)
  • Regression: 401 passed across probes + handlers + audit + tools + reasoning
  • ruff check clean on all changed files

🤖 Generated with Claude Code

…esBanner gaps

Bundles the three follow-on gaps CC#2 surfaced honestly via
V1NotesBanner in PR #171 so the probe-investigation viewer can
render dm_sent / per-investigation cost / summary text per row.

Three changes
-------------
1. Route wake_consumer DM through SlackDMHandler outbound-log
   * Extracted `_append_outbound_log_entry` from the handler into
     a free function `append_outbound_log_entry(log_path, ...)`
     in `kora_cli/handlers/slack_dm_handler.py`. The handler's
     instance method now delegates — byte-identical JSONL rows
     regardless of caller.
   * Added `resolve_slack_dm_log_path()` public accessor so
     non-handler callers don't depend on the private helper.
   * `wake_consumer._send_operator_dm_routed()` calls the free
     function with the probe's `caller_session_id`
     (`probe:{probe}:{category}`) so CC#2's audit-stream join
     can light up. Failure paths (slack client unavailable,
     channel_id unset, post_dm raises) all write a `send_status=
     "failed"` row so the panel renders the failure too.

2. New audit seam `probe.investigation_completed`
   * Added to `SeamName` Literal next to `tool.probe_autofix_attempted`.
   * Emitted once per dispatched investigation (success + fallback
     paths alike). Fields per spec:
       probe / issue_category / severity
       model_used / input_tokens / output_tokens /
         cache_creation_input_tokens / cache_read_input_tokens
       total_cost_usd / investigation_duration_ms
       investigation_summary_text (VERBATIM — operator-decision-
         relevant per the #182 reason-field precedent; Kora-
         composed, no external-string leakage)
       dm_status ∈ {sent, failed_send, engine_unavailable_fallback,
                    engine_unavailable_failed_send}
       autofix_attempted (back-reference: did
         tool.probe_autofix_attempted fire with the same
         caller_session_id since investigation_started_at?)
       reasoning_error (when set)
   * source="reasoning" (probe_wake_consumer isn't in SourceName
     Literal; reasoning is the closest semantic match + matches
     the autofix seam's attribution).
   * caller_session_id="probe:{probe}:{category}" — matches the
     engine-side derivation in anthropic_engine._derive_caller_session_id
     for `probe_investigation` source, so the 4 audit streams
     (wake / autofix-attempted / completed / slack_dm_log) all
     join on the same key.

3. Cost computation
   * `_compute_total_cost_usd(meta)` uses
     `agent.usage_pricing.estimate_usage_cost` — the same call
     `cost_state_holder.record_inference` runs to bill the
     cost-ladder. Reuse vs. duplication keeps audit-sum-by-day
     and ladder accounting in lockstep without rounding drift.
   * Rejected the cost_telemetry snapshot alternative the spec
     mentioned: aggregated, no per-investigation row to fetch,
     racy under concurrent investigations.
   * Returns None when model unknown to pricing registry OR
     usage_pricing raises — field is always present (panel can
     render "(unknown)" without key-error).

Helpers (module-level, pure)
----------------------------
* `_reasoning_meta_from_result(result)` — projects ResponseResult
  into the 5-key meta dict the outbound log + completed audit
  share. Tolerant of None / partial attribute presence.
* `_autofix_attempted_during(caller_session_id, since)` — reads
  recent audit JSONL via `read_audit_entries(seam=..., since=...)`
  and matches caller_session_id. Fail-soft on read error.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant