Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-AUDIT-PANEL-ENDPOINTS — flip 3 stub panels using audit JSONL#141

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-AUDIT-PANEL-ENDPOINTS
May 23, 2026
Merged

feat(kora): KR-AUDIT-PANEL-ENDPOINTS — flip 3 stub panels using audit JSONL#141
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-AUDIT-PANEL-ENDPOINTS

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Three endpoint flips at once, all reading from CC#3's ${KORA_HOME}/kora_audit_log.jsonl (KR-AUDIT-JSONL-SINK, PR #139). Backend-only — no FE changes; TS interfaces already match the projected shapes. Stub-then-real arc closes for agent-activity / reasoning / webhook-events.

Shared reader (new)

kora_cli/audit/jsonl_reader.pyread_audit_entries(seam, limit, since) is the single helper all 3 endpoints use. Per-call get_kora_home() lookup so monkeypatch in tests works without ContextVar plumbing (per #137's fixture lesson). Tolerates missing file, malformed JSON, non-dict lines, Pydantic ValidationError per line. Newest-first by emitted_at.

Flip 1 — /api/agent-activity/recent (mcp.tool_called)

K-DG drift caught: spec said details.duration_ms / tool_status / result_summary. Actual writer at mcp_tools.py:714-724 only emits tool_name / tool_kind / caller_actor_kind / args_keys / result.

Projection: duration_ms=0; status="ok" (audit only fires on success path today; KR-MCP-RUNTIME-SURFACE follow-on extends); result_summary=details.result (writer docstring confirms pre-filtered short string).

Flip 2 — /api/reasoning/recent (reasoning.tool_called)

GROUPS by caller_session_id so a multi-tool reasoning iteration collapses into ONE ReasoningCall row with tools_used: [name1, name2, name3]. tools_used is a FE extension field — existing TS ReasoningCall ignores it; a follow-on FE bucket renders.

Status derivation: all-ok → ok; any not_allowedhalted + capability_denied; any execution_errorfailed + handler_error.

Limitations (per spec §2 Flip 2): model_used / input_tokens / output_tokens / response_text_truncated_200 set to null — those fields live in slack_dm_log.jsonl outbound entries, not audit. The KR-REASONING-PANEL-MODEL-XREF follow-on bucket cross-references by timestamp + thread_ts. cost_rung_at_call set to "unknown" (lowercase Enum.value — preserves the K-DG pin from PR #132 against engine.py:47-49 CostLadderRungName literal). Keeping this PR small per spec §4.

Aggregate counts (total_recent_24h, by_status_24h) use individual audit rows — not groups — so headline reflects total reasoning activity volume.

Flip 3 — /api/webhooks/events/recent (webhook.dead_letter)

Status pinned to "dead_letter" (this seam ONLY emits dead-letters; verified happy-path events stay in chain log; rate-limited from slowapi doesn't currently call emit_audit).

SECURITY: source_ip OCTET-MASKED at the projection edge. Audit writer at webhook_dead_letter.py:142 passes RAW peer_ip; endpoint enforces _mask_ipv4_last_two_octets ("54.203.99.142""54.203.x.x") per the KR-WEBHOOK-EVENTS #109 contract. IPv6 / unexpected shapes → "—" defensively (never leaks unmasked).

SECURITY: details sub-set to {reason, header_present} — never the full audit details dict (which carries body_bytes / request_id / headers the panel hasn't vetted).

4-layer security carry-forward

Preserved across all 3 endpoints:

  • Walk-payload sweeps for Anthropic key shapes, Slack token shapes, HMAC-secret shapes, full IPv4 leaks.
  • Per-field caller_actor_kind label-shape pin (no hash/base64 runs).
  • result_summary no-raw-JSON pin.
  • cost_rung lowercase literal pin.
  • details sub-set enforcement (webhook).

Fixture-isolation discipline (per #137 lesson)

All 3 endpoint test files monkeypatch get_kora_home in all 3 module namespaces (kora_constants, kora_cli.config, kora_cli.web_server). Reader tests use the pattern too. The reader's own resolution uses a local kora_constants import so the test patch takes effect on a fresh call.

Test plan

  • tests/kora_cli/audit/test_jsonl_reader.py14 reader tests (missing/empty file, seam filter, since filter, naive datetime UTC, limit cap, malformed/non-dict/validation-error tolerance, blank lines, env override).
  • test_web_server_agent_activity.py11 tests (rewrite): empty, shape, projection, seam filtering, ?limit, newest-first, walk-payload SECURITY, label-shape caller_actor_kind, no-raw-JSON result_summary, by_caller_24h reconciliation, cron-regression sanity.
  • test_web_server_reasoning.py15 tests (rewrite): empty, shape, single-tool, multi-row session grouping, separate sessions → separate rows, duration_ms sum, status derivation (ok/halted/failed), seam filtering, walk-payload SECURITY, lowercase cost_rung pin, ?limit on groups, by_status_24h counts individual rows not groups, cron-regression sanity.
  • test_web_server_webhook_events.py15 tests (rewrite): empty, shape, projection, status pin, source→endpoint mapping, IPv4 octet-mask, walk-payload no-full-IPv4, IPv6/dash→"—" fallback, details sub-set enforcement, seam filtering, newest-first, ?limit, cron-regression sanity.
  • Full admin-panel regression: 357/357 across 27 suites.
  • Manual smoke: bring up daemon → exercise one of each (MCP tool call / Joshua DM with reasoning / webhook dead-letter) → reload the 3 panels → verify real data appears with stub:false.

K-DG drift summary

Spec said Actual Resolution
§2 Flip 1: details.duration_ms / tool_status / result_summary Only result in writer duration_ms=0; status="ok"; result_summary=details.result; flagged in endpoint docstring
§2 Flip 2: cost_rung_at_call could be null FE union requires a value Used "unknown" (lowercase Enum.value member); preserves PR #132 K-DG pin

Refs

🤖 Generated with Claude Code

… JSONL

Three endpoint flips at once, all reading from CC#3's
${KORA_HOME}/kora_audit_log.jsonl (KR-AUDIT-JSONL-SINK, PR #139).
Backend-only — no FE changes; TS interfaces already match the
projected shapes. Same stub-then-real pattern as PR #137's
slack-dm flip.

Shared reader (NEW):
  * kora_cli/audit/jsonl_reader.py with read_audit_entries(
    seam, limit, since) — single helper used by all 3 endpoints.
  * Per-call get_kora_home() lookup so monkeypatch in tests works
    without ContextVar plumbing (per the #137 fixture lesson).
  * Tolerates missing file, malformed JSON, non-dict lines,
    Pydantic ValidationError on a single row — log + skip.
  * Newest-first by emitted_at descending.

Flip 1 — /api/agent-activity/recent (mcp.tool_called):
  * Filters audit rows where seam == "mcp.tool_called", projects
    to AgentCall shape.
  * K-DG drift caught: spec said details has duration_ms /
    tool_status / result_summary. Actual writer at
    kora_cli/listeners/mcp_tools.py:714-724 only emits tool_name
    / tool_kind / caller_actor_kind / args_keys / result.
    Projection: duration_ms=0; status="ok" (audit only fires on
    success path today; KR-MCP-RUNTIME-SURFACE follow-on extends);
    result_summary=details.result (writer docstring confirms this
    is pre-filtered to a short string).

Flip 2 — /api/reasoning/recent (reasoning.tool_called):
  * Filters audit rows where seam == "reasoning.tool_called".
  * GROUPS by caller_session_id so a multi-tool reasoning
    iteration collapses into ONE ReasoningCall row with
    tools_used: [name1, name2, name3]. tools_used is a FE
    extension field — existing TS ReasoningCall ignores it;
    follow-on FE bucket renders.
  * Status derivation per spec: all-ok → ok; any not_allowed →
    halted+capability_denied; any execution_error → failed+
    handler_error.
  * Limitations (per spec §2 Flip 2): model_used / tokens /
    response_text_truncated_200 set to null — those fields live
    in slack_dm_log.jsonl outbound entries, not audit. The
    KR-REASONING-PANEL-MODEL-XREF follow-on bucket cross-
    references by timestamp + thread_ts. cost_rung_at_call set
    to "unknown" (lowercase Enum.value — preserves the K-DG pin
    from PR #132 against engine.py:47-49 CostLadderRungName
    literal). Keeping this PR small per spec §4.
  * Aggregate counts (total_recent_24h, by_status_24h) use
    INDIVIDUAL audit rows — not groups — so headline reflects
    total reasoning activity volume.

Flip 3 — /api/webhooks/events/recent (webhook.dead_letter):
  * Filters audit rows where seam == "webhook.dead_letter".
  * status pinned to "dead_letter" (this seam ONLY emits dead-
    letters; verified happy-path events stay in chain log; rate-
    limited from slowapi doesn't currently call emit_audit).
  * SECURITY: source_ip OCTET-MASKED at the projection edge.
    Audit writer at webhook_dead_letter.py:142 passes RAW
    peer_ip; endpoint enforces _mask_ipv4_last_two_octets
    ("54.203.99.142" → "54.203.x.x") per the KR-WEBHOOK-EVENTS
    #109 contract. IPv6 / unexpected shapes → "—" defensively
    (never leaks unmasked).
  * SECURITY: details sub-set to {reason, header_present} —
    never the full audit details dict (which carries
    body_bytes / request_id / headers the panel hasn't vetted).

4-layer SECURITY contract carry-forward (preserved across all 3
endpoints):
  * Walk-payload sweeps for Anthropic key shapes, Slack token
    shapes, HMAC-secret shapes, full IPv4 leaks.
  * Per-field caller_actor_kind label-shape pin (no hash/base64
    runs).
  * result_summary no-raw-JSON pin.
  * cost_rung lowercase literal pin.
  * details sub-set enforcement (webhook).

Fixture-isolation discipline applied per #137 lesson: all 3
endpoint test files monkeypatch get_kora_home in ALL THREE
module namespaces (kora_constants, kora_cli.config,
kora_cli.web_server). Reader tests also use the pattern. The
reader's own resolution uses a local kora_constants import so
the test patch takes effect on a fresh call.

Tests:
  * tests/kora_cli/audit/test_jsonl_reader.py — 14 reader tests:
    missing file, empty file, seam filter, since filter, naive
    datetime UTC assumption, limit cap, malformed JSON tolerance,
    non-dict line skip, Pydantic ValidationError per-line skip,
    blank lines, KORA_AUDIT_LOG_PATH env override.
  * test_web_server_agent_activity.py — 11 tests (rewrite):
    empty log, shape, mcp.tool_called projection, seam filtering,
    ?limit cap, newest-first, walk-payload SECURITY, label-shape
    caller_actor_kind, no-raw-JSON result_summary, by_caller_24h
    reconciliation, cron-regression sanity.
  * test_web_server_reasoning.py — 15 tests (rewrite): empty,
    shape, single-tool projection, multi-row session grouping,
    different sessions → separate rows, duration_ms sum, status
    derivation (ok / halted / failed), seam filtering,
    walk-payload SECURITY, lowercase cost_rung pin, ?limit on
    groups, by_status_24h counts individual rows not groups,
    cron-regression sanity.
  * test_web_server_webhook_events.py — 15 tests (rewrite):
    empty, shape, dead_letter projection, status pinned, source
    → endpoint mapping, IPv4 octet-mask enforcement, walk-payload
    no-full-IPv4 sweep, IPv6/dash → "—" fallback, details sub-
    set enforcement, seam filtering, newest-first, ?limit cap,
    cron-regression sanity.
  * Full admin-panel regression: 357/357 across 27 suites.

K-DG drift summary:
  * spec §2 Flip 1 said details has duration_ms / tool_status /
    result_summary — only `result` is present in the actual
    writer. Documented + handled in projection.
  * spec §2 Flip 2 said cost_rung could be null — FE
    ReasoningCostRung union requires a value; used "unknown"
    (which IS in the enum) instead. Preserves the lowercase
    Enum.value contract from PR #132.

Follow-on buckets cited:
  * KR-REASONING-PANEL-MODEL-XREF — cross-ref to
    slack_dm_log.jsonl for model_used / tokens / response_text
  * KR-MCP-RUNTIME-SURFACE follow-on — extends audit writer with
    duration_ms + failure-path emit (status taxonomy)

Refs:
  * rafe-walker/kora-docs 17_cc_bucket_prompts/KR-AUDIT-PANEL-ENDPOINTS_three_flips.md
  * PR #139 — KR-AUDIT-JSONL-SINK (writer + AuditEntry shape)
  * PR #137 — KR-SLACK-DM-PANEL-FLIP (fixture-isolation pattern)
  * PR #114 / #132 / #109 — original stub panels being flipped

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 1069da7 into feature/phase2-upgrades May 23, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-AUDIT-PANEL-ENDPOINTS branch May 23, 2026 19:46
rafe-walker added a commit that referenced this pull request May 23, 2026
…ia xref (#143)

Cross-references audit JSONL with slack_dm_log outbound entries to populate previously-null model_used / tokens / response_text fields on reasoning panel rows.

- NEW kora_cli/audit/reasoning_xref.py (cross-ref helper with parsing/matching/cost-rung-derivation/text-truncation) + /api/reasoning/recent endpoint update + 27 new xref tests.

K-DG drift caught up-front: spec said verify if outbound JSONL writes caller_session_id; grep found it does NOT. Documented in module header + commit body.

Correlation algorithm: caller_session_id → (channel_id, event_ts) → match outbound with same channel_id where thread_ts == event_ts, fallback to closest sent_at within ±60s.

Cost-rung derivation: substring-match on model name (opus/sonnet/haiku) so future model revs keep mapping correctly; cost_ladder_halted reasoning_error supersedes; preserves lowercase Enum.value pin from #132/#141.

Graceful degradation: when xref fails per-group, row renders with null fields — identical shape to #141 pre-xref output so FE handles both with no conditional logic.

Security carve-out: response_text_truncated_200 is intentionally Joshua-content (carved out from PII sweep, same pattern as #141 message_id and slack_dm panel text).

384/384 admin-panel tests pass across 28 suites.
rafe-walker added a commit that referenced this pull request May 23, 2026
…ound (#148)

CC#2 follow-on after CC#1 KR-EMAIL-OUTBOUND-REASONING-META (#146) unblocked the gap her STOP-ASK caught.

- 2 files, +737/-7: extension to reasoning_xref.py (email path: loader + parser + 3-tier matcher) + 20 new email-specific tests.

All 3 K-DG gates verified before drafting per re-dispatch: send_email kwargs ✓; opt-in writer ✓; caller_session_id literal format symmetry between handler + engine ✓.

3-tier cascade: PRIMARY caller_session_id literal equality (closed by #146) → SECONDARY in_reply_to chain → LAST RESORT ±60s timestamp window.

Slack-first precedence preserved: existing #141/#143 tests (42/42) still pass without modification.

response_text carve-out for email-sourced rows: stays null per #124 design (body never in email JSONL); same shape as slack_dm text + #143 message_id carve-outs. Tracked via xref_source local so the conditional null-set cannot regress to populating from a future field rename.

400/400 admin-panel + audit tests pass across 29 suites.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant