feat(kora): KR-AUDIT-JSONL-SINK — JSONL bridge for 5 audit seams by rafe-walker · Pull Request #139 · rafe-walker/kora

rafe-walker · 2026-05-23T01:11:17Z

Summary

Bridge bucket between today's structured-log audit lines and the future substrate-backed audit. Promotes the 4 audit emitters (mcp.tool_called covers MCP read + mutating; webhook.dead_letter; slack_dm.reply_failed; reasoning.tool_called) to ALSO write JSONL rows that operator panels can consume programmatically.

Bucket spec: `kora_docs/17_cc_bucket_prompts/KR-AUDIT-JSONL-SINK_bridge_to_substrate.md`.

Base: `feature/phase2-upgrades` — NOT main.

New module

`kora_cli/audit/jsonl_sink.py` (~210 LOC) — `AuditEntry` Pydantic model with `extra="forbid"` (catches schema drift across emit sites) + `emit_audit()` JSONL-only writer (best-effort; OSError WARN + continue). Path resolution via `kora_constants.get_kora_home()` (KORA_HOME primary + legacy HERMES_HOME fallback).

Dual-write architecture

Each caller retains its existing `[kora.]` structured-log line VERBATIM AND calls `emit_audit()` afterward to write the JSONL row. PM's "no breaking change" constraint preserved byte-for-byte across all 4 emit sites — operator grep workflows that targeted the prior shapes keep working.

`emit_audit()` is JSONL-only by design — keeps the structured-log format under each caller's control, avoiding format drift across the 4 seams that ship distinct line shapes today.

Refactored emit sites (4 sites covering 5 seam usages)

Seam	File	Source bucket
`mcp.tool_called` (mutating)	`kora_cli/listeners/mcp_tools.py:_emit_audit`	KR-MCP-RUNTIME-SURFACE ST2
`webhook.dead_letter`	`kora_cli/listeners/webhook_dead_letter.py:emit_webhook_dead_letter`	KR-D-DAEMON ST3
`slack_dm.reply_failed`	`kora_cli/handlers/slack_dm_handler.py:_emit_reply_failed_event`	KR-FEAT-SLACK-DM ST2
`reasoning.tool_called`	`kora_cli/reasoning/anthropic_engine.py:_emit_tool_called_audit`	KR-FEAT-AGENTIC-REASONING ST2

(MCP read tools share the `mcp.tool_called` seam name; the existing emit helper covers mutating tools only — read-tool audit is a deferred follow-on.)

Bug-on-first-pass caught + fixed

Initial draft moved the structured-log emit INTO `emit_audit`'s generic kv-pair builder. That changed the byte-for-byte line format (`tool=X` → `tool_name=X`, field ordering shifted) and broke 9 prior-bucket tests asserting verbatim shape. Restructured to JSONL-only emit_audit + caller-retained structured-log lines. All 339 prior-bucket tests pass unmodified.

Tests (17 new, 481 total all passing)

`test_jsonl_sink.py` (17 tests):

AuditEntry shape: minimal / full construction; `extra="forbid"` rejects unknown field; rejects invalid seam / source
emit_audit append: parseable JSONL line per call; append-only multi-call; creates parent dir
Path resolution: env override / KORA_HOME default / HERMES_HOME fallback
Degrade-to-log-only: unwritable path WARN+return-no-crash; invalid seam → defensive log + no JSONL write
SECURITY walk-payload sweep (2 tests):
- Clean batch (4 seams × realistic safe details) passes
- Polluted batch (Slack token / Anthropic OAuth / Bearer header / email PII) tripped by sweep regexes
Per-seam allow-list — exercises all 4 refactored emitters indirectly + verifies JSONL `details` keys are subset of declared per-seam allow-list. Drift catch: any new field added to an emit site requires updating `_SEAM_ALLOWED_KEYS` + security review of the new field's content.
Dual-write verification — single emit produces BOTH the verbatim structured-log line AND the JSONL row.

SECURITY — 4-layer carry-forward

Per spec:

`details` filter contract: each caller pre-filters its dict to safe shapes (`args_keys` not values / `body_bytes` not body / `text_len` not text). Same shape preserved from each emit site's pre-existing safe field set.
Walk-payload sweep: regex against token shapes (xoxb / xoxp / xapp / sk-ant-oat / sk-ant / Bearer / AKIA) + PII (email-address). Clean batch passes; polluted batch tripped.
Per-seam allow-list: declared key set; new fields require allow-list update + security review.
No engine input/output bodies: existing audit emitters already excluded these (asserted by prior bucket tests); refactor preserves the boundary.

Operator runbook note

`kora_runtime_first_deploy_runbook.md` extended with new "Operator obligations — JSONL log rotation" section listing the 4 append-only JSONL files, the operator-managed rotation mechanism (logrotate copytruncate / Fly log-tailing), and the disk-full failure mode (`[kora.audit.skipped]` WARN + graceful structured-log degradation).

§4 ship checklist

Base `feature/phase2-upgrades`
Title per format
All 4 emitters refactored; structured-log lines preserved VERBATIM (asserted by 339 prior-bucket tests passing unmodified)
`AuditEntry` Pydantic with `extra="forbid"`
SECURITY walk-payload sweep over diverse batch passes
KORA_HOME / HERMES_HOME fallback verified
Operator runbook note added
Tests pass locally (481/481)

What unblocks

Per the bucket spec, the JSONL surface unblocks 3 panel flips (small follow-on bucket KR-AUDIT-PANEL-ENDPOINTS):

AGENT-ACTIVITY-PANEL — filters to seam in (mcp.tool_called, reasoning.tool_called)
REASONING-PANEL — filters to seam=reasoning.tool_called
WEBHOOK-EVENTS-PANEL — filters to seam=webhook.dead_letter

When substrate ships the audit-ledger contract (coord ask 2026-05-22), `emit_audit` extends to triple-writer (structured log + JSONL + substrate event_log row); panels continue reading JSONL OR move to substrate — same row shape.

🤖 Generated with Claude Code

Bridge bucket between today's structured-log audit lines and the future substrate-backed audit. Promotes the 4 audit emitters (mcp.tool_called covers MCP read + mutating; webhook.dead_letter; slack_dm.reply_failed; reasoning.tool_called) to ALSO write JSONL rows that operator panels can consume programmatically. When substrate-team ships the audit-ledger contract (coord ask 2026-05-22), emit_audit extends to triple-write; panel endpoints stay reading the same JSONL shape OR pivot to substrate reads. ## New module **`kora_cli/audit/`** (NEW package): - **`jsonl_sink.py`** (~210 lines) — `AuditEntry` Pydantic model with `extra="forbid"` (catches schema drift across emit sites) + `emit_audit()` JSONL-only writer (best-effort; OSError WARN + continue). Uses `kora_constants.get_kora_home()` path resolution (KORA_HOME primary + legacy HERMES_HOME fallback via the same pattern as slack_dm_log.jsonl). - **`__init__.py`** — re-exports `AuditEntry` + `emit_audit`. ## Dual-write architecture Each caller retains its existing `[kora.<seam>]` structured-log line **VERBATIM** AND calls `emit_audit()` afterward to write the JSONL row. PM's "no breaking change" constraint preserved byte-for-byte across all 4 emit sites — operator grep workflows that targeted the prior shapes keep working. `emit_audit()` is JSONL-only by design — keeps the structured-log format under each caller's control, avoiding format drift across the 4 seams that ship distinct line shapes today. ## Refactored emit sites | Seam | File | Source bucket | |---|---|---| | `mcp.tool_called` (mutating; read pending follow-on) | `kora_cli/listeners/mcp_tools.py:_emit_audit` | KR-MCP-RUNTIME-SURFACE ST2 | | `webhook.dead_letter` | `kora_cli/listeners/webhook_dead_letter.py:emit_webhook_dead_letter` | KR-D-DAEMON ST3 | | `slack_dm.reply_failed` | `kora_cli/handlers/slack_dm_handler.py:_emit_reply_failed_event` | KR-FEAT-SLACK-DM ST2 | | `reasoning.tool_called` | `kora_cli/reasoning/anthropic_engine.py:_emit_tool_called_audit` | KR-FEAT-AGENTIC-REASONING ST2 | Each emitter: (1) keeps its existing logger.info/warning call verbatim, (2) imports `from kora_cli.audit import emit_audit`, (3) calls emit_audit with the same details it was already building locally + a seam-shaped `source` literal. ## Pre-bug-on-first-pass caught Initial draft moved the structured-log emit INTO `emit_audit`'s generic kv-pair builder. That changed the byte-for-byte line format (`tool=X` → `tool_name=X`, field ordering shifted) and broke 9 prior-bucket tests that asserted the verbatim shape. Restructured to JSONL-only emit_audit + caller-retained structured-log lines. All 339 prior-bucket tests pass unmodified. ## Tests (17 new, 481 total all passing) **`test_jsonl_sink.py`** (17 tests): - **AuditEntry shape**: minimal construction / full construction / rejects unknown top-level field (extra="forbid") / rejects invalid seam / rejects invalid source - **emit_audit JSONL append**: parseable JSONL line per call / append-only multi-call / creates parent dir - **Path resolution**: env override / KORA_HOME default / HERMES_HOME fallback chain - **Degrade-to-log-only**: unwritable path → WARN + return no crash / invalid seam → defensive log + no JSONL write no raise - **SECURITY walk-payload sweep** (2 tests): - Clean batch (4 seams × realistic safe details) passes - Polluted batch (Slack token / Anthropic OAuth / Bearer header / email PII) tripped by sweep regex - **Per-seam allow-list test** — exercises all 4 refactored emitters indirectly + verifies JSONL `details` keys are subset of declared per-seam allow-list. Drift catch: any new field added to an emit site must update _SEAM_ALLOWED_KEYS + get a security review of the new field's content. - **Dual-write verification** — single emit produces BOTH the verbatim structured-log line + the JSONL row. ## SECURITY — 4-layer carry-forward Per spec §2 SECURITY: 1. **`details` filter contract**: each caller pre-filters its `details` dict to safe shapes — `args_keys` (sorted key names, values dropped) / `body_bytes` (count, not body) / `text_len` (length, not text) / etc. Same shape preserved from each emit site's pre-existing safe field set. 2. **Walk-payload sweep**: synthetic JSONL batch run against token-shape regexes (`xoxb-`, `xoxp-`, `xapp-`, `sk-ant-oat-`, `sk-ant-`, `Bearer`, `AKIA[0-9A-Z]{16}`) + PII regex (email-address). Clean batch passes; polluted batch tripped. 3. **Per-seam allow-list**: declared key set per seam. Adding a new field requires updating both the allow-list AND the security review. 4. **No engine input/output bodies**: existing audit emitters already exclude tool input/output bodies (asserted by prior bucket tests); refactor preserves that boundary — only `args_keys` (not args values) and `text_len` (not text) surface in details. ## Operator runbook note `kora_runtime_first_deploy_runbook.md` extended with a new "Operator obligations — JSONL log rotation" section listing: - The 4 append-only JSONL files (slack_dm + email outbound + email inbound + kora_audit_log) - Daemon does NOT auto-rotate; operator manages via logrotate copytruncate / Fly log-tailing / periodic ssh-rotate - Disk-full failure mode (`[kora.audit.skipped]` WARN + TCP healthcheck red); structured-log lines still emit for graceful degradation - When substrate-audit lands → JSONL becomes debug/forensics-only ## §4 ship checklist - [x] PR base = `feature/phase2-upgrades` - [x] Title format `feat(kora): KR-AUDIT-JSONL-SINK — JSONL bridge for 5 audit seams` - [x] All 4 emitters refactored; existing structured-log lines preserved VERBATIM (asserted by 339 prior-bucket tests passing unmodified) - [x] AuditEntry Pydantic with `extra="forbid"` (asserted by test) - [x] SECURITY walk-payload sweep over diverse batch passes (clean + polluted negative-control) - [x] KORA_HOME / HERMES_HOME fallback verified - [x] Operator runbook note about log rotation added ## After this lands Per the bucket spec, the JSONL surface unblocks 3 panel flips (small follow-on bucket KR-AUDIT-PANEL-ENDPOINTS): - **AGENT-ACTIVITY-PANEL** flip — reads kora_audit_log.jsonl, filters to seam in ("mcp.tool_called", "reasoning.tool_called") - **REASONING-PANEL** flip — filters to seam="reasoning.tool_called" - **WEBHOOK-EVENTS-PANEL** flip — filters to seam="webhook.dead_letter" When substrate ships the audit-ledger contract (coord ask 2026-05-22), `emit_audit` extends to triple-writer (structured log + JSONL + substrate event_log row); panels can continue reading JSONL OR move to substrate reads — same row shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

rafe-walker merged commit 5416fdc into feature/phase2-upgrades May 23, 2026

rafe-walker deleted the feat/kora-KR-AUDIT-JSONL-SINK branch May 23, 2026 01:13

rafe-walker mentioned this pull request May 23, 2026

feat(kora): KR-AUDIT-PANEL-ENDPOINTS — flip 3 stub panels using audit JSONL #141

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kora): KR-AUDIT-JSONL-SINK — JSONL bridge for 5 audit seams#139

feat(kora): KR-AUDIT-JSONL-SINK — JSONL bridge for 5 audit seams#139
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-AUDIT-JSONL-SINK

rafe-walker commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rafe-walker commented May 23, 2026

Summary

New module

Dual-write architecture

Refactored emit sites (4 sites covering 5 seam usages)

Bug-on-first-pass caught + fixed

Tests (17 new, 481 total all passing)

SECURITY — 4-layer carry-forward

Operator runbook note

§4 ship checklist

What unblocks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant