This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-AUDIT-JSONL-SINK — JSONL bridge for 5 audit seams#139
Merged
rafe-walker merged 1 commit intoMay 23, 2026
Merged
Conversation
Bridge bucket between today's structured-log audit lines and the
future substrate-backed audit. Promotes the 4 audit emitters
(mcp.tool_called covers MCP read + mutating; webhook.dead_letter;
slack_dm.reply_failed; reasoning.tool_called) to ALSO write JSONL
rows that operator panels can consume programmatically.
When substrate-team ships the audit-ledger contract (coord ask
2026-05-22), emit_audit extends to triple-write; panel endpoints
stay reading the same JSONL shape OR pivot to substrate reads.
## New module
**`kora_cli/audit/`** (NEW package):
- **`jsonl_sink.py`** (~210 lines) — `AuditEntry` Pydantic model
with `extra="forbid"` (catches schema drift across emit sites)
+ `emit_audit()` JSONL-only writer (best-effort; OSError WARN +
continue). Uses `kora_constants.get_kora_home()` path resolution
(KORA_HOME primary + legacy HERMES_HOME fallback via the same
pattern as slack_dm_log.jsonl).
- **`__init__.py`** — re-exports `AuditEntry` + `emit_audit`.
## Dual-write architecture
Each caller retains its existing `[kora.<seam>]` structured-log
line **VERBATIM** AND calls `emit_audit()` afterward to write
the JSONL row. PM's "no breaking change" constraint preserved
byte-for-byte across all 4 emit sites — operator grep workflows
that targeted the prior shapes keep working.
`emit_audit()` is JSONL-only by design — keeps the structured-log
format under each caller's control, avoiding format drift across
the 4 seams that ship distinct line shapes today.
## Refactored emit sites
| Seam | File | Source bucket |
|---|---|---|
| `mcp.tool_called` (mutating; read pending follow-on) | `kora_cli/listeners/mcp_tools.py:_emit_audit` | KR-MCP-RUNTIME-SURFACE ST2 |
| `webhook.dead_letter` | `kora_cli/listeners/webhook_dead_letter.py:emit_webhook_dead_letter` | KR-D-DAEMON ST3 |
| `slack_dm.reply_failed` | `kora_cli/handlers/slack_dm_handler.py:_emit_reply_failed_event` | KR-FEAT-SLACK-DM ST2 |
| `reasoning.tool_called` | `kora_cli/reasoning/anthropic_engine.py:_emit_tool_called_audit` | KR-FEAT-AGENTIC-REASONING ST2 |
Each emitter: (1) keeps its existing logger.info/warning call
verbatim, (2) imports `from kora_cli.audit import emit_audit`,
(3) calls emit_audit with the same details it was already
building locally + a seam-shaped `source` literal.
## Pre-bug-on-first-pass caught
Initial draft moved the structured-log emit INTO `emit_audit`'s
generic kv-pair builder. That changed the byte-for-byte line
format (`tool=X` → `tool_name=X`, field ordering shifted) and
broke 9 prior-bucket tests that asserted the verbatim shape.
Restructured to JSONL-only emit_audit + caller-retained
structured-log lines. All 339 prior-bucket tests pass unmodified.
## Tests (17 new, 481 total all passing)
**`test_jsonl_sink.py`** (17 tests):
- **AuditEntry shape**: minimal construction / full construction /
rejects unknown top-level field (extra="forbid") / rejects
invalid seam / rejects invalid source
- **emit_audit JSONL append**: parseable JSONL line per call /
append-only multi-call / creates parent dir
- **Path resolution**: env override / KORA_HOME default /
HERMES_HOME fallback chain
- **Degrade-to-log-only**: unwritable path → WARN + return no
crash / invalid seam → defensive log + no JSONL write no raise
- **SECURITY walk-payload sweep** (2 tests):
- Clean batch (4 seams × realistic safe details) passes
- Polluted batch (Slack token / Anthropic OAuth / Bearer
header / email PII) tripped by sweep regex
- **Per-seam allow-list test** — exercises all 4 refactored
emitters indirectly + verifies JSONL `details` keys are subset
of declared per-seam allow-list. Drift catch: any new field
added to an emit site must update _SEAM_ALLOWED_KEYS + get a
security review of the new field's content.
- **Dual-write verification** — single emit produces BOTH the
verbatim structured-log line + the JSONL row.
## SECURITY — 4-layer carry-forward
Per spec §2 SECURITY:
1. **`details` filter contract**: each caller pre-filters its
`details` dict to safe shapes — `args_keys` (sorted key names,
values dropped) / `body_bytes` (count, not body) /
`text_len` (length, not text) / etc. Same shape preserved
from each emit site's pre-existing safe field set.
2. **Walk-payload sweep**: synthetic JSONL batch run against
token-shape regexes (`xoxb-`, `xoxp-`, `xapp-`, `sk-ant-oat-`,
`sk-ant-`, `Bearer`, `AKIA[0-9A-Z]{16}`) + PII regex
(email-address). Clean batch passes; polluted batch tripped.
3. **Per-seam allow-list**: declared key set per seam. Adding a
new field requires updating both the allow-list AND the
security review.
4. **No engine input/output bodies**: existing audit emitters
already exclude tool input/output bodies (asserted by prior
bucket tests); refactor preserves that boundary — only
`args_keys` (not args values) and `text_len` (not text)
surface in details.
## Operator runbook note
`kora_runtime_first_deploy_runbook.md` extended with a new
"Operator obligations — JSONL log rotation" section listing:
- The 4 append-only JSONL files (slack_dm + email outbound +
email inbound + kora_audit_log)
- Daemon does NOT auto-rotate; operator manages via logrotate
copytruncate / Fly log-tailing / periodic ssh-rotate
- Disk-full failure mode (`[kora.audit.skipped]` WARN +
TCP healthcheck red); structured-log lines still emit for
graceful degradation
- When substrate-audit lands → JSONL becomes
debug/forensics-only
## §4 ship checklist
- [x] PR base = `feature/phase2-upgrades`
- [x] Title format `feat(kora): KR-AUDIT-JSONL-SINK — JSONL bridge for 5 audit seams`
- [x] All 4 emitters refactored; existing structured-log lines preserved VERBATIM (asserted by 339 prior-bucket tests passing unmodified)
- [x] AuditEntry Pydantic with `extra="forbid"` (asserted by test)
- [x] SECURITY walk-payload sweep over diverse batch passes (clean + polluted negative-control)
- [x] KORA_HOME / HERMES_HOME fallback verified
- [x] Operator runbook note about log rotation added
## After this lands
Per the bucket spec, the JSONL surface unblocks 3 panel flips
(small follow-on bucket KR-AUDIT-PANEL-ENDPOINTS):
- **AGENT-ACTIVITY-PANEL** flip — reads kora_audit_log.jsonl,
filters to seam in ("mcp.tool_called", "reasoning.tool_called")
- **REASONING-PANEL** flip — filters to seam="reasoning.tool_called"
- **WEBHOOK-EVENTS-PANEL** flip — filters to seam="webhook.dead_letter"
When substrate ships the audit-ledger contract (coord ask
2026-05-22), `emit_audit` extends to triple-writer (structured
log + JSONL + substrate event_log row); panels can continue
reading JSONL OR move to substrate reads — same row shape.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Bridge bucket between today's structured-log audit lines and the future substrate-backed audit. Promotes the 4 audit emitters (mcp.tool_called covers MCP read + mutating; webhook.dead_letter; slack_dm.reply_failed; reasoning.tool_called) to ALSO write JSONL rows that operator panels can consume programmatically.
Bucket spec: `kora_docs/17_cc_bucket_prompts/KR-AUDIT-JSONL-SINK_bridge_to_substrate.md`.
Base: `feature/phase2-upgrades` — NOT main.
New module
`kora_cli/audit/jsonl_sink.py` (~210 LOC) — `AuditEntry` Pydantic model with `extra="forbid"` (catches schema drift across emit sites) + `emit_audit()` JSONL-only writer (best-effort; OSError WARN + continue). Path resolution via `kora_constants.get_kora_home()` (KORA_HOME primary + legacy HERMES_HOME fallback).
Dual-write architecture
Each caller retains its existing `[kora.]` structured-log line VERBATIM AND calls `emit_audit()` afterward to write the JSONL row. PM's "no breaking change" constraint preserved byte-for-byte across all 4 emit sites — operator grep workflows that targeted the prior shapes keep working.
`emit_audit()` is JSONL-only by design — keeps the structured-log format under each caller's control, avoiding format drift across the 4 seams that ship distinct line shapes today.
Refactored emit sites (4 sites covering 5 seam usages)
(MCP read tools share the `mcp.tool_called` seam name; the existing emit helper covers mutating tools only — read-tool audit is a deferred follow-on.)
Bug-on-first-pass caught + fixed
Initial draft moved the structured-log emit INTO `emit_audit`'s generic kv-pair builder. That changed the byte-for-byte line format (`tool=X` → `tool_name=X`, field ordering shifted) and broke 9 prior-bucket tests asserting verbatim shape. Restructured to JSONL-only emit_audit + caller-retained structured-log lines. All 339 prior-bucket tests pass unmodified.
Tests (17 new, 481 total all passing)
`test_jsonl_sink.py` (17 tests):
SECURITY — 4-layer carry-forward
Per spec:
Operator runbook note
`kora_runtime_first_deploy_runbook.md` extended with new "Operator obligations — JSONL log rotation" section listing the 4 append-only JSONL files, the operator-managed rotation mechanism (logrotate copytruncate / Fly log-tailing), and the disk-full failure mode (`[kora.audit.skipped]` WARN + graceful structured-log degradation).
§4 ship checklist
What unblocks
Per the bucket spec, the JSONL surface unblocks 3 panel flips (small follow-on bucket KR-AUDIT-PANEL-ENDPOINTS):
When substrate ships the audit-ledger contract (coord ask 2026-05-22), `emit_audit` extends to triple-writer (structured log + JSONL + substrate event_log row); panels continue reading JSONL OR move to substrate — same row shape.
🤖 Generated with Claude Code