Skip to content
This repository was archived by the owner on May 26, 2026. It is now read-only.

feat(kora): KR-AUDIT-JSONL-SINK — JSONL bridge for 5 audit seams#139

Merged
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-AUDIT-JSONL-SINK
May 23, 2026
Merged

feat(kora): KR-AUDIT-JSONL-SINK — JSONL bridge for 5 audit seams#139
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-AUDIT-JSONL-SINK

Conversation

@rafe-walker

Copy link
Copy Markdown
Owner

Summary

Bridge bucket between today's structured-log audit lines and the future substrate-backed audit. Promotes the 4 audit emitters (mcp.tool_called covers MCP read + mutating; webhook.dead_letter; slack_dm.reply_failed; reasoning.tool_called) to ALSO write JSONL rows that operator panels can consume programmatically.

Bucket spec: `kora_docs/17_cc_bucket_prompts/KR-AUDIT-JSONL-SINK_bridge_to_substrate.md`.

Base: `feature/phase2-upgrades` — NOT main.

New module

`kora_cli/audit/jsonl_sink.py` (~210 LOC) — `AuditEntry` Pydantic model with `extra="forbid"` (catches schema drift across emit sites) + `emit_audit()` JSONL-only writer (best-effort; OSError WARN + continue). Path resolution via `kora_constants.get_kora_home()` (KORA_HOME primary + legacy HERMES_HOME fallback).

Dual-write architecture

Each caller retains its existing `[kora.]` structured-log line VERBATIM AND calls `emit_audit()` afterward to write the JSONL row. PM's "no breaking change" constraint preserved byte-for-byte across all 4 emit sites — operator grep workflows that targeted the prior shapes keep working.

`emit_audit()` is JSONL-only by design — keeps the structured-log format under each caller's control, avoiding format drift across the 4 seams that ship distinct line shapes today.

Refactored emit sites (4 sites covering 5 seam usages)

Seam File Source bucket
`mcp.tool_called` (mutating) `kora_cli/listeners/mcp_tools.py:_emit_audit` KR-MCP-RUNTIME-SURFACE ST2
`webhook.dead_letter` `kora_cli/listeners/webhook_dead_letter.py:emit_webhook_dead_letter` KR-D-DAEMON ST3
`slack_dm.reply_failed` `kora_cli/handlers/slack_dm_handler.py:_emit_reply_failed_event` KR-FEAT-SLACK-DM ST2
`reasoning.tool_called` `kora_cli/reasoning/anthropic_engine.py:_emit_tool_called_audit` KR-FEAT-AGENTIC-REASONING ST2

(MCP read tools share the `mcp.tool_called` seam name; the existing emit helper covers mutating tools only — read-tool audit is a deferred follow-on.)

Bug-on-first-pass caught + fixed

Initial draft moved the structured-log emit INTO `emit_audit`'s generic kv-pair builder. That changed the byte-for-byte line format (`tool=X` → `tool_name=X`, field ordering shifted) and broke 9 prior-bucket tests asserting verbatim shape. Restructured to JSONL-only emit_audit + caller-retained structured-log lines. All 339 prior-bucket tests pass unmodified.

Tests (17 new, 481 total all passing)

`test_jsonl_sink.py` (17 tests):

  • AuditEntry shape: minimal / full construction; `extra="forbid"` rejects unknown field; rejects invalid seam / source
  • emit_audit append: parseable JSONL line per call; append-only multi-call; creates parent dir
  • Path resolution: env override / KORA_HOME default / HERMES_HOME fallback
  • Degrade-to-log-only: unwritable path WARN+return-no-crash; invalid seam → defensive log + no JSONL write
  • SECURITY walk-payload sweep (2 tests):
    • Clean batch (4 seams × realistic safe details) passes
    • Polluted batch (Slack token / Anthropic OAuth / Bearer header / email PII) tripped by sweep regexes
  • Per-seam allow-list — exercises all 4 refactored emitters indirectly + verifies JSONL `details` keys are subset of declared per-seam allow-list. Drift catch: any new field added to an emit site requires updating `_SEAM_ALLOWED_KEYS` + security review of the new field's content.
  • Dual-write verification — single emit produces BOTH the verbatim structured-log line AND the JSONL row.

SECURITY — 4-layer carry-forward

Per spec:

  1. `details` filter contract: each caller pre-filters its dict to safe shapes (`args_keys` not values / `body_bytes` not body / `text_len` not text). Same shape preserved from each emit site's pre-existing safe field set.
  2. Walk-payload sweep: regex against token shapes (xoxb / xoxp / xapp / sk-ant-oat / sk-ant / Bearer / AKIA) + PII (email-address). Clean batch passes; polluted batch tripped.
  3. Per-seam allow-list: declared key set; new fields require allow-list update + security review.
  4. No engine input/output bodies: existing audit emitters already excluded these (asserted by prior bucket tests); refactor preserves the boundary.

Operator runbook note

`kora_runtime_first_deploy_runbook.md` extended with new "Operator obligations — JSONL log rotation" section listing the 4 append-only JSONL files, the operator-managed rotation mechanism (logrotate copytruncate / Fly log-tailing), and the disk-full failure mode (`[kora.audit.skipped]` WARN + graceful structured-log degradation).

§4 ship checklist

  • Base `feature/phase2-upgrades`
  • Title per format
  • All 4 emitters refactored; structured-log lines preserved VERBATIM (asserted by 339 prior-bucket tests passing unmodified)
  • `AuditEntry` Pydantic with `extra="forbid"`
  • SECURITY walk-payload sweep over diverse batch passes
  • KORA_HOME / HERMES_HOME fallback verified
  • Operator runbook note added
  • Tests pass locally (481/481)

What unblocks

Per the bucket spec, the JSONL surface unblocks 3 panel flips (small follow-on bucket KR-AUDIT-PANEL-ENDPOINTS):

  • AGENT-ACTIVITY-PANEL — filters to seam in (mcp.tool_called, reasoning.tool_called)
  • REASONING-PANEL — filters to seam=reasoning.tool_called
  • WEBHOOK-EVENTS-PANEL — filters to seam=webhook.dead_letter

When substrate ships the audit-ledger contract (coord ask 2026-05-22), `emit_audit` extends to triple-writer (structured log + JSONL + substrate event_log row); panels continue reading JSONL OR move to substrate — same row shape.

🤖 Generated with Claude Code

Bridge bucket between today's structured-log audit lines and the
future substrate-backed audit. Promotes the 4 audit emitters
(mcp.tool_called covers MCP read + mutating; webhook.dead_letter;
slack_dm.reply_failed; reasoning.tool_called) to ALSO write JSONL
rows that operator panels can consume programmatically.

When substrate-team ships the audit-ledger contract (coord ask
2026-05-22), emit_audit extends to triple-write; panel endpoints
stay reading the same JSONL shape OR pivot to substrate reads.

## New module

**`kora_cli/audit/`** (NEW package):

- **`jsonl_sink.py`** (~210 lines) — `AuditEntry` Pydantic model
  with `extra="forbid"` (catches schema drift across emit sites)
  + `emit_audit()` JSONL-only writer (best-effort; OSError WARN +
  continue). Uses `kora_constants.get_kora_home()` path resolution
  (KORA_HOME primary + legacy HERMES_HOME fallback via the same
  pattern as slack_dm_log.jsonl).
- **`__init__.py`** — re-exports `AuditEntry` + `emit_audit`.

## Dual-write architecture

Each caller retains its existing `[kora.<seam>]` structured-log
line **VERBATIM** AND calls `emit_audit()` afterward to write
the JSONL row. PM's "no breaking change" constraint preserved
byte-for-byte across all 4 emit sites — operator grep workflows
that targeted the prior shapes keep working.

`emit_audit()` is JSONL-only by design — keeps the structured-log
format under each caller's control, avoiding format drift across
the 4 seams that ship distinct line shapes today.

## Refactored emit sites

| Seam | File | Source bucket |
|---|---|---|
| `mcp.tool_called` (mutating; read pending follow-on) | `kora_cli/listeners/mcp_tools.py:_emit_audit` | KR-MCP-RUNTIME-SURFACE ST2 |
| `webhook.dead_letter` | `kora_cli/listeners/webhook_dead_letter.py:emit_webhook_dead_letter` | KR-D-DAEMON ST3 |
| `slack_dm.reply_failed` | `kora_cli/handlers/slack_dm_handler.py:_emit_reply_failed_event` | KR-FEAT-SLACK-DM ST2 |
| `reasoning.tool_called` | `kora_cli/reasoning/anthropic_engine.py:_emit_tool_called_audit` | KR-FEAT-AGENTIC-REASONING ST2 |

Each emitter: (1) keeps its existing logger.info/warning call
verbatim, (2) imports `from kora_cli.audit import emit_audit`,
(3) calls emit_audit with the same details it was already
building locally + a seam-shaped `source` literal.

## Pre-bug-on-first-pass caught

Initial draft moved the structured-log emit INTO `emit_audit`'s
generic kv-pair builder. That changed the byte-for-byte line
format (`tool=X` → `tool_name=X`, field ordering shifted) and
broke 9 prior-bucket tests that asserted the verbatim shape.
Restructured to JSONL-only emit_audit + caller-retained
structured-log lines. All 339 prior-bucket tests pass unmodified.

## Tests (17 new, 481 total all passing)

**`test_jsonl_sink.py`** (17 tests):

- **AuditEntry shape**: minimal construction / full construction /
  rejects unknown top-level field (extra="forbid") / rejects
  invalid seam / rejects invalid source
- **emit_audit JSONL append**: parseable JSONL line per call /
  append-only multi-call / creates parent dir
- **Path resolution**: env override / KORA_HOME default /
  HERMES_HOME fallback chain
- **Degrade-to-log-only**: unwritable path → WARN + return no
  crash / invalid seam → defensive log + no JSONL write no raise
- **SECURITY walk-payload sweep** (2 tests):
  - Clean batch (4 seams × realistic safe details) passes
  - Polluted batch (Slack token / Anthropic OAuth / Bearer
    header / email PII) tripped by sweep regex
- **Per-seam allow-list test** — exercises all 4 refactored
  emitters indirectly + verifies JSONL `details` keys are subset
  of declared per-seam allow-list. Drift catch: any new field
  added to an emit site must update _SEAM_ALLOWED_KEYS + get a
  security review of the new field's content.
- **Dual-write verification** — single emit produces BOTH the
  verbatim structured-log line + the JSONL row.

## SECURITY — 4-layer carry-forward

Per spec §2 SECURITY:

1. **`details` filter contract**: each caller pre-filters its
   `details` dict to safe shapes — `args_keys` (sorted key names,
   values dropped) / `body_bytes` (count, not body) /
   `text_len` (length, not text) / etc. Same shape preserved
   from each emit site's pre-existing safe field set.
2. **Walk-payload sweep**: synthetic JSONL batch run against
   token-shape regexes (`xoxb-`, `xoxp-`, `xapp-`, `sk-ant-oat-`,
   `sk-ant-`, `Bearer`, `AKIA[0-9A-Z]{16}`) + PII regex
   (email-address). Clean batch passes; polluted batch tripped.
3. **Per-seam allow-list**: declared key set per seam. Adding a
   new field requires updating both the allow-list AND the
   security review.
4. **No engine input/output bodies**: existing audit emitters
   already exclude tool input/output bodies (asserted by prior
   bucket tests); refactor preserves that boundary — only
   `args_keys` (not args values) and `text_len` (not text)
   surface in details.

## Operator runbook note

`kora_runtime_first_deploy_runbook.md` extended with a new
"Operator obligations — JSONL log rotation" section listing:

- The 4 append-only JSONL files (slack_dm + email outbound +
  email inbound + kora_audit_log)
- Daemon does NOT auto-rotate; operator manages via logrotate
  copytruncate / Fly log-tailing / periodic ssh-rotate
- Disk-full failure mode (`[kora.audit.skipped]` WARN +
  TCP healthcheck red); structured-log lines still emit for
  graceful degradation
- When substrate-audit lands → JSONL becomes
  debug/forensics-only

## §4 ship checklist

- [x] PR base = `feature/phase2-upgrades`
- [x] Title format `feat(kora): KR-AUDIT-JSONL-SINK — JSONL bridge for 5 audit seams`
- [x] All 4 emitters refactored; existing structured-log lines preserved VERBATIM (asserted by 339 prior-bucket tests passing unmodified)
- [x] AuditEntry Pydantic with `extra="forbid"` (asserted by test)
- [x] SECURITY walk-payload sweep over diverse batch passes (clean + polluted negative-control)
- [x] KORA_HOME / HERMES_HOME fallback verified
- [x] Operator runbook note about log rotation added

## After this lands

Per the bucket spec, the JSONL surface unblocks 3 panel flips
(small follow-on bucket KR-AUDIT-PANEL-ENDPOINTS):

- **AGENT-ACTIVITY-PANEL** flip — reads kora_audit_log.jsonl,
  filters to seam in ("mcp.tool_called", "reasoning.tool_called")
- **REASONING-PANEL** flip — filters to seam="reasoning.tool_called"
- **WEBHOOK-EVENTS-PANEL** flip — filters to seam="webhook.dead_letter"

When substrate ships the audit-ledger contract (coord ask
2026-05-22), `emit_audit` extends to triple-writer (structured
log + JSONL + substrate event_log row); panels can continue
reading JSONL OR move to substrate reads — same row shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rafe-walker rafe-walker merged commit 5416fdc into feature/phase2-upgrades May 23, 2026
@rafe-walker rafe-walker deleted the feat/kora-KR-AUDIT-JSONL-SINK branch May 23, 2026 01:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant