This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-FEAT-AGENTIC-REASONING ST2 — audit + system prompt + JSONL extension#136
Merged
rafe-walker merged 1 commit intoMay 23, 2026
Conversation
…SONL extension
Closes the agentic-reasoning arc. ST1 wired the tool-use loop;
ST2 adds the operator-visible audit surface, updates Kora's
system prompt to make her tool-aware, and extends the Slack-DM
outbound JSONL with the `tools_used` field for cockpit
consumption.
## Three deliverables per spec §2
### 1. Tool-call audit logging
Every tool execution inside the reasoning loop emits one
`[kora.reasoning.tool_called]` structured-log line with:
- `tool_name`
- `triggered_by` — `slack_dm` / `email` / `mcp` (matches
`IncomingMessage.source`)
- `caller_session_id` — source-shaped stable identifier so
operator log-analysis can join reasoning audit lines back
to the inbound JSONL entry that triggered them:
- slack_dm: `"{channel_id}:{event_ts}"`
- email: `"email:{message_id}"`
- mcp: `"mcp:{caller_actor_kind}:{tool_name}"`
- `tool_duration_ms` — engine-side wall-clock per tool call
- `tool_status` — one of `"ok"` / `"not_allowed"` /
`"execution_error"` (with `exc_type=<class>` annotation on
execution_error)
**No substrate write yet** — same audit-seam precedent as
KR-MCP-RUNTIME-SURFACE ST2's `[kora.mcp.tool_called]` and
KR-D-DAEMON ST3's `[kora.webhook.dead_letter]`. The next bucket
(substrate-backed conversation memory + audit) promotes all
three seams together.
**Body-content hygiene**: tool input args + Joshua's inbound
text NEVER appear in audit log lines. Only stable identifiers
(tool name, status codes, IDs) are logged. Asserted by test
exercising a sensitive-marker that lives ONLY in tool input +
final response text — marker absent from all audit lines.
### 2. JSONL outbound `tools_used` extension
`SlackDMHandler` outbound JSONL entries now carry
`tools_used: list[str] | omitted` with three states:
- **Key absent**: engine bypassed / refused / errored (no
tools could have been called)
- **Key present, `[]`**: engine completed reasoning + chose
to use zero tools (different signal than "engine bypassed")
- **Key present, `[<names>]`**: engine called these tools
(oldest→newest, duplicates preserved for accurate count)
Backwards-compatible: pre-ST2 outbound entries simply omit the
field. The REASONING-PANEL frontend (CC#2) consumes
`tools_used` to surface "this response used N tools" without
parsing structured logs.
### 3. System prompt — Tool Use section
`kora_docs/00_canonical_current_state/kora_system_prompt.md`
extended (+75 lines) with:
- Tool surface enumeration (5 read-only tools + when to use
each)
- Tone guidance: **never pre-announce** (no "Let me check…").
Just call the tool + respond with the answer.
- Cost awareness: each tool call = separate API roundtrip
against the $200/mo Max plan; don't sweep "just to be
thorough"
- Max 5 tool calls per response (safety cap)
- Result-citation guidance: use concrete returned values
directly, not paraphrases
- **Mutation boundary** explicitly framed: "Kora REASONS in
her DM thread; AGENTS DRIVE her via MCP." When Joshua asks
for mutation (pause yourself / create a ticket / send a
message), Kora explains she can't initiate that + suggests
the operator-driven path
Memory + context section also updated — removed the
"future buckets will let you call tools" placeholder (those
buckets are now THIS bucket); replaced with the live tool
surface description.
## Engine changes
`kora_cli/reasoning/anthropic_engine.py` (+162 lines):
- `respond()` derives `triggered_by` (from `message.source`) +
`caller_session_id` (from `IncomingMessage` metadata via
`_derive_caller_session_id()`) at entry; threads them down to
`_tool_use_loop()` → `_execute_tool_calls()`.
- `_execute_tool_calls()` times each tool call with
`time.monotonic()`; emits a `_emit_tool_called_audit()` line
per result (ok / not_allowed / execution_error).
- `_derive_caller_session_id()` source-shape helper exposed at
module level for direct test coverage.
- `_emit_tool_called_audit()` is the stable audit-seam — the
signature mirrors what a substrate-backed promotion would
consume.
## Handler changes
`kora_cli/handlers/slack_dm_handler.py` (+41 lines):
- `reasoning_meta` dict gains a `tools_used` key. Set to
`list(result.tools_used)` on engine success;
`None` on engine error / bypass / exception (engine refused
before invoking tools — semantically equivalent to bypass).
- `_append_outbound_log_entry()` accepts new
`tools_used: Optional[List[str]]` kwarg. Writes `tools_used`
key only when not None — preserves the empty-list vs
omitted-key distinction.
- `typing.List` import added.
## Tool-input validation strictness — verified
Per spec §2 ST2(c). Test exercises a TypeError raised by the
executor (mirrors what Pydantic-validation-failure /
unknown-kwarg looks like at the dispatcher boundary, e.g. when
Claude hallucinates a kwarg name):
- Engine catches the TypeError at `_execute_tool_calls`
- Emits tool_result error block with `is_error: true` +
`content: "tool_execution_error: TypeError"`
- Records audit log line with
`tool_status="execution_error"` + `exc_type=TypeError`
- Engine continues to next iteration; final API call sees
the error block + recovers
- ResponseResult.error is None (engine itself didn't crash)
- Handler completes the response normally
## Tests (18 new, 404 total all passing)
**`test_anthropic_engine_tool_audit.py`** (12 tests):
- `_derive_caller_session_id` source-shape: slack_dm (full +
missing event_ts + empty metadata), email, mcp
- Audit log on successful tool call (tool_status=ok)
- Audit log on tool_not_allowed (mutating tool)
- Audit log on execution_error (with exc_type)
- Audit log NEVER contains tool input bodies (sensitive-marker
asserted absent)
- Multiple tool calls → one audit line per call (3 tools across
2 iterations → 3 audit lines)
- **Validation strictness**: malformed Claude input
(TypeError) → engine doesn't crash + tool_result error block
reaches next iteration + audit records execution_error
**`test_slack_dm_tools_used.py`** (6 tests):
- Successful reasoning with no tools → `tools_used: []` in JSONL
- Successful reasoning with N tools → `tools_used: [<names>]`
- Duplicates preserved (counts honored)
- `tools_used` OMITTED when engine_unavailable / engine returns
error / engine raises exception
- **Schema backwards-compat**: mixed-path JSONL file —
consumers branch on `tools_used` key presence + finds
well-formed entries for all 3 paths (tools/no-tools/canned)
## §5 ship checklist
- [x] Base `feature/phase2-upgrades`
- [x] Title per format
- [x] Tool execution paths fail-soft (verified — engine
continues across all 3 failure modes)
- [x] Audit emitter pure-functional (no body content; only
identifiers + status)
- [x] System prompt updated (tone + boundary)
- [x] JSONL extension backwards-compatible (key presence
branches semantically)
- [x] Validation strictness test green
- [x] K-DG: literal field names — `_emit_tool_called_audit`
params + `IncomingMessage.source` literal type +
`IncomingMessage.metadata` shape per-source confirmed
- [x] Tests pass locally (**404/404** across full suite)
## Phase 2 + AGENTIC-REASONING closes
Joshua DMs "what's my daemon doing?" → Kora calls
`kora__get_operational_state` → real-data response. The
agentic-reasoning extension to Feature 5 is shipped end-to-end:
- Tool-use loop (ST1)
- Operator-visible audit + JSONL surface (this PR)
- System prompt makes Kora tool-aware (this PR)
- Validation strictness verified (this PR)
## What's next
Per PM: **substrate-backed conversation memory + audit** —
replaces:
- `slack_dm_log.jsonl` context-loading with IsoKron substrate reads
- `[kora.slack_dm.received]` / `[kora.slack_dm.reply_failed]` /
`[kora.mcp.tool_called]` / `[kora.reasoning.tool_called]` /
`[kora.webhook.dead_letter]` structured-log seams with
substrate event_log writes
Big architectural piece; needs cross-team coord with IsoKron
PM on the substrate schema (new event vocab literals + a
permissive `kora_operation_ledger` shape OR a dedicated
`kora_audit_log` table). PM is drafting the bucket + coord
ask now.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes the agentic-reasoning arc. ST1 wired the tool-use loop; ST2 adds the operator-visible audit surface, updates Kora's system prompt to make her tool-aware, and extends the Slack-DM outbound JSONL with the `tools_used` field.
Bucket spec: `kora_docs/17_cc_bucket_prompts/KR-FEAT-AGENTIC-REASONING_tool_use_mid_loop.md`.
Base: `feature/phase2-upgrades` — NOT main.
Three deliverables
1. Tool-call audit logging
Every tool execution emits one `[kora.reasoning.tool_called]` structured-log line with:
No substrate write yet — same audit-seam precedent as KR-MCP-RUNTIME-SURFACE ST2 + KR-D-DAEMON ST3 + KR-FEAT-SLACK-DM. Substrate-backed audit follow-on promotes all seams together.
Body-content hygiene: tool input args + Joshua's inbound text NEVER appear in audit lines (asserted via sensitive-marker test).
2. JSONL outbound `tools_used` extension
`SlackDMHandler` outbound entries gain `tools_used: list[str] | omitted` with three semantic states:
Backwards-compatible: pre-ST2 entries omit the field.
3. System prompt — Tool Use section
`kora_docs/00_canonical_current_state/kora_system_prompt.md` (+75 lines):
Engine changes (+162 LOC)
Handler changes (+41 LOC)
Tool-input validation strictness verified
Per spec §2 ST2(c). Test exercises a `TypeError` raised by the executor (mirrors Pydantic validation failure / unknown-kwarg at dispatcher boundary):
Tests (18 new, 404 total all passing)
`test_anthropic_engine_tool_audit.py` (12 tests):
`test_slack_dm_tools_used.py` (6 tests):
§5 ship checklist
Agentic-reasoning closes
Joshua DMs "what's my daemon doing?" → Kora calls `kora__get_operational_state` → real-data response. The full extension to Feature 5 is shipped:
What's next
Per PM: substrate-backed conversation memory + audit — replaces:
Cross-team coord with IsoKron PM on the substrate schema (new event vocab literals + permissive `kora_operation_ledger` shape OR dedicated `kora_audit_log` table). Big architectural piece.
🤖 Generated with Claude Code