feat(kora): KR-CHEAP-PROMPT-CACHING — ephemeral cache on system prompt + tool descriptions by rafe-walker · Pull Request #158 · rafe-walker/kora

rafe-walker · 2026-05-24T01:22:50Z

Summary

Prompt caching active. Expected ~50% input-token cost reduction on warm cache. Baseline-setting for subsequent cheap-substrate work (per Council R3 Lock R3-4 — this ships FIRST so post-caching telemetry measures the post-caching world).

Call sites with `cache_control`

Site	What's cached	How
`anthropic_engine.py:_tool_use_loop` — `kwargs["system"]`	Entire `kora_system_prompt.md` content	Bare string → `[{type: text, text: ..., cache_control: {type: ephemeral}}]` content-block list via new helper `_wrap_system_as_cacheable`
`anthropic_engine.py:_tool_use_loop` — `kwargs["tools"]`	Full tool list (5 read-only reasoning tools)	Last tool gets `cache_control: ephemeral` via new helper `_wrap_tools_as_cacheable`. Marker on last block covers everything before it per Anthropic API semantics. Input list never mutated.

Both structures are built once outside the iteration loop so byte-identical kwargs go to the SDK on every roundtrip — cache key matches → reads hit. Verified by `test_engine_sends_same_system_and_tools_each_iteration`.

Empty-tool-list path preserved: registry fails → `tools=[]` → kwargs["tools"] omitted entirely (some SDK versions reject empty arrays).

Cost-ladder accounting patch

The infrastructure ALREADY supported cache tokens (`CanonicalUsage(cache_read_tokens, cache_write_tokens)` + `PricingEntry.cache_*_cost_per_million` — Opus 4.7 priced at $0.50 read / $6.25 write per million). The gap was the engine→handler→holder data flow. Six-step patch:

`ResponseResult` extended with `cache_creation_input_tokens` + `cache_read_input_tokens` (default 0, backwards-compatible).
`_tool_use_loop` accumulates both per-iteration from `usage.cache_creation_input_tokens` / `usage.cache_read_input_tokens` (`getattr`-default 0 for older SDK or uncached call).
`_project_final_response` signature + body pass cache totals through. Max-iter exit + projection-failed branches also wired.
`slack_dm_handler.py` `reasoning_meta` dict gained the two cache fields at init + the engine-exception branch + the success-path meta build.
`_record_inference_to_cost_ladder` reads cache fields from meta + builds `CanonicalUsage(input, output, cache_write_tokens, cache_read_tokens)`. Early-return guard now checks ALL four token buckets so a pure-cache-read call still bills.
`_append_outbound_log_entry` extended with the same kwargs so cache tokens persist into the slack DM outbound JSONL — reasoning panel can surface per-call cache-hit rates without grepping logs.

K-DG verification

Anthropic SDK 0.87.0 (one minor ahead of spec's 0.86.0) verified to expose `CacheControlEphemeralParam`, `TextBlockParam`, `ToolParam` — cache_control syntax identical between minor versions per https://platform.claude.com/docs/en/agents-and-tools/prompt-caching. The cost-ladder code path was K-DG'd BEFORE edits: `grep -rn record_inference` surfaced the staticmethod at `slack_dm_handler.py:645` + the `PricingEntry` table in `agent/usage_pricing.py` with rates for every Claude 4.X model.

Tests

16 new tests in `tests/kora_cli/reasoning/test_anthropic_engine_caching.py` covering:

Wrapper unit tests (system shape, tool shape, mutation safety, empty + single-tool edge cases)
Engine integration (system + tools shape on SDK call; tools omitted when registry empty; SAME structures across iterations for cache key stability)
ResponseResult surfaces accumulated cache totals; tolerates missing usage attrs
Handler cost-ladder integration (cache fields → CanonicalUsage; None → 0; pure-cache-read still bills; all-zero skips)

1 pre-existing test updated in `test_anthropic_engine.py`: `test_system_prompt_passed_to_sdk` now asserts the content-block list shape.

Test plan

16 new caching tests pass
483/483 reasoning + handlers + listeners pass serially
Full repo xdist: 9228 passed; 43 failed identical to baseline (test_anthropic_adapter / test_backup / test_config / test_gateway_* / test_web_server* / test_kanban_db / test_list_picker_providers / test_model_switch_* / test_startup_plugin_gating). Zero failures in reasoning or handlers.
No regression in tool-use loop semantics (parallel dispatch, audit emit, max-iteration cap, error isolation)
No regression in SlackDM outbound JSONL backwards compat (cache fields use the same None-omit semantic as the existing reasoning fields)

🤖 Generated with Claude Code

…t + tool descriptions Per Council R3 Lock R3-4, this is the build-list #1 baseline- setter: prompt caching ships FIRST so subsequent cheap-substrate telemetry measures the post-caching world. Expected ~50% input- token cost reduction on warm reasoning calls. # Two cache breakpoints (Anthropic API allows 4) System prompt + tool descriptions both marked with ``cache_control: {"type": "ephemeral"}``. The marker on a block caches everything UP TO AND INCLUDING that block, so two breakpoints cover both static input regions. # Call sites with cache_control 1. ``kora_cli/reasoning/anthropic_engine.py:_tool_use_loop`` — SDK kwargs["system"] is now a content-block list (was bare string) with cache_control on its single text block. Built ONCE outside the iteration loop so the same structure is sent on every roundtrip → cache key matches → reads hit. Wrapped via new module-level helper ``_wrap_system_as_ cacheable``. 2. Same loop — kwargs["tools"] now has cache_control on the LAST tool only (covers everything before it per API semantics). Wrapped via ``_wrap_tools_as_cacheable``; input list never mutated. Both wrappers tested as pure functions + via engine-level integration assertions. Tools-empty case preserved: registry fails → tools=[] → kwargs["tools"] omitted entirely (some SDK versions reject empty arrays). # Cost-ladder accounting patch The cost-ladder infrastructure ALREADY supported cache tokens via ``CanonicalUsage(cache_read_tokens, cache_write_tokens)`` + ``PricingEntry.cache_*_cost_per_million`` (Opus 4.7: $0.50 cache_read, $6.25 cache_write per million tokens). The gap was the engine→handler→holder data flow: 1. ``ResponseResult`` extended with ``cache_creation_input_tokens`` + ``cache_read_input_tokens`` fields (default 0, backwards-compatible). 2. ``_tool_use_loop`` accumulates both per iteration (mirror of input/output accumulation) reading from ``usage.cache_creation_input_tokens`` / ``usage.cache_read_input_tokens`` (getattr-default to 0 when older SDK / uncached call leaves them absent). 3. ``_project_final_response`` signature + body pass them through. Max-iter exit + the projection-failed branch also wired. 4. ``slack_dm_handler.py``: initial ``reasoning_meta`` dict + the engine-exception branch + the success-path meta build all gained the two cache fields. 5. ``_record_inference_to_cost_ladder`` reads the cache fields from meta + constructs ``CanonicalUsage(input_tokens, output_tokens, cache_write_tokens, cache_read_tokens)`` — PricingEntry's per-million rates do the rest. Early-return guard now checks ALL four token buckets (a pure-cache-read call still bills). 6. ``_append_outbound_log_entry`` extended with the same two optional kwargs so the cache totals persist into the slack DM outbound JSONL — reasoning panel can surface cache-hit rate per call without grepping logs. # K-DG verification Anthropic SDK 0.87.0 (one minor ahead of the spec's 0.86.0) verified to expose ``CacheControlEphemeralParam``, ``TextBlockParam``, ``ToolParam`` — cache_control syntax identical between minor versions per the docs at https://platform.claude.com/docs/en/agents-and-tools/prompt-caching. # Tests tests/kora_cli/reasoning/test_anthropic_engine_caching.py — 16 new tests: - Wrapper unit tests: system → content-block list with marker; tools → last marked, earlier unmarked; input not mutated; empty → empty; single-tool gets marker - Engine integration: system kwarg shape; tools[-1] has marker + tools[:-1] do NOT; tools kwarg omitted when registry returns empty; SAME system + tools sent on every iteration (cache key stability across tool-use loop) - ResponseResult surfaces cache_creation + cache_read totals; accumulates across iterations (write on iter 1, reads on iters 2+3); defaults to 0 when usage attrs missing - Handler cost-ladder: reasoning_meta cache fields flow into CanonicalUsage(cache_write_tokens, cache_read_tokens); None → 0 fallback; pure-cache-read call still bills; all- zero call still skips tests/kora_cli/reasoning/test_anthropic_engine.py — 1 pre- existing test updated: ``test_system_prompt_passed_to_sdk`` now asserts the content-block list shape instead of bare-string. # Regression 483/483 reasoning + handlers + listeners pass serially. Full repo xdist: 9228 passed, 43 failed identical to baseline (test_anthropic_adapter / test_backup / test_config / test_gateway_* / test_web_server* / test_kanban_db / test_list_picker_providers / test_model_switch_* / test_startup_plugin_gating / test_web_server_cron_profiles). Zero failures in reasoning or handlers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lock R3-8 sub-cut (c) implementation. 34 panels instrumented (8 panels + 26 pages). Backend: POST /api/panel_view → ${KORA_HOME}/panel_views.jsonl (Path B chosen — separate file from kora_audit_log.jsonl to preserve audit log's forensic semantics per CC#2's K-DG sweep). Hook: web/src/hooks/usePanelView.ts — fire-and-forget POST on mount; silent failure (instrumentation must never break UX). 18/18 endpoint+pin tests + 210/210 regression. tsc -b + vite build clean. Rebased onto current feature/phase2-upgrades (post #157 snapshot + #158 caching) to resolve adjacent-endpoint-addition conflict in kora_cli/web_server.py.

rafe-walker merged commit 559c46f into feature/phase2-upgrades May 24, 2026

rafe-walker deleted the feat/kora-KR-CHEAP-PROMPT-CACHING branch May 24, 2026 01:28

This was referenced May 24, 2026

feat(kora): KR-CHEAP-TRIVIAL-DM-SHORTCIRCUIT — regex + snapshot-interp phrasebook (R3-4 #2) #160

Merged

research(kora): KR-FORK-HOOK-VERIFY — Hermes Gateway hook coverage report #168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(kora): KR-CHEAP-PROMPT-CACHING — ephemeral cache on system prompt + tool descriptions#158

feat(kora): KR-CHEAP-PROMPT-CACHING — ephemeral cache on system prompt + tool descriptions#158
rafe-walker merged 1 commit into
feature/phase2-upgradesfrom
feat/kora-KR-CHEAP-PROMPT-CACHING

rafe-walker commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rafe-walker commented May 24, 2026

Summary

Call sites with `cache_control`

Cost-ladder accounting patch

K-DG verification

Tests

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant