This repository was archived by the owner on May 26, 2026. It is now read-only.
feat(kora): KR-CHEAP-PROMPT-CACHING — ephemeral cache on system prompt + tool descriptions#158
Merged
rafe-walker merged 1 commit intoMay 24, 2026
Conversation
…t + tool descriptions Per Council R3 Lock R3-4, this is the build-list #1 baseline- setter: prompt caching ships FIRST so subsequent cheap-substrate telemetry measures the post-caching world. Expected ~50% input- token cost reduction on warm reasoning calls. # Two cache breakpoints (Anthropic API allows 4) System prompt + tool descriptions both marked with ``cache_control: {"type": "ephemeral"}``. The marker on a block caches everything UP TO AND INCLUDING that block, so two breakpoints cover both static input regions. # Call sites with cache_control 1. ``kora_cli/reasoning/anthropic_engine.py:_tool_use_loop`` — SDK kwargs["system"] is now a content-block list (was bare string) with cache_control on its single text block. Built ONCE outside the iteration loop so the same structure is sent on every roundtrip → cache key matches → reads hit. Wrapped via new module-level helper ``_wrap_system_as_ cacheable``. 2. Same loop — kwargs["tools"] now has cache_control on the LAST tool only (covers everything before it per API semantics). Wrapped via ``_wrap_tools_as_cacheable``; input list never mutated. Both wrappers tested as pure functions + via engine-level integration assertions. Tools-empty case preserved: registry fails → tools=[] → kwargs["tools"] omitted entirely (some SDK versions reject empty arrays). # Cost-ladder accounting patch The cost-ladder infrastructure ALREADY supported cache tokens via ``CanonicalUsage(cache_read_tokens, cache_write_tokens)`` + ``PricingEntry.cache_*_cost_per_million`` (Opus 4.7: $0.50 cache_read, $6.25 cache_write per million tokens). The gap was the engine→handler→holder data flow: 1. ``ResponseResult`` extended with ``cache_creation_input_tokens`` + ``cache_read_input_tokens`` fields (default 0, backwards-compatible). 2. ``_tool_use_loop`` accumulates both per iteration (mirror of input/output accumulation) reading from ``usage.cache_creation_input_tokens`` / ``usage.cache_read_input_tokens`` (getattr-default to 0 when older SDK / uncached call leaves them absent). 3. ``_project_final_response`` signature + body pass them through. Max-iter exit + the projection-failed branch also wired. 4. ``slack_dm_handler.py``: initial ``reasoning_meta`` dict + the engine-exception branch + the success-path meta build all gained the two cache fields. 5. ``_record_inference_to_cost_ladder`` reads the cache fields from meta + constructs ``CanonicalUsage(input_tokens, output_tokens, cache_write_tokens, cache_read_tokens)`` — PricingEntry's per-million rates do the rest. Early-return guard now checks ALL four token buckets (a pure-cache-read call still bills). 6. ``_append_outbound_log_entry`` extended with the same two optional kwargs so the cache totals persist into the slack DM outbound JSONL — reasoning panel can surface cache-hit rate per call without grepping logs. # K-DG verification Anthropic SDK 0.87.0 (one minor ahead of the spec's 0.86.0) verified to expose ``CacheControlEphemeralParam``, ``TextBlockParam``, ``ToolParam`` — cache_control syntax identical between minor versions per the docs at https://platform.claude.com/docs/en/agents-and-tools/prompt-caching. # Tests tests/kora_cli/reasoning/test_anthropic_engine_caching.py — 16 new tests: - Wrapper unit tests: system → content-block list with marker; tools → last marked, earlier unmarked; input not mutated; empty → empty; single-tool gets marker - Engine integration: system kwarg shape; tools[-1] has marker + tools[:-1] do NOT; tools kwarg omitted when registry returns empty; SAME system + tools sent on every iteration (cache key stability across tool-use loop) - ResponseResult surfaces cache_creation + cache_read totals; accumulates across iterations (write on iter 1, reads on iters 2+3); defaults to 0 when usage attrs missing - Handler cost-ladder: reasoning_meta cache fields flow into CanonicalUsage(cache_write_tokens, cache_read_tokens); None → 0 fallback; pure-cache-read call still bills; all- zero call still skips tests/kora_cli/reasoning/test_anthropic_engine.py — 1 pre- existing test updated: ``test_system_prompt_passed_to_sdk`` now asserts the content-block list shape instead of bare-string. # Regression 483/483 reasoning + handlers + listeners pass serially. Full repo xdist: 9228 passed, 43 failed identical to baseline (test_anthropic_adapter / test_backup / test_config / test_gateway_* / test_web_server* / test_kanban_db / test_list_picker_providers / test_model_switch_* / test_startup_plugin_gating / test_web_server_cron_profiles). Zero failures in reasoning or handlers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rafe-walker
added a commit
that referenced
this pull request
May 24, 2026
Lock R3-8 sub-cut (c) implementation. 34 panels instrumented (8 panels + 26 pages).
Backend: POST /api/panel_view → ${KORA_HOME}/panel_views.jsonl (Path B chosen — separate file from kora_audit_log.jsonl to preserve audit log's forensic semantics per CC#2's K-DG sweep).
Hook: web/src/hooks/usePanelView.ts — fire-and-forget POST on mount; silent failure (instrumentation must never break UX).
18/18 endpoint+pin tests + 210/210 regression. tsc -b + vite build clean.
Rebased onto current feature/phase2-upgrades (post #157 snapshot + #158 caching) to resolve adjacent-endpoint-addition conflict in kora_cli/web_server.py.
This was referenced May 24, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Prompt caching active. Expected ~50% input-token cost reduction on warm cache. Baseline-setting for subsequent cheap-substrate work (per Council R3 Lock R3-4 — this ships FIRST so post-caching telemetry measures the post-caching world).
Call sites with `cache_control`
Both structures are built once outside the iteration loop so byte-identical kwargs go to the SDK on every roundtrip — cache key matches → reads hit. Verified by `test_engine_sends_same_system_and_tools_each_iteration`.
Empty-tool-list path preserved: registry fails → `tools=[]` → kwargs["tools"] omitted entirely (some SDK versions reject empty arrays).
Cost-ladder accounting patch
The infrastructure ALREADY supported cache tokens (`CanonicalUsage(cache_read_tokens, cache_write_tokens)` + `PricingEntry.cache_*_cost_per_million` — Opus 4.7 priced at $0.50 read / $6.25 write per million). The gap was the engine→handler→holder data flow. Six-step patch:
K-DG verification
Anthropic SDK 0.87.0 (one minor ahead of spec's 0.86.0) verified to expose `CacheControlEphemeralParam`, `TextBlockParam`, `ToolParam` — cache_control syntax identical between minor versions per https://platform.claude.com/docs/en/agents-and-tools/prompt-caching. The cost-ladder code path was K-DG'd BEFORE edits: `grep -rn record_inference` surfaced the staticmethod at `slack_dm_handler.py:645` + the `PricingEntry` table in `agent/usage_pricing.py` with rates for every Claude 4.X model.
Tests
16 new tests in `tests/kora_cli/reasoning/test_anthropic_engine_caching.py` covering:
1 pre-existing test updated in `test_anthropic_engine.py`: `test_system_prompt_passed_to_sdk` now asserts the content-block list shape.
Test plan
🤖 Generated with Claude Code