User Archon AI Client (Claude/llama.cpp)
──── ────── ────────────────────────────
sends message ────────▶ buildFullPrompt()
├─ loads codebases from DB
├─ discovers workflows from disk
├─ fetches recent workflow results
├─ assembles single prompt string
│ [X] system prompt varies by codebase count
│ [X] workflowContextSuffix appears/disappears
│ [X] threadContext changes every turn
│ [X] issueContext/fileSuffix conditional
│
sends prompt + resumeSessionId
──▶ KV cache: partial or no hit
──▶ full context rebuild
──▶ full inference
streams response ◀─────
sees response ◀────────
llama.cpp KV cache behavior showing full context rebuild on every turn:
2026-05-05 15:21:36.138 [Info] slot update_slots: task 14335 | new prompt, n_ctx_slot = 262144, n_keep = 8192, task.n_tokens = 52755
2026-05-05 15:21:36.138 [Info] slot update_slots: task 14335 | n_past = 17792, slot.prompt.tokens.size() = 46384
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [46333, 46333] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [45821, 45821] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [40959, 40959] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [32767, 32767] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [24575, 24575] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [16383, 16383] against 17792...
2026-05-05 15:21:36.149 [Info] slot update_slots: task 14335 | restored context checkpoint (pos_min = 16383, pos_max = 16383, n_tokens = 16384, n_past = 16384, size = 62.813 MiB)
2026-05-05 15:21:36.151 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 24575, ...)
2026-05-05 15:21:36.153 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 32767, ...)
2026-05-05 15:21:36.154 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 40959, ...)
2026-05-05 15:21:36.155 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 45821, ...)
2026-05-05 15:21:36.156 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 46333, ...)
2026-05-05 15:21:36.157 [Info] slot update_slots: task 14335 | n_tokens = 16384, memory_seq_rm [16384, end)
Breakdown:
- task.n_tokens = 52,755 → total prompt size
- n_past = 17,792 → KV cache had 17,792 tokens from previous turn
- slot.prompt.tokens.size() = 46,384 → new prompt prefix (52,755 - 6,371 user message)
- Only checkpoint at [16383, 16383] matches (16383 < 17792)
- All larger checkpoints erased → 48,000+ tokens of context rebuilt from scratch
- Cache hit ratio: ~34% (16,384 / 46,384), rest is full rebuild
Summary
promptstring and change between turns, invalidating the cache prefix on every message. This causes full context rebuild on every turn even when usingresumeSessionId.major— functional correctness is unaffected; performance degrades significantly from repeated cache misses (higher latency and token cost).Steps to Reproduce
resumeSessionId)cache_creation_input_tokensis high on every turn,cache_read_input_tokensis 0 or minimalExpected vs Actual
cache_read_input_tokens >> cache_creation_input_tokens.User Flow
Environment
Logs
Impact
handleMessage→aiClient.sendQuery) — all AI-powered conversations. CLI workflow DAG nodes are not affected (each node gets a fresh session).Scope
coreorchestrator:prompt-builder,orchestrator:orchestrator-agent