Skip to content

Prompt caching is broken for orchestrator calls resulting in repeatedly high TTFT #1591

@aarononeal

Description

@aarononeal

Summary

  • What broke: Prompt caching is broken for orchestrator agent calls — the system prompt and workflow context sections are embedded in the prompt string and change between turns, invalidating the cache prefix on every message. This causes full context rebuild on every turn even when using resumeSessionId.
  • When it started (if known): Unknown — likely always been this way.
  • Severity: major — functional correctness is unaffected; performance degrades significantly from repeated cache misses (higher latency and token cost).

Steps to Reproduce

  1. Start Archon server with at least one registered codebase and one workflow
  2. Send a message to a conversation (e.g., "hello")
  3. Check Anthropic/llama.cpp usage stats — observe full context rebuild on every turn
  4. Send another message in the same conversation (session is resumed via resumeSessionId)
  5. Check usage stats again — observe cache_creation_input_tokens is high on every turn, cache_read_input_tokens is 0 or minimal

Expected vs Actual

  • Expected: The stable prefix of the prompt (system preset + routing rules + project list) is cached on the first turn. Subsequent turns within the same session reuse that cache — only the new user message portion is processed, with cache_read_input_tokens >> cache_creation_input_tokens.
  • Actual: Every turn produces a fresh prompt string with different content (conditional workflow context, dynamic project list, variable message content before the cache breakpoint), causing full context rebuild on every turn.

User Flow

User                   Archon                   AI Client (Claude/llama.cpp)
────                   ──────                   ────────────────────────────
sends message ────────▶ buildFullPrompt()
                        ├─ loads codebases from DB
                        ├─ discovers workflows from disk
                        ├─ fetches recent workflow results
                        ├─ assembles single prompt string
                        │   [X] system prompt varies by codebase count
                        │   [X] workflowContextSuffix appears/disappears
                        │   [X] threadContext changes every turn
                        │   [X] issueContext/fileSuffix conditional
                        │
                        sends prompt + resumeSessionId
                                              ──▶ KV cache: partial or no hit
                                              ──▶ full context rebuild
                                              ──▶ full inference
                         streams response ◀─────
sees response ◀────────

Environment

  • Platform: All (Slack / Telegram / GitHub / Discord / Web / CLI) — affects orchestrator path only
  • Database: SQLite / PostgreSQL (both affected)
  • Running in worktree? Yes / No (both affected)
  • OS: All

Logs

llama.cpp KV cache behavior showing full context rebuild on every turn:

2026-05-05 15:21:36.138 [Info] slot update_slots: task 14335 | new prompt, n_ctx_slot = 262144, n_keep = 8192, task.n_tokens = 52755
2026-05-05 15:21:36.138 [Info] slot update_slots: task 14335 | n_past = 17792, slot.prompt.tokens.size() = 46384
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [46333, 46333] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [45821, 45821] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [40959, 40959] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [32767, 32767] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [24575, 24575] against 17792...
2026-05-05 15:21:36.139 [Info] slot update_slots: task 14335 | Checking checkpoint with [16383, 16383] against 17792...
2026-05-05 15:21:36.149 [Info] slot update_slots: task 14335 | restored context checkpoint (pos_min = 16383, pos_max = 16383, n_tokens = 16384, n_past = 16384, size = 62.813 MiB)
2026-05-05 15:21:36.151 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 24575, ...)
2026-05-05 15:21:36.153 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 32767, ...)
2026-05-05 15:21:36.154 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 40959, ...)
2026-05-05 15:21:36.155 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 45821, ...)
2026-05-05 15:21:36.156 [Info] slot update_slots: task 14335 | erased invalidated context checkpoint (pos_min = 46333, ...)
2026-05-05 15:21:36.157 [Info] slot update_slots: task 14335 | n_tokens = 16384, memory_seq_rm [16384, end)

Breakdown:
  - task.n_tokens = 52,755  → total prompt size
  - n_past = 17,792         → KV cache had 17,792 tokens from previous turn
  - slot.prompt.tokens.size() = 46,384  → new prompt prefix (52,755 - 6,371 user message)
  - Only checkpoint at [16383, 16383] matches (16383 < 17792)
  - All larger checkpoints erased → 48,000+ tokens of context rebuilt from scratch
  - Cache hit ratio: ~34% (16,384 / 46,384), rest is full rebuild

Impact

  • Affected workflows/commands: Orchestrator agent path (handleMessageaiClient.sendQuery) — all AI-powered conversations. CLI workflow DAG nodes are not affected (each node gets a fresh session).
  • Reproduction rate: Always
  • Workaround available? No — users pay full token cost on every turn with no cache reuse.
  • Data loss risk? No

Scope

  • Package(s) likely involved: core
  • Module: orchestrator:prompt-builder, orchestrator:orchestrator-agent

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High priority - Address soon, next in queuearea: orchestratorMain conversation orchestrationbugSomething is brokeneffort/mediumFew files, one domain or module, some coordination neededperformancePerformance improvements

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions