test(prompt-budget): lock cache-prefix byte budget by esengine · Pull Request #1320 · esengine/DeepSeek-Reasonix

esengine · 2026-05-19T10:21:24Z

Summary

Every byte of the system prompt and tool spec list ships in every request to DeepSeek — paid at full miss price the first time, ~10% on every subsequent cache hit. Recent PRs have grown both without anyone tracking the cumulative cost, and several users have reported higher token consumption since 0.46.0.

This PR adds tests/prompt-budget.test.ts, a regression net that locks the current values:

CODE_SYSTEM_PROMPT.length ≤ 24,500 bytes (currently 24,387 — under cap)
code-mode tool spec list ≤ 40,000 bytes (currently 39,377 across 35 tools — under cap)
codeSystemPrompt(root) with built-in skills ≤ 26,500 bytes (currently 26,337 — under cap)
no single tool description > 8 KiB on its own

A failing assertion forces the PR author to either compress an equivalent section, or raise the budget explicitly in the commit message — visible in code review.

This is PR #1 of a four-PR token-optimization series:

feat: Allow user to press tab and add aditional context when denying. #1 (this) — lock the budget so future regressions are caught
feat: event-log kernel — durable Event sidecar + replay-capable consumer #2 — compress the largest tool descriptions (search_content / install_skill / create_skill / remember / read_file)
feat(cli): reasonix events <name> — first user-facing consumer of the event log #3 — compress the system prompt (cut redundant tool inventory, merge audit rails)
👋 Welcome to Reasonix · how to get involved #4 — lower read_file outline threshold from 512 KiB → 64 KiB (recover the 0.46.0 regression)

Test plan

npm run verify — all 231 test files / 3,241 tests pass
New tests/prompt-budget.test.ts — 4 tests, all pass
npm run lint clean
npm run typecheck clean

Every byte of the system prompt and tool spec list ships in every request — paid at full miss price the first time and ~10% on every subsequent cache hit. Recent PRs have grown both without anyone tracking the cumulative cost. Lock the current values: - CODE_SYSTEM_PROMPT.length ≤ 24_500 bytes - code-mode tool spec list ≤ 40_000 bytes - no single tool description > 8 KiB A failing assertion forces the PR author to either compress an equivalent section or raise the budget explicitly with a justification in the commit message — visible in code review.

…st) (#1323) The system prompt was 24,387 bytes (≈ 6,100 tokens) — much of it overlapping with the tool descriptions sitting right next to it in the cache prefix. Sections like "When to propose a plan", "When to ask the user to pick", and "When to track multi-step intent" each recited rules that the tool's own description already carried. Aggressive dedup pass: - Drop the redundant "you have these filesystem tools" opening sentence — the API ships the tool list separately. - Merge the three independent submit_plan / ask_choice / todo_write sections into one short "Picking the right tool" block. - Fold "Exploration", "Trust what you already know", and "When the user wants to switch project" into shorter equivalents — same rules, no narrative. - Collapse the foreground/background section. The full how-to lives in the run_command / run_background tool descriptions; the prompt only needs the picking rule. - Compress the audit-mode rails (#610) prose around the six rails themselves. Every rail's load-bearing phrase is preserved verbatim so tests/code-prompt.test.ts still asserts on them. Result: 24,387 → 11,956 bytes (-51%, ≈ 3,100 tokens per request). Combined with PR #1320 / #1321 the cache-prefix tax per request is now ~16k tokens instead of ~36k. Behaviour unchanged — every rail / gate / mode constraint is still asserted by the existing prompt tests. Co-authored-by: reasonix <reasonix@deepseek.com>

Every byte of the system prompt and tool spec list ships in every request — paid at full miss price the first time and ~10% on every subsequent cache hit. Recent PRs have grown both without anyone tracking the cumulative cost. Lock the current values: - CODE_SYSTEM_PROMPT.length ≤ 24_500 bytes - code-mode tool spec list ≤ 40_000 bytes - no single tool description > 8 KiB A failing assertion forces the PR author to either compress an equivalent section or raise the budget explicitly with a justification in the commit message — visible in code review. Co-authored-by: reasonix <reasonix@deepseek.com>

…st) (esengine#1323) The system prompt was 24,387 bytes (≈ 6,100 tokens) — much of it overlapping with the tool descriptions sitting right next to it in the cache prefix. Sections like "When to propose a plan", "When to ask the user to pick", and "When to track multi-step intent" each recited rules that the tool's own description already carried. Aggressive dedup pass: - Drop the redundant "you have these filesystem tools" opening sentence — the API ships the tool list separately. - Merge the three independent submit_plan / ask_choice / todo_write sections into one short "Picking the right tool" block. - Fold "Exploration", "Trust what you already know", and "When the user wants to switch project" into shorter equivalents — same rules, no narrative. - Collapse the foreground/background section. The full how-to lives in the run_command / run_background tool descriptions; the prompt only needs the picking rule. - Compress the audit-mode rails (esengine#610) prose around the six rails themselves. Every rail's load-bearing phrase is preserved verbatim so tests/code-prompt.test.ts still asserts on them. Result: 24,387 → 11,956 bytes (-51%, ≈ 3,100 tokens per request). Combined with PR esengine#1320 / esengine#1321 the cache-prefix tax per request is now ~16k tokens instead of ~36k. Behaviour unchanged — every rail / gate / mode constraint is still asserted by the existing prompt tests. Co-authored-by: reasonix <reasonix@deepseek.com>

esengine merged commit 548568b into main May 19, 2026
4 checks passed

esengine deleted the feat/prompt-budget-regression-net branch May 19, 2026 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(prompt-budget): lock cache-prefix byte budget#1320

test(prompt-budget): lock cache-prefix byte budget#1320
esengine merged 1 commit into
mainfrom
feat/prompt-budget-regression-net

esengine commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented May 19, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant