Skip to content

test(prompt-budget): lock cache-prefix byte budget#1320

Merged
esengine merged 1 commit into
mainfrom
feat/prompt-budget-regression-net
May 19, 2026
Merged

test(prompt-budget): lock cache-prefix byte budget#1320
esengine merged 1 commit into
mainfrom
feat/prompt-budget-regression-net

Conversation

@esengine

Copy link
Copy Markdown
Owner

Summary

Every byte of the system prompt and tool spec list ships in every request to DeepSeek — paid at full miss price the first time, ~10% on every subsequent cache hit. Recent PRs have grown both without anyone tracking the cumulative cost, and several users have reported higher token consumption since 0.46.0.

This PR adds tests/prompt-budget.test.ts, a regression net that locks the current values:

  • CODE_SYSTEM_PROMPT.length24,500 bytes (currently 24,387 — under cap)
  • code-mode tool spec list ≤ 40,000 bytes (currently 39,377 across 35 tools — under cap)
  • codeSystemPrompt(root) with built-in skills ≤ 26,500 bytes (currently 26,337 — under cap)
  • no single tool description > 8 KiB on its own

A failing assertion forces the PR author to either compress an equivalent section, or raise the budget explicitly in the commit message — visible in code review.

This is PR #1 of a four-PR token-optimization series:

Test plan

  • npm run verify — all 231 test files / 3,241 tests pass
  • New tests/prompt-budget.test.ts — 4 tests, all pass
  • npm run lint clean
  • npm run typecheck clean

Every byte of the system prompt and tool spec list ships in every
request — paid at full miss price the first time and ~10% on every
subsequent cache hit. Recent PRs have grown both without anyone
tracking the cumulative cost.

Lock the current values:
- CODE_SYSTEM_PROMPT.length ≤ 24_500 bytes
- code-mode tool spec list ≤ 40_000 bytes
- no single tool description > 8 KiB

A failing assertion forces the PR author to either compress an
equivalent section or raise the budget explicitly with a
justification in the commit message — visible in code review.
@esengine esengine merged commit 548568b into main May 19, 2026
4 checks passed
@esengine esengine deleted the feat/prompt-budget-regression-net branch May 19, 2026 11:05
esengine added a commit that referenced this pull request May 19, 2026
…st) (#1323)

The system prompt was 24,387 bytes (≈ 6,100 tokens) — much of it
overlapping with the tool descriptions sitting right next to it in
the cache prefix. Sections like "When to propose a plan", "When to
ask the user to pick", and "When to track multi-step intent" each
recited rules that the tool's own description already carried.

Aggressive dedup pass:
- Drop the redundant "you have these filesystem tools" opening
  sentence — the API ships the tool list separately.
- Merge the three independent submit_plan / ask_choice / todo_write
  sections into one short "Picking the right tool" block.
- Fold "Exploration", "Trust what you already know", and "When the
  user wants to switch project" into shorter equivalents — same
  rules, no narrative.
- Collapse the foreground/background section. The full how-to lives
  in the run_command / run_background tool descriptions; the prompt
  only needs the picking rule.
- Compress the audit-mode rails (#610) prose around the six rails
  themselves. Every rail's load-bearing phrase is preserved verbatim
  so tests/code-prompt.test.ts still asserts on them.

Result: 24,387 → 11,956 bytes (-51%, ≈ 3,100 tokens per request).
Combined with PR #1320 / #1321 the cache-prefix tax per request is
now ~16k tokens instead of ~36k.

Behaviour unchanged — every rail / gate / mode constraint is still
asserted by the existing prompt tests.

Co-authored-by: reasonix <reasonix@deepseek.com>
ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
Every byte of the system prompt and tool spec list ships in every
request — paid at full miss price the first time and ~10% on every
subsequent cache hit. Recent PRs have grown both without anyone
tracking the cumulative cost.

Lock the current values:
- CODE_SYSTEM_PROMPT.length ≤ 24_500 bytes
- code-mode tool spec list ≤ 40_000 bytes
- no single tool description > 8 KiB

A failing assertion forces the PR author to either compress an
equivalent section or raise the budget explicitly with a
justification in the commit message — visible in code review.

Co-authored-by: reasonix <reasonix@deepseek.com>
ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
…st) (esengine#1323)

The system prompt was 24,387 bytes (≈ 6,100 tokens) — much of it
overlapping with the tool descriptions sitting right next to it in
the cache prefix. Sections like "When to propose a plan", "When to
ask the user to pick", and "When to track multi-step intent" each
recited rules that the tool's own description already carried.

Aggressive dedup pass:
- Drop the redundant "you have these filesystem tools" opening
  sentence — the API ships the tool list separately.
- Merge the three independent submit_plan / ask_choice / todo_write
  sections into one short "Picking the right tool" block.
- Fold "Exploration", "Trust what you already know", and "When the
  user wants to switch project" into shorter equivalents — same
  rules, no narrative.
- Collapse the foreground/background section. The full how-to lives
  in the run_command / run_background tool descriptions; the prompt
  only needs the picking rule.
- Compress the audit-mode rails (esengine#610) prose around the six rails
  themselves. Every rail's load-bearing phrase is preserved verbatim
  so tests/code-prompt.test.ts still asserts on them.

Result: 24,387 → 11,956 bytes (-51%, ≈ 3,100 tokens per request).
Combined with PR esengine#1320 / esengine#1321 the cache-prefix tax per request is
now ~16k tokens instead of ~36k.

Behaviour unchanged — every rail / gate / mode constraint is still
asserted by the existing prompt tests.

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant