Skip to content

refactor(tools): compress tool spec descriptions (-28%, ~2.7k tokens/request)#1321

Merged
esengine merged 1 commit into
mainfrom
feat/compress-tool-descriptions
May 19, 2026
Merged

refactor(tools): compress tool spec descriptions (-28%, ~2.7k tokens/request)#1321
esengine merged 1 commit into
mainfrom
feat/compress-tool-descriptions

Conversation

@esengine

Copy link
Copy Markdown
Owner

Summary

Every byte of a tool's description and JSON-schema parameter blob ships in every request — so the size of tools.specs() is a per-request cache-prefix tax. The list had grown to 39,377 bytes (~10k tokens) across 35 tools, with much of the bulk being:

  • teaching guidance that the system prompt already covers (when to use submit_plan vs ask_choice, when to spawn a subagent);
  • verbose JSON-schema property descriptions restating what the parameter name already conveys.

This PR tightens the heaviest twelve descriptions:

Tool Before After Δ
submit_plan 3517 1648 -53%
revise_plan 2554 1482 -42%
create_skill 2379 1367 -43%
install_skill 2293 1510 -34%
search_content 2167 1263 -42%
mark_step_complete 1988 1037 -48%
add_mcp_server 1898 1213 -36%
run_command 1862 1376 -26%
todo_write 1675 872 -48%
ask_choice 1807 1313 -27%
remember 1812 1306 -28%
run_background 1611 1186 -26%

Total: 39,377 → 28,412 bytes (-28%, ≈ 2,740 tokens per request).

What is preserved

The behaviourally load-bearing rules stay verbatim:

  • submit_plan: don't-use-for-A/B/C-menus / use ask_choice for branches.
  • run_command: the chain operator + redirect support list, the cd doesn't persist warning, the filter-at-source hint.
  • remember: the "won't re-load until next /new" warning.
  • ChoiceRequestedError's stop-calling-tools semantics (this lives in the error object, not the description).

The longer-form teaching content already lives in the system prompt (# When to propose a plan, # When to ask the user to pick, # When to track multi-step intent, # Foreground vs. background commands), so trimming the duplication in tool descriptions is pure dedup.

This is PR #2 of a four-PR token-optimization series. PR #1 (#1320) added a regression net at tests/prompt-budget.test.ts that locks current values; the new floor here will be tightened in a follow-up commit once both land.

Diagnostic script

scripts/measure-tool-sizes.mts is added so future audits can run npx tsx scripts/measure-tool-sizes.mts and get a per-tool byte breakdown without re-deriving the table by hand.

Test plan

  • npm run verify — all 230 test files / 3,237 tests pass
  • npm run lint clean
  • npm run typecheck clean
  • tests/tools.test.ts, tests/plan.test.ts, tests/skills.test.ts, tests/choice.test.ts all green

Every byte of a tool's description and JSON-schema parameter blob ships
in every request. The spec list had grown to 39,377 bytes (≈ 10k tokens)
across 35 tools — much of it teaching guidance that was already covered
in the system prompt, plus verbose schema property descriptions that
restated what the parameter name already conveys.

Compress the heaviest twelve tools while preserving the actually
load-bearing rules (don't-use-for-A/B/C-menus on submit_plan, plan-mode
gate behavior on run_command, prefix-stable warning on remember):

  submit_plan          3517 → 1648  (-53%)
  revise_plan          2554 → 1482  (-42%)
  create_skill         2379 → 1367  (-43%)
  install_skill        2293 → 1510  (-34%)
  search_content       2167 → 1263  (-42%)
  mark_step_complete   1988 → 1037  (-48%)
  add_mcp_server       1898 → 1213  (-36%)
  run_command          1862 → 1376  (-26%)
  todo_write           1675 →  872  (-48%)
  ask_choice           1807 → 1313  (-27%)
  remember             1812 → 1306  (-28%)
  run_background       1611 → 1186  (-26%)

Total: 39,377 → 28,412 bytes (-28%, ≈ 2,740 tokens per request).

Behaviour unchanged — guidance that lived only in tool descriptions
(plan-mode interaction, ChoiceRequestedError stop semantics, prefix
re-load timing) is preserved verbatim. The system prompt and tool
error messages already carry the longer-form teaching content.

scripts/measure-tool-sizes.mts added so future audits don't have to
re-derive per-tool byte counts by hand.
@esengine esengine merged commit 6e5fa83 into main May 19, 2026
4 checks passed
@esengine esengine deleted the feat/compress-tool-descriptions branch May 19, 2026 11:09
esengine added a commit that referenced this pull request May 19, 2026
…st) (#1323)

The system prompt was 24,387 bytes (≈ 6,100 tokens) — much of it
overlapping with the tool descriptions sitting right next to it in
the cache prefix. Sections like "When to propose a plan", "When to
ask the user to pick", and "When to track multi-step intent" each
recited rules that the tool's own description already carried.

Aggressive dedup pass:
- Drop the redundant "you have these filesystem tools" opening
  sentence — the API ships the tool list separately.
- Merge the three independent submit_plan / ask_choice / todo_write
  sections into one short "Picking the right tool" block.
- Fold "Exploration", "Trust what you already know", and "When the
  user wants to switch project" into shorter equivalents — same
  rules, no narrative.
- Collapse the foreground/background section. The full how-to lives
  in the run_command / run_background tool descriptions; the prompt
  only needs the picking rule.
- Compress the audit-mode rails (#610) prose around the six rails
  themselves. Every rail's load-bearing phrase is preserved verbatim
  so tests/code-prompt.test.ts still asserts on them.

Result: 24,387 → 11,956 bytes (-51%, ≈ 3,100 tokens per request).
Combined with PR #1320 / #1321 the cache-prefix tax per request is
now ~16k tokens instead of ~36k.

Behaviour unchanged — every rail / gate / mode constraint is still
asserted by the existing prompt tests.

Co-authored-by: reasonix <reasonix@deepseek.com>
ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
…request) (esengine#1321)

Every byte of a tool's description and JSON-schema parameter blob ships
in every request. The spec list had grown to 39,377 bytes (≈ 10k tokens)
across 35 tools — much of it teaching guidance that was already covered
in the system prompt, plus verbose schema property descriptions that
restated what the parameter name already conveys.

Compress the heaviest twelve tools while preserving the actually
load-bearing rules (don't-use-for-A/B/C-menus on submit_plan, plan-mode
gate behavior on run_command, prefix-stable warning on remember):

  submit_plan          3517 → 1648  (-53%)
  revise_plan          2554 → 1482  (-42%)
  create_skill         2379 → 1367  (-43%)
  install_skill        2293 → 1510  (-34%)
  search_content       2167 → 1263  (-42%)
  mark_step_complete   1988 → 1037  (-48%)
  add_mcp_server       1898 → 1213  (-36%)
  run_command          1862 → 1376  (-26%)
  todo_write           1675 →  872  (-48%)
  ask_choice           1807 → 1313  (-27%)
  remember             1812 → 1306  (-28%)
  run_background       1611 → 1186  (-26%)

Total: 39,377 → 28,412 bytes (-28%, ≈ 2,740 tokens per request).

Behaviour unchanged — guidance that lived only in tool descriptions
(plan-mode interaction, ChoiceRequestedError stop semantics, prefix
re-load timing) is preserved verbatim. The system prompt and tool
error messages already carry the longer-form teaching content.

scripts/measure-tool-sizes.mts added so future audits don't have to
re-derive per-tool byte counts by hand.

Co-authored-by: reasonix <reasonix@deepseek.com>
ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
…st) (esengine#1323)

The system prompt was 24,387 bytes (≈ 6,100 tokens) — much of it
overlapping with the tool descriptions sitting right next to it in
the cache prefix. Sections like "When to propose a plan", "When to
ask the user to pick", and "When to track multi-step intent" each
recited rules that the tool's own description already carried.

Aggressive dedup pass:
- Drop the redundant "you have these filesystem tools" opening
  sentence — the API ships the tool list separately.
- Merge the three independent submit_plan / ask_choice / todo_write
  sections into one short "Picking the right tool" block.
- Fold "Exploration", "Trust what you already know", and "When the
  user wants to switch project" into shorter equivalents — same
  rules, no narrative.
- Collapse the foreground/background section. The full how-to lives
  in the run_command / run_background tool descriptions; the prompt
  only needs the picking rule.
- Compress the audit-mode rails (esengine#610) prose around the six rails
  themselves. Every rail's load-bearing phrase is preserved verbatim
  so tests/code-prompt.test.ts still asserts on them.

Result: 24,387 → 11,956 bytes (-51%, ≈ 3,100 tokens per request).
Combined with PR esengine#1320 / esengine#1321 the cache-prefix tax per request is
now ~16k tokens instead of ~36k.

Behaviour unchanged — every rail / gate / mode constraint is still
asserted by the existing prompt tests.

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant