refactor(tools): compress tool spec descriptions (-28%, ~2.7k tokens/request) by esengine · Pull Request #1321 · esengine/DeepSeek-Reasonix

esengine · 2026-05-19T10:31:51Z

Summary

Every byte of a tool's description and JSON-schema parameter blob ships in every request — so the size of tools.specs() is a per-request cache-prefix tax. The list had grown to 39,377 bytes (~10k tokens) across 35 tools, with much of the bulk being:

teaching guidance that the system prompt already covers (when to use submit_plan vs ask_choice, when to spawn a subagent);
verbose JSON-schema property descriptions restating what the parameter name already conveys.

This PR tightens the heaviest twelve descriptions:

Tool	Before	After	Δ
submit_plan	3517	1648	-53%
revise_plan	2554	1482	-42%
create_skill	2379	1367	-43%
install_skill	2293	1510	-34%
search_content	2167	1263	-42%
mark_step_complete	1988	1037	-48%
add_mcp_server	1898	1213	-36%
run_command	1862	1376	-26%
todo_write	1675	872	-48%
ask_choice	1807	1313	-27%
remember	1812	1306	-28%
run_background	1611	1186	-26%

Total: 39,377 → 28,412 bytes (-28%, ≈ 2,740 tokens per request).

What is preserved

The behaviourally load-bearing rules stay verbatim:

submit_plan: don't-use-for-A/B/C-menus / use ask_choice for branches.
run_command: the chain operator + redirect support list, the cd doesn't persist warning, the filter-at-source hint.
remember: the "won't re-load until next /new" warning.
ChoiceRequestedError's stop-calling-tools semantics (this lives in the error object, not the description).

The longer-form teaching content already lives in the system prompt (# When to propose a plan, # When to ask the user to pick, # When to track multi-step intent, # Foreground vs. background commands), so trimming the duplication in tool descriptions is pure dedup.

This is PR #2 of a four-PR token-optimization series. PR #1 (#1320) added a regression net at tests/prompt-budget.test.ts that locks current values; the new floor here will be tightened in a follow-up commit once both land.

Diagnostic script

scripts/measure-tool-sizes.mts is added so future audits can run npx tsx scripts/measure-tool-sizes.mts and get a per-tool byte breakdown without re-deriving the table by hand.

Test plan

npm run verify — all 230 test files / 3,237 tests pass
npm run lint clean
npm run typecheck clean
tests/tools.test.ts, tests/plan.test.ts, tests/skills.test.ts, tests/choice.test.ts all green

Every byte of a tool's description and JSON-schema parameter blob ships in every request. The spec list had grown to 39,377 bytes (≈ 10k tokens) across 35 tools — much of it teaching guidance that was already covered in the system prompt, plus verbose schema property descriptions that restated what the parameter name already conveys. Compress the heaviest twelve tools while preserving the actually load-bearing rules (don't-use-for-A/B/C-menus on submit_plan, plan-mode gate behavior on run_command, prefix-stable warning on remember): submit_plan 3517 → 1648 (-53%) revise_plan 2554 → 1482 (-42%) create_skill 2379 → 1367 (-43%) install_skill 2293 → 1510 (-34%) search_content 2167 → 1263 (-42%) mark_step_complete 1988 → 1037 (-48%) add_mcp_server 1898 → 1213 (-36%) run_command 1862 → 1376 (-26%) todo_write 1675 → 872 (-48%) ask_choice 1807 → 1313 (-27%) remember 1812 → 1306 (-28%) run_background 1611 → 1186 (-26%) Total: 39,377 → 28,412 bytes (-28%, ≈ 2,740 tokens per request). Behaviour unchanged — guidance that lived only in tool descriptions (plan-mode interaction, ChoiceRequestedError stop semantics, prefix re-load timing) is preserved verbatim. The system prompt and tool error messages already carry the longer-form teaching content. scripts/measure-tool-sizes.mts added so future audits don't have to re-derive per-tool byte counts by hand.

…st) (#1323) The system prompt was 24,387 bytes (≈ 6,100 tokens) — much of it overlapping with the tool descriptions sitting right next to it in the cache prefix. Sections like "When to propose a plan", "When to ask the user to pick", and "When to track multi-step intent" each recited rules that the tool's own description already carried. Aggressive dedup pass: - Drop the redundant "you have these filesystem tools" opening sentence — the API ships the tool list separately. - Merge the three independent submit_plan / ask_choice / todo_write sections into one short "Picking the right tool" block. - Fold "Exploration", "Trust what you already know", and "When the user wants to switch project" into shorter equivalents — same rules, no narrative. - Collapse the foreground/background section. The full how-to lives in the run_command / run_background tool descriptions; the prompt only needs the picking rule. - Compress the audit-mode rails (#610) prose around the six rails themselves. Every rail's load-bearing phrase is preserved verbatim so tests/code-prompt.test.ts still asserts on them. Result: 24,387 → 11,956 bytes (-51%, ≈ 3,100 tokens per request). Combined with PR #1320 / #1321 the cache-prefix tax per request is now ~16k tokens instead of ~36k. Behaviour unchanged — every rail / gate / mode constraint is still asserted by the existing prompt tests. Co-authored-by: reasonix <reasonix@deepseek.com>

…request) (esengine#1321) Every byte of a tool's description and JSON-schema parameter blob ships in every request. The spec list had grown to 39,377 bytes (≈ 10k tokens) across 35 tools — much of it teaching guidance that was already covered in the system prompt, plus verbose schema property descriptions that restated what the parameter name already conveys. Compress the heaviest twelve tools while preserving the actually load-bearing rules (don't-use-for-A/B/C-menus on submit_plan, plan-mode gate behavior on run_command, prefix-stable warning on remember): submit_plan 3517 → 1648 (-53%) revise_plan 2554 → 1482 (-42%) create_skill 2379 → 1367 (-43%) install_skill 2293 → 1510 (-34%) search_content 2167 → 1263 (-42%) mark_step_complete 1988 → 1037 (-48%) add_mcp_server 1898 → 1213 (-36%) run_command 1862 → 1376 (-26%) todo_write 1675 → 872 (-48%) ask_choice 1807 → 1313 (-27%) remember 1812 → 1306 (-28%) run_background 1611 → 1186 (-26%) Total: 39,377 → 28,412 bytes (-28%, ≈ 2,740 tokens per request). Behaviour unchanged — guidance that lived only in tool descriptions (plan-mode interaction, ChoiceRequestedError stop semantics, prefix re-load timing) is preserved verbatim. The system prompt and tool error messages already carry the longer-form teaching content. scripts/measure-tool-sizes.mts added so future audits don't have to re-derive per-tool byte counts by hand. Co-authored-by: reasonix <reasonix@deepseek.com>

…st) (esengine#1323) The system prompt was 24,387 bytes (≈ 6,100 tokens) — much of it overlapping with the tool descriptions sitting right next to it in the cache prefix. Sections like "When to propose a plan", "When to ask the user to pick", and "When to track multi-step intent" each recited rules that the tool's own description already carried. Aggressive dedup pass: - Drop the redundant "you have these filesystem tools" opening sentence — the API ships the tool list separately. - Merge the three independent submit_plan / ask_choice / todo_write sections into one short "Picking the right tool" block. - Fold "Exploration", "Trust what you already know", and "When the user wants to switch project" into shorter equivalents — same rules, no narrative. - Collapse the foreground/background section. The full how-to lives in the run_command / run_background tool descriptions; the prompt only needs the picking rule. - Compress the audit-mode rails (esengine#610) prose around the six rails themselves. Every rail's load-bearing phrase is preserved verbatim so tests/code-prompt.test.ts still asserts on them. Result: 24,387 → 11,956 bytes (-51%, ≈ 3,100 tokens per request). Combined with PR esengine#1320 / esengine#1321 the cache-prefix tax per request is now ~16k tokens instead of ~36k. Behaviour unchanged — every rail / gate / mode constraint is still asserted by the existing prompt tests. Co-authored-by: reasonix <reasonix@deepseek.com>

esengine merged commit 6e5fa83 into main May 19, 2026
4 checks passed

esengine deleted the feat/compress-tool-descriptions branch May 19, 2026 11:09

esengine mentioned this pull request May 19, 2026

fix(skill): surface /skill in welcome hints + drop stale built-ins line (#1213) #1333

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(tools): compress tool spec descriptions (-28%, ~2.7k tokens/request)#1321

refactor(tools): compress tool spec descriptions (-28%, ~2.7k tokens/request)#1321
esengine merged 1 commit into
mainfrom
feat/compress-tool-descriptions

esengine commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esengine commented May 19, 2026

Summary

What is preserved

Diagnostic script

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant