feat(agent): context-aware tool result budgeting#6339
Open
jbarket wants to merge 6 commits into
Open
Conversation
9ccc11c to
756291b
Compare
Budget = max(floor, min(baseline, available_context)) - baseline: 25% of context window (absolute ceiling per result) - available: remaining context * 4 chars/token (dynamic) - floor: 2000 tokens minimum (never return useless slivers) Oversized results spill to disk with pagination metadata. Made-with: Cursor
- Init ToolBudget after compressor (uses real context_length) - _apply_tool_budget() intercepts results in both dispatch paths - Compaction-before-spill when context is tight - Dynamic turn budget passed to enforce_turn_budget() Made-with: Cursor
The budget layer now provides context-aware protection, making the infinity exemption unnecessary. read_file falls back to the default 100K char inner limit with the budget layer as the outer guard. Made-with: Cursor
- cli-config.yaml.example: tool_budgets block with result_pct, turn_pct, floor_tokens, compact_before_spill - Developer guide explaining budget calculation, scaling, and interaction with existing systems Made-with: Cursor
The test previously asserted read_file had float('inf') threshold.
Updated to verify it now uses DEFAULT_RESULT_SIZE_CHARS since the
budget layer provides context-aware protection.
Made-with: Cursor
756291b to
048253d
Compare
Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds context-aware budgeting for tool results. When a tool's output would exceed the model's available context, the result is spilled to disk and the model gets a bounded preview with pagination instructions (use
read_filewithoffset/limitfor more).Budget =
max(floor, min(baseline, available_context))On a 32K model, this prevents the exact scenario where
ps auxor a large file read returns 36K tokens into a 32K window — producing an HTTP 400 "request exceeds context size" error. On 128K+ models, the budget is generous enough that it effectively never triggers.When context is tight, the agent compacts conversation history before accepting a small budget, so the model gets a useful chunk rather than drip-feeding 5%-of-context slivers.
Scaling
Related Issues
read_file)Type of Change
Changes Made
New files
agent/tool_budget.py—ToolBudgetclass: budget calculation, spill-to-disk, preview generation with pagination metadatatests/agent/test_tool_budget.py— 25 unit tests for budget calculation, compaction triggers, apply/spill logictests/test_tool_budget_integration.py— 15 integration tests for agent wiring and end-to-end behaviortests/tools/test_read_file_budget.py— 3 tests verifyingread_fileexemption removedwebsite/docs/developer-guide/tool-budgets.md— Developer documentationModified files
run_agent.py— InitToolBudgetafter compressor,_apply_tool_budget()wrapper with compaction-before-spill, intercept in both concurrent and sequential dispatch paths, pass dynamic turn budget toenforce_turn_budget()tools/budget_config.py— Removeread_file: float("inf")fromPINNED_THRESHOLDS(budget layer makes it unnecessary)tools/file_tools.py— Removemax_result_size_chars=float('inf')fromread_fileregistrationtools/tool_result_storage.py— Updated existing test from infinity assertion to default threshold assertioncli-config.yaml.example— Addedtool_budgetsconfig blockHow to Test
Unit tests
pytest tests/agent/test_tool_budget.py tests/test_tool_budget_integration.py \ tests/tools/test_read_file_budget.py tests/tools/test_tool_result_storage.py -v43 new tests + 41 existing storage tests all pass. Zero regressions across 1023 locally-runnable tests.
Manual (32K model)
ps aux,find /,cata big file)read_fileinstructions, no HTTP 400 errorsread_file offset=NManual (128K+ model)
Manual (eviction + compaction interaction)
Checklist
Code
feat(agent):,fix(tools):,test(tools):,docs:)pytest tests/ -qand all tests passDocumentation & Housekeeping
website/docs/developer-guide/tool-budgets.mdcli-config.yaml.example— addedtool_budgetsblockDesign Notes
Why centralized (not per-tool)
Hermes has a plugin-style tool ecosystem — anyone can add tools. A centralized budget layer protects ALL tools automatically, including third-party ones that don't know about context limits. Tool authors never need to think about budgets.
Why
read_filefor paginationThe model already knows
read_filewithoffset/limit. No new tools to register or maintain. Improvements toread_filebenefit both file reading and result pagination. One tool, one improvement path.Prompt caching compatibility
The budget layer runs at insertion time — before a result enters the message array. Once a message is in the array, it never changes. This preserves Anthropic/OpenAI prefix cache hits across turns (addressing the concern raised in #415).
Self-regulating behavior
The feature is designed to be invisible on large-context models:
baseline = context_length × 0.25— on a 200K model that's 50K tokens, far above typical tool outputMade with Cursor