feat(agent): context-aware tool result budgeting by jbarket · Pull Request #6339 · NousResearch/hermes-agent

jbarket · 2026-04-09T00:09:05Z

What does this PR do?

Adds context-aware budgeting for tool results. When a tool's output would exceed the model's available context, the result is spilled to disk and the model gets a bounded preview with pagination instructions (use read_file with offset/limit for more).

Budget = max(floor, min(baseline, available_context))

On a 32K model, this prevents the exact scenario where ps aux or a large file read returns 36K tokens into a 32K window — producing an HTTP 400 "request exceeds context size" error. On 128K+ models, the budget is generous enough that it effectively never triggers.

When context is tight, the agent compacts conversation history before accepting a small budget, so the model gets a useful chunk rather than drip-feeding 5%-of-context slivers.

Scaling

Model	Context	Per-result baseline	Behavior
Gemma 4 31B	32K	~8K tokens (~32K chars)	Active — large results paginated
GPT-5.4	128K	~32K tokens (~128K chars)	Rarely triggers
Claude 4 Opus	200K	~50K tokens (~200K chars)	Almost never triggers
Gemini 2.5	1M	~250K tokens (~1M chars)	Invisible

Related Issues

Feature: Insertion-Time Tool Result Trimming — Cache-Friendly Context Management #415 — Insertion-Time Tool Result Trimming (this implements insertion-time budgeting, cache-friendly)
Feature: Two-Phase Context Management — Prune Tool Outputs Before Full Compaction (inspired by Kilocode) #513 — Two-Phase Context Management (budget layer acts as the insertion-time defense)
Feature: Granular Improvements from Roo Code Deep-Dive — Tool Output, Patch Refinements, Anti-Hallucination, Prompt Methodology #507 — Roo Code Deep-Dive: Tool Output (spill-to-disk + pagination via read_file)
Context overrun #348 — Context overrun (the exact bug: local model exceeded context from tool output)
Token overhead analysis: 73% of each API call is fixed overhead (~13.9K tokens) — data + suggestions #4379 — Token overhead analysis (quantified the problem this addresses)

Type of Change

✨ New feature (non-breaking change that adds functionality)

Changes Made

New files

agent/tool_budget.py — ToolBudget class: budget calculation, spill-to-disk, preview generation with pagination metadata
tests/agent/test_tool_budget.py — 25 unit tests for budget calculation, compaction triggers, apply/spill logic
tests/test_tool_budget_integration.py — 15 integration tests for agent wiring and end-to-end behavior
tests/tools/test_read_file_budget.py — 3 tests verifying read_file exemption removed
website/docs/developer-guide/tool-budgets.md — Developer documentation

Modified files

run_agent.py — Init ToolBudget after compressor, _apply_tool_budget() wrapper with compaction-before-spill, intercept in both concurrent and sequential dispatch paths, pass dynamic turn budget to enforce_turn_budget()
tools/budget_config.py — Remove read_file: float("inf") from PINNED_THRESHOLDS (budget layer makes it unnecessary)
tools/file_tools.py — Remove max_result_size_chars=float('inf') from read_file registration
tools/tool_result_storage.py — Updated existing test from infinity assertion to default threshold assertion
cli-config.yaml.example — Added tool_budgets config block

How to Test

Unit tests

pytest tests/agent/test_tool_budget.py tests/test_tool_budget_integration.py \
       tests/tools/test_read_file_budget.py tests/tools/test_tool_result_storage.py -v

43 new tests + 41 existing storage tests all pass. Zero regressions across 1023 locally-runnable tests.

Manual (32K model)

Configure a model with ≤32K context (e.g., local Gemma 4 via llama.cpp)
Run commands that produce large output (ps aux, find /, cat a big file)
Verify results are paginated with read_file instructions, no HTTP 400 errors
Verify the model can page through with read_file offset=N

Manual (128K+ model)

Configure a large-context model
Same commands — verify results pass through unchanged, budget never triggers

Manual (eviction + compaction interaction)

Use a 32K model, have a long conversation
Run a command producing large output when context is ~90% full
Check logs — compaction should fire before spill, freeing room for a useful chunk

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (feat(agent):, fix(tools):, test(tools):, docs:)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (43 new tests across 3 test files)
I've tested on my platform: Ubuntu (kernel 6.17.0-19-generic), Python 3.13.7, RTX 5090 w/ Gemma 4 31B Q4_K_M via llama.cpp (32K context)

Documentation & Housekeeping

I've updated relevant documentation — website/docs/developer-guide/tool-budgets.md
I've updated cli-config.yaml.example — added tool_budgets block
I've considered cross-platform impact — pure Python, no OS-specific code, no new dependencies
N/A — no changes to tool descriptions/schemas for existing tools

Design Notes

Why centralized (not per-tool)

Hermes has a plugin-style tool ecosystem — anyone can add tools. A centralized budget layer protects ALL tools automatically, including third-party ones that don't know about context limits. Tool authors never need to think about budgets.

Why `read_file` for pagination

The model already knows read_file with offset/limit. No new tools to register or maintain. Improvements to read_file benefit both file reading and result pagination. One tool, one improvement path.

Prompt caching compatibility

The budget layer runs at insertion time — before a result enters the message array. Once a message is in the array, it never changes. This preserves Anthropic/OpenAI prefix cache hits across turns (addressing the concern raised in #415).

Self-regulating behavior

The feature is designed to be invisible on large-context models:

baseline = context_length × 0.25 — on a 200K model that's 50K tokens, far above typical tool output
Compaction only fires when context is genuinely tight
Spill only happens when the result actually exceeds available space
The floor ensures the model always gets a useful chunk, never a useless sliver

Made with Cursor

Budget = max(floor, min(baseline, available_context)) - baseline: 25% of context window (absolute ceiling per result) - available: remaining context * 4 chars/token (dynamic) - floor: 2000 tokens minimum (never return useless slivers) Oversized results spill to disk with pagination metadata. Made-with: Cursor

- Init ToolBudget after compressor (uses real context_length) - _apply_tool_budget() intercepts results in both dispatch paths - Compaction-before-spill when context is tight - Dynamic turn budget passed to enforce_turn_budget() Made-with: Cursor

The budget layer now provides context-aware protection, making the infinity exemption unnecessary. read_file falls back to the default 100K char inner limit with the budget layer as the outer guard. Made-with: Cursor

- cli-config.yaml.example: tool_budgets block with result_pct, turn_pct, floor_tokens, compact_before_spill - Developer guide explaining budget calculation, scaling, and interaction with existing systems Made-with: Cursor

The test previously asserted read_file had float('inf') threshold. Updated to verify it now uses DEFAULT_RESULT_SIZE_CHARS since the budget layer provides context-aware protection. Made-with: Cursor

Made-with: Cursor

jbarket force-pushed the feat/tool-budget branch from 9ccc11c to 756291b Compare April 9, 2026 00:27

jbarket added 5 commits April 9, 2026 10:23

docs: add tool_budgets config and developer guide

e49f565

- cli-config.yaml.example: tool_budgets block with result_pct, turn_pct, floor_tokens, compact_before_spill - Developer guide explaining budget calculation, scaling, and interaction with existing systems Made-with: Cursor

test(tools): update read_file threshold test for budget layer change

048253d

The test previously asserted read_file had float('inf') threshold. Updated to verify it now uses DEFAULT_RESULT_SIZE_CHARS since the budget layer provides context-aware protection. Made-with: Cursor

jbarket force-pushed the feat/tool-budget branch from 756291b to 048253d Compare April 9, 2026 15:23

fix(docs): add tool-budgets to developer guide sidebar

f53a138

Made-with: Cursor

alt-glitch added P3 Low — cosmetic, nice to have type/feature New feature or request comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 30, 2026

pinguarmy mentioned this pull request May 20, 2026

feat(tool-result-compaction): add opt-in large tool result compaction #29454

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agent): context-aware tool result budgeting#6339

feat(agent): context-aware tool result budgeting#6339
jbarket wants to merge 6 commits into
NousResearch:mainfrom
jbarket:feat/tool-budget

jbarket commented Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jbarket commented Apr 9, 2026

What does this PR do?

Scaling

Related Issues

Type of Change

Changes Made

New files

Modified files

How to Test

Unit tests

Manual (32K model)

Manual (128K+ model)

Manual (eviction + compaction interaction)

Checklist

Code

Documentation & Housekeeping

Design Notes

Why centralized (not per-tool)

Why read_file for pagination

Prompt caching compatibility

Self-regulating behavior

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why `read_file` for pagination