Skip to content

Bugfix: recover from structured reasoning budget exhaustion#9452

Open
HiddenPuppy wants to merge 3 commits into
NousResearch:mainfrom
HiddenPuppy:codex/fix-thinking-budget-recovery
Open

Bugfix: recover from structured reasoning budget exhaustion#9452
HiddenPuppy wants to merge 3 commits into
NousResearch:mainfrom
HiddenPuppy:codex/fix-thinking-budget-recovery

Conversation

@HiddenPuppy

Copy link
Copy Markdown
Contributor

Summary

  • detect thinking-budget exhaustion for chat-completions responses that return structured reasoning fields without visible text
  • record usage before length/empty-response recovery so Hermes can use real token pressure for recovery decisions
  • compact context before continuation/prefill when a reasoning-only response already shows the conversation is over the compaction threshold
  • add regression tests for structured reasoning truncation and proactive compression retry paths

Root Cause

Hermes already had a thinking-budget guard for inline <think> content, but OpenAI-compatible models like glm-5-turbo often return reasoning via reasoning_content/reasoning_details with empty content. Those responses skipped the guard, then walked into continuation or prefill retries that grew context further without ever giving compression a chance.

Notes

Validation

  • git diff --check
  • python3 -m py_compile run_agent.py tests/run_agent/test_run_agent.py
  • Full pytest was not runnable locally in this environment because the machine does not currently have the repo's required Python 3.11 + dev test toolchain installed.

Closes #9344

@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists comp/agent Core agent loop, run_agent.py, prompt builder labels Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp/agent Core agent loop, run_agent.py, prompt builder P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Thinking model (glm-5-turbo) reasoning tokens exhaust output budget, producing empty responses with no recovery path

2 participants