fix(agent): output-token parse for OpenRouter; empty-stream guard; cron-session prefix (#38652 #38725 #38788)#40405
Closed
ashishpatel26 wants to merge 1 commit into
Conversation
…on-session prefix Three P1 fixes in one batch: 1. parse_available_output_tokens_from_error (NousResearch#38652): guard and extraction now recognise the OpenRouter/Nous error format ("maximum context length is N... K in the output"). The old guard required "max_tokens" + "available_tokens" keywords — both absent in OpenRouter responses — so the function returned None, the caller could not reduce max_tokens, and the session entered an infinite auto-reset loop. 2. Zero-chunk stream guard (NousResearch#38725): _call_chat_completions now raises RuntimeError when the SSE loop exits with finish_reason=None and no accumulated content/tool-calls. Previously the or-"stop" fallback fabricated a syntactically valid but empty completion, masking upstream errors as successful turns. 3. Cron session_id prefix preservation (NousResearch#38788): compression-triggered session rotation now carries forward the cron_ prefix so post-compression cron sessions remain identifiable. ContextCompressor.on_session_start clears _previous_summary on every session switch, preventing stale cron summaries from bleeding into live conversations resumed via /resume. Fixes NousResearch#38652, NousResearch#38725, NousResearch#38788.
Contributor
|
Salvaged the two isolated reliability fixes (zero-chunk stream guard + OpenRouter output-cap parsing) into #40589 with credit. I split out the two compression-state changes ( |
teknium1
added a commit
that referenced
this pull request
Jun 7, 2026
…ut-cap errors (#40589) Two isolated reliability fixes: - chat_completion_helpers: raise on a zero-chunk stream (no finish_reason, no content/reasoning/tool_calls) so retry handles it instead of fabricating a successful empty turn. - model_metadata: parse the OpenRouter/Nous output-cap error phrasing ("maximum context length is N ... (A of text input, B of tool input, C in the output)") so parse_available_output_tokens_from_error returns a real cap and the caller stops looping on it. Salvaged from #40405 (@ashishpatel26) — took the two stream/error-parsing fixes. The PR also bundled compression-state changes (on_session_start clearing _previous_summary; cron session-id prefix preservation, #38788); those touch the compression hot path and are split out for separate review. Co-authored-by: ashishpatel26 <ashishpatel26@users.noreply.github.com>
changman
pushed a commit
to changman/hermes-agent
that referenced
this pull request
Jun 10, 2026
…ut-cap errors (NousResearch#40589) Two isolated reliability fixes: - chat_completion_helpers: raise on a zero-chunk stream (no finish_reason, no content/reasoning/tool_calls) so retry handles it instead of fabricating a successful empty turn. - model_metadata: parse the OpenRouter/Nous output-cap error phrasing ("maximum context length is N ... (A of text input, B of tool input, C in the output)") so parse_available_output_tokens_from_error returns a real cap and the caller stops looping on it. Salvaged from NousResearch#40405 (@ashishpatel26) — took the two stream/error-parsing fixes. The PR also bundled compression-state changes (on_session_start clearing _previous_summary; cron session-id prefix preservation, NousResearch#38788); those touch the compression hot path and are split out for separate review. Co-authored-by: ashishpatel26 <ashishpatel26@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three P1 bug fixes: infinite token-reset loop, fabricated empty stream turn, cron summary leak.
Fix 1 - #38652: parse_available_output_tokens_from_error misses OpenRouter format
Root cause: guard required "max_tokens" + "available_tokens" keywords. OpenRouter uses "maximum context length is N" and "K in the output" - neither present, returns None, caller loops forever.
Fix: expanded guard; added extraction computing available = context_length - text_input - tool_input.
Files: agent/model_metadata.py
Fix 2 - #38725: Streaming parser fabricates empty stop turn on zero-chunk stream
Root cause: after stream loop with zero chunks, finish_reason is None. The or "stop" fallback fabricated a successful empty turn, hiding the provider error.
Fix: zero-chunk guard raises RuntimeError so retry machinery handles recovery.
Files: agent/chat_completion_helpers.py
Fix 3 - #38788: Cron session summary leaks into live conversations after compression
Root cause A: compression rotated session_id dropping cron_ prefix; later resume injected cron summaries into live conversations.
Root cause B: _previous_summary never cleared on session switch.
Fix A: session rotation preserves cron_ prefix when session is cron-sourced.
Fix B: added ContextCompressor.on_session_start() clearing _previous_summary on each switch.
Files: agent/conversation_compression.py, agent/context_compressor.py
Test plan
stopturn when an OpenAI-wire stream yields zero chunks #38725: zero-chunk SSE stream - confirm RuntimeError raised not silent empty turn