Summary
When a streaming chat.completions request to an OpenAI-compatible endpoint returns HTTP 200 but the SSE body yields zero usable chunks (an empty stream, or a non-standard error: data frame that the OpenAI SDK doesn't surface as content), the streaming consumer in agent/chat_completion_helpers.py exits its loop with no content and finish_reason = None, and then fabricates a successful empty stop turn instead of raising. The agent accepts the empty turn (and can re-loop on a persistent condition), which presents to the user as a silent "stuck"/hung agent with no error surfaced.
Where (function/symbols; line numbers approximate)
In interruptible_streaming_api_call (agent/chat_completion_helpers.py):
- The consume loop
for chunk in stream: (~L1758) exits normally on a zero-chunk stream — content_parts, reasoning_parts, and tool_calls_acc all stay empty and finish_reason stays None.
- When the mock response is assembled (~L1938):
effective_finish_reason = finish_reason or 'stop' — this turns the empty/errored result into a valid-looking stop turn. No error is raised, so it never enters the retry/error path.
Related: the OpenAI-wire path has no stream-parse-error recognition equivalent to the Anthropic path (run_agent.py _is_provider_stream_parse_error only matches Anthropic's "expected ident at line"), so a malformed OpenAI-style stream is dropped silently rather than surfaced.
Why it matters
Some OpenAI-compatible servers/proxies emit a 200 with an empty or non-standard-error SSE body under certain conditions (e.g. context overflow, or an upstream that returns an error inside a data: frame). Today that becomes an invisible empty assistant turn rather than a recoverable error, and on a persistent condition the agent can loop on empty turns.
Proposed fix (provider-agnostic, defense-in-depth)
Add a post-loop zero-chunk guard just before the mock response is built (~L1900), so an empty result becomes a real, retryable error that flows through the existing retry/recovery machinery instead of a fabricated successful turn:
if finish_reason is None and not content_parts and not reasoning_parts and not tool_calls_acc:
raise RuntimeError(
"Provider returned an empty stream with no finish_reason "
"(possible upstream error or malformed SSE)."
)
Optional companion — a first-token deadline (e.g. HERMES_STREAM_FIRST_TOKEN_TIMEOUT, default ~60s) enforced in the existing outer poll loop and not suppressed for local endpoints, to bound the blocking-empty case (stream opens but never emits a chunk). This matters for local endpoints in particular because the stale-stream detector is disabled for them (~L2299-2301) and the httpx read timeout is raised to the API timeout (~L1695-1696), leaving no short backstop. (For non-local endpoints the standard read timeout already bounds the blocking case, so the post-loop guard above is the key fix.)
Repro sketch
Point Hermes at an OpenAI-compatible endpoint that returns HTTP 200 with an empty SSE body (or an error: frame) for some request, send a message, and observe the agent produce a silent empty turn / appear to hang, rather than surfacing an error or retrying.
Impact
Low-risk, contained change in the streaming finalizer; turns a silent-empty-turn failure mode (which looks like a hang) into a normal recoverable error for any provider/proxy that emits a malformed or empty stream.
Summary
When a streaming
chat.completionsrequest to an OpenAI-compatible endpoint returns HTTP 200 but the SSE body yields zero usable chunks (an empty stream, or a non-standarderror:data frame that the OpenAI SDK doesn't surface as content), the streaming consumer inagent/chat_completion_helpers.pyexits its loop with no content andfinish_reason = None, and then fabricates a successful emptystopturn instead of raising. The agent accepts the empty turn (and can re-loop on a persistent condition), which presents to the user as a silent "stuck"/hung agent with no error surfaced.Where (function/symbols; line numbers approximate)
In
interruptible_streaming_api_call(agent/chat_completion_helpers.py):for chunk in stream:(~L1758) exits normally on a zero-chunk stream —content_parts,reasoning_parts, andtool_calls_accall stay empty andfinish_reasonstaysNone.effective_finish_reason = finish_reason or 'stop'— this turns the empty/errored result into a valid-lookingstopturn. No error is raised, so it never enters the retry/error path.Related: the OpenAI-wire path has no stream-parse-error recognition equivalent to the Anthropic path (
run_agent.py_is_provider_stream_parse_erroronly matches Anthropic's"expected ident at line"), so a malformed OpenAI-style stream is dropped silently rather than surfaced.Why it matters
Some OpenAI-compatible servers/proxies emit a
200with an empty or non-standard-error SSE body under certain conditions (e.g. context overflow, or an upstream that returns an error inside adata:frame). Today that becomes an invisible empty assistant turn rather than a recoverable error, and on a persistent condition the agent can loop on empty turns.Proposed fix (provider-agnostic, defense-in-depth)
Add a post-loop zero-chunk guard just before the mock response is built (~L1900), so an empty result becomes a real, retryable error that flows through the existing retry/recovery machinery instead of a fabricated successful turn:
Optional companion — a first-token deadline (e.g.
HERMES_STREAM_FIRST_TOKEN_TIMEOUT, default ~60s) enforced in the existing outer poll loop and not suppressed for local endpoints, to bound the blocking-empty case (stream opens but never emits a chunk). This matters for local endpoints in particular because the stale-stream detector is disabled for them (~L2299-2301) and the httpx read timeout is raised to the API timeout (~L1695-1696), leaving no short backstop. (For non-local endpoints the standard read timeout already bounds the blocking case, so the post-loop guard above is the key fix.)Repro sketch
Point Hermes at an OpenAI-compatible endpoint that returns
HTTP 200with an empty SSE body (or anerror:frame) for some request, send a message, and observe the agent produce a silent empty turn / appear to hang, rather than surfacing an error or retrying.Impact
Low-risk, contained change in the streaming finalizer; turns a silent-empty-turn failure mode (which looks like a hang) into a normal recoverable error for any provider/proxy that emits a malformed or empty stream.