Skip to content

Streaming parser silently fabricates an empty stop turn when an OpenAI-wire stream yields zero chunks #38725

@HoltYoung

Description

@HoltYoung

Summary

When a streaming chat.completions request to an OpenAI-compatible endpoint returns HTTP 200 but the SSE body yields zero usable chunks (an empty stream, or a non-standard error: data frame that the OpenAI SDK doesn't surface as content), the streaming consumer in agent/chat_completion_helpers.py exits its loop with no content and finish_reason = None, and then fabricates a successful empty stop turn instead of raising. The agent accepts the empty turn (and can re-loop on a persistent condition), which presents to the user as a silent "stuck"/hung agent with no error surfaced.

Where (function/symbols; line numbers approximate)

In interruptible_streaming_api_call (agent/chat_completion_helpers.py):

  • The consume loop for chunk in stream: (~L1758) exits normally on a zero-chunk stream — content_parts, reasoning_parts, and tool_calls_acc all stay empty and finish_reason stays None.
  • When the mock response is assembled (~L1938): effective_finish_reason = finish_reason or 'stop' — this turns the empty/errored result into a valid-looking stop turn. No error is raised, so it never enters the retry/error path.

Related: the OpenAI-wire path has no stream-parse-error recognition equivalent to the Anthropic path (run_agent.py _is_provider_stream_parse_error only matches Anthropic's "expected ident at line"), so a malformed OpenAI-style stream is dropped silently rather than surfaced.

Why it matters

Some OpenAI-compatible servers/proxies emit a 200 with an empty or non-standard-error SSE body under certain conditions (e.g. context overflow, or an upstream that returns an error inside a data: frame). Today that becomes an invisible empty assistant turn rather than a recoverable error, and on a persistent condition the agent can loop on empty turns.

Proposed fix (provider-agnostic, defense-in-depth)

Add a post-loop zero-chunk guard just before the mock response is built (~L1900), so an empty result becomes a real, retryable error that flows through the existing retry/recovery machinery instead of a fabricated successful turn:

if finish_reason is None and not content_parts and not reasoning_parts and not tool_calls_acc:
    raise RuntimeError(
        "Provider returned an empty stream with no finish_reason "
        "(possible upstream error or malformed SSE)."
    )

Optional companion — a first-token deadline (e.g. HERMES_STREAM_FIRST_TOKEN_TIMEOUT, default ~60s) enforced in the existing outer poll loop and not suppressed for local endpoints, to bound the blocking-empty case (stream opens but never emits a chunk). This matters for local endpoints in particular because the stale-stream detector is disabled for them (~L2299-2301) and the httpx read timeout is raised to the API timeout (~L1695-1696), leaving no short backstop. (For non-local endpoints the standard read timeout already bounds the blocking case, so the post-loop guard above is the key fix.)

Repro sketch

Point Hermes at an OpenAI-compatible endpoint that returns HTTP 200 with an empty SSE body (or an error: frame) for some request, send a message, and observe the agent produce a silent empty turn / appear to hang, rather than surfacing an error or retrying.

Impact

Low-risk, contained change in the streaming finalizer; turns a silent-empty-turn failure mode (which looks like a hang) into a normal recoverable error for any provider/proxy that emits a malformed or empty stream.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High — major feature broken, no workaroundcomp/agentCore agent loop, run_agent.py, prompt buildertype/bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions