Streaming parser silently fabricates an empty `stop` turn when an OpenAI-wire stream yields zero chunks

## Summary

When a streaming `chat.completions` request to an OpenAI-compatible endpoint returns **HTTP 200 but the SSE body yields zero usable chunks** (an empty stream, or a non-standard `error:` data frame that the OpenAI SDK doesn't surface as content), the streaming consumer in `agent/chat_completion_helpers.py` exits its loop with no content and `finish_reason = None`, and then **fabricates a successful empty `stop` turn** instead of raising. The agent accepts the empty turn (and can re-loop on a persistent condition), which presents to the user as a silent "stuck"/hung agent with no error surfaced.

## Where (function/symbols; line numbers approximate)

In `interruptible_streaming_api_call` (`agent/chat_completion_helpers.py`):

- The consume loop `for chunk in stream:` (~L1758) exits normally on a zero-chunk stream — `content_parts`, `reasoning_parts`, and `tool_calls_acc` all stay empty and `finish_reason` stays `None`.
- When the mock response is assembled (~L1938): `effective_finish_reason = finish_reason or 'stop'` — this turns the empty/errored result into a valid-looking `stop` turn. No error is raised, so it never enters the retry/error path.

Related: the OpenAI-wire path has no stream-parse-error recognition equivalent to the Anthropic path (`run_agent.py` `_is_provider_stream_parse_error` only matches Anthropic's `"expected ident at line"`), so a malformed OpenAI-style stream is dropped silently rather than surfaced.

## Why it matters

Some OpenAI-compatible servers/proxies emit a `200` with an empty or non-standard-error SSE body under certain conditions (e.g. context overflow, or an upstream that returns an error inside a `data:` frame). Today that becomes an invisible empty assistant turn rather than a recoverable error, and on a persistent condition the agent can loop on empty turns.

## Proposed fix (provider-agnostic, defense-in-depth)

Add a **post-loop zero-chunk guard** just before the mock response is built (~L1900), so an empty result becomes a real, retryable error that flows through the existing retry/recovery machinery instead of a fabricated successful turn:

```python
if finish_reason is None and not content_parts and not reasoning_parts and not tool_calls_acc:
    raise RuntimeError(
        "Provider returned an empty stream with no finish_reason "
        "(possible upstream error or malformed SSE)."
    )
```

**Optional companion** — a first-token deadline (e.g. `HERMES_STREAM_FIRST_TOKEN_TIMEOUT`, default ~60s) enforced in the existing outer poll loop and **not** suppressed for local endpoints, to bound the *blocking*-empty case (stream opens but never emits a chunk). This matters for local endpoints in particular because the stale-stream detector is disabled for them (~L2299-2301) and the httpx read timeout is raised to the API timeout (~L1695-1696), leaving no short backstop. (For non-local endpoints the standard read timeout already bounds the blocking case, so the post-loop guard above is the key fix.)

## Repro sketch

Point Hermes at an OpenAI-compatible endpoint that returns `HTTP 200` with an empty SSE body (or an `error:` frame) for some request, send a message, and observe the agent produce a silent empty turn / appear to hang, rather than surfacing an error or retrying.

## Impact

Low-risk, contained change in the streaming finalizer; turns a silent-empty-turn failure mode (which looks like a hang) into a normal recoverable error for any provider/proxy that emits a malformed or empty stream.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming parser silently fabricates an empty `stop` turn when an OpenAI-wire stream yields zero chunks #38725

Summary

Where (function/symbols; line numbers approximate)

Why it matters

Proposed fix (provider-agnostic, defense-in-depth)

Repro sketch

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Streaming parser silently fabricates an empty stop turn when an OpenAI-wire stream yields zero chunks #38725

Description

Summary

Where (function/symbols; line numbers approximate)

Why it matters

Proposed fix (provider-agnostic, defense-in-depth)

Repro sketch

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Streaming parser silently fabricates an empty `stop` turn when an OpenAI-wire stream yields zero chunks #38725