Skip to content

[Bug] Raw terminated from upstream stream close leaks to assistant message without translation #754

@Astro-Han

Description

@Astro-Han

What happened?

An assistant message ends with error = { name: "UnknownError", data: { message: "terminated" } } and the UI surfaces the raw string terminated. The string originates from undici / Node's fetch implementation when a response body is closed before its read finishes; PawWork passes it through to the user without translation or normalization.

Captured from a real session export (docs/debug-session-log/pawwork-session-neon-orchid-2026-05-19-05-08-23-terminated.json, local-only):

  • Top-level: assistant.error = { name: "UnknownError", data: { message: "terminated" } }.
  • LLM trace flags.stream_error: true, tokens all zero. stream_events shows start, start_step, reasoning_start/end, tool_input_start, 5x tool_input_delta, then no further events for ~147s before the message fails.
  • The trailing apply_patch tool part is finalised with state.error = "Tool execution aborted", metadata.interrupted: true.

The 147s duration is the wall-clock between the request start and the failure. It is not the local stream watchdog firing: SILENT_STREAM_TIMEOUT_MS defaults to 10 minutes (packages/opencode/src/session/llm.ts:30) and resets on every provider-progress event (resetTimeout, llm.ts:509). The most plausible trigger is an upstream stream closure — provider gateway, intermediate proxy, or network — which causes undici to throw TypeError("terminated") while the iterator is awaiting the next chunk.

Root cause

Stream-boundary errors are wrapped verbatim into the assistant message. MessageV2.fromError() (packages/opencode/src/session/message-v2.ts:1187) treats any unknown Error as UnknownError and stores only error.message in data.message. There is no translation step that recognises undici / fetch low-level identifiers (terminated, other side closed, etc.) and converts them into a user-facing description of what happened to the stream.

This is independent of the local stream watchdog. The watchdog's two branches are by design: connect-phase exhaustion produces a friendly error, while post-progress idle simply lets the stream end (see existing test silent stream timeout cancels provider response body promptly, packages/opencode/test/session/llm.test.ts:617-688, which explicitly asserts Exit.isFailure(exit) === false). The bug in this report is the missing translation layer when the closure comes from outside the watchdog.

Steps to reproduce

  1. Start a long-running model turn that emits at least one provider-progress event (any reasoning or tool-input chunk).
  2. Cause the upstream stream to be closed mid-response, without the local watchdog reaching its threshold. Easiest in development: kill the provider connection from a proxy, or use a flaky provider/gateway that drops the long-lived HTTP/2 stream.
  3. Observe the assistant message fails with the literal terminated text shown to the user.

What did you expect to happen?

The assistant message should fail with a human-readable description of what happened to the stream (for example: the stream was closed by the upstream provider before the response finished). The raw undici identifier terminated should never reach the UI.

PawWork version

v2026.5.19 (prod-202605181651)

OS version

macOS 15.4.0 (darwin 25.4.0)

Can you reproduce it again?

Only once so far

Diagnostics

  • Failing session export (local): docs/debug-session-log/pawwork-session-neon-orchid-2026-05-19-05-08-23-terminated.json.
  • Related-but-different export from the same session: docs/debug-session-log/pawwork-session-neon-orchid-2026-05-19-04-58-27-LLM stream connection timed out after 30000ms without provider progress.json (this one is the connect-timeout watchdog firing correctly with a friendly message; included for contrast — it demonstrates the watchdog is not the source of the terminated bug).
  • Error-wrapping site: packages/opencode/src/session/message-v2.ts:1187 (fromErrorUnknownError).
  • Stream consumption / error rethrow: packages/opencode/src/session/processor.ts:719 and packages/opencode/src/session/llm.ts:563-597 (failOnTimeout).
  • Fix plan will be posted as a comment after review.

Waiting for

Holding scope intentionally. A one-off undici terminated is not enough signal to design the fix — different providers and SDKs emit different raw identifiers for "upstream closed the stream mid-response" (undici: terminated; OpenAI SDK: variations like stream interrupted / response closed; Anthropic SDK: its own fetch wrapper strings). Patching one substring per occurrence is the wrong direction and would compound existing substring-matching debt in retry.ts.

Before committing to a fix, collect:

  • 2-3 more reproductions to confirm this is recurring, not a one-time provider hiccup.
  • Cross-provider samples (at minimum: one non-OpenAI provider) to confirm whether the raw strings actually differ and how.

Once there is enough evidence, the right shape is a stream-boundary error taxonomy (e.g., StreamInterrupted with provider, phase, cause) plumbed through MessageV2.error and rendered with localised UI copy — and the same pass should sweep retry.ts's substring-based error matching. Until then, do not add per-string translation patches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium prioritybugSomething isn't workingharnessModel harness, prompts, tool descriptions, and session mechanicstech-debtSupplemental cleanup, maintainability, architecture, test, or quality debt context

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions