You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Design and implement a first-class LLM run diagnostics architecture so recurring terminated failures can be debugged, classified, and eventually recovered without one-off string patches.
What should be true when this is done:
Session exports answer where an LLM request failed: before connection, early stream, mid-generation, tool side-effect, local abort, watchdog, or provider error.
The exported diagnostics distinguish user/PawWork aborts from upstream/provider/socket disconnects.
The diagnostics include enough safe correlation data to compare PawWork behavior against Codex App and upstream OpenCode.
Recovery policy can be derived from facts: safe auto-retry, user-confirmed retry, or no retry.
Raw terminated / UND_ERR_SOCKET should no longer be treated as an opaque user-facing failure.
Design a run-level diagnostics recorder for LLM requests, not just additional ad hoc fields on llm_trace.stream.error.
Define a failure classifier and retry-safety classifier based on stream facts, tool side effects, abort provenance, watchdog state, and transport errors.
Extend session exports with a human-readable summary plus a bounded engineering timeline.
Add deterministic local harness coverage for early disconnect, mid-stream disconnect, local abort, watchdog timeout, and post-tool-call disconnect cases.
Keep exported fields safe: no prompts, secrets, raw auth headers, or unbounded response bodies.
Out of scope for the first design pass:
Changing provider credentials or OpenAI account routing.
Blindly adding substring-specific UI translations for terminated.
Auto-retrying all stream failures before retry-safety is explicit.
Uploading session exports automatically.
Replacing the whole LLM runtime in the same PR unless the design proves it is necessary.
Upstream OpenCode appears to rely mostly on logs / optional OpenTelemetry / AI SDK onError for this path.
Upstream has an experimental native LLM runtime path, which may eventually help compare AI SDK transport behavior against a more controlled runtime.
Verification
A design is acceptable when it specifies:
The diagnostic events and their bounded schema.
The failure taxonomy and retry-safety taxonomy.
How request/runtime/transport facts are captured without leaking secrets.
How session export summary and timeline are structured.
How deterministic tests simulate stream disconnects.
Which part is minimum viable diagnostics, which part is architectural foundation, and which part is deferred.
Implementation verification should eventually include targeted tests for:
Early provider progress followed by UND_ERR_SOCKET before text/tool output.
Mid-generation disconnect after visible text.
Disconnect after tool call / side effect started.
Local abort with provenance.
Watchdog connect timeout and silent stream timeout.
Export sanitizer redaction of safe transport metadata.
Execution mode
Investigate and propose a plan first — the agent must post the plan as an issue comment and wait for explicit approval before writing code or opening a PR.
Goal
Design and implement a first-class LLM run diagnostics architecture so recurring
terminatedfailures can be debugged, classified, and eventually recovered without one-off string patches.What should be true when this is done:
terminated/UND_ERR_SOCKETshould no longer be treated as an opaque user-facing failure.Scope
In scope:
terminatedsamples.llm_trace.stream.error.Out of scope for the first design pass:
terminated.Relevant files or context
Recent evidence:
docs/debug-session-log/pawwork-session-hidden-mountain-2026-05-20-04-10-54-terminated.jsondocs/debug-session-log/pawwork-session-quiet-wizard-2026-05-20-08-25-08-terminated.jsonBoth show the same failure signature after PR #771 diagnostics:
stream.error.constructor_name: TypeErrorstream.error.cause_name: SocketErrorstream.error.cause_code: UND_ERR_SOCKETstream.error.cause_message: other side closedstack_hint: Fetch.onAborted ... undiciwatchdog.fired: falseabort.signal_aborted_at_error: falsewatchdog.provider_progressed: trueRelated work:
terminatedfrom upstream stream close leaks to assistant message without translation #754 captured the earlier rawterminatedleak but was intentionally conservative while waiting for more samples.Likely code areas:
packages/opencode/src/session/llm.tspackages/opencode/src/session/llm-trace/*packages/opencode/src/session/export.tspackages/opencode/src/session/retry.tspackages/opencode/src/session/processor.tspackages/opencode/test/session/llm*.test.tsUpstream context:
onErrorfor this path.Verification
A design is acceptable when it specifies:
Implementation verification should eventually include targeted tests for:
UND_ERR_SOCKETbefore text/tool output.Execution mode
Investigate and propose a plan first — the agent must post the plan as an issue comment and wait for explicit approval before writing code or opening a PR.