Skip to content

[Task] Add LLM stream failure diagnostics v2 to session exports #760

@Astro-Han

Description

@Astro-Han

Goal

Add structured LLM stream failure diagnostics to PawWork session exports so the next stream failure can be attributed to the correct boundary instead of requiring per-incident guesswork.

When this task is done, a failed assistant turn should tell us whether the failure most likely came from local cancellation, PawWork's stream watchdog, SDK / transport stream reading, provider / gateway closure, or an unknown boundary with enough evidence to continue investigation.

This is a diagnostic foundation task, not a user-visible behavior fix.

Scope

In scope:

  • Extend the existing llm_trace / session export diagnostics with a schema-versioned v2 shape or compatible v1 extension.
  • Capture a compact stream phase timeline: request creation, SDK stream returned, watchdog armed, first event, first provider-progress event, last provider-progress event, failure/completion.
  • Capture watchdog configuration and state: connectTimeoutMs, streamTimeoutMs, provider-progress state, and timeout/failure phase.
  • Capture a sanitized error fingerprint for stream failures: constructor name, error name, message, code, cause name/message/code, and at most a safe stack/module hint.
  • Capture abort state at failure time: whether the LLM abort signal was already aborted, plus any available abort provenance.
  • Capture safe provider correlation data when available, such as request id / response id / non-sensitive response headers.
  • Keep exports safe: do not include auth headers, cookies, prompt text, tool args, raw provider response body, or arbitrary URLs.
  • Preserve current runtime behavior. This task should improve diagnosis only.

Out of scope:

  • Translating terminated into user-facing copy.
  • Changing retry behavior or timeout policy.
  • Reworking the full watchdog architecture.
  • Recording every stream chunk or provider packet.
  • Building a general observability or telemetry platform.
  • Provider-specific error taxonomy beyond safe raw fingerprints and phase classification.

Relevant files or context

Related issues / PRs:

Likely files:

  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/session/llm-trace/types.ts
  • packages/opencode/src/session/llm-trace/recorder.ts
  • packages/opencode/src/session/processor.ts
  • packages/opencode/src/session/export.ts
  • packages/opencode/src/session/message-v2.ts
  • packages/opencode/test/session/llm.test.ts
  • packages/opencode/test/session/export.test.ts

Observed gap from #754's second reproduction:

  • The export shows UnknownError with data.message = "terminated" and flags.stream_error = true.
  • The trace proves provider progress happened before failure.
  • The trace does not show whether the abort signal was already aborted, whether a watchdog fired, what raw error class/code/cause was thrown by undici/SDK, or whether provider request correlation is available.

Verification

  • Add focused unit tests for stream failure diagnostics on:
    • connect timeout before first provider-progress event;
    • mid-stream external iterator error after provider progress;
    • local abort / interrupt path if currently observable in the test surface.
  • Add export tests proving diagnostics are included and sanitized.
  • Add regression tests ensuring sensitive data is not exported: auth headers, cookies, raw prompt text, raw tool args, and response bodies.
  • Run targeted checks, likely:
    • bun --cwd packages/opencode test test/session/llm.test.ts test/session/export.test.ts --timeout 30000
    • bun --cwd packages/opencode typecheck
    • git diff --check

Execution mode

Investigate and propose a plan first — the agent must post the plan as an issue comment and wait for an explicit "approved" comment before writing code or opening a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityharnessModel harness, prompts, tool descriptions, and session mechanicstaskNarrow execution, audit, spike, migration, tracking, or upstream follow-up worktech-debtSupplemental cleanup, maintainability, architecture, test, or quality debt context

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions