Skip to content

[Bug] Reasoning-only UND_ERR_SOCKET disconnects do not auto recover #939

@Astro-Han

Description

@Astro-Han

What happened?

A PawWork prod session hit a provider stream disconnect during reasoning generation. The run did not automatically recover even though the exported diagnostics show no final text output and no tool input, tool call, or tool execution before the disconnect.

The failing export was pawwork-session-curious-meadow-2026-05-27-01-44-00.json from a local user report. Key diagnostic facts:

  • App version: 0.0.0-prod-202605261611
  • Provider/model: opencode-go/deepseek-v4-pro
  • Terminal cause: provider_transport_disconnect / during_text_generation
  • Error shape: TypeError: terminated, cause SocketError: other side closed, cause code UND_ERR_SOCKET
  • Stream phase: reasoning_generation
  • provider_progress_seen: true
  • reasoning_output_started: true
  • text_output_started: false
  • tool_input_started: false
  • tool_call_materialized: false
  • tool_execution_started: false
  • unsafe_side_effect_started: false
  • Observability retry safety: candidate_safe_auto_retry, reason reasoning_only_without_final_text_or_tool_activity
  • Final recovery decision: technical_retryable: false, recovery_mode: stop, retry_attempted: false

This means the safety layer recognized the failure as a likely safe replay candidate, but the technical retryability gate rejected the transport error before the safe recovery replay could run.

Which area seems affected?

Model harness, prompts, tools, or session mechanics

How much does this affect you?

Breaks an important workflow

Steps to reproduce

  1. Start a PawWork session using a reasoning model through opencode-go.
  2. Hit an undici/fetch stream termination while the model is producing reasoning only, before final text or tool activity.
  3. Inspect the exported diagnostics.
  4. See UND_ERR_SOCKET classified as a provider transport disconnect by run observability, but technical_retryable: false in the recovery decision.

What did you expect to happen?

PawWork should automatically replay once when a retryable transport disconnect happens before final text output or tool activity, including reasoning-only output. The failed reasoning draft should be removed before replay, matching the existing safe recovery behavior.

PawWork version

0.0.0-prod-202605261611

OS version

macOS 25.5.0

Can you reproduce it again?

Sometimes

Diagnostics

The original session export is local-only and not attached here. Relevant sanitized fields are included above.

Related context: #927 tracks broader recovery after partial output and tool activity. This issue is narrower: reasoning-only transport disconnects should pass the technical retryability gate and reach the existing safe recovery replay path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P1High prioritybugSomething isn't workingharnessModel harness, prompts, tool descriptions, and session mechanics

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions