Skip to content

[Bug] Stuck '思考中' state when provider/model is misconfigured (no connect timeout) #554

@Astro-Han

Description

@Astro-Han

What happened?

When a session is submitted against an invalid provider/model combination — concrete repro: providerID: alibaba-coding-plan-cn + modelID: glm-5 (the alibaba coding plan gateway serves Qwen-family models, GLM is a Zhipu product, so the pair has no valid upstream route) — the request silently fails to produce any stream events. The assistant message is created with empty parts, no time.completed, no info.error, and no llm_trace is recorded. From the UI the session stays at "思考中" indefinitely while session_status remains busy. Users perceive it as the model erroring out, but there is no surface to recover from — they can't tell if it's still loading or stuck, and they can't even cancel it from a normal Stop press if the composer is hidden behind the busy spinner.

Steps to reproduce

  1. Configure a provider/model pair that resolves locally but is rejected by the upstream gateway (e.g. add glm-5 modelID under alibaba-coding-plan-cn provider).
  2. Submit any prompt with that pair selected.
  3. Observe: composer flips to busy, "思考中" appears, and stays forever. No error toast, no error message body, no provider error event.

Expected

If the first stream event doesn't arrive within a reasonable connect timeout (e.g. 30 s), the session should mark the assistant message as error with a clear provider-rejected reason and flip session_status back to idle, so the user can switch model or retry.

Diagnostics

In the attached session export (pawwork-session-stellar-pixel-2026-05-11-10-53-32.json) the second user submit at 10:50:57.589 used provider: alibaba-coding-plan-cn + model: glm-5. The corresponding assistant message (msg_e16a958f70017yxOn0NXKAT7ut) has:

  • time.created: 1778496657655 but no time.completed
  • parts: [] — no step-start, reasoning, text, or tool parts
  • no info.error field
  • no diagnostics.llm_trace recorded (compare against the GPT message in the same session, which has a full trace including aborted: true)
  • export captured at 10:53:32 — 2 m 35 s after submit, still no resolution

runtime_context.model_refs lists alibaba-coding-plan-cn/glm-5 as resolved: true, but resolved: true only means the local config layer registered the entry, not that the upstream gateway accepts the model id. There is no upstream validation gate before the first real request.

The existing SILENT_STREAM_TIMEOUT_MS (packages/opencode/src/session/llm.ts:30, default 10 min) was designed for "stream stalled after producing events" — it doesn't fire when the stream never produces any event at all, which is the failure mode here.

Suggested fixes

  1. Connect-timeout watchdog in packages/opencode/src/session/llm.ts. Separate from SILENT_STREAM_TIMEOUT_MS, arm a shorter timer (e.g. 30 s) that fires if no provider event has arrived since dispatch. On fire, write an APIError("provider did not respond") onto the assistant message and flip status to idle.
  2. Boot-time validation of provider/model pairs in runtime_context.model_refs. resolved: true should require at least a successful capabilities probe, or be downgraded to registered: true with validated: false so an invalid pair surfaces before the user submits a real prompt against it.
  3. Status reconcile: if no llm_trace and no message parts are written within a wall-clock budget, the session_status watchdog should release the busy lock so the composer doesn't trap the user in 思考中.

Environment

  • PawWork version: local build (session export runtime_context.app_version: "local")
  • OS: macOS 15.x (Darwin 25.3.0)
  • Reproducibility: Yes, every time (with the invalid provider/model combo)

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2Medium priorityappApplication behavior and product flowsbugSomething isn't workingharnessModel harness, prompts, tool descriptions, and session mechanics

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions