[Bug] 30s LLM connect timeout aborts OpenAI reasoning streams (post-#729 residual)

### What happened?

PawWork's 30-second "first provider progress" watchdog (`CONNECT_STREAM_TIMEOUT_MS` in `packages/opencode/src/session/llm.ts:31`) aborts in-flight reasoning-model streams when the model spends more than 30 seconds on internal reasoning before emitting any event that `isProviderProgressEvent` (L546-561) whitelists. The whitelist only counts `text-*` / `reasoning-*` / `tool-input-*` / `tool-call` / `tool-result` / `tool-error` as progress, so connection establishment, the synthetic `start` envelope, and `start-step` do not reset the timer. With OpenAI `gpt-5.5` (reasoning-capable, default `reasoningEffort: "medium"` from `provider/transform.ts:1185`), 14 active tools, and a long session, first-chunk latency reproducibly exceeds 30s and the watchdog kills an otherwise-healthy stream.

The error reaches the user as `UnknownError: LLM stream connection timed out after 30000ms without provider progress`. `SessionRetry.policy.retryable()` (`packages/opencode/src/session/retry.ts:55-105`) does not classify this local error as retryable, so there is no automatic retry. This is distinct from #728 / PR #729: that PR fixed the timer starting before the HTTP request was actually sent, and the build in this report already includes that fix. The residual issue is the 30s ceiling itself, which #729's body explicitly deferred.

### Which area seems affected?

Model harness, prompts, tools, or session mechanics

### How much does this affect you?

Breaks an important workflow

### Steps to reproduce

1. Open a long-running build-agent session with OpenAI `gpt-5.5` (or a comparable reasoning-capable model at `reasoningEffort: "medium"` or higher).
2. Let the model run several tool-call rounds so the session accumulates meaningful context (this report: 269 messages / 1063 parts).
3. Issue a follow-up turn whose first model action requires non-trivial reasoning before any text or tool-input chunk.
4. Occasionally observe the assistant message fail with the 30000ms timeout error before any provider chunk is received.

### What did you expect to happen?

The assistant message completes, or, if the stream must be aborted, the retry policy attempts it again automatically and only surfaces a hard error after repeated failures, rather than failing on the first occurrence with no provider chunk ever received.

### PawWork version

`0.0.0-prod-202605181651`

### OS version

macOS 26 (Darwin 25.4.0)

### Can you reproduce it again?

Sometimes

### Diagnostics

- Session: `ses_1c1b6ccdbffes5qfwa7ovaOcLH`. Failing assistant message: `msg_e3e9723a10015WNXnu81BTQeXD`.
- Trace counters from the session export: `dur_ms: 30204`, `stream_events.start: 1`, all other counters (`start_step`, `text_*`, `reasoning_*`, `tool_input_*`, `tool_call`, `tool_result`, `tool_error`, `error`, `finish_step`, `finish`) `0`, `tokens.input/output/reasoning: 0`, `flags.stream_error: true`, `flags.empty_completion: false`. Provider emitted no `error` event; PawWork's watchdog aborted the stream.
- The preceding trace `msg_e3e96c8030015laTiPT5gmzjpD` finished cleanly with `finish_reason: tool-calls` 17 seconds earlier, so this is not a stale connection. A user retry 16 seconds after the failure (`msg_e3e97d832001lbotXyQzAOF05y`) succeeded in 16.6s with 26 text deltas, confirming the model and account were healthy.
- Investigation chain confirming this is the #729 residual: grep for the error literal points to `session/llm.ts:466`. `git log -- packages/opencode/src/session/llm.ts` shows `610241905 fix: defer LLM stream connect timeout to after HTTP request is sent (#729)` as the most recent change to that file. PR #729's body explicitly defers two follow-ups — (1) `SessionRetry.policy` not treating connect timeouts as retryable, (2) `connectTimeoutMs` not being configurable end-to-end — and the build identifier in this report (`0.0.0-prod-202605181651`, built 2026-05-18 16:51) postdates the PR #729 merge (2026-05-18 08:06 UTC), so the timer-start fix is present and the residual 30s ceiling is what fired here.
- Full session export (`pawwork-session-neon-orchid-2026-05-19-04-58-27-...json`, ~5.1MB) available locally on request.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 30s LLM connect timeout aborts OpenAI reasoning streams (post-#729 residual) #755

What happened?

Which area seems affected?

How much does this affect you?

Steps to reproduce

What did you expect to happen?

PawWork version

OS version

Can you reproduce it again?

Diagnostics

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] 30s LLM connect timeout aborts OpenAI reasoning streams (post-#729 residual) #755

Description

What happened?

Which area seems affected?

How much does this affect you?

Steps to reproduce

What did you expect to happen?

PawWork version

OS version

Can you reproduce it again?

Diagnostics

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions