Skip to content

fix: widen LLM connect timeout for reasoning models#758

Merged
Astro-Han merged 3 commits into
devfrom
claude/i755-reasoning-connect-timeout
May 19, 2026
Merged

fix: widen LLM connect timeout for reasoning models#758
Astro-Han merged 3 commits into
devfrom
claude/i755-reasoning-connect-timeout

Conversation

@Astro-Han

@Astro-Han Astro-Han commented May 19, 2026

Copy link
Copy Markdown
Owner

Summary

Add ProviderTransform.streamTimeouts(model) returning { connectTimeoutMs: 120_000 } for reasoning-capable models (gated on model.capabilities.reasoning), and apply it at the two production llm.stream() call sites — session/processor.ts (main response) and session/prompt.ts (title generation) — with helper-first spread order so any caller-provided StreamInput.connectTimeoutMs still wins. The default CONNECT_STREAM_TIMEOUT_MS is untouched; non-reasoning models keep the 30s ceiling.

Why

The 30-second first-progress watchdog in session/llm.ts aborts reasoning-model streams whose first observable provider event arrives later than the ceiling. The incident recorded in #755 shows OpenAI gpt-5.5 on a long session spending 30204ms with only start: 1 and zero content events before the watchdog aborted. PR #729 fixed an earlier residual where the connect timer armed before the HTTP request was actually sent; the residual addressed here is the 30s ceiling itself, which #729's body explicitly deferred. The capability gate keys off model.capabilities.reasoning after verifying the current models.json catalog has correct reasoning=true labels on the gpt-5 family (22/23, only gpt-5.3-chat-latest excluded), all o-series, Claude haiku/sonnet/opus 4.x thinking variants, and the Gemini 2.5+/3.1 series — so no provider-id allowlist fallback is needed at this time.

Related Issue

Refs #755 (short-term path; the full deferred set — SessionRetry.policy.retryable() not classifying local timeouts, watchdog architectural rewrite, and the parallel mid-stream terminated failure mode — is documented in the issue body and left to follow-up PRs).

Human Review Status

Pending

Review Focus

  • Helper-first spread order at session/processor.ts:954 and session/prompt.ts:465. processor.ts receives streamInput from process() callers (session/prompt.ts:2020, session/compaction.ts:480); neither sets connectTimeoutMs today, so the spread defaults to the helper value. The order is helper-first specifically so that a future caller wishing to override does not need code changes here.
  • Capability-based gate vs model-id-pattern matching. Catalog cross-checked at PR-prep time and labels are reliable across the four target families; a second pair of eyes on whether model.capabilities.reasoning === true is the right axis is welcome.
  • The new contract test floor at >= 90_000 in transform.test.ts codifies the policy direction (the lowest value considered for reasoning models during the issue discussion). Worth questioning whether a corresponding upper bound is also needed.
  • Title-generation path applies the helper to mdl from provider.getSmallModel(input.providerID) or the title agent's explicit override. The typical small model is non-reasoning so the helper returns {}; only when a user explicitly configures a reasoning small variant for the title agent does the 120s apply, and the failure mode is silent (caught at prompt.ts:482-488, logs and falls back to the default title). The deliberate choice was to avoid a per-mode branch inside the helper.

Risk Notes

  • After the 120s ceiling is reached, a first-progress timeout still surfaces as a hard UnknownError because SessionRetry.policy.retryable() does not type-tag local timeouts and does not match the bare error message. The user-experience improvement here is the lower hit rate; no automatic retry is added in this PR. Retry classification is deferred until the parallel mid-stream terminated failure mode is analyzed, so retry can be designed once against both failure shapes rather than speculatively per-shape.
  • Explicit connectTimeoutMs: undefined from a caller would clobber the helper value during spread and fall through to the 30s default via llm.ts:434-439. No production path constructs streamInput this way today, but a future caller with conditional override should pass a positive number or omit the field rather than set it to undefined.
  • Behavior change is strictly scoped to reasoning-capable models. Non-reasoning models keep the 30s default.
  • No visible UI or copy changed; the visible-UI conditional checkbox is left unticked for that reason.
  • No platform / packaging / updater / signing / paths / shell / permissions surface was touched; the macOS/Windows conditional checkbox is left unticked for that reason.
  • No docs / release notes / dependencies / permissions / credentials / deletion behavior / generated content / local file changes; the related conditional checkbox is left unticked for that reason.

How To Verify

typecheck (bun --cwd packages/opencode run typecheck):                       ok
bun test packages/opencode/test/provider/transform.test.ts (streamTimeouts):  3 pass / 0 fail
bun test packages/opencode/test/session/ packages/opencode/test/provider/:    940 pass / 4 skip / 1 todo / 0 fail
internal cross-review (Claude Opus + Codex high, parallel):                   0 P0 / 0 P1 / 1 P3 both reviewers flagged (policy floor too loose) fixed by tightening test to >= 90_000

Screenshots or Recordings

Not applicable (no UI change).

Checklist

  • Type label — this PR carries exactly one of bug, enhancement, task, documentation. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
  • Routing labels — this PR carries at least one of app, ui, platform, harness, ci. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
  • Priority label — this PR carries exactly one of P0, P1, P2, P3. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
  • Human Review Status above is set to Pending, Approved by @<reviewer>, or Not required: <reason> (default is Pending; "not required" is restricted to bot-authored low-risk PRs).
  • I linked the related issue, or stated in Summary why there is no issue.
  • I described the review focus and any meaningful risks.
  • I replaced the example block in How To Verify with the real verification steps and the key result for each.
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope.
  • (conditional) I manually checked visible UI or copy changes when needed, with screenshots or recordings. Leave unticked only if no visible UI or copy changed.
  • (conditional) I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes. Leave unticked only if no platform/packaging surface was touched.
  • (conditional) I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant. Leave unticked only if none of those surfaces was touched.
  • I reviewed the final diff for unrelated changes and suspicious dependency changes.
  • I am targeting dev, and my PR title and commit messages use Conventional Commits in English.

Summary by CodeRabbit

  • New Features

    • Added intelligent timeout management for AI model streaming with extended timeouts for reasoning-capable models, improving stability during complex inference tasks.
    • Enhanced title generation streams with optimized timeout configuration for better performance on reasoning models.
  • Tests

    • Added test coverage for new timeout management functionality, validating behavior across different model types.

Review Change Stack

Astro-Han added 2 commits May 19, 2026 15:59
The 30s first-progress watchdog in session/llm.ts aborts reasoning-model
streams whose first observable provider event arrives later than the
ceiling. This is reproducible with OpenAI gpt-5.5 on long sessions and
was missed by #729 (which only fixed the timer-start moment).

Inject a 120s connect timeout via a new ProviderTransform.streamTimeouts
helper, gated on model.capabilities.reasoning. Apply it at the two
production llm.stream() call sites (processor main response + prompt
title generation) with helper-first spread order so any caller-provided
StreamInput.connectTimeoutMs still wins.

Three contract tests in transform.test.ts:
- policy floor: helper output exceeds CONNECT_STREAM_TIMEOUT_MS
- routing: reasoning emits override, non-reasoning emits empty
- caller override precedence: explicit StreamInput value wins

Out of scope, tracked separately on the issue:
- SessionRetry.policy.retryable() does not classify local timeouts
- watchdog architecture rewrite (typed errors, wall-clock budget)
- mid-stream "terminated" errors (separate incident, separate PR)

Refs #755
Crosscheck flagged the original >30s assertion as too loose — a
regression that dropped the helper value to 31s would still pass.
Add a >=90_000 lower bound; 90s is the lowest ceiling considered for
reasoning models in #755 discussion, so this floor codifies the
policy direction without pinning the chosen 120s constant.

Refs #755
@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

Warning

Rate limit exceeded

@Astro-Han has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 44 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 07fe7edd-91c9-4081-ac61-72b4c563d13e

📥 Commits

Reviewing files that changed from the base of the PR and between d44b7ed and a8268e7.

📒 Files selected for processing (1)
  • packages/opencode/src/provider/transform.ts
📝 Walkthrough

Walkthrough

This PR adds a provider-aware LLM stream connection timeout transformer. A new streamTimeouts function in provider/transform.ts returns a 120-second timeout for reasoning-capable models and is injected into the main processor and title-generation LLM streams; comprehensive tests validate reasoning/non-reasoning behavior and caller override precedence.

Changes

Stream Connect Timeout Wiring

Layer / File(s) Summary
Stream timeout transformer definition
packages/opencode/src/provider/transform.ts
New streamTimeouts(model) function returns { connectTimeoutMs: 120000 } when model.capabilities.reasoning is enabled, otherwise {}. Exported via ProviderTransform namespace.
Processor and title generation stream integration
packages/opencode/src/session/processor.ts, packages/opencode/src/session/prompt.ts
ProviderTransform.streamTimeouts(model) is spread into llm.stream() options in both the main message stream and title-generation stream, injecting reasoning-aware timeout values into LLM calls.
Stream timeout transformer tests
packages/opencode/test/provider/transform.test.ts
New test block validates: reasoning models produce connectTimeoutMs ≥ 90,000 ms, non-reasoning models emit undefined, and explicit caller-provided connectTimeoutMs overrides the computed value.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related issues

Possibly related PRs

  • Astro-Han/pawwork#729: Both PRs manage LLM stream connect-timeout timing—this PR injects connectTimeoutMs via ProviderTransform.streamTimeouts, while the retrieved PR defers timeout arming in session/llm.ts to measure the window correctly.
  • Astro-Han/pawwork#558: Both PRs wire LLM stream connectTimeoutMs behavior—this PR introduces the transformer and injection, while the retrieved PR implements the timeout enforcement and failure handling in session/llm.ts.

Suggested labels

bug, P2

Poem

🐰 A timeout for thought, so swift and so true,
One hundred twenty seconds for models that brew,
Reasoning rockets need time to ascend,
While quick ones sail fast—no timeout to spend. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change—widening the LLM connect timeout for reasoning models—matching the core changeset across all four modified files.
Description check ✅ Passed The description is comprehensive and fully populated across all required template sections: Summary, Why, Related Issue, Human Review Status, Review Focus, Risk Notes, How To Verify, and a completed checklist with all applicable items checked.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/i755-reasoning-connect-timeout

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority labels May 19, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes non-doc, non-test paths outside the low-risk bucket).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@Astro-Han Astro-Han added the bug Something isn't working label May 19, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a specialized connection timeout for reasoning models. It adds a streamTimeouts utility in ProviderTransform that sets a 120-second timeout when reasoning capabilities are detected. This utility is integrated into the LLM streaming logic in both the session processor and prompt generation. Comprehensive tests were added to verify the timeout logic and ensure that manual overrides are respected. I have no feedback to provide.

GPT Pro pre-merge review noted the helper-spread convention is only
enforceable by code reading today. Add JSDoc so future readers see the
"spread at every call site" expectation at the helper definition. Not a
test addition — three internal/external reviewers agreed adding a heavy
integration test for a 2-call-site contract is overkill.

Refs #755
@Astro-Han Astro-Han merged commit 461b025 into dev May 19, 2026
25 checks passed
@Astro-Han Astro-Han deleted the claude/i755-reasoning-connect-timeout branch May 19, 2026 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant