Skip to content

fix: defer LLM stream connect timeout to after HTTP request is sent#729

Merged
Astro-Han merged 1 commit into
Astro-Han:devfrom
Spongeacer:fix/llm-stream-connect-timeout
May 18, 2026
Merged

fix: defer LLM stream connect timeout to after HTTP request is sent#729
Astro-Han merged 1 commit into
Astro-Han:devfrom
Spongeacer:fix/llm-stream-connect-timeout

Conversation

@Spongeacer

@Spongeacer Spongeacer commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

Defer the LLM stream connect timeout (arm() call) until after run() returns, so the 30-second window measures only network/provider response time, not internal async setup (provider lookup, config, plugin hooks).

Why

The connect timer was started during request-object construction, before the HTTP request was actually sent. run()'s async setup work (provider lookup, config load, experimental.chat.system.transform / chat.params / chat.headers plugin hooks) ate into the 30s window, so users on slower providers (Kimi, Xiaomi token plan, other domestic-China endpoints) saw "timed out after 30000ms" before they had actually waited 30 seconds. Issue #728 documents 12 occurrences in a single user's database.

Fix moves arm() into a new startTimeout() method on the request object and calls it after run() returns, right before stream consumption.

Related Issue

Fixes #728.

Human Review Status

Pending.

Review Focus

  • packages/opencode/src/session/llm.ts: the arm() call site move. Confirm nothing else in the request lifecycle depends on the timer being armed during construction.
  • resetTimeout() and timeoutFailure semantics are unchanged.

Risk Notes

  • Behavior change is scoped to the connect timer. The "silent stream" timeout (between events after the first one) is untouched.
  • If run() itself never returns (e.g., provider lookup hangs), the connect timer never starts. This is the same failure mode as before — run() hangs were never covered by the connect timer. No new regression surface.
  • Issue LLM stream connect timeout fires prematurely — arm() starts before HTTP request #728 also lists two follow-ups not addressed here: (1) SessionRetry.policy does not treat connect timeouts as retryable, (2) connectTimeoutMs is not configurable end-to-end. Both intentionally left for separate PRs.

How To Verify

typecheck (packages/opencode):                 ok
bun test packages/opencode/test/session/llm:   20 pass, 0 fail

Screenshots or Recordings

Not applicable (no UI change).

Checklist

  • Human review status is stated above as pending
  • I linked the related issue (LLM stream connect timeout fires prematurely — arm() starts before HTTP request #728)
  • Labels: maintainer to apply (bug, platform, priority)
  • I described the review focus and any meaningful risks
  • I listed the relevant verification steps and the key result for each
  • I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope
  • Not a UI change
  • No macOS/Windows platform impact
  • Targeting dev, Conventional Commits in English

Note to @Spongeacer: The branch was rewritten by a maintainer to narrow scope to just the llm.ts fix (the original PR also pulled in unrelated audit/dead-code changes that broke typecheck via a duplicate function in tool-info.ts). The other changes can be reproposed individually if you still want them in. To resync your local branch:

```
git fetch origin
git checkout fix/llm-stream-connect-timeout
git reset --hard origin/fix/llm-stream-connect-timeout
```

@github-actions github-actions Bot added app Application behavior and product flows ui Design system and user interface harness Model harness, prompts, tool descriptions, and session mechanics P2 Medium priority labels May 18, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested priority: P2 (includes user-path files (packages/app/src/components/terminal.tsx, packages/app/src/pages/layout/sidebar-workspace.tsx)).

P1/P0 are reserved for maintainer confirmation. Please relabel manually if this is a release blocker, security issue, data-loss risk, or updater/runtime failure.

@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR delivers security validations and robustness improvements across the CLI, LLM session, and data handling layers. Terminal font loading is now fault-tolerant, CLI commands validate user inputs (SQL read-only enforcement, URL protocols, config path traversal), LLM stream timeout is fixed to measure only network time (not setup), and data parsing now guards against invalid parameters.

Changes

Validation and resilience improvements across CLI, session, and UI

Layer / File(s) Summary
UI resilience: font loading and tool-info
packages/app/src/components/terminal.tsx, packages/app/src/pages/layout/sidebar-workspace.tsx, packages/ui/src/components/tool-info.ts
Terminal font-ready promise now catches failures, preventing unhandled rejections; sidebar type closing adjusted; tool-info helper functions reformatted for consistency.
CLI input validation: SQL, URL, and path security
packages/opencode/src/cli/cmd/db.ts, packages/opencode/src/cli/cmd/providers.ts, packages/opencode/src/config/paths.ts
db command rejects non-SELECT/PRAGMA/WITH queries; providers login rejects non-HTTP(S) URLs; config path substitution guards against directory-escape references in {file:...} tokens.
LLM stream timeout fix: arm after HTTP request sent
packages/opencode/src/session/llm.ts
Stream request timeout timer (startTimeout()) is invoked after run() returns, ensuring the 30-second connection timeout measures only network/provider waiting time instead of including async setup work. Fixes issue #728.
Data handling robustness: type safety and parameter parsing
packages/opencode/src/provider/transform.ts, packages/opencode/src/session/prompt.ts
System message sanitization removes unsafe any[] cast; file-range URL parameter parsing adds radix-10 parsing and NaN guards to prevent invalid offset/limit propagation.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • Astro-Han/pawwork#558: Modifies packages/opencode/src/session/llm.ts to adjust LLM streaming timeout behavior similarly.
  • Astro-Han/pawwork#646: Also modifies packages/app/src/components/terminal.tsx to handle font-loading promise failures and improve Enter key handling.

Suggested labels

bug, P2, app

Poem

🐰 A timeout that came far too soon,
Now counts from the network's true tune.
Paths are checked, URLs validated bright,
And promises caught when they take flight.
Robustness blooms in validation's light!

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning Most changes align with the stated scope (timeout fix in llm.ts and related changes). However, multiple unrelated changes appear out-of-scope: security boundary checks (paths.ts, cmd/db.ts, cmd/providers.ts), memory/type fixes (transform.ts), query parsing fixes (prompt.ts), and formatting adjustments (tool-info.ts, sidebar-workspace.tsx, terminal.tsx). Isolate the timeout fix (llm.ts + related minimal changes) into this PR. Move unrelated security checks, fixes, and formatting changes to separate PRs focused on their specific objectives.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and accurately summarizes the main change: deferring the LLM stream connect timeout until after the HTTP request is sent.
Linked Issues check ✅ Passed The PR successfully addresses the primary coding requirement from issue #728: moving the timeout arming from request initialization to after the HTTP request is sent via the new startTimeout() method.
Description check ✅ Passed PR description comprehensively covers all required template sections with clear, specific details about the timeout fix, its rationale, verification steps, and risk analysis.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/opencode/src/cli/cmd/db.ts`:
- Around line 31-35: The current validation in db.ts allowing queries that start
with "with" is unsafe because SQLite permits CTEs before INSERT/UPDATE/DELETE;
update the check in the query validation (the normalized/query logic) to either
(A) remove "with" from the allowlist so only "select" and "pragma" are accepted,
or (B) implement deeper inspection: parse the normalized string after the CTE
block to locate the first real statement token and verify it is "select" or
"pragma" before allowing execution; update the UI.error message and keep the
existing process.exit(1) behavior on rejection.

In `@packages/opencode/src/cli/cmd/providers.ts`:
- Around line 300-303: The code path that validates the URL (the block checking
/^https?:\/\//i against url) logs an error and returns, which leaves the process
exit code as 0; change this to set a non-zero exit status before returning (for
example call process.exit(1) or throw an error) so failures are visible to
automation—update the branch that calls prompts.log.error("URL must use http or
https protocol") and prompts.outro("Done") to exit with a non-zero code (e.g.,
process.exit(1)) instead of just returning.

In `@packages/opencode/src/config/paths.ts`:
- Around line 122-125: The current lexical check using
path.resolve/path.relative can be bypassed via symlinks; update the validation
in the code that computes resolvedPath and relativeToConfig so both the config
root and the target file are canonicalized with fs.realpath (or
fs.promises.realpath) before comparing. Specifically, call realpath on configDir
(or configSource root) and on the resolvedPath, then use path.relative between
those real paths and throw the same JsonError (referencing JsonError and token)
if the real-relative path escapes; also ensure subsequent reading
(Filesystem.readText) uses the realpath to avoid following a symlink that points
outside. Handle realpath errors and convert them into the existing JsonError
flow.

In `@packages/ui/src/components/tool-info.ts`:
- Around line 42-58: The file declares duplicate top-level functions
enterWorktreeOwnerProject and enterWorktreeTarget (causing TS compilation
failure); remove the redundant implementations so only one definition of each
remains (keep the original/desired implementation and delete the duplicates you
found at lines 42–58), and ensure any exports or references still point to the
retained functions (enterWorktreeOwnerProject, enterWorktreeTarget) and that
helper utilities pickString/getFilename are available where used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 91e5eab7-48e1-4889-8d0d-292139bed5cb

📥 Commits

Reviewing files that changed from the base of the PR and between 0e64c23 and 92a20bc.

📒 Files selected for processing (9)
  • packages/app/src/components/terminal.tsx
  • packages/app/src/pages/layout/sidebar-workspace.tsx
  • packages/opencode/src/cli/cmd/db.ts
  • packages/opencode/src/cli/cmd/providers.ts
  • packages/opencode/src/config/paths.ts
  • packages/opencode/src/provider/transform.ts
  • packages/opencode/src/session/llm.ts
  • packages/opencode/src/session/prompt.ts
  • packages/ui/src/components/tool-info.ts

Comment thread packages/opencode/src/cli/cmd/db.ts Outdated
Comment thread packages/opencode/src/cli/cmd/providers.ts Outdated
Comment thread packages/opencode/src/config/paths.ts Outdated
Comment thread packages/ui/src/components/tool-info.ts Outdated

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several security and stability enhancements, including read-only query enforcement for the database CLI, URL protocol validation, and path traversal protection for configuration file references. It also refactors LLM request timeouts to measure only network wait time, improves terminal font loading resilience, and adds robust parsing for prompt range parameters. A critical issue was identified in the UI components where duplicate function definitions were introduced, which would cause redeclaration errors.

Comment thread packages/ui/src/components/tool-info.ts Outdated
The connect timeout (30s) was started during request object creation
(arm() call), but the HTTP request wasn't sent until after run()
completed its async setup work (provider lookup, config, plugin hooks).
This caused premature timeouts — users saw "timed out after 30000ms"
before actually waiting 30 seconds for the provider.

Move arm() into a new startTimeout() method on the request object,
called after run() returns. The timeout now only measures actual
network/provider wait time.

Fixes Astro-Han#728
@Astro-Han Astro-Han force-pushed the fix/llm-stream-connect-timeout branch from 92a20bc to bac1267 Compare May 18, 2026 07:33
@github-actions github-actions Bot removed app Application behavior and product flows ui Design system and user interface labels May 18, 2026
@Astro-Han Astro-Han added bug Something isn't working P1 High priority and removed P2 Medium priority labels May 18, 2026
@Astro-Han Astro-Han merged commit 6102419 into Astro-Han:dev May 18, 2026
26 of 30 checks passed
Astro-Han added a commit that referenced this pull request May 19, 2026
…758)

## Summary

Add `ProviderTransform.streamTimeouts(model)` returning `{ connectTimeoutMs: 120_000 }` for reasoning-capable models (gated on `model.capabilities.reasoning`), and apply it at the two production `llm.stream()` call sites — `session/processor.ts` (main response) and `session/prompt.ts` (title generation) — with helper-first spread order so any caller-provided `StreamInput.connectTimeoutMs` still wins. The default `CONNECT_STREAM_TIMEOUT_MS` is untouched; non-reasoning models keep the 30s ceiling.

## Why

The 30-second first-progress watchdog in `session/llm.ts` aborts reasoning-model streams whose first observable provider event arrives later than the ceiling. The incident recorded in #755 shows OpenAI `gpt-5.5` on a long session spending 30204ms with only `start: 1` and zero content events before the watchdog aborted. PR #729 fixed an earlier residual where the connect timer armed before the HTTP request was actually sent; the residual addressed here is the 30s ceiling itself, which #729's body explicitly deferred. The capability gate keys off `model.capabilities.reasoning` after verifying the current models.json catalog has correct `reasoning=true` labels on the gpt-5 family (22/23, only `gpt-5.3-chat-latest` excluded), all o-series, Claude haiku/sonnet/opus 4.x thinking variants, and the Gemini 2.5+/3.1 series — so no provider-id allowlist fallback is needed at this time.

## Related Issue

Refs #755 (short-term path; the full deferred set — `SessionRetry.policy.retryable()` not classifying local timeouts, watchdog architectural rewrite, and the parallel mid-stream `terminated` failure mode — is documented in the issue body and left to follow-up PRs).

## Human Review Status

Pending

## Review Focus

- Helper-first spread order at `session/processor.ts:954` and `session/prompt.ts:465`. `processor.ts` receives `streamInput` from `process()` callers (`session/prompt.ts:2020`, `session/compaction.ts:480`); neither sets `connectTimeoutMs` today, so the spread defaults to the helper value. The order is helper-first specifically so that a future caller wishing to override does not need code changes here.
- Capability-based gate vs model-id-pattern matching. Catalog cross-checked at PR-prep time and labels are reliable across the four target families; a second pair of eyes on whether `model.capabilities.reasoning === true` is the right axis is welcome.
- The new contract test floor at `>= 90_000` in `transform.test.ts` codifies the policy direction (the lowest value considered for reasoning models during the issue discussion). Worth questioning whether a corresponding upper bound is also needed.
- Title-generation path applies the helper to `mdl` from `provider.getSmallModel(input.providerID)` or the title agent's explicit override. The typical small model is non-reasoning so the helper returns `{}`; only when a user explicitly configures a reasoning small variant for the title agent does the 120s apply, and the failure mode is silent (caught at `prompt.ts:482-488`, logs and falls back to the default title). The deliberate choice was to avoid a per-mode branch inside the helper.

## Risk Notes

- After the 120s ceiling is reached, a first-progress timeout still surfaces as a hard `UnknownError` because `SessionRetry.policy.retryable()` does not type-tag local timeouts and does not match the bare error message. The user-experience improvement here is the lower hit rate; no automatic retry is added in this PR. Retry classification is deferred until the parallel mid-stream `terminated` failure mode is analyzed, so retry can be designed once against both failure shapes rather than speculatively per-shape.
- Explicit `connectTimeoutMs: undefined` from a caller would clobber the helper value during spread and fall through to the 30s default via `llm.ts:434-439`. No production path constructs `streamInput` this way today, but a future caller with conditional override should pass a positive number or omit the field rather than set it to `undefined`.
- Behavior change is strictly scoped to reasoning-capable models. Non-reasoning models keep the 30s default.
- No visible UI or copy changed; the visible-UI conditional checkbox is left unticked for that reason.
- No platform / packaging / updater / signing / paths / shell / permissions surface was touched; the macOS/Windows conditional checkbox is left unticked for that reason.
- No docs / release notes / dependencies / permissions / credentials / deletion behavior / generated content / local file changes; the related conditional checkbox is left unticked for that reason.

## How To Verify

```text
typecheck (bun --cwd packages/opencode run typecheck):                       ok
bun test packages/opencode/test/provider/transform.test.ts (streamTimeouts):  3 pass / 0 fail
bun test packages/opencode/test/session/ packages/opencode/test/provider/:    940 pass / 4 skip / 1 todo / 0 fail
internal cross-review (Claude Opus + Codex high, parallel):                   0 P0 / 0 P1 / 1 P3 both reviewers flagged (policy floor too loose) fixed by tightening test to >= 90_000
```

## Screenshots or Recordings

Not applicable (no UI change).

## Checklist

- [ ] **Type label** — this PR carries exactly one of `bug`, `enhancement`, `task`, `documentation`. Type labels are author-added; the labeler bot does NOT assign them. Add the label in the GitHub UI, then tick this.
- [ ] **Routing labels** — this PR carries at least one of `app`, `ui`, `platform`, `harness`, `ci`. The labeler bot assigns these on PR open based on changed paths. Confirm the bot's choice (or override if wrong), then tick this.
- [ ] **Priority label** — this PR carries exactly one of `P0`, `P1`, `P2`, `P3`. The priority-triage bot suggests one on PR open. Confirm or override, then tick this.
- [x] Human Review Status above is set to `Pending`, `Approved by @<reviewer>`, or `Not required: <reason>` (default is `Pending`; "not required" is restricted to bot-authored low-risk PRs).
- [x] I linked the related issue, or stated in Summary why there is no issue.
- [x] I described the review focus and any meaningful risks.
- [x] I replaced the example block in How To Verify with the real verification steps and the key result for each.
- [x] I did not introduce unrelated refactors, dependencies, generated files, or file changes beyond the stated scope.
- [ ] **(conditional)** I manually checked visible UI or copy changes when needed, with screenshots or recordings. Leave unticked only if no visible UI or copy changed.
- [ ] **(conditional)** I considered macOS and Windows impact for platform, packaging, updater, signing, paths, shell, or permissions changes. Leave unticked only if no platform/packaging surface was touched.
- [ ] **(conditional)** I called out docs, release notes, dependencies, permissions, credentials, deletion behavior, generated content, or local file changes when relevant. Leave unticked only if none of those surfaces was touched.
- [x] I reviewed the final diff for unrelated changes and suspicious dependency changes.
- [x] I am targeting `dev`, and my PR title and commit messages use Conventional Commits in English.


<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

## Summary by CodeRabbit

* **New Features**
  * Added intelligent timeout management for AI model streaming with extended timeouts for reasoning-capable models, improving stability during complex inference tasks.
  * Enhanced title generation streams with optimized timeout configuration for better performance on reasoning models.

* **Tests**
  * Added test coverage for new timeout management functionality, validating behavior across different model types.

<!-- review_stack_entry_start -->

[![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/Astro-Han/pawwork/pull/758?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working harness Model harness, prompts, tool descriptions, and session mechanics P1 High priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM stream connect timeout fires prematurely — arm() starts before HTTP request

2 participants