feat(cli): add API preconnect to reduce first-call latency#3318
Conversation
📋 Review SummaryThis PR implements an API preconnect feature that fires a fire-and-forget HEAD request during startup to warm TCP+TLS connections, potentially reducing first API call latency by 100-200ms. The implementation is well-structured with comprehensive test coverage (18 unit tests) and thoughtful skip conditions for proxy/enterprise environments. Overall, this is a solid optimization with minimal risk to the main code flow. 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
479f597 to
98c08c3
Compare
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.4 via Qwen Code /review
Fire a fire-and-forget HEAD request early in startup to warm the TCP+TLS connection. Subsequent actual API calls reuse the connection, saving 100-200ms on the first request. - Skip when proxy env vars (HTTPS_PROXY etc.) are set - Skip when NODE_EXTRA_CA_CERTS is set (enterprise TLS inspection) - Skip in sandbox mode (process restarts, preconnect is ineffective) - Skip when a non-default baseUrl is configured (mTLS / private deployment) - Disable via QWEN_CODE_DISABLE_PRECONNECT=1 - Uses AbortSignal.timeout(5s) to bound resource lifetime - Add benchmark-api-latency.sh to measure TCP+TLS handshake improvement Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Fix isDefaultBaseUrl to use exact/path-prefix matching after stripping protocol, preventing subdomain-spoofed URLs (e.g. api.openai.com.evil.com) from being mistaken for default URLs - Add debug log to catch block in gemini.tsx for observability when authType retrieval fails - Add @internal JSDoc tag to resetPreconnectState to signal testing-only API - Add test case verifying subdomain-spoofed URLs are correctly rejected Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Derive preconnect target from modelsConfig.getGenerationConfig().baseUrl instead of raw settings, respecting the full resolution priority chain (modelProviders > cli > env > settings) - Cache undici dispatcher via getOrCreateSharedDispatcher() so preconnect and SDK clients share the same connection pool, enabling actual TCP+TLS connection reuse - Remove redundant ENV_BASE_URL_MAP / getEnvBaseUrlForAuthType since the caller already resolves env vars through the model config resolver Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Pass Config.getProxy() into preconnect dispatcher creation and add tests that verify proxy propagation to the shared dispatcher. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Proxy is now handled via config.getProxy() passed as options.proxy, so the env-var proxy check is redundant. Also add runtime detection guard to skip preconnect on non-Node runtimes (e.g. Bun) where undici's connection pool is not shared. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
85aed40 to
7317a4b
Compare
The runtime guard at line 151 already ensures only Node.js reaches the dispatcher logic, so the second detectRuntime() ternary was dead code. Simplified to directly call getOrCreateSharedDispatcher(). Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…-alive - Remove gemini and vertex-ai from preconnect targets since their SDK uses bare fetch() with a separate connection pool, not the shared undici dispatcher - Set keepAliveTimeout to 60s on the shared dispatcher so warmed connections survive the typical startup-to-first-prompt gap in interactive sessions (undici defaults to 4s) - Fix misleading comment in gemini.tsx Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
[Critical] packages/core/src/services/loopDetectionService.ts: checkReadFileLoop() now triggers READ_FILE_LOOP whenever 8 read-like tools appear within the last 15 calls after any non-read tool has occurred, even if productive non-read calls are interleaved. This can abort normal investigation/refactor flows that are still making progress. Suggested fix: reset the read-loop window/counter on any non-read tool, or require a consecutive read-like suffix instead of counting reads anywhere in the sliding window.
[Critical] packages/core/src/tools/swarm.ts: swarm workers are instructed not to spawn sub-agents, but the default worker tool config still exposes recursive orchestration tools because only ask_user_question is disallowed and the default allowlist is ['*']. A worker can recursively launch agent/swarm, defeating bounded concurrency and causing runaway tool/token usage or hangs. Suggested fix: add recursive orchestration tools like agent and swarm to DEFAULT_WORKER_DISALLOWED_TOOLS, or replace the wildcard with an explicit non-recursive allowlist.
— gpt-5.4 via Qwen Code /review
There was a problem hiding this comment.
@wenshao Thanks for the review! Regarding the two [Critical] comments:
-
loopDetectionService.ts—checkReadFileLoop()false positives: This file is not part of this PR's changeset. The concern is valid and worth tracking separately — filed or will file as a standalone issue. -
swarm.ts— recursive orchestration tools not blocked: Also not part of this PR. The observation thatagent/swarmshould be inDEFAULT_WORKER_DISALLOWED_TOOLSis reasonable and should be addressed in a dedicated PR.
This PR only touches API preconnect (apiPreconnect.ts, runtimeFetchOptions.ts, gemini.tsx). The other inline suggestions (proxy skip, Bun guard, gemini/vertex removal, keepAliveTimeout) have been addressed in the latest commits.
|
Re: the two [Critical] items in the review — Both |
Keep main's --bare early-parse logic; the PR's loadSettings() and cleanupCheckpoints() calls already exist downstream after bare-mode detection. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
The mock factory was using `await import('@qwen-code/qwen-code-core')`
to get `createDebugLogger`, which recursively loaded the full core
entrypoint and failed on `undici` resolution. Replace with a simple
stub created via `vi.hoisted()`.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
Additional findings
[Critical] packages/cli/src/config/config.ts:903-905,1089-1091
In bare mode we now drop CLI-provided --core-tools entirely: resolvedCoreTools excludes argv.coreTools, and later Config gets coreTools: undefined. That breaks the bare-mode contract of honoring explicit CLI inputs, so commands like qwen --bare --core-tools read_file,write_file no longer preserve the requested tool allowlist.
Suggested fix: keep argv.coreTools in bare mode and only suppress settings-derived settings.tools.core values.
[Suggestion] packages/vscode-ide-companion/src/webview/providers/WebViewProvider.ts:145-176
This slash-command notification bridge forwards every event straight into the active webview without checking event.sessionId. That means /insight progress or ready events from a background or previous session can leak into the conversation the user is currently viewing.
Suggested fix: ignore slash-command notifications whose sessionId does not match the active session bound to the webview before sending insightProgress, insightReportReady, or fallback streamChunk updates.
— gpt-5.4 via Qwen Code /review
Move `preconnectFired = true` past the targetUrl resolution check. Permanent skip conditions (disabled, sandbox, custom CA, non-Node runtime) still set the flag immediately, but when the target URL cannot be resolved (e.g. auth/baseUrl not yet available), the flag stays unset so a later call can retry once the config is ready. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Remove `resetDispatcherCache` from core public exports (test-only internal) - Wrap dispatcher creation and fetch in try-catch to guard against sync throws - Add test for synchronous dispatcher error handling Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
… script - Add `https://dashscope.aliyuncs.com` to DEFAULT_BASE_URLS so users on the DashScope openai-compatible API (dashscope.aliyuncs.com/compatible-mode/v1) benefit from preconnect instead of being silently skipped - Add 2 tests: dashscope compatible-mode URL triggers preconnect, spoofed subdomain is rejected - Replace benchmark-api-latency.sh with benchmark-api-latency.mjs: uses undici directly (same library as preconnect) for accurate in-process connection reuse measurement, covers all 3 endpoints, adds per-request timeout to avoid hanging on dashscope HEAD requests 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
The benchmark-api-latency.mjs script uses Node 18+ built-ins (fetch, AbortSignal, setTimeout) that were not covered by the existing scripts pattern (*.js only). Extend the pattern to *.mjs and add globals.browser so ESLint recognises these globals. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
|
@tanzhenxin @pomelo-nwu Could you help review this pr ? this is one sub issue of improve startup time. I have already tested by myself, can see screenshots for details. |
… endpoints Import ALIBABA_STANDARD_API_KEY_ENDPOINTS so isDefaultBaseUrl() accepts dashscope-intl (sg), dashscope-us (us-virginia), and cn-hongkong in addition to the mainland cn-beijing endpoint. Also fix the test mock factory to avoid importing the real core package. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
DYNAMIC_QWEN_OAUTH_BASE_URL sentinel is truthy but not a real URL, causing getPreconnectTargetUrl() to skip the authType fallback and silently disable preconnect for default Qwen OAuth path. Add URL format validation so non-http sentinels fall through to the default URL lookup. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review
|
Thanks @doudouOUC — merged. Two minor follow-ups (non-blocking, leaving here for visibility): 1. The current comment in 2.
Neither is urgent; happy to take a small follow-up PR when convenient. |
Fire a fire-and-forget HEAD request early in startup to warm the TCP+TLS connection. Subsequent SDK calls share an undici dispatcher with preconnect, reusing the warmed connection to save 100-200ms on the first request. Skip conditions: - NODE_EXTRA_CA_CERTS set (enterprise TLS inspection) - Sandbox mode (process-restart context) - Non-default baseUrl (mTLS / private deployment) - Non-Node runtimes (Bun) Disable via QWEN_CODE_DISABLE_PRECONNECT=1. Closes QwenLM#3223
Closes #3223
Summary
Fire a fire-and-forget HEAD request early in startup to warm the TCP+TLS connection. Subsequent actual API calls reuse this connection, saving 100-200ms on the first request.
Smart skip conditions:
HTTPS_PROXY,https_proxy,HTTP_PROXY,http_proxy)NODE_EXTRA_CA_CERTSset (enterprise TLS inspection)baseUrlconfigured (mTLS / private deployment)Controls:
QWEN_CODE_DISABLE_PRECONNECT=1AbortSignal.timeout(5s)to bound resource lifetimeChanges in This PR
https://dashscope.aliyuncs.comto the preconnect whitelist so users on the DashScope openai-compatible API (dashscope.aliyuncs.com/compatible-mode/v1) benefit from preconnect instead of being silently skippedbenchmark-api-latency.shwithbenchmark-api-latency.mjs: uses undici directly (same library as preconnect) for accurate in-process connection reuse measurement, covers all 3 default endpoints, adds per-request timeout to avoid hanging on dashscope HEAD requestsBenchmark Results
Run with
node scripts/benchmark-api-latency.mjs:Test Plan
benchmark-api-latency.mjsfor accurate in-process measurementManual Verification
场景 1:正常启动时预连接触发
预期日志:
场景 2:
NODE_EXTRA_CA_CERTS设置时跳过预期日志:
场景 3:
QWEN_CODE_DISABLE_PRECONNECT=1手动禁用预期日志:
场景 4:Sandbox 模式时跳过
预期日志:
场景 5:性能基准测试
🤖 Generated with Qwen Code