feat(telemetry): Phase 4b — retry visibility for qwen-code.llm_request (#3731)#4432
Conversation
📋 Review SummaryThis PR (Phase 4b) adds per-attempt HTTP-status retry telemetry to make retry behavior visible in traces, logs, and metrics. The implementation uses AsyncLocalStorage to propagate retry state from 🔍 General FeedbackPositive aspects:
Architectural decisions:
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
Adds Phase 4b retry observability for LLM requests by propagating per-attempt retry state via AsyncLocalStorage and emitting a new api_retry telemetry event + metric, while also fixing a Phase 4a sampling_ms derivation bug.
Changes:
- Introduces
retryContext(ALS) and wiresretryWithBackoffto populate per-attempt retry metadata and optionally invoke anonRetrycallback. - Adds
ApiRetryEvent+logApiRetry(RUM + OTel log + counter) and wires the 4 LLM retry call sites to emit per-attempt retry telemetry withsubagent_name. - Fixes
sampling_mscalculation inendLLMRequestSpanto avoid double-subtracting setup time; updates unit tests accordingly.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/utils/retryContext.ts | Adds ALS container/type for per-attempt retry metadata propagation. |
| packages/core/src/utils/retry.ts | Adds onRetry callback, ALS propagation, and accumulators for retry delay/setup timing. |
| packages/core/src/utils/retry.test.ts | Adds unit coverage for ALS context propagation + onRetry behavior and monotonic attempt counting. |
| packages/core/src/telemetry/types.ts | Adds ApiRetryEvent and includes it in the union type. |
| packages/core/src/telemetry/session-tracing.ts | Updates LLM metadata docs and fixes sampling_ms derivation. |
| packages/core/src/telemetry/session-tracing.test.ts | Updates tests to match corrected sampling_ms formula. |
| packages/core/src/telemetry/qwen-logger/qwen-logger.ts | Adds RUM emission for API retry events. |
| packages/core/src/telemetry/metrics.ts | Defines and records new qwen-code.api.retry.count counter (tagged by model). |
| packages/core/src/telemetry/loggers.ts | Adds logApiRetry three-sink fan-out (RUM + OTel log + metric). |
| packages/core/src/telemetry/loggers.test.ts | Adds coverage for logApiRetry fan-out + subagent_name propagation + SDK-disabled behavior. |
| packages/core/src/telemetry/index.ts | Re-exports recordApiRetry. |
| packages/core/src/telemetry/constants.ts | Adds EVENT_API_RETRY constant. |
| packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts | Snapshots ALS retry metadata in sync prelude and threads it to all LLM span end sites (stream + non-stream). |
| packages/core/src/core/loggingContentGenerator/loggingContentGenerator.test.ts | Adds tests ensuring retry metadata propagates (and defaults) for stream/non-stream paths. |
| packages/core/src/core/geminiChat.ts | Wires retryWithBackoff.onRetry to emit ApiRetryEvent for Gemini streaming retries. |
| packages/core/src/core/client.ts | Wires retryWithBackoff.onRetry to emit ApiRetryEvent for a Gemini client LLM call site. |
| packages/core/src/core/baseLlmClient.ts | Wires retryWithBackoff.onRetry to emit ApiRetryEvent for two BaseLlmClient call sites. |
| docs/design/telemetry-llm-request-timing-design.md | Updates design doc D4 to reflect ALS-based propagation after implementation discovery. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
wenshao
left a comment
There was a problem hiding this comment.
Second opinion review (qwen3.7-max): no new issues found beyond the 9 inline comments from the previous review round. The AsyncLocalStorage propagation, streaming snapshot closure pattern, sampling_ms formula fix, and ALS isolation tests are all sound. CI green (24/24). — qwen3.7-max via Qwen Code /review
维护者本地真实测试验证报告 (PR #4432)在本地 worktree 中将 PR 合并到 latest 合并/编译
单元测试 (telemetry + retry + LCG,合计 20 个文件)逐文件覆盖(只列与本 PR 直接相关或被改动的):
PR 描述写的是 ~580,实测 587,且全绿。 端到端验证(真实 OTel SDK + outfile,无 mock)按 PR 描述照搬方法,自己写了一个一次性 vitest 文件 四个用例分别校验:(1) outfile 里 2 条 实测从 outfile 解析出来的关键数值
{
"qwen-code.model": "phase4b-test-model",
"gen_ai.request.model": "phase4b-test-model",
"duration_ms": 800,
"input_tokens": 100, "gen_ai.usage.input_tokens": 100,
"output_tokens": 20, "gen_ai.usage.output_tokens": 20,
"ttft_ms": 80, "gen_ai.server.time_to_first_token": 0.08,
"request_setup_ms": 129,
"attempt": 3,
"retry_total_delay_ms": 90.63,
"sampling_ms": 720,
"output_tokens_per_second": 27.78,
"success": true
}
Phase 4a
|
| 检查项 | 期望 | 实测 | ✓ |
|---|---|---|---|
| LogRecord 条数 | 2 | 2 | ✓ |
attempt_number 单调 |
1 → 2 | 1, 2 | ✓ |
status_code 透传 |
500 | 500 | ✓ |
LLM span attempt |
3 (1 success + 2 prior) | 3 | ✓ |
request_setup_ms ≥ retry_total_delay_ms (setup 包含 attempt 自身耗时) |
true | 129 ≥ 90.63 | ✓ |
sampling_ms = duration - ttft(Phase 4a 修复) |
720 | 720 | ✓ |
| 旧公式会算成 591 (bug) | ≠720 | 591 ≠ 720(已修复) | ✓ |
output_tokens_per_second |
27.78 | 27.78 | ✓ |
GenAI semconv time_to_first_token (秒) |
0.08 | 0.08 | ✓ |
Counter qwen-code.api.retry.count |
model 维度 = 2 | {model}: 2 |
✓ |
| ALS 隔离:attempt 1 setup=0, attempt 2 > 0, attempt 3 单调增 | as expected | 0 → 25ms → 80ms | ✓ |
一处与 PR 描述需对照的实际行为(并非 bug)
PR 提到 "api_retry LogToSpanProcessor 桥接 span 以 sibling 形式挂在 caller active span 下"。这是 OTLP 路径 的行为(useOtlp 分支才 new LogToSpanProcessor(...));本次 outfile 模式的验证里没有桥接 span 产出,只有 LogRecord 本身写入了 outfile,这与 sdk.ts:259-342 的代码完全一致 — outfile 模式下 LogRecord 直接走 FileLogExporter,不经过 LogToSpanProcessor。这是预期行为,PR 单测 log-to-span-processor.test.ts (32 tests) 已单独覆盖该桥接逻辑。如果未来希望 outfile 模式也产出桥接 span 以方便本地观察,可在 4c 或后续 PR 里单独考虑(超出本 PR 范围)。
复现命令
git fetch origin pull/4432/head:pr-4432-test && git fetch origin main
git worktree add -b pr-4432-merge-test /tmp/pr4432-merged origin/main
cd /tmp/pr4432-merged && git merge pr-4432-test --no-edit
npm ci && npm run build && npm run typecheck
# 单元测试(20 个文件,587 tests)
cd packages/core && npx vitest run --coverage=false \
src/telemetry/ src/utils/retry.test.ts \
src/core/loggingContentGenerator/loggingContentGenerator.test.ts
# E2E:把验证文件 phase4b-e2e-verify.test.ts 放到 packages/core/src/utils/ 后
npx vitest run --coverage=false src/utils/phase4b-e2e-verify.test.ts(验证用的 e2e 测试文件用完即弃,与 PR 描述一致;如有需要我可以贴完整源码到下方 thread。)
结论
- ✅ 与 latest
main合并干净,无冲突 - ✅ build / typecheck / lint 全绿
- ✅ 587 个单元测试全部通过
- ✅ 真实 OTel SDK 端到端验证:3 个 sink(LogRecord / Span / Counter)全部按预期写入 outfile,数值与 PR 描述一致
- ✅ Phase 4a
sampling_ms公式修复在真实场景下证实是必要的且生效 - 同意合并。
Fixes 6 of 9 inline review comments from wenshao + Copilot. The remaining 3 are pushback (duration_ms semantic = design intent per D5; persistent retry cap = explicitly deferred in PR description). 1. Fix JSDoc inaccuracy on `onRetry` contract (QwenLM#1+QwenLM#2): the comment incorrectly said "synchronous throws inside fn execute OUTSIDE the ALS frame." In fact fn() runs inside retryContext.run() so throws ARE inside the frame. What's outside the frame is the onRetry callback itself (it fires from the catch block). Rewritten per wenshao's suggestion: tells callers not to read retryContext.getStore() inside onRetry — all data comes via the RetryAttemptInfo parameter. 2. Add doc comment on content-retry delay inflation (QwenLM#3): retryTotalDelayMs accumulator includes content-retry delays (shouldRetryOnContent path) which don't fire onRetry. This is intentional — the LLM span attribute reports total user-perceived backoff time — but was undocumented. 3. Add signal?.aborted guard before onRetry invocations (QwenLM#6): if the abort signal fires between the catch and onRetry execution point, we now skip the callback to avoid phantom retry events that inflate the counter for retries that never actually proceeded. Applied to both persistent and normal retry paths. 4. Add persistent retry path test (status=429 + persistentMode) (QwenLM#4): the highest-volume production retry path had zero Phase 4b test coverage. Now verifies onRetry fires with monotonic attempt counter and that persistent-mode exponential backoff produces increasing delayMs. 5. Add Retry-After header path test (status=429 + retry-after: 2) (QwenLM#7): verifies that when the error carries a Retry-After header, onRetry.delayMs reflects the parsed header value (2000ms) instead of the exponential backoff calculation. 6. Add stream idle-timeout retry-attr propagation test (QwenLM#8): verifies that the closure-captured retrySnapshot reaches the setTimeout-fired endLLMRequestSpan call with correct retry context values (attempt=4, requestSetupMs=3000, retryTotalDelayMs=2500). All 186 affected tests pass (retry 68 + LCG 48 + session-tracing 70). tsc --noEmit clean. eslint clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
… in RUM (QwenLM#4432) Addresses 2 of 5 R3 review comments from wenshao (2026-05-26): 1. loggingContentGenerator.test.ts:2290 — replace `if (timeoutRecord)` guard with `expect(timeoutRecord).toBeDefined()` so the idle-timeout retry-attr test fails loudly instead of passing with 0 assertions when setTimeout doesn't fire. Also rewrote the test to use fake timers from the START (so the 5-min idle timeout is created under fake clock and can be advanced via vi.advanceTimersByTimeAsync), fixing the underlying reason it wasn't firing. 2. qwen-logger.ts:963 — add `prompt_id: event.prompt_id` to logApiRetryEvent RUM properties. Without this, RUM dashboards cannot correlate api_retry events with specific prompts, unlike the analogous logApiErrorEvent which already includes prompt_id. 165 affected tests pass. Remaining 3 R3 items (QwenLM#9 onRetry helper, QwenLM#10 error-path test coverage, QwenLM#11 caller integration assertions) deferred to follow-up PR — non-blocking refactor/test-hardening. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI still running. — qwen3.7-max via Qwen Code /review
Maintainer Verification ReportBuild
Tests — PR-Touched Files (Targeted)
Tests — Full Core Suite (Regression)
The single failure ( Code Review FindingsArchitecture: Clean AsyncLocalStorage-based retry context propagation, consistent with existing Key observations:
No issues found. The implementation is thorough, well-documented, and matches the design doc. Verified by: wenshao |
QwenLM#3731) Adds per-attempt retry telemetry for HTTP-status retries (429/5xx) emitted by retryWithBackoff at the 4 LLM call sites. Second slice of Phase 4 (sub-issue Architectural discovery (mid-planning) -------------------------------------- The Phase 4 design doc assumed claude-code's "one LLM span owns the retry loop" pattern. Reading the 4 retryWithBackoff call sites revealed qwen-code inverts that: retryWithBackoff sits ABOVE LoggingContentGenerator. Each attempt creates a fresh LLM span. The original "in-LCG accumulator" plan wouldn't work. Resolution: propagate retry state via AsyncLocalStorage (`retryContext`). retryWithBackoff wraps each `await fn()` in `retryContext.run(...)`, and LoggingContentGenerator reads the ALS in its synchronous prelude (before the first await) and threads the snapshot into all endLLMRequestSpan callsites — success / error / idle-timeout / abort. Matches existing patterns (promptIdContext, subagentNameContext, agent-context). Plan went through 3 review rounds (Plan-agent reviews) finding 22 issues total — all addressed before implementation. Changes ------- - New retryContext.ts (AsyncLocalStorage<RetryAttemptContext>) with attempt + requestSetupMs + retryTotalDelayMs fields. Computed in retry.ts immediately before `await fn()` so values are anchored to the attempt's actual start, not derived downstream. - retry.ts: - New `onRetry?: (info: RetryAttemptInfo) => void` option on RetryOptions. Opt-in per caller: non-LLM callers stay silent. - Monotonic `iterationCount` decoupled from `attempt` (which is clamped at `maxAttempts - 1` in persistent mode). Always reflects "this is the Nth fn() call" — no flip-flopping for mixed-error sequences. - retryContext.run wrap around fn() so LCG can read the ALS. - onRetry invocations wrapped in try/catch: telemetry exceptions never break the retry loop (logged via debugLogger). - logRetryAttempt debug log line KEPT — useful when OTel SDK isn't wired up (local CLI debugging, integration tests, early-startup errors). - ApiRetryEvent telemetry event class (types.ts) with model + promptId + attempt_number + error fields + subagent_name. JSDoc cross-references ContentRetryEvent (they cover different retry budgets — HTTP-status vs invalid-stream — and can both fire for one prompt). - logApiRetry function in loggers.ts — three-sink fan-out matching logContentRetry: QwenLogger RUM, OTel log signal (bridged via LogToSpanProcessor), recordApiRetry metric counter. - recordApiRetry metric (metrics.ts) — `qwen-code.api.retry.count` Counter tagged with {model}. Full COUNTER_DEFINITIONS entry + initialization + recording function + index.ts export. - qwen-logger.ts adds logApiRetryEvent for RUM consistency. - 4 LLM caller wiring sites (client.ts, baseLlmClient.ts x2, geminiChat.ts) opt in with onRetry callback that emits ApiRetryEvent with subagentName from subagentNameContext.getStore(). - LoggingContentGenerator: snapshotRetryMetadata() helper called in the SYNCHRONOUS prelude of generateContent / generateContentStream — only point where retryContext is guaranteed active for the streaming path (the returned AsyncGenerator is iterated AFTER retryWithBackoff resolves). Snapshot threaded as parameter to loggingStreamWrapper so every endLLMRequestSpan callsite (success / error / idle-timeout / abort) sees the same values. `attempt` defaults to 1 when no retry context is present (warmup, side-queries, direct calls) so dashboards filtering WHERE attempt=1 include those. Bundled Phase 4a bug fix (sampling_ms formula) ----------------------------------------------- Phase 4a's `sampling_ms = duration_ms - ttft_ms - (requestSetupMs ?? 0)` was silently wrong. `duration_ms` only covers `ttft + sampling` for the span (startTime is captured when startLLMRequestSpan runs, AFTER any setup phase). Subtracting setup again is double-counting. Phase 4a masked the bug because requestSetupMs was always undefined → 0. Phase 4b populates requestSetupMs with cumulative retry overhead — without this fix, sampling_ms would clamp to 0 for every retried request, wiping output-throughput data exactly when operators need it most. Fix: `sampling_ms = duration_ms - ttft_ms` (drop the setup subtraction). Phase 4a tests updated accordingly: 1 test rewritten to use inputs that actually exercise the clamp under the new formula (ttft > duration = clock skew); 1 test renamed to assert the FIX (setup is NOT subtracted). Out of scope (deferred, noted in PR description) ------------------------------------------------ - Persistent retry mode emission cap (50+ events under QWEN_CODE_UNATTENDED_RETRY). Aggregated attempt/retry_total_delay_ms remain accurate regardless. - SDK-internal retries (openai/google-genai maxRetries=3) remain invisible — operator awareness only. - Stream-iteration errors (mid-stream network drop during for-await) bypass retryWithBackoff entirely. Pre-existing behavior, not a Phase 4b regression. - shouldRetryOnContent content-retry path (retry.ts:184-193) skips onRetry. No caller uses this path today — code path is dead. Tests ----- - retry.test.ts: 9 new cases (monotonic counter, requestSetupMs growth, first-try success, onRetry callback contract, absent-callback silence, callback-throws resilience, shouldRetryOnError mid-loop giveup, parallel-call ALS isolation, nested-retry inner-frame read). - loggers.test.ts: 3 new cases (3-sink fan-out, subagent_name propagation, SDK-not-initialized path). - loggingContentGenerator.test.ts: 4 new cases (non-stream ALS propagation, non-stream default attempt=1, stream ALS propagation through wrapper closure, stream default attempt=1). - session-tracing.test.ts: 1 test rewritten + 1 renamed for the sampling_ms fix. All 580 telemetry + retry + LCG tests pass. tsc --noEmit clean. eslint clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Fixes 6 of 9 inline review comments from wenshao + Copilot. The remaining 3 are pushback (duration_ms semantic = design intent per D5; persistent retry cap = explicitly deferred in PR description). 1. Fix JSDoc inaccuracy on `onRetry` contract (QwenLM#1+QwenLM#2): the comment incorrectly said "synchronous throws inside fn execute OUTSIDE the ALS frame." In fact fn() runs inside retryContext.run() so throws ARE inside the frame. What's outside the frame is the onRetry callback itself (it fires from the catch block). Rewritten per wenshao's suggestion: tells callers not to read retryContext.getStore() inside onRetry — all data comes via the RetryAttemptInfo parameter. 2. Add doc comment on content-retry delay inflation (QwenLM#3): retryTotalDelayMs accumulator includes content-retry delays (shouldRetryOnContent path) which don't fire onRetry. This is intentional — the LLM span attribute reports total user-perceived backoff time — but was undocumented. 3. Add signal?.aborted guard before onRetry invocations (QwenLM#6): if the abort signal fires between the catch and onRetry execution point, we now skip the callback to avoid phantom retry events that inflate the counter for retries that never actually proceeded. Applied to both persistent and normal retry paths. 4. Add persistent retry path test (status=429 + persistentMode) (QwenLM#4): the highest-volume production retry path had zero Phase 4b test coverage. Now verifies onRetry fires with monotonic attempt counter and that persistent-mode exponential backoff produces increasing delayMs. 5. Add Retry-After header path test (status=429 + retry-after: 2) (QwenLM#7): verifies that when the error carries a Retry-After header, onRetry.delayMs reflects the parsed header value (2000ms) instead of the exponential backoff calculation. 6. Add stream idle-timeout retry-attr propagation test (QwenLM#8): verifies that the closure-captured retrySnapshot reaches the setTimeout-fired endLLMRequestSpan call with correct retry context values (attempt=4, requestSetupMs=3000, retryTotalDelayMs=2500). All 186 affected tests pass (retry 68 + LCG 48 + session-tracing 70). tsc --noEmit clean. eslint clean. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
… in RUM (QwenLM#4432) Addresses 2 of 5 R3 review comments from wenshao (2026-05-26): 1. loggingContentGenerator.test.ts:2290 — replace `if (timeoutRecord)` guard with `expect(timeoutRecord).toBeDefined()` so the idle-timeout retry-attr test fails loudly instead of passing with 0 assertions when setTimeout doesn't fire. Also rewrote the test to use fake timers from the START (so the 5-min idle timeout is created under fake clock and can be advanced via vi.advanceTimersByTimeAsync), fixing the underlying reason it wasn't firing. 2. qwen-logger.ts:963 — add `prompt_id: event.prompt_id` to logApiRetryEvent RUM properties. Without this, RUM dashboards cannot correlate api_retry events with specific prompts, unlike the analogous logApiErrorEvent which already includes prompt_id. 165 affected tests pass. Remaining 3 R3 items (QwenLM#9 onRetry helper, QwenLM#10 error-path test coverage, QwenLM#11 caller integration assertions) deferred to follow-up PR — non-blocking refactor/test-hardening. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
479bbf3 to
5458a9e
Compare
yiliang114
left a comment
There was a problem hiding this comment.
LGTM. The ALS-based retry context propagation is clean and matches existing patterns (promptIdContext, subagentNameContext). The sampling_ms formula fix is correct — duration_ms already excludes setup time since startTime is captured after startLLMRequestSpan. All prior review comments are addressed or explicitly deferred as non-blocking.
The 3 follow-up rounds of review (22 findings total) and the author's responses look thorough. Test coverage is solid — including the idle-timeout path and parallel ALS isolation.
Summary
Phase 4b of P3 LLM request timing decomposition (#3731 / #4413). Adds per-attempt HTTP-status retry telemetry so operators can see retry behavior in traces, logs, and metrics — previously the
retryWithBackoffpath at the 4 LLM call sites was completely invisible to telemetry (onlydebugLogger.warn).After Phase 4b, a request that goes
500 → 429 → successproduces:qwen-code.llm_requestspans (each with its ownduration_ms/ttft_ms/error/ status code).qwen-code.api_retryLogToSpanProcessor bridge spans sibling-parented under the caller's active span (interaction or tool).qwen-code.api_retryLogRecords (OTel log signal + QwenLogger RUM).qwen-code.api.retry.countcounter, tagged{model}.attempt: 3,request_setup_ms: ~1200(cumulative retry overhead),retry_total_delay_ms: ~1500.Architectural discovery (mid-planning)
The Phase 4 design doc assumed claude-code's "one LLM span owns the retry loop" pattern. Reading the 4 retry call sites carefully revealed qwen-code inverts that:
retryWithBackoffsits aboveLoggingContentGenerator. Each retry attempt creates a fresh LLM span. The original "in-LCG accumulator" plan wouldn't work.Resolution: propagate retry state via AsyncLocalStorage (
retryContext), matching the existingpromptIdContext/subagentNameContext/agent-contextpatterns.retryWithBackoffwraps eachawait fn()inretryContext.run(...);LoggingContentGeneratorreads the ALS in its synchronous prelude (before the first await) and threads the snapshot to allendLLMRequestSpancallsites (success / error / idle-timeout / abort).The plan went through 3 review rounds by Plan-agents finding 22 issues total — all addressed before implementation. Design doc updated to reflect the ALS-based approach (D4 has a "Phase 4b update" subsection at the top).
Bundled Phase 4a bug fix (sampling_ms formula)
Phase 4a's
sampling_ms = duration_ms - ttft_ms - (requestSetupMs ?? 0)was silently wrong:duration_msonly coversttft + samplingfor the span (startTime is captured whenstartLLMRequestSpanruns, AFTER any setup phase), so subtractingsetupagain is double-counting. Phase 4a masked the bug becauserequestSetupMswas always undefined → 0. Phase 4b populatesrequestSetupMswith cumulative retry overhead — without this fix, sampling_ms would clamp to 0 for every retried request, wiping output-throughput data exactly when operators need it most.Fix:
sampling_ms = duration_ms - ttft_ms(drop the setup subtraction). Phase 4a tests updated.Closes
Design doc
docs/design/telemetry-llm-request-timing-design.md— D4 section updated with the Phase 4b discovery + ALS resolution.Implementation overview
New (
packages/core/src/utils/retryContext.ts):AsyncLocalStorage<RetryAttemptContext>withattempt/requestSetupMs/retryTotalDelayMsfields. MirrorspromptIdContext.tsstyle.retry.ts:onRetry?: (info: RetryAttemptInfo) => voidoption onRetryOptions(opt-in per caller).iterationCountdecoupled fromattempt(which gets clamped atmaxAttempts - 1in persistent mode). No flip-flopping for mixed-error sequences.retryContext.runwrap aroundfn().onRetryinvocations wrapped in try/catch — telemetry exceptions never break the retry loop.logRetryAttemptdebug log line (useful when OTel SDK isn't wired up).types.ts: newApiRetryEventclass withsubagent_namefield. JSDoc cross-referencesContentRetryEvent(they cover different retry budgets and can both fire for one prompt).loggers.ts: newlogApiRetryfunction — three-sink fan-out matchinglogContentRetry: QwenLogger RUM, OTellogger.emit(bridged viaLogToSpanProcessor),recordApiRetryCounter.metrics.ts: newqwen-code.api.retry.countCounter tagged with{model}. FullCOUNTER_DEFINITIONSentry + initialization + recording function + index.ts export.qwen-logger.ts:logApiRetryEventfor RUM consistency.4 LLM caller wiring sites:
client.ts,baseLlmClient.ts(×2),geminiChat.ts— each opts in withonRetrycallback that emitsApiRetryEventwithsubagentNamefromsubagentNameContext.getStore().LoggingContentGenerator:snapshotRetryMetadata()helper called in the synchronous prelude ofgenerateContent/generateContentStream— only point whereretryContextis guaranteed active for the streaming path (returned AsyncGenerator is iterated AFTERretryWithBackoffresolves). Snapshot threaded as a parameter tologgingStreamWrapper.attemptdefaults to 1 when no retry context (warmup / side-queries / direct calls).session-tracing.ts: Phase 4asampling_msformula fix +LLMRequestMetadata.requestSetupMs/attemptJSDoc updates.PR split
This is Phase 4b (~500 LOC production + ~520 LOC tests). Phase 4a (#4417, merged) shipped TTFT + GenAI semconv dual-emit. Phase 4c (next PR) —
recordApiRequestBreakdown()activation for the 3 of 4ApiRequestPhasevalues that qwen-code has real data for.Test plan
retry.test.ts+9 cases: monotonic counter,requestSetupMsgrowth across attempts, first-try success (no telemetry), onRetry callback contract, absent-callback silence, callback-throws resilience,shouldRetryOnErrormid-loop giveup, parallel-call ALS isolation, nested-retry inner-frame read.loggers.test.ts+3 cases: 3-sink fan-out,subagent_namepropagation, SDK-not-initialized path.loggingContentGenerator.test.ts+4 cases: non-stream ALS propagation, non-stream defaultattempt=1, stream ALS propagation through wrapper closure, stream defaultattempt=1.session-tracing.test.tsupdates: 1 test rewritten + 1 renamed for thesampling_msformula fix.tsc --noEmitcleaneslintcleanEnd-to-end verification (real OTel SDK + outfile)
Beyond unit tests with mocked SDK, drove the full retry → telemetry → exporter chain with the real OTel SDK initialised in outfile mode to confirm the wiring works against actual
@opentelemetry/sdk-*packages, not just mocks.Method
One-shot vitest file under
packages/core/src/utils/:Configwithtelemetry.enabled=true, target='local', outfile=<tmp>.initializeTelemetry(config)— boots the real OTel SDK withFileSpanExporter+FileLogExporter.startLLMRequestSpan.retryWithBackoff(fn, opts)wherefnthrowsstatus=500for the first 2 calls then returns'ok'on the 3rd.onRetrycallback emitsApiRetryEventvialogApiRetry(config, ...)for each failure.fn, snapshotretryContext.getStore()to verify ALS values flow correctly.endLLMRequestSpanwith the final attempt's retry context fields.shutdownTelemetry()to force-flush exporters.Inputs
phase4b-test-modelprompt_idverify-prompt-1status=500each)initialDelayMs/maxDelayMsmaxAttemptsinputTokens=100,outputTokens=20,ttftMs=80,durationMs=800Outputs (verbatim from outfile, formatting collapsed)
qwen-code.api_retryLogRecord 1:{ "session.id": "phase4b-verify-session", "event.name": "qwen-code.api_retry", "model": "phase4b-test-model", "prompt_id": "verify-prompt-1", "attempt_number": 1, "error_type": "Error", "error_message": "Simulated 500 #1", "status_code": 500, "retry_delay_ms": 28.80, "duration_ms": 28.80 } // _body: "API retry attempt 1 for phase4b-test-model (status 500)."qwen-code.api_retryLogRecord 2: same shape withattempt_number=2,retry_delay_ms=50.17.qwen-code.llm_requestspan attributes:{ "qwen-code.model": "phase4b-test-model", "gen_ai.request.model": "phase4b-test-model", "duration_ms": 800, "input_tokens": 100, "gen_ai.usage.input_tokens": 100, "output_tokens": 20, "gen_ai.usage.output_tokens": 20, "ttft_ms": 80, "gen_ai.server.time_to_first_token": 0.08, "request_setup_ms": 81, "attempt": 3, "retry_total_delay_ms": 78.97, "sampling_ms": 720, "output_tokens_per_second": 27.78, "success": true }Expectations matrix
attempt_numbermonotonicstatus_codepropagatedretry_delay_ms≈ jittered backoffattemptretry_total_delay_ms≈ sum of backoffsrequest_setup_ms≥retry_total_delay_ms(includes attempt durations)sampling_ms(Phase 4a fix:duration - ttft, NOT... - setup)output_tokens_per_secondgen_ai.server.time_to_first_token(seconds)logApiRetryThe Phase 4a
sampling_msformula fix is particularly important to call out: under the old (buggy) formuladuration - ttft - setup, Phase 4b's non-zerorequest_setup_ms=81would have producedsampling_ms = 800 - 80 - 81 = 639(wrong) instead of the correct720. Verified the fix produces the right value live.Reproduction
The verification file was throwaway and removed after running. To re-run:
Outfile written to
/var/folders/.../telemetry.jsonl(test mirrors a copy to/tmp/phase4b-e2e-output.jsonlfor inspection).Out of scope (deferred / pre-existing)
ApiRetryEventrecords underQWEN_CODE_UNATTENDED_RETRY). Aggregatedattempt/retry_total_delay_mson the LLM span remain accurate regardless.maxRetries=3) remain invisible — operator awareness.for await) bypassretryWithBackoffentirely. Pre-existing behavior, not a Phase 4b regression.shouldRetryOnContentcontent-retry path (retry.ts:184-193) skipsonRetry. No caller in the codebase uses this path today — code path is dead. JSDoc documents the contract.ApiRetryEventvsContentRetryEventcoexistenceCleanly partitioned by failure mode — operators summing "total retry count for a prompt" must sum both event streams:
ContentRetryEvent(existing) — emitted bygeminiChat's for-loop forInvalidStreamErrorretries viaINVALID_CONTENT_RETRY_OPTIONS(NOT throughretryWithBackoff).ApiRetryEvent(new in 4b) — emitted byretryWithBackoff'sonRetrycallback for HTTP 429 / 5xx retries.ApiRetryEventJSDoc explicitly referencesContentRetryEventto document this.Bridge span parentage
onRetryfires from insideretry.ts's catch block — at that moment, the failed attempt's LLM span has already ended, and the next attempt's span hasn't started.LogToSpanProcessor-bridgedqwen-code.api_retryspans therefore parent to the caller's active span (typically interaction or tool), appearing as siblings of the failed LLM span, not children. This is by design and produces a coherent timeline:failed-LLM-span → api_retry-bridge-sibling → next-LLM-spanreads correctly in trace viewers.