fix(telemetry): improve LogToSpan bridge error info and TUI handling#4482
Conversation
📋 Review SummaryThis PR addresses two important issues in the 🔍 General Feedback
🎯 Specific Feedback🟡 High
🟢 Medium
🔵 Low
✅ Highlights
|
There was a problem hiding this comment.
Pull request overview
This PR improves the OpenTelemetry log-to-span “bridge” behavior used when an OTLP backend supports traces/metrics but not logs, by (1) emitting more actionable export-failure diagnostics and (2) preventing those diagnostics from polluting the interactive Ink TUI by allowing hosts to route them to a configurable sink.
Changes:
- Add an injectable
diagnosticsSinktoLogToSpanProcessor, and route internal diagnostics (export failure/timeout, buffer overflow, exporter throw) through it. - Improve export-failure diagnostics formatting to include a structured error identifier, optional numeric HTTP status, and a truncated response-body snippet with JSON escaping to keep logs single-line.
- Update telemetry SDK wiring and add/extend unit tests to cover interactive vs non-interactive routing and diagnostic formatting edge cases.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| packages/core/src/telemetry/sdk.ts | Injects a diagnostics sink in interactive mode to route bridge diagnostics into the OTEL debug logger instead of raw stderr (avoids TUI breakage). |
| packages/core/src/telemetry/sdk.test.ts | Adds coverage ensuring interactive mode injects the sink and non-interactive mode retains stderr behavior; adjusts assertions to be robust to added options. |
| packages/core/src/telemetry/log-to-span-processor.ts | Introduces LogToSpanDiagnosticsSink, default stderr sink, sink-safe emission helper, and richer export error formatting. |
| packages/core/src/telemetry/log-to-span-processor.test.ts | Adds extensive tests for diagnostic formatting (newline escaping, truncation, httpCode emission) and sink routing behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const extra = err as { code?: number; data?: string }; | ||
| const msg = err.message || err.name || 'unknown'; | ||
| const parts = [`error=${JSON.stringify(msg)}`]; | ||
| // `code` is only meaningful as an HTTP status. Networking errors surface | ||
| // string codes like 'ECONNREFUSED' on the same field — labelling those as | ||
| // `httpCode` would be a lie, so only emit for numeric codes. | ||
| if (typeof extra.code === 'number') parts.push(`httpCode=${extra.code}`); | ||
| if (typeof extra.data === 'string' && extra.data.length > 0) { | ||
| parts.push(`data=${JSON.stringify(extra.data.slice(0, 200))}`); | ||
| } |
There was a problem hiding this comment.
Addressed in b7aa282 by going the wording route: renamed the test name and reworded the comment to say "characters" (and added a note that the cap is a leak/noise budget, not a hard byte limit). Switching to a byte-based truncation via Buffer/TextEncoder would be more correct but adds runtime cost on every diagnostic for marginal benefit — the cap is there to bound log noise, and an over-budget UTF-8 emoji string at 199 chars is still well within "reasonable diagnostic size".
| it('truncates response data snippets to 200 bytes before stringifying', async () => { | ||
| await processor.shutdown(); | ||
| const err = Object.assign(new Error(''), { | ||
| name: 'OTLPExporterError', | ||
| code: 500, | ||
| data: 'x'.repeat(500), | ||
| }); |
There was a problem hiding this comment.
Renamed in b7aa282 to "truncates response data snippets to 200 characters before stringifying" so the assertion matches the actual code-unit semantics. See the sibling thread on line 464 for why we kept the cheaper char-based slice.
| throw new Error('sink exploded'); | ||
| }); | ||
| const exportFn = vi.fn((_spans, cb: (r: { code: number }) => void) => | ||
| cb({ code: 1, error: new Error('boom') }), |
There was a problem hiding this comment.
[Critical] TypeScript compilation error (TS2353): Object literal may only specify known properties, and 'error' does not exist in type '{ code: number; }'.
The callback type annotation is too narrow — it doesn't include the error property that ExportResult carries. This causes tsc to fail and is the root cause of the CI test/lint failures.
| cb({ code: 1, error: new Error('boom') }), | |
| const exportFn = vi.fn((_spans, cb: (r: { code: number; error?: Error }) => void) => |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in b7aa282 — widened the callback type to { code: number; error?: Error } so the literal with error is accepted. This was indeed the root cause of the CI lint/test failures across all 3 OS; tests are now green locally.
| code: 500, | ||
| data: '{\n "error": "boom"\n}', | ||
| }); | ||
| processor = makeFailingProcessor(err); |
There was a problem hiding this comment.
[Suggestion] Dead makeFailingProcessor(err) call — the processor created here is immediately overwritten by the explicit new LogToSpanProcessor(...) below without ever being used. The orphaned processor retains a live setInterval timer that is never cleaned up.
| processor = makeFailingProcessor(err); | |
| const sink = vi.fn(); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in b7aa282 — removed the orphaned makeFailingProcessor(err) assignment. Good catch on the timer leak; setInterval from the discarded processor would have run until the test process exited.
| process.stderr.write( | ||
| `[LogToSpan] export threw: ${err instanceof Error ? err.message : String(err)}\n`, | ||
| this.emitDiagnostic( | ||
| `[LogToSpan] export threw: ${err instanceof Error ? err.message : String(err)}`, |
There was a problem hiding this comment.
[Suggestion] The export threw diagnostic interpolates err.message / String(err) raw, without JSON escaping. Meanwhile, formatExportError() (added in this same PR) was specifically designed to enforce the single-line invariant via JSON.stringify. A synchronous exporter throw with embedded newlines in its message will produce a multi-line diagnostic — the exact failure mode formatExportError was added to prevent.
Additionally, the String(err) false branch (for non-Error throws like throw 'raw string') is untested.
| `[LogToSpan] export threw: ${err instanceof Error ? err.message : String(err)}`, | |
| `[LogToSpan] export threw: ${JSON.stringify(err instanceof Error ? err.message : String(err))}`, |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in b7aa282 — export threw now JSON.stringify-s its payload, matching the single-line invariant enforced by formatExportError. Also added two test cases: one for embedded-newline payloads, and one exercising the String(err) branch via a non-Error throw (the previously-untested fallback).
- Fix TS2353 compile error in keeps-processing-after-sink-throw test:
the mock callback type was narrowed to `{ code: number }` and rejected
the `error?: Error` field that `ExportResult` actually carries. Widen
the type. (wenshao Critical — this was the root cause of CI lint/test
failures across all 3 OS.)
- JSON.stringify the payload of the `export threw` diagnostic so a
synchronously-thrown error with embedded newlines stays on one line,
same single-line invariant enforced by `formatExportError`. Add
coverage for both the newline case and the non-Error throw branch.
(wenshao Suggestion)
- Remove the dead `makeFailingProcessor(err)` call in the JSON-escape
test that was immediately overwritten — the orphaned processor
retained a live `setInterval` timer with no cleanup. (wenshao
Suggestion)
- Rename the "200 bytes" test name and comment to "200 characters" to
match the actual `string.slice(0, 200)` (UTF-16 code units) behavior;
add a note on the cap being a leak/noise budget, not a hard byte
limit. (Copilot 2x)
- Strengthen the non-interactive test to actually trigger a failed
export against the real `LogToSpanProcessor` and assert the default
sink writes to stderr, not just that `diagnosticsSink === undefined`.
(github-actions High QwenLM#2)
- Reword the "shell-active bytes" comment to "characters that would
break log parsing" — the actual concern is log-line tearing, not
shell semantics. (github-actions Medium)
- Update class JSDoc to mention the diagnostics-sink responsibility
alongside the bridge purpose. (github-actions Low)
- Minor JSDoc wording fix on `LogToSpanDiagnosticsSink` type for
clarity around the no-trailing-newline contract. (github-actions Low)
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
|
Triaged the automated review summary against b7aa282: 🟡 High — sink-failure fallback to stderr: Push back. The whole point of the 🟡 High — verify non-interactive mode actually emits to stderr: Valid, fixed. The previous test only asserted 🟢 Medium — configurable 🟢 Medium — "shell-active bytes" wording: Fixed. Reworded to "characters that would break log parsing" — the concern is log-line tearing for greps and the file-logger format, not shell semantics. 🟢 Medium — explicit 🔵 Low — class JSDoc mentions TUI fix: Fixed. Added a paragraph pointing at the 🔵 Low — extract 🔵 Low — sink type JSDoc wording: Fixed. Reworded to "Messages are passed without a trailing newline — the sink implementation decides how to terminate them" for clarity. 🔵 Low — buffer-overflow full-format assertion: Skipping. That test verifies a specific signal (the warning prefix). Pinning to the full format would just churn on every wording tweak without catching anything the prefix doesn't. 🤖 Generated with Qwen Code |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
Qwen Code Review (STANDARD)What this PR does (2–3 sentences synthesized from the diff and PR title/body): 🟡 P2 — Medium
🟢 P3 — Low
Cross-file notes
✅ Highlights
Validation Evidence
Qwen Code |
|
Triage of the automated review: 🟡 P2 #1 (TSDoc 🟡 P2 #2 (sink drops diagnostics if no debug session): Acknowledged trade-off — `setDebugLogSession` is called in the `Config` constructor (`config.ts:958`) before `initializeTelemetry` (`config.ts:1136`), so in normal startup the session is established before the bridge constructs. The startup-window concern only materializes if telemetry init somehow precedes Config construction, which the current call graph forbids. Documented in `emitDiagnostic` JSDoc with the `isDebugLoggingDegraded()` fallback rationale; no follow-up planned. 🟢 P3 #1 (extract No code change this round. 🤖 Generated with Qwen Code |
Qwen Code Review (DEEP)CI-safe profile adapted from bundled Correctness / SecurityNo correctness or security issues found. The PR is well-implemented:
Needs VerificationNo concerns requiring additional verification. Validation Evidence
Test Coverage
Needs VerificationNo cross-file concerns — the changed symbols ( Validation Evidence
Maintainability / PerformanceNo maintainability or performance issues found. Code-quality highlights (verified against diff + surrounding code):
Performance highlights:
Needs Verification
Validation EvidencePRESENT — PR body cites unit results (
Undirected AuditNo additional undirected-audit issues found. I reviewed the diff through attacker, 3 AM oncall, and six-months-later maintainer personas, cross-referencing the production code paths in the working tree. The changes are tight and well-reasoned:
Needs VerificationNo concerns that require code outside the supplied context. All production call sites ( Qwen Code |
Add coverage for two paths flagged by the DEEP-tier review on QwenLM#4482: - `err.message || err.name || 'unknown'` chain: the third branch (both message and name empty) was never exercised. Scenario: minified environments that strip `Error.name`. Test constructs `Object.assign(new Error(''), { name: '' })` and asserts the output contains `error="unknown"`. - `typeof extra.data === 'string' && extra.data.length > 0` guard: the empty-string case (HTTP response with empty body) was never tested, so a future loosening to `!== undefined` would silently start emitting `data=""`. Test asserts `data=` is absent. Both branches are real and reachable in production failure modes; the tests are guards for the documented intent. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
|
Triage of the DEEP-tier review (correctness/security, test coverage, maintainability/performance, undirected audit): Correctness / Maintainability / Undirected: no findings, all positive. Test Coverage (2 × P3): both real branches that weren't covered. Repro confirmed each is reachable, and the bot's "future-refactor regression" concern on the
Tests now 46/46 green. 🤖 Generated with Qwen Code |
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
| // units), not bytes — non-ASCII payloads may stringify to more bytes; this | ||
| // is fine because the cap is a leak/noise budget, not a hard byte limit. | ||
| function formatExportError(err: Error | undefined): string { | ||
| if (!err) return 'error=unknown'; |
There was a problem hiding this comment.
[Suggestion] Inconsistent quoting: the early return produces error=unknown (unquoted), but when message and name are both empty, the fallthrough path yields error="unknown" (JSON-quoted via JSON.stringify). Two semantically identical "unknown error" states produce greppable-different output.
| if (!err) return 'error=unknown'; | |
| if (!err) return 'error="unknown"'; |
Also update the corresponding test expectation at line 873 from 'code=1 error=unknown\n' to 'code=1 error="unknown"\n'.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in d6f63b4 — early-return now emits error="unknown" matching the JSON-quoted fallthrough path. Updated test expectation. Both paths greppable-identical now.
| // is fine because the cap is a leak/noise budget, not a hard byte limit. | ||
| function formatExportError(err: Error | undefined): string { | ||
| if (!err) return 'error=unknown'; | ||
| const extra = err as { code?: number; data?: string }; |
There was a problem hiding this comment.
[Suggestion] The type assertion claims code?: number, but at runtime err.code can be a string (e.g. 'ECONNREFUSED' — exercised by the PR's own ECONNREFUSED test). The typeof guard is correct, but the annotation misleads future maintainers who might simplify it to if (extra.code) and incorrectly label ECONNREFUSED as httpCode=ECONNREFUSED.
| const extra = err as { code?: number; data?: string }; | |
| const extra = err as { code?: number | string; data?: string }; |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in d6f63b4 — widened to code?: number | string and added a comment marking the typeof === 'number' guard as load-bearing. Type now matches the runtime reality your ECONNREFUSED test exercises.
| // `httpCode` would be a lie, so only emit for numeric codes. | ||
| if (typeof extra.code === 'number') parts.push(`httpCode=${extra.code}`); | ||
| if (typeof extra.data === 'string' && extra.data.length > 0) { | ||
| parts.push(`data=${JSON.stringify(extra.data.slice(0, 200))}`); |
There was a problem hiding this comment.
[Suggestion] No test exercises formatExportError when err.data is a non-string truthy value (e.g. Buffer, number, plain object). The typeof extra.data === 'string' guard's false branch is only tested for undefined and ''. If a future OTLP exporter version returns binary response data, the diagnostic silently omits the data field with no test to catch it.
Consider adding:
it('omits data field when err.data is a non-string type (e.g. Buffer)', async () => {
const err = Object.assign(new Error('fail'), { code: 500, data: Buffer.from('binary') });
// ... assert msg contains httpCode=500 but not data=
});— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Added in d6f63b4 — it('omits data field when err.data is a non-string truthy value (e.g. Buffer)', ...). Same guard-against-future-loosening rationale you (and the DEEP-tier bot) raised on the empty-string case.
| process.stderr.write( | ||
| `[LogToSpan] export timeout after ${EXPORT_TIMEOUT_MS}ms\n`, | ||
| this.emitDiagnostic( | ||
| `[LogToSpan] export timeout after ${EXPORT_TIMEOUT_MS}ms`, |
There was a problem hiding this comment.
[Suggestion] The export timeout diagnostic omits the batch span count. spans.length is available in the closure but not included in the message. When an operator sees a timeout, they cannot distinguish "network is slow" from "batch was too large."
| `[LogToSpan] export timeout after ${EXPORT_TIMEOUT_MS}ms`, | |
| this.emitDiagnostic( | |
| `[LogToSpan] export timeout after ${EXPORT_TIMEOUT_MS}ms (${spans.length} span(s))`, | |
| ); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in d6f63b4 — timeout diagnostic now reads export timeout after 30000ms (N span(s)). Updated the matching test regex accordingly.
| // JSON.stringify so embedded newlines in the thrown message stay on | ||
| // one line — same single-line invariant enforced by formatExportError. | ||
| const detail = err instanceof Error ? err.message : String(err); | ||
| this.emitDiagnostic( |
There was a problem hiding this comment.
[Suggestion] The export-threw (synchronous exception) path extracts only err.message, while the callback-based failure path uses formatExportError which additionally surfaces httpCode and data. If an exporter synchronously throws an OTLPExporterError (carrying code and data), those fields are silently discarded. Consider reusing formatExportError for Error instances:
| this.emitDiagnostic( | |
| const detail = err instanceof Error | |
| ? formatExportError(err) | |
| : `error=${JSON.stringify(String(err))}`; | |
| this.emitDiagnostic( | |
| `[LogToSpan] export threw: ${detail}`, | |
| ); |
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Fixed in d6f63b4 — sync-throw path now delegates Error instances to formatExportError, picking up httpCode + data when an OTLPExporterError is thrown synchronously. Non-Error throws keep the JSON.stringify(String(err)) fallback for the single-line invariant. Added a regression test (surfaces httpCode/data when a sync-thrown error carries OTLPExporterError fields).
| // alone so CI / scripts can still see export failures on stderr — | ||
| // the canonical diagnostic channel for batch runs. | ||
| ...(config.isInteractive() && { | ||
| diagnosticsSink: (message: string) => debugLogger.warn(message), |
There was a problem hiding this comment.
[Suggestion] When isInteractive() is true but no debug log session is established yet (or QWEN_DEBUG_LOG_FILE=0), debugLogger.warn() silently no-ops. The diagnostic is fully lost — no stderr, no file. The emitDiagnostic JSDoc references isDebugLoggingDegraded() as the safety net, but that function only tracks file-write failures, not "no session" or "logging disabled."
Consider guarding the injection so the processor falls back to the default stderr sink when the debug logger won't write, or at minimum document this tradeoff in the code comment.
— qwen3.7-max via Qwen Code /review
There was a problem hiding this comment.
Partial — pushed back on the stderr fallback but documented the trade-off in d6f63b4.
The two failure modes you described:
-
"No session yet": in the normal call graph this can't happen —
setDebugLogSession(this)runs in theConfigconstructor atconfig.ts:958, whileinitializeTelemetry(this)runs later in the sameinitialize()flow atconfig.ts:1136. The session is established before the bridge is constructed, let alone before the bridge can fire its first export. -
QWEN_DEBUG_LOG_FILE=0: this is a user-explicit opt-out. Falling back to stderr here would re-introduce exactly the TUI-pollution problem the sink injection was added to prevent — for an audience (interactive-mode users with debug logging deliberately disabled) that has signed up for full silence by setting that env var.
Documented as an accepted trade-off at the sink wiring site. isDebugLoggingDegraded() is admittedly only a partial signal (file-write failures only) — if it turns out to be insufficient in practice, the right escalation is to broaden that signal in debugLogger.ts so all silent-loss cases surface uniformly, rather than splitting the policy at each sink consumer.
The OTel `LogToSpanProcessor` bridge (used when traces+metrics are over
OTLP but logs aren't, e.g. Alibaba Cloud ARMS) had two diagnostic issues:
1. Empty error messages. When the OTLP HTTP exporter callback returned
`{ code: FAILED, error }`, `error.message` is the HTTP reason-phrase —
always empty on HTTP/2. The bridge printed literally
`[LogToSpan] export failed: code=1 error=` with zero actionable info.
Now we surface `name`, `httpCode` (only when numeric), and a 200-byte
`data` snippet from the underlying OTLPExporterError, with JSON-escape
on user content so embedded newlines can't tear the log line.
2. TUI pollution. The processor wrote diagnostics to `process.stderr`
directly. Ink only manages stdout, so those writes punched through
into the rendered terminal area. The processor now accepts an
injectable `diagnosticsSink`; in interactive mode `sdk.ts` injects a
sink that routes through `debugLogger.warn` (file-backed). Non-
interactive runs (CI/scripts) keep the default stderr sink so export
failures remain visible on the canonical batch-diagnostic channel.
Backward compatibility is preserved: the legacy numeric-arg constructor
keeps stderr behavior; the options-object overload gains the new field.
Other raw `process.stderr.write` sites in the CLI (errors.ts,
startupProfiler.ts, useGeminiStream.ts, etc.) have the same TUI-leak
pattern but are intentionally left out of this PR.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Fix TS2353 compile error in keeps-processing-after-sink-throw test:
the mock callback type was narrowed to `{ code: number }` and rejected
the `error?: Error` field that `ExportResult` actually carries. Widen
the type. (wenshao Critical — this was the root cause of CI lint/test
failures across all 3 OS.)
- JSON.stringify the payload of the `export threw` diagnostic so a
synchronously-thrown error with embedded newlines stays on one line,
same single-line invariant enforced by `formatExportError`. Add
coverage for both the newline case and the non-Error throw branch.
(wenshao Suggestion)
- Remove the dead `makeFailingProcessor(err)` call in the JSON-escape
test that was immediately overwritten — the orphaned processor
retained a live `setInterval` timer with no cleanup. (wenshao
Suggestion)
- Rename the "200 bytes" test name and comment to "200 characters" to
match the actual `string.slice(0, 200)` (UTF-16 code units) behavior;
add a note on the cap being a leak/noise budget, not a hard byte
limit. (Copilot 2x)
- Strengthen the non-interactive test to actually trigger a failed
export against the real `LogToSpanProcessor` and assert the default
sink writes to stderr, not just that `diagnosticsSink === undefined`.
(github-actions High QwenLM#2)
- Reword the "shell-active bytes" comment to "characters that would
break log parsing" — the actual concern is log-line tearing, not
shell semantics. (github-actions Medium)
- Update class JSDoc to mention the diagnostics-sink responsibility
alongside the bridge purpose. (github-actions Low)
- Minor JSDoc wording fix on `LogToSpanDiagnosticsSink` type for
clarity around the no-trailing-newline contract. (github-actions Low)
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Add coverage for two paths flagged by the DEEP-tier review on QwenLM#4482: - `err.message || err.name || 'unknown'` chain: the third branch (both message and name empty) was never exercised. Scenario: minified environments that strip `Error.name`. Test constructs `Object.assign(new Error(''), { name: '' })` and asserts the output contains `error="unknown"`. - `typeof extra.data === 'string' && extra.data.length > 0` guard: the empty-string case (HTTP response with empty body) was never tested, so a future loosening to `!== undefined` would silently start emitting `data=""`. Test asserts `data=` is absent. Both branches are real and reachable in production failure modes; the tests are guards for the documented intent. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Quote `error="unknown"` in the err-missing early return so it matches
the JSON.stringify output produced when message+name fall back to
'unknown'. Two paths now emit identical greppable output for
semantically identical "unknown error" states.
- Widen the duck-typed cast to `code?: number | string` and add a
load-bearing comment on the `typeof === 'number'` guard. The type now
matches reality (Node networking errors surface string codes like
ECONNREFUSED), preventing a future simplification to `if (extra.code)`
that would mislabel networking errors as HTTP statuses.
- Reuse `formatExportError` in the sync-throw path so a synchronously-
thrown OTLPExporterError surfaces its httpCode and data, matching the
callback-failure path. Non-Error throws still fall back to
JSON.stringify on String(err) to preserve the single-line invariant.
- Include batch span count in the timeout diagnostic
("(N span(s))") — lets an operator distinguish slow network from
oversized batch when troubleshooting timeouts.
- Add a test for non-string truthy err.data (Buffer) — the `typeof ===
'string'` guard's false branch was only covered for undefined and
empty string, so a future refactor relaxing the guard would silently
start emitting binary garbage with no test to catch it.
- Document the QWEN_DEBUG_LOG_FILE=0 trade-off at the sink wiring site:
interactive mode plus disabled debug log = full diagnostic silence.
This is an accepted user opt-in trade-off; falling back to stderr
would re-introduce the TUI pollution this injection prevents.
🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
d6f63b4 to
7fc32b1
Compare
wenshao
left a comment
There was a problem hiding this comment.
No review findings. Downgraded from Approve to Comment: CI still running. — qwen3.7-max via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review
本地真实测试报告(merge 参考)按 PR body 的 Test plan 完成了静态检查 + 单元测试 + 真实 tmux 端到端验证。LogToSpan bridge 的诊断输出确实被正确路由到 debug 日志文件、TUI 渲染区彻底干净,新的 测试环境
1. 构建与静态检查2. 单元测试
实际数量 > PR body 声明,因为后续两次 review-driven commit 又补了 25 个 case: 3. 真实 CLI 端到端(tmux + 假 OTLP 服务器)这是 PR body Test plan 里 "Manual verification" 的可重复版本——验证两件事:
步骤
export QWEN_TELEMETRY_ENABLED=true
export QWEN_TELEMETRY_OTLP_PROTOCOL=http
export QWEN_TELEMETRY_OTLP_ENDPOINT="" # ← 空串,让 logsUrl=undefined
export QWEN_TELEMETRY_OTLP_TRACES_ENDPOINT=http://127.0.0.1:14318/v1/traces
unset QWEN_TELEMETRY_TARGET # 注意:valid 值仅 local/gcp,不需要设
tmux new-session -d -s pr4482 -x 220 -y 50
tmux send-keys -t pr4482 'npm run dev' Enter
假服务器收到的请求Debug 日志文件中的 bridge 诊断(关键证据)
逐项核对:
TUI 渲染区扫描(反向证据)整个 2000 行 scrollback 内零 覆盖矩阵:PR claim → 验证
4. 一点配置可用性提醒(非阻塞)我在搭建复现时踩过两个坑,文档/帮助里似乎没明说,记下来供其他 reviewer 或后续用户参考:
5. 结论
从本地真实测试角度,该 PR 可以合并。 ARMS Singapore 那种 "traces+metrics 通但 logs 不通" 的真实失败场景下,运维拿到的诊断从原来的 |
|
感谢这么细的端到端验证 🙏 特别想 highlight 几条:
第 4 节那两条 OTLP 配置可用性提示( PR side 不再有改动,等 OWNER/MEMBER 一票就能合。再次感谢。 🤖 Generated with Qwen Code |
…enLM#4482) Co-authored-by: Seth Troisi <sethtroisi@google.com>
Summary
Two fixes to the
LogToSpanProcessorbridge — used when the OTLP backend supports traces+metrics but not logs (e.g. Alibaba Cloud ARMS):[LogToSpan] export failed: code=1 error=becauseOTLPExporterError.messageis the HTTP reason-phrase, which is always empty on HTTP/2 (and frequently stripped by gateways). Now we surfaceerror=\"<name|message>\",httpCode=<n>(only when numeric —ECONNREFUSEDand friends are not labelled as HTTP codes), and a 200-bytedatasnippet from the response body. All user-supplied content is JSON-escaped so embedded newlines cannot tear the log line.process.stderr, which Ink does not control, so messages broke out of the rendered terminal area (see screenshot in https://github.com/QwenLM/qwen-code/issues/... if a tracking issue exists). The processor now accepts an injectablediagnosticsSink; in interactive modesdk.tsinjects a sink that routes throughdebugLogger.warn(file-backed). Non-interactive runs (CI, scripts,-p) keep the default stderr sink so export failures stay visible on the canonical batch-diagnostic channel.Reproduction (before)
DataWorks online IDE + ARMS Singapore internal endpoint:
And the TUI shows:
[LogToSpan] export failed: code=1 error=After this PR (interactive): the diagnostic goes to
~/.qwen/logs/<session>.logonly, with full context:Backward compatibility
new LogToSpanProcessor(exporter, flushIntervalMs, maxBufferSize)) is preserved and keeps stderr behavior.diagnosticsSinkis a new optional field on the options-object overload only.LogToSpanDiagnosticsSinkis exported from the processor module but is NOT re-exported fromtelemetry/index.tsorcore/src/index.ts— internal API, no semver implication.Out of scope
Other raw
process.stderr.writecall sites in the CLI (utils/errors.ts,startupProfiler.ts,useGeminiStream.ts:2543,acpAgent.ts, others) have the same TUI-leak pattern but are left for follow-up to keep this PR focused.Test plan
vitest run packages/core/src/telemetry/log-to-span-processor.test.ts— 42 cases (covers the 4 diagnostic sites routing through the sink, sink-throw recovery, format edge cases including newlines, ECONNREFUSED, missing error, and data truncation)vitest run packages/core/src/telemetry/sdk.test.ts— 38 cases (covers interactive injects file sink + non-interactive leaves stderr sink alone)eslintclean on all 4 touched files~/.qwen/logs/<session>.loginstead of the TUI render area🤖 Generated with Qwen Code