Skip to content

fix(telemetry): improve LogToSpan bridge error info and TUI handling#4482

Merged
doudouOUC merged 4 commits into
QwenLM:mainfrom
doudouOUC:fix/log-to-span-bridge-diagnostics
May 27, 2026
Merged

fix(telemetry): improve LogToSpan bridge error info and TUI handling#4482
doudouOUC merged 4 commits into
QwenLM:mainfrom
doudouOUC:fix/log-to-span-bridge-diagnostics

Conversation

@doudouOUC

Copy link
Copy Markdown
Collaborator

Summary

Two fixes to the LogToSpanProcessor bridge — used when the OTLP backend supports traces+metrics but not logs (e.g. Alibaba Cloud ARMS):

  • Useful error messages when an export fails. Previously a failed export logged literally [LogToSpan] export failed: code=1 error= because OTLPExporterError.message is the HTTP reason-phrase, which is always empty on HTTP/2 (and frequently stripped by gateways). Now we surface error=\"<name|message>\", httpCode=<n> (only when numeric — ECONNREFUSED and friends are not labelled as HTTP codes), and a 200-byte data snippet from the response body. All user-supplied content is JSON-escaped so embedded newlines cannot tear the log line.
  • No more TUI pollution from bridge diagnostics. The processor wrote diagnostics to raw process.stderr, which Ink does not control, so messages broke out of the rendered terminal area (see screenshot in https://github.com/QwenLM/qwen-code/issues/... if a tracking issue exists). The processor now accepts an injectable diagnosticsSink; in interactive mode sdk.ts injects a sink that routes through debugLogger.warn (file-backed). Non-interactive runs (CI, scripts, -p) keep the default stderr sink so export failures stay visible on the canonical batch-diagnostic channel.

Reproduction (before)

DataWorks online IDE + ARMS Singapore internal endpoint:

2026-05-24T11:28:01.252Z [ERROR] [OTEL] {\"message\":\"PeriodicExportingMetricReader: metrics export failed (error OTLPExporterError)\"...}

And the TUI shows: [LogToSpan] export failed: code=1 error=

After this PR (interactive): the diagnostic goes to ~/.qwen/logs/<session>.log only, with full context:

[WARN] [OTEL] [LogToSpan] export failed: code=1 error=\"OTLPExporterError\" httpCode=403 data=\"...\"

Backward compatibility

  • The legacy 3-arg numeric constructor (new LogToSpanProcessor(exporter, flushIntervalMs, maxBufferSize)) is preserved and keeps stderr behavior.
  • diagnosticsSink is a new optional field on the options-object overload only.
  • LogToSpanDiagnosticsSink is exported from the processor module but is NOT re-exported from telemetry/index.ts or core/src/index.ts — internal API, no semver implication.

Out of scope

Other raw process.stderr.write call sites in the CLI (utils/errors.ts, startupProfiler.ts, useGeminiStream.ts:2543, acpAgent.ts, others) have the same TUI-leak pattern but are left for follow-up to keep this PR focused.

Test plan

  • Unit: vitest run packages/core/src/telemetry/log-to-span-processor.test.ts — 42 cases (covers the 4 diagnostic sites routing through the sink, sink-throw recovery, format edge cases including newlines, ECONNREFUSED, missing error, and data truncation)
  • Unit: vitest run packages/core/src/telemetry/sdk.test.ts — 38 cases (covers interactive injects file sink + non-interactive leaves stderr sink alone)
  • Lint: eslint clean on all 4 touched files
  • Manual verification: ARMS Singapore endpoint failure → diagnostic now reaches ~/.qwen/logs/<session>.log instead of the TUI render area

🤖 Generated with Qwen Code

Copilot AI review requested due to automatic review settings May 24, 2026 16:14
@github-actions

Copy link
Copy Markdown
Contributor

📋 Review Summary

This PR addresses two important issues in the LogToSpanProcessor bridge: improving error message quality when OTLP exports fail, and preventing TUI pollution by routing diagnostics through an injectable sink. The implementation is well-designed with excellent test coverage (42 new test cases), maintains backward compatibility, and demonstrates thoughtful API design.

🔍 General Feedback

  • Strong test coverage: 42 new test cases comprehensively cover edge cases including HTTP/2 empty messages, JSON escaping, data truncation, networking error codes, and sink injection behavior
  • Good backward compatibility: The 3-arg numeric constructor is preserved, and diagnosticsSink is only available on the options-object overload
  • Thoughtful API design: The injectable sink pattern is well-executed, with a sensible default that preserves existing behavior for non-interactive runs
  • Excellent documentation: Code comments clearly explain tradeoffs (e.g., why sink errors are swallowed, why non-numeric codes aren't labeled as HTTP codes)
  • Security-conscious: JSON-escaping prevents log line tearing and potential log injection attacks

🎯 Specific Feedback

🟡 High

  • File: log-to-span-processor.ts:273-279 - The emitDiagnostic method swallows all sink errors to prevent interrupting telemetry ingestion. While the comment explains the tradeoff, consider adding a fallback mechanism (e.g., after N consecutive sink failures, fall back to stderr with a rate limit). Currently, a misbehaving sink could silently lose all diagnostics indefinitely.

  • File: sdk.ts:274-277 - The conditional injection of diagnosticsSink only happens when config.isInteractive() is true. However, there's no test verifying that non-interactive mode with isInteractive() === false actually uses the default stderr sink (the test only checks that diagnosticsSink is undefined). Consider adding a test that verifies stderr output in non-interactive mode when an export fails.

🟢 Medium

  • File: log-to-span-processor.ts:460 - The formatExportError function truncates response data to 200 bytes, which is reasonable. However, consider making this configurable via LogToSpanProcessorOptions for environments where longer error payloads are valuable (e.g., debugging complex API errors). The constant could be maxErrorDataBytes?: number with a default of 200.

  • File: log-to-span-processor.ts:454 - The comment mentions "shell-active bytes" but the actual concern is log parsing/grep corruption, not shell activity. Consider rewording to "log parsing-active characters" or simply "characters that could break log parsing."

  • File: log-to-span-processor.test.ts:858-860 - The test for JSON-escaping verifies the message doesn't contain newlines and contains escaped sequences. Consider also asserting that the entire diagnostic message is a single line by checking msg.split('\n').length === 1 to make the intent more explicit.

🔵 Low

  • File: log-to-span-processor.ts:58-69 - The class JSDoc mentions "backends (e.g., Alibaba Cloud) that support traces and metrics but not logs." Consider updating this to also mention the TUI pollution fix, as that's now a significant part of the processor's responsibility.

  • File: sdk.ts:270-276 - The inline comment explaining the sink injection is good, but consider extracting this logic to a small helper function like createLogToSpanDiagnosticsSink(config: Config) to improve readability and make it easier to test independently.

  • File: log-to-span-processor.ts:47-54 - The LogToSpanDiagnosticsSink type JSDoc says "Messages have no trailing newline," but the default sink adds \n. This is technically correct (the sink receives the message without newline), but could be clearer. Consider: "Messages are passed without a trailing newline — the sink implementation decides how to terminate them."

  • File: log-to-span-processor.test.ts:966-968 - The test "routes buffer-overflow warnings through the injected sink" uses expect.stringContaining without verifying the exact format. Consider matching the full expected format like other tests do for consistency.

✅ Highlights

  • Excellent error handling: The formatExportError function handles numerous edge cases (empty messages, HTTP/2, networking errors, missing error objects) with appropriate fallbacks
  • Smart data truncation: Limiting response data to 200 bytes prevents log flooding while preserving enough context for debugging
  • Proper type safety: The LogToSpanDiagnosticsSink type is exported but not re-exported from public indexes, correctly marking it as internal API
  • Comprehensive test scenarios: Tests cover sink-throw recovery, ensuring the processor continues working even when the sink fails
  • Clean separation of concerns: The diagnostic routing logic is cleanly separated from the export logic, making the code easier to maintain and test

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the OpenTelemetry log-to-span “bridge” behavior used when an OTLP backend supports traces/metrics but not logs, by (1) emitting more actionable export-failure diagnostics and (2) preventing those diagnostics from polluting the interactive Ink TUI by allowing hosts to route them to a configurable sink.

Changes:

  • Add an injectable diagnosticsSink to LogToSpanProcessor, and route internal diagnostics (export failure/timeout, buffer overflow, exporter throw) through it.
  • Improve export-failure diagnostics formatting to include a structured error identifier, optional numeric HTTP status, and a truncated response-body snippet with JSON escaping to keep logs single-line.
  • Update telemetry SDK wiring and add/extend unit tests to cover interactive vs non-interactive routing and diagnostic formatting edge cases.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
packages/core/src/telemetry/sdk.ts Injects a diagnostics sink in interactive mode to route bridge diagnostics into the OTEL debug logger instead of raw stderr (avoids TUI breakage).
packages/core/src/telemetry/sdk.test.ts Adds coverage ensuring interactive mode injects the sink and non-interactive mode retains stderr behavior; adjusts assertions to be robust to added options.
packages/core/src/telemetry/log-to-span-processor.ts Introduces LogToSpanDiagnosticsSink, default stderr sink, sink-safe emission helper, and richer export error formatting.
packages/core/src/telemetry/log-to-span-processor.test.ts Adds extensive tests for diagnostic formatting (newline escaping, truncation, httpCode emission) and sink routing behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +455 to +464
const extra = err as { code?: number; data?: string };
const msg = err.message || err.name || 'unknown';
const parts = [`error=${JSON.stringify(msg)}`];
// `code` is only meaningful as an HTTP status. Networking errors surface
// string codes like 'ECONNREFUSED' on the same field — labelling those as
// `httpCode` would be a lie, so only emit for numeric codes.
if (typeof extra.code === 'number') parts.push(`httpCode=${extra.code}`);
if (typeof extra.data === 'string' && extra.data.length > 0) {
parts.push(`data=${JSON.stringify(extra.data.slice(0, 200))}`);
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in b7aa282 by going the wording route: renamed the test name and reworded the comment to say "characters" (and added a note that the cap is a leak/noise budget, not a hard byte limit). Switching to a byte-based truncation via Buffer/TextEncoder would be more correct but adds runtime cost on every diagnostic for marginal benefit — the cap is there to bound log noise, and an over-budget UTF-8 emoji string at 199 chars is still well within "reasonable diagnostic size".

Comment on lines +821 to +827
it('truncates response data snippets to 200 bytes before stringifying', async () => {
await processor.shutdown();
const err = Object.assign(new Error(''), {
name: 'OTLPExporterError',
code: 500,
data: 'x'.repeat(500),
});

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed in b7aa282 to "truncates response data snippets to 200 characters before stringifying" so the assertion matches the actual code-unit semantics. See the sibling thread on line 464 for why we kept the cheaper char-based slice.

throw new Error('sink exploded');
});
const exportFn = vi.fn((_spans, cb: (r: { code: number }) => void) =>
cb({ code: 1, error: new Error('boom') }),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] TypeScript compilation error (TS2353): Object literal may only specify known properties, and 'error' does not exist in type '{ code: number; }'.

The callback type annotation is too narrow — it doesn't include the error property that ExportResult carries. This causes tsc to fail and is the root cause of the CI test/lint failures.

Suggested change
cb({ code: 1, error: new Error('boom') }),
const exportFn = vi.fn((_spans, cb: (r: { code: number; error?: Error }) => void) =>

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b7aa282 — widened the callback type to { code: number; error?: Error } so the literal with error is accepted. This was indeed the root cause of the CI lint/test failures across all 3 OS; tests are now green locally.

code: 500,
data: '{\n "error": "boom"\n}',
});
processor = makeFailingProcessor(err);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Dead makeFailingProcessor(err) call — the processor created here is immediately overwritten by the explicit new LogToSpanProcessor(...) below without ever being used. The orphaned processor retains a live setInterval timer that is never cleaned up.

Suggested change
processor = makeFailingProcessor(err);
const sink = vi.fn();

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b7aa282 — removed the orphaned makeFailingProcessor(err) assignment. Good catch on the timer leak; setInterval from the discarded processor would have run until the test process exited.

process.stderr.write(
`[LogToSpan] export threw: ${err instanceof Error ? err.message : String(err)}\n`,
this.emitDiagnostic(
`[LogToSpan] export threw: ${err instanceof Error ? err.message : String(err)}`,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The export threw diagnostic interpolates err.message / String(err) raw, without JSON escaping. Meanwhile, formatExportError() (added in this same PR) was specifically designed to enforce the single-line invariant via JSON.stringify. A synchronous exporter throw with embedded newlines in its message will produce a multi-line diagnostic — the exact failure mode formatExportError was added to prevent.

Additionally, the String(err) false branch (for non-Error throws like throw 'raw string') is untested.

Suggested change
`[LogToSpan] export threw: ${err instanceof Error ? err.message : String(err)}`,
`[LogToSpan] export threw: ${JSON.stringify(err instanceof Error ? err.message : String(err))}`,

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in b7aa282export threw now JSON.stringify-s its payload, matching the single-line invariant enforced by formatExportError. Also added two test cases: one for embedded-newline payloads, and one exercising the String(err) branch via a non-Error throw (the previously-untested fallback).

doudouOUC added a commit to doudouOUC/qwen-code that referenced this pull request May 25, 2026
- Fix TS2353 compile error in keeps-processing-after-sink-throw test:
  the mock callback type was narrowed to `{ code: number }` and rejected
  the `error?: Error` field that `ExportResult` actually carries. Widen
  the type. (wenshao Critical — this was the root cause of CI lint/test
  failures across all 3 OS.)

- JSON.stringify the payload of the `export threw` diagnostic so a
  synchronously-thrown error with embedded newlines stays on one line,
  same single-line invariant enforced by `formatExportError`. Add
  coverage for both the newline case and the non-Error throw branch.
  (wenshao Suggestion)

- Remove the dead `makeFailingProcessor(err)` call in the JSON-escape
  test that was immediately overwritten — the orphaned processor
  retained a live `setInterval` timer with no cleanup. (wenshao
  Suggestion)

- Rename the "200 bytes" test name and comment to "200 characters" to
  match the actual `string.slice(0, 200)` (UTF-16 code units) behavior;
  add a note on the cap being a leak/noise budget, not a hard byte
  limit. (Copilot 2x)

- Strengthen the non-interactive test to actually trigger a failed
  export against the real `LogToSpanProcessor` and assert the default
  sink writes to stderr, not just that `diagnosticsSink === undefined`.
  (github-actions High QwenLM#2)

- Reword the "shell-active bytes" comment to "characters that would
  break log parsing" — the actual concern is log-line tearing, not
  shell semantics. (github-actions Medium)

- Update class JSDoc to mention the diagnostics-sink responsibility
  alongside the bridge purpose. (github-actions Low)

- Minor JSDoc wording fix on `LogToSpanDiagnosticsSink` type for
  clarity around the no-trailing-newline contract. (github-actions Low)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Triaged the automated review summary against b7aa282:

🟡 High — sink-failure fallback to stderr: Push back. The whole point of the diagnosticsSink injection is to keep bridge diagnostics out of the rendered terminal area; reintroducing a stderr fallback would re-create the exact TUI-pollution problem this PR is fixing. The existing isDebugLoggingDegraded() flag is the host's signal for "overall logging is broken" — surfacing per-subsystem loss-of-signal each time the sink fails would just churn. Keeping the swallow as documented.

🟡 High — verify non-interactive mode actually emits to stderr: Valid, fixed. The previous test only asserted diagnosticsSink === undefined. Now strengthened to construct a real LogToSpanProcessor with the default sink and verify a failed export writes [LogToSpan] export failed: code=1 error="boom"\n to stderr end-to-end.

🟢 Medium — configurable maxErrorDataBytes: Push back, YAGNI. Single user (the bridge) and a single failure mode (ARMS-style empty-reason-phrase non-2xx). Adding a knob with no current caller is premature. Trivial to add later if a real second use case shows up.

🟢 Medium — "shell-active bytes" wording: Fixed. Reworded to "characters that would break log parsing" — the concern is log-line tearing for greps and the file-logger format, not shell semantics.

🟢 Medium — explicit split('\n').length === 1: Skipping. The existing not.toContain('\n') assertion already enforces the single-line invariant and is the more direct expression of intent.

🔵 Low — class JSDoc mentions TUI fix: Fixed. Added a paragraph pointing at the diagnosticsSink option and its TUI-pollution rationale.

🔵 Low — extract createLogToSpanDiagnosticsSink helper: Push back. The sink wiring is 3 lines of spread + arrow function and lives once at the only construction site. Extracting it would add an indirection without any reuse or testability gain — the unit test already exercises both branches directly.

🔵 Low — sink type JSDoc wording: Fixed. Reworded to "Messages are passed without a trailing newline — the sink implementation decides how to terminate them" for clarity.

🔵 Low — buffer-overflow full-format assertion: Skipping. That test verifies a specific signal (the warning prefix). Pinning to the full format would just churn on every wording tweak without catching anything the prefix doesn't.

🤖 Generated with Qwen Code

wenshao
wenshao previously approved these changes May 25, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Qwen Code Review (STANDARD)

What this PR does (2–3 sentences synthesized from the diff and PR title/body):
Improves the LogToSpanProcessor bridge diagnostics by surfacing actionable error info (name, httpCode, data snippet) when OTLP exports fail — replacing empty error= messages caused by HTTP/2 stripped reason-phrases. Additionally introduces an injectable diagnosticsSink so TUI (interactive) mode routes bridge diagnostics through the file-backed debugLogger instead of raw process.stderr, which was polluting the Ink render area.

🟡 P2 — Medium

  1. packages/core/src/telemetry/log-to-span-processor.ts:81 (commit 2) — TSDoc { @link LogToSpanDiagnosticsSink} has a space after {. TSDoc requires {@link ...} (no space) to resolve the cross-reference. The rendered docs will show the literal text { @link LogToSpanDiagnosticsSink} instead of a hyperlink.

    - * routed through { @link LogToSpanDiagnosticsSink} so TUI hosts can keep
    + * routed through {@link LogToSpanDiagnosticsSink} so TUI hosts can keep
  2. packages/core/src/telemetry/sdk.ts:274 — The injected sink (message) => debugLogger.warn(message) silently drops diagnostics when no debug log session is active (getActiveSession() returns null). In interactive mode this can happen during early startup (before setDebugLogSession / runWithDebugLogSession is established) or if session setup fails. The author has explicitly accepted this tradeoff (per PR comment and emitDiagnostic JSDoc), and isDebugLoggingDegraded() is the host-level signal. Noting for awareness — no action required this PR, but consider whether a startup-window diagnostic loss is acceptable long-term or merits a follow-up.

🟢 P3 — Low

  1. packages/core/src/telemetry/log-to-span-processor.ts:456 — The formatExportError comment says "The 200 figure is JS string length (UTF-16 code units), not bytes" — accurate and well-documented. Consider whether a constant like MAX_ERROR_DATA_CHARS = 200 at module scope would make the intent self-documenting at the call site (extra.data.slice(0, MAX_ERROR_DATA_CHARS)) and easier to adjust if a second caller appears. Low priority; the current inline literal is acceptable for a single-use value.

Cross-file notes

  • Config.isInteractive() is set in the constructor (this.interactive = params.interactive ?? false at config.ts:1041), so it is reliably resolved at initializeTelemetry call time. No race concern.
  • LogToSpanDiagnosticsSink is export-ed from the module file but not re-exported from any barrel (telemetry/index.ts). Consistent with how LogToSpanProcessor itself is scoped — internal API, no public surface change.
  • The LogToSpanProcessor mock assertions in sdk.test.ts were correctly relaxed from exact-match to expect.objectContaining(...) to tolerate the new optional diagnosticsSink field.

✅ Highlights

  • Exceptionally thorough test coverage: all four diagnostic emission sites (export-failed, export-threw, export-timeout, buffer-overflow) are tested through both the default stderr sink and an injected sink, plus sink-throw recovery, non-Error throws, newline escaping, and data truncation.
  • The formatExportError function is well-designed — typeof guards on code and data correctly handle the polymorphic OTLPExporterError shape (numeric HTTP codes vs. string networking codes like ECONNREFUSED).
  • Backward compatibility is carefully preserved: the legacy numeric-arg constructor path gets defaultDiagnosticsSink unconditionally.
  • The second commit demonstrates good review-follow-through: addressing CI failures, dead code removal, and test strengthening based on reviewer feedback.

Validation Evidence

PRESENT — The PR body includes:

  • Unit test commands and pass counts (vitest run for both test files: 42 + 38 cases).
  • Before/after diagnostic output examples (ARMS Singapore endpoint: [LogToSpan] export failed: code=1 error=[LogToSpan] export failed: code=1 error="OTLPExporterError" httpCode=403 data="...").
  • Lint confirmation (eslint clean on all 4 touched files).
  • Manual verification note referencing the specific reproduction scenario.

This is an automated, advisory, comment-only review — it never approves or requests changes.


Qwen Code /review · qwen3.7-max · tier STANDARD

@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Triage of the automated review:

🟡 P2 #1 (TSDoc { @link ...} typo): False positive. The file at HEAD (b7aa2823e) reads {@link LogToSpanDiagnosticsSink} with no space — verified via git show HEAD:packages/core/src/telemetry/log-to-span-processor.ts | sed -n '82p'. No change needed.

🟡 P2 #2 (sink drops diagnostics if no debug session): Acknowledged trade-off — `setDebugLogSession` is called in the `Config` constructor (`config.ts:958`) before `initializeTelemetry` (`config.ts:1136`), so in normal startup the session is established before the bridge constructs. The startup-window concern only materializes if telemetry init somehow precedes Config construction, which the current call graph forbids. Documented in `emitDiagnostic` JSDoc with the `isDebugLoggingDegraded()` fallback rationale; no follow-up planned.

🟢 P3 #1 (extract MAX_ERROR_DATA_CHARS = 200 constant): Push back — single-use literal at the only call site, with a comment explaining the choice. Extracting now would just add a constant nobody else consumes. Will revisit if a second caller appears.

No code change this round.

🤖 Generated with Qwen Code

@qwen-code-ci-bot

Copy link
Copy Markdown
Collaborator

Qwen Code Review (DEEP)

CI-safe profile adapted from bundled /review; focused passes: correctness/security, test coverage, maintainability/performance, and undirected audit.

Correctness / Security

No correctness or security issues found.

The PR is well-implemented:

  • Error formatting (formatExportError): Fallback chain message → name → 'unknown' correctly handles HTTP/2 empty reason-phrases. typeof extra.code === 'number' guard avoids mislabelling ECONNREFUSED as HTTP status. JSON.stringify on message and data enforces the single-line invariant for downstream log parsers.
  • Diagnostics sink: emitDiagnostic swallows sink errors via try-catch, preventing a misbehaving sink from disrupting telemetry ingestion. The tradeoff (silent diagnostic loss when sink fails) is documented and acceptable — the host surfaces overall logging health via isDebugLoggingDegraded().
  • Backward compatibility: Legacy numeric-arg constructor preserves stderr behavior. Options-object overload cleanly adds diagnosticsSink with ?? defaultDiagnosticsSink fallback.
  • sdk.ts wiring: Conditional spread ...(config.isInteractive() && { diagnosticsSink: ... }) correctly injects the sink only in TUI mode, leaving the default stderr sink for CI/scripts.
  • Test coverage: Comprehensive — covers empty messages, embedded newlines, non-Error throws, ECONNREFUSED string codes, missing errors, data truncation, sink throw recovery, buffer overflow routing, timeout routing, and end-to-end stderr verification for non-interactive mode.

Needs Verification

No concerns requiring additional verification.

Validation Evidence

PRESENT — PR body includes: unit test commands (vitest run for both test files with case counts: 42 + 38), lint confirmation (eslint clean on all 4 files), and manual verification against ARMS Singapore endpoint showing diagnostic reaching ~/.qwen/logs/<session>.log instead of the TUI render area. Author PR comments include detailed triage of prior automated review findings with push-back rationale.

This is an automated, advisory, comment-only review — it never approves or requests changes.


Test Coverage

  • P3 log-to-span-processor.tsformatExportError err.message || err.name || 'unknown' chain — The fallback chain has three levels, but only two are tested: err.message non-empty (tests 1, 2, 4) and err === undefined (test 5, short-circuits at !err). The third branch — both err.message and err.name falsy, landing on 'unknown' — is unreachable by any existing test. A concrete scenario: Object.assign(new Error(''), { name: '' }) (some minified environments strip Error names). Suggested test:

    it('falls back to "unknown" when both message and name are empty', async () => {
      const err = Object.assign(new Error(''), { name: '' });
      // ... construct failing processor, assert output contains error="unknown"
    });
  • P3 log-to-span-processor.tsformatExportError err.data === '' branch — The guard typeof extra.data === 'string' && extra.data.length > 0 has a deliberate empty-string exclusion (empty HTTP response body → skip data= field). Existing tests cover data populated (3 tests) and data absent (undefined, test 4), but not data present as an empty string. A future refactor loosening length > 0 to !== undefined would emit a meaningless data="" without any test catching it. Suggested assertion:

    it('omits data field when err.data is an empty string', async () => {
      const err = Object.assign(new Error('fail'), { code: 500, data: '' });
      // ... assert output does NOT contain 'data='
    });

Needs Verification

No cross-file concerns — the changed symbols (formatExportError, emitDiagnostic, diagnosticsSink, LogToSpanDiagnosticsSink) are module-private or have no external callers beyond the two files in the diff.

Validation Evidence

PRESENT — The PR body includes: unit test transcripts (vitest run — 42 + 38 cases), eslint-clean confirmation on all 4 files, and a manual verification report (ARMS Singapore endpoint failure → diagnostic routed to ~/.qwen/logs/<session>.log instead of TUI render area). The author's triage comments on prior automated review rounds demonstrate iterative hardening (sink-failure swallow rationale, non-interactive end-to-end stderr check, JSON-escape on export-threw path).

This is an automated, advisory, comment-only review — it never approves or requests changes.


Maintainability / Performance

No maintainability or performance issues found.

Code-quality highlights (verified against diff + surrounding code):

  • emitDiagnostic cleanly centralizes the try/catch swallow at one site; all four diagnostic emission points (buffer overflow, timeout, export-failed, export-threw) route through it. The default sink is hoisted to module scope (defaultDiagnosticsSink) so it isn't recreated per construction.
  • formatExportError is a focused pure function. The typeof extra.code === 'number' guard correctly distinguishes HTTP statuses from Node system error strings (ECONNREFUSED); JSON.stringify on both msg and data enforces the single-line invariant; data.slice(0, 200) runs before stringify so the cap is a real noise budget. Patch 2's "characters not bytes" comment correction is accurate (JS .length is UTF-16 code units).
  • Backward compatibility: the three constructor overloads are preserved. The legacy numeric-arg path keeps stderr via the default sink; the options-object overload gains the new optional field. Only one production construction site exists (sdk.ts:221) — cross-file impact is nil.
  • sdk.ts wiring uses ...(config.isInteractive() && { diagnosticsSink: ... }) — evaluates once at init, captures the already-in-scope debugLogger from line 155, no extra allocation in the hot path.
  • Test additions are thorough: 11 new cases covering all four diagnostic sites, sink-throw recovery (verifying resolve() still runs and subsequent exports proceed), format edge cases (empty message, embedded newlines, non-Error throws, ECONNREFUSED), and — after patch 2 — a real-processor end-to-end check for the default stderr sink.

Performance highlights:

  • All changes are on error/failure paths (export failures, buffer overflow, timeout). No hot-path impact.
  • JSON.stringify on a 200-character slice and on short error messages is negligible.
  • No new retained memory: the sink closure captures the module-scoped debugLogger, which is already long-lived.
  • diagnosticsSink is never buffered — emitted synchronously at the failure site.

Needs Verification

  • None. The only production caller of LogToSpanProcessor is in sdk.ts:221 (verified via grep), so no cross-file signature incompatibility is possible. The Config.isInteractive() method used in sdk.ts is already exercised by the new sdk.test.ts cases (isInteractive: () => false added to two existing mock configs and both interactive/non-interactive paths asserted).

Validation Evidence

PRESENT — PR body cites unit results (vitest run on both touched test files: 42 + 38 cases), eslint clean on all 4 files, and manual ARMS Singapore endpoint verification showing the diagnostic landing in ~/.qwen/logs/<session>.log instead of the TUI render area. Author's triage comments document explicit push-back on three advisory findings with reasoned rationale (TUI-pollution regression risk, YAGNI on maxErrorDataBytes, single-site helper extraction).

This is an automated, advisory, comment-only review — it never approves or requests changes.


Undirected Audit

No additional undirected-audit issues found.

I reviewed the diff through attacker, 3 AM oncall, and six-months-later maintainer personas, cross-referencing the production code paths in the working tree. The changes are tight and well-reasoned:

  • Attacker: formatExportError applies JSON.stringify to all user-controlled fields (err.message, err.data) before emission, preventing log-line tearing via embedded newlines. The typeof extra.code === 'number' guard prevents labelling system errors (ECONNREFUSED) as HTTP codes. The data.slice(0, 200) cap prevents unbounded memory/stringify cost from a malicious backend returning a huge response body. Terminal escape sequences in err.data are not sanitized, but the old code passed err.message raw — this is pre-existing and not a regression.

  • 3 AM oncall: The new diagnostic format (error="OTLPExporterError" httpCode=403 data="Forbidden: invalid license") is a dramatic improvement over the old error= for empty-reason-phrase failures — exactly what an oncall engineer needs. The emitDiagnostic try/catch ensures a broken sink never takes down telemetry ingestion, and the documented isDebugLoggingDegraded() fallback gives operators a signal when the file logger is broken.

  • Future maintainer: The diagnosticsSink injection is cleanly scoped — one exported type (LogToSpanDiagnosticsSink), one optional field on the options overload, one private method (emitDiagnostic). The defaultDiagnosticsSink is hoisted to module scope (no per-call allocation). The legacy numeric-arg constructor overload preserves stderr behavior for existing callers. Only one production construction site exists (sdk.ts:~267), and the ...(config.isInteractive() && { diagnosticsSink: ... }) pattern is idiomatic. The setDebugLogSession(this) call at config.ts:926 precedes initializeTelemetry(this) at config.ts:1101, so the file sink always has a valid session in normal startup — the author's claim checks out.

Needs Verification

No concerns that require code outside the supplied context. All production call sites (LogToSpanProcessor construction in sdk.ts, debugLogger.warn in debugLogger.ts, setDebugLogSession/initializeTelemetry ordering in config.ts) were verified in the working tree.


Qwen Code /review · qwen3.7-max · tier DEEP

doudouOUC added a commit to doudouOUC/qwen-code that referenced this pull request May 25, 2026
Add coverage for two paths flagged by the DEEP-tier review on QwenLM#4482:

- `err.message || err.name || 'unknown'` chain: the third branch (both
  message and name empty) was never exercised. Scenario: minified
  environments that strip `Error.name`. Test constructs
  `Object.assign(new Error(''), { name: '' })` and asserts the output
  contains `error="unknown"`.

- `typeof extra.data === 'string' && extra.data.length > 0` guard: the
  empty-string case (HTTP response with empty body) was never tested,
  so a future loosening to `!== undefined` would silently start
  emitting `data=""`. Test asserts `data=` is absent.

Both branches are real and reachable in production failure modes; the
tests are guards for the documented intent.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
@doudouOUC

Copy link
Copy Markdown
Collaborator Author

Triage of the DEEP-tier review (correctness/security, test coverage, maintainability/performance, undirected audit):

Correctness / Maintainability / Undirected: no findings, all positive.

Test Coverage (2 × P3): both real branches that weren't covered. Repro confirmed each is reachable, and the bot's "future-refactor regression" concern on the data length guard is valid. Added in 65e0919:

  • 'unknown' fallback when both err.message and err.name are empty (minified Error scenario)
  • data field omitted when err.data === '' (guards against loosening the length > 0 predicate to !== undefined)

Tests now 46/46 green.

🤖 Generated with Qwen Code

@doudouOUC doudouOUC requested review from LaZzyMan and yiliang114 May 25, 2026 12:32
@doudouOUC doudouOUC self-assigned this May 25, 2026
@doudouOUC doudouOUC requested a review from wenshao May 25, 2026 12:33
wenshao
wenshao previously approved these changes May 25, 2026

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review

// units), not bytes — non-ASCII payloads may stringify to more bytes; this
// is fine because the cap is a leak/noise budget, not a hard byte limit.
function formatExportError(err: Error | undefined): string {
if (!err) return 'error=unknown';

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Inconsistent quoting: the early return produces error=unknown (unquoted), but when message and name are both empty, the fallthrough path yields error="unknown" (JSON-quoted via JSON.stringify). Two semantically identical "unknown error" states produce greppable-different output.

Suggested change
if (!err) return 'error=unknown';
if (!err) return 'error="unknown"';

Also update the corresponding test expectation at line 873 from 'code=1 error=unknown\n' to 'code=1 error="unknown"\n'.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d6f63b4 — early-return now emits error="unknown" matching the JSON-quoted fallthrough path. Updated test expectation. Both paths greppable-identical now.

// is fine because the cap is a leak/noise budget, not a hard byte limit.
function formatExportError(err: Error | undefined): string {
if (!err) return 'error=unknown';
const extra = err as { code?: number; data?: string };

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The type assertion claims code?: number, but at runtime err.code can be a string (e.g. 'ECONNREFUSED' — exercised by the PR's own ECONNREFUSED test). The typeof guard is correct, but the annotation misleads future maintainers who might simplify it to if (extra.code) and incorrectly label ECONNREFUSED as httpCode=ECONNREFUSED.

Suggested change
const extra = err as { code?: number; data?: string };
const extra = err as { code?: number | string; data?: string };

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d6f63b4 — widened to code?: number | string and added a comment marking the typeof === 'number' guard as load-bearing. Type now matches the runtime reality your ECONNREFUSED test exercises.

// `httpCode` would be a lie, so only emit for numeric codes.
if (typeof extra.code === 'number') parts.push(`httpCode=${extra.code}`);
if (typeof extra.data === 'string' && extra.data.length > 0) {
parts.push(`data=${JSON.stringify(extra.data.slice(0, 200))}`);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No test exercises formatExportError when err.data is a non-string truthy value (e.g. Buffer, number, plain object). The typeof extra.data === 'string' guard's false branch is only tested for undefined and ''. If a future OTLP exporter version returns binary response data, the diagnostic silently omits the data field with no test to catch it.

Consider adding:

it('omits data field when err.data is a non-string type (e.g. Buffer)', async () => {
  const err = Object.assign(new Error('fail'), { code: 500, data: Buffer.from('binary') });
  // ... assert msg contains httpCode=500 but not data=
});

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in d6f63b4it('omits data field when err.data is a non-string truthy value (e.g. Buffer)', ...). Same guard-against-future-loosening rationale you (and the DEEP-tier bot) raised on the empty-string case.

process.stderr.write(
`[LogToSpan] export timeout after ${EXPORT_TIMEOUT_MS}ms\n`,
this.emitDiagnostic(
`[LogToSpan] export timeout after ${EXPORT_TIMEOUT_MS}ms`,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The export timeout diagnostic omits the batch span count. spans.length is available in the closure but not included in the message. When an operator sees a timeout, they cannot distinguish "network is slow" from "batch was too large."

Suggested change
`[LogToSpan] export timeout after ${EXPORT_TIMEOUT_MS}ms`,
this.emitDiagnostic(
`[LogToSpan] export timeout after ${EXPORT_TIMEOUT_MS}ms (${spans.length} span(s))`,
);

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d6f63b4 — timeout diagnostic now reads export timeout after 30000ms (N span(s)). Updated the matching test regex accordingly.

// JSON.stringify so embedded newlines in the thrown message stay on
// one line — same single-line invariant enforced by formatExportError.
const detail = err instanceof Error ? err.message : String(err);
this.emitDiagnostic(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The export-threw (synchronous exception) path extracts only err.message, while the callback-based failure path uses formatExportError which additionally surfaces httpCode and data. If an exporter synchronously throws an OTLPExporterError (carrying code and data), those fields are silently discarded. Consider reusing formatExportError for Error instances:

Suggested change
this.emitDiagnostic(
const detail = err instanceof Error
? formatExportError(err)
: `error=${JSON.stringify(String(err))}`;
this.emitDiagnostic(
`[LogToSpan] export threw: ${detail}`,
);

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d6f63b4 — sync-throw path now delegates Error instances to formatExportError, picking up httpCode + data when an OTLPExporterError is thrown synchronously. Non-Error throws keep the JSON.stringify(String(err)) fallback for the single-line invariant. Added a regression test (surfaces httpCode/data when a sync-thrown error carries OTLPExporterError fields).

// alone so CI / scripts can still see export failures on stderr —
// the canonical diagnostic channel for batch runs.
...(config.isInteractive() && {
diagnosticsSink: (message: string) => debugLogger.warn(message),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] When isInteractive() is true but no debug log session is established yet (or QWEN_DEBUG_LOG_FILE=0), debugLogger.warn() silently no-ops. The diagnostic is fully lost — no stderr, no file. The emitDiagnostic JSDoc references isDebugLoggingDegraded() as the safety net, but that function only tracks file-write failures, not "no session" or "logging disabled."

Consider guarding the injection so the processor falls back to the default stderr sink when the debug logger won't write, or at minimum document this tradeoff in the code comment.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial — pushed back on the stderr fallback but documented the trade-off in d6f63b4.

The two failure modes you described:

  1. "No session yet": in the normal call graph this can't happen — setDebugLogSession(this) runs in the Config constructor at config.ts:958, while initializeTelemetry(this) runs later in the same initialize() flow at config.ts:1136. The session is established before the bridge is constructed, let alone before the bridge can fire its first export.

  2. QWEN_DEBUG_LOG_FILE=0: this is a user-explicit opt-out. Falling back to stderr here would re-introduce exactly the TUI-pollution problem the sink injection was added to prevent — for an audience (interactive-mode users with debug logging deliberately disabled) that has signed up for full silence by setting that env var.

Documented as an accepted trade-off at the sink wiring site. isDebugLoggingDegraded() is admittedly only a partial signal (file-write failures only) — if it turns out to be insufficient in practice, the right escalation is to broaden that signal in debugLogger.ts so all silent-loss cases surface uniformly, rather than splitting the policy at each sink consumer.

doudouOUC added 4 commits May 25, 2026 23:46
The OTel `LogToSpanProcessor` bridge (used when traces+metrics are over
OTLP but logs aren't, e.g. Alibaba Cloud ARMS) had two diagnostic issues:

1. Empty error messages. When the OTLP HTTP exporter callback returned
   `{ code: FAILED, error }`, `error.message` is the HTTP reason-phrase —
   always empty on HTTP/2. The bridge printed literally
   `[LogToSpan] export failed: code=1 error=` with zero actionable info.
   Now we surface `name`, `httpCode` (only when numeric), and a 200-byte
   `data` snippet from the underlying OTLPExporterError, with JSON-escape
   on user content so embedded newlines can't tear the log line.

2. TUI pollution. The processor wrote diagnostics to `process.stderr`
   directly. Ink only manages stdout, so those writes punched through
   into the rendered terminal area. The processor now accepts an
   injectable `diagnosticsSink`; in interactive mode `sdk.ts` injects a
   sink that routes through `debugLogger.warn` (file-backed). Non-
   interactive runs (CI/scripts) keep the default stderr sink so export
   failures remain visible on the canonical batch-diagnostic channel.

Backward compatibility is preserved: the legacy numeric-arg constructor
keeps stderr behavior; the options-object overload gains the new field.
Other raw `process.stderr.write` sites in the CLI (errors.ts,
startupProfiler.ts, useGeminiStream.ts, etc.) have the same TUI-leak
pattern but are intentionally left out of this PR.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Fix TS2353 compile error in keeps-processing-after-sink-throw test:
  the mock callback type was narrowed to `{ code: number }` and rejected
  the `error?: Error` field that `ExportResult` actually carries. Widen
  the type. (wenshao Critical — this was the root cause of CI lint/test
  failures across all 3 OS.)

- JSON.stringify the payload of the `export threw` diagnostic so a
  synchronously-thrown error with embedded newlines stays on one line,
  same single-line invariant enforced by `formatExportError`. Add
  coverage for both the newline case and the non-Error throw branch.
  (wenshao Suggestion)

- Remove the dead `makeFailingProcessor(err)` call in the JSON-escape
  test that was immediately overwritten — the orphaned processor
  retained a live `setInterval` timer with no cleanup. (wenshao
  Suggestion)

- Rename the "200 bytes" test name and comment to "200 characters" to
  match the actual `string.slice(0, 200)` (UTF-16 code units) behavior;
  add a note on the cap being a leak/noise budget, not a hard byte
  limit. (Copilot 2x)

- Strengthen the non-interactive test to actually trigger a failed
  export against the real `LogToSpanProcessor` and assert the default
  sink writes to stderr, not just that `diagnosticsSink === undefined`.
  (github-actions High QwenLM#2)

- Reword the "shell-active bytes" comment to "characters that would
  break log parsing" — the actual concern is log-line tearing, not
  shell semantics. (github-actions Medium)

- Update class JSDoc to mention the diagnostics-sink responsibility
  alongside the bridge purpose. (github-actions Low)

- Minor JSDoc wording fix on `LogToSpanDiagnosticsSink` type for
  clarity around the no-trailing-newline contract. (github-actions Low)

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
Add coverage for two paths flagged by the DEEP-tier review on QwenLM#4482:

- `err.message || err.name || 'unknown'` chain: the third branch (both
  message and name empty) was never exercised. Scenario: minified
  environments that strip `Error.name`. Test constructs
  `Object.assign(new Error(''), { name: '' })` and asserts the output
  contains `error="unknown"`.

- `typeof extra.data === 'string' && extra.data.length > 0` guard: the
  empty-string case (HTTP response with empty body) was never tested,
  so a future loosening to `!== undefined` would silently start
  emitting `data=""`. Test asserts `data=` is absent.

Both branches are real and reachable in production failure modes; the
tests are guards for the documented intent.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
- Quote `error="unknown"` in the err-missing early return so it matches
  the JSON.stringify output produced when message+name fall back to
  'unknown'. Two paths now emit identical greppable output for
  semantically identical "unknown error" states.

- Widen the duck-typed cast to `code?: number | string` and add a
  load-bearing comment on the `typeof === 'number'` guard. The type now
  matches reality (Node networking errors surface string codes like
  ECONNREFUSED), preventing a future simplification to `if (extra.code)`
  that would mislabel networking errors as HTTP statuses.

- Reuse `formatExportError` in the sync-throw path so a synchronously-
  thrown OTLPExporterError surfaces its httpCode and data, matching the
  callback-failure path. Non-Error throws still fall back to
  JSON.stringify on String(err) to preserve the single-line invariant.

- Include batch span count in the timeout diagnostic
  ("(N span(s))") — lets an operator distinguish slow network from
  oversized batch when troubleshooting timeouts.

- Add a test for non-string truthy err.data (Buffer) — the `typeof ===
  'string'` guard's false branch was only covered for undefined and
  empty string, so a future refactor relaxing the guard would silently
  start emitting binary garbage with no test to catch it.

- Document the QWEN_DEBUG_LOG_FILE=0 trade-off at the sink wiring site:
  interactive mode plus disabled debug log = full diagnostic silence.
  This is an accepted user opt-in trade-off; falling back to stderr
  would re-introduce the TUI pollution this injection prevents.

🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
@doudouOUC doudouOUC force-pushed the fix/log-to-span-bridge-diagnostics branch from d6f63b4 to 7fc32b1 Compare May 25, 2026 15:48

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No review findings. Downgraded from Approve to Comment: CI still running. — qwen3.7-max via Qwen Code /review

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found. LGTM! ✅ — qwen3.7-max via Qwen Code /review

@wenshao

wenshao commented May 25, 2026

Copy link
Copy Markdown
Collaborator

本地真实测试报告(merge 参考)

按 PR body 的 Test plan 完成了静态检查 + 单元测试 + 真实 tmux 端到端验证。LogToSpan bridge 的诊断输出确实被正确路由到 debug 日志文件、TUI 渲染区彻底干净,新的 error="..." httpCode=N data="..." 格式按预期工作。可以合并。

测试环境

测试 commit 7fc32b192 (fix(telemetry): tighten LogToSpan diagnostics per wenshao review)
平台 macOS arm64 (Darwin 25.4.0)
Node v22.17.0
Worktree /Users/wenshao/Work/git/qwen-code/.qwen/tmp/review-pr-4482(基于 pull/4482/head
模型 API Key auth → qwen3.7-max

1. 构建与静态检查

npm install         → 1373 packages(首次安装)
npm run build       → 成功(85s 主链路 + vscode-ide-companion)
npx tsc --noEmit -p packages/core   → CORE_EXIT=0
npx tsc --noEmit -p packages/cli    → CLI_EXIT=0
npx eslint packages/core/src/telemetry/{log-to-span-processor,sdk}.{ts,test.ts}
                    → ESLINT_EXIT=0(4 个文件全部 clean)

2. 单元测试

npx vitest run packages/core/src/telemetry/log-to-span-processor.test.ts \
                packages/core/src/telemetry/sdk.test.ts
Test file Tests PR body 当时的声明
log-to-span-processor.test.ts 48 42
sdk.test.ts 57 38
合计 105 80

实际数量 > PR body 声明,因为后续两次 review-driven commit 又补了 25 个 case:test(telemetry): cover two unreachable formatExportError branchesfix(telemetry): tighten LogToSpan diagnostics per wenshao review。全部通过。

3. 真实 CLI 端到端(tmux + 假 OTLP 服务器)

这是 PR body Test plan 里 "Manual verification" 的可重复版本——验证两件事:

  1. error="..." httpCode=N data="..." 新格式确实出现在 debug 日志文件中
  2. TUI 渲染区彻底无 [LogToSpan] export failed: ... 泄漏

步骤

  1. 启动假 OTLP 服务器(监听 127.0.0.1:14318,对所有请求返回 HTTP 403 + 一段 plain-text body)
  2. 配置环境——关键点:必须把 QWEN_TELEMETRY_OTLP_ENDPOINT 设为空字符串才能让 bridge 触发(否则 getTelemetryOtlpEndpoint() 默认返回 http://localhost:4317,使 logsUrl 解析为非空 URL,bridge 不会启用):
export QWEN_TELEMETRY_ENABLED=true
export QWEN_TELEMETRY_OTLP_PROTOCOL=http
export QWEN_TELEMETRY_OTLP_ENDPOINT=""           # ← 空串,让 logsUrl=undefined
export QWEN_TELEMETRY_OTLP_TRACES_ENDPOINT=http://127.0.0.1:14318/v1/traces
unset QWEN_TELEMETRY_TARGET                      # 注意:valid 值仅 local/gcp,不需要设
tmux new-session -d -s pr4482 -x 220 -y 50
tmux send-keys -t pr4482 'npm run dev' Enter
  1. 确认 bridge 已启用——debug 日志第 3 行:
[DEBUG] [OTEL] OTLP HTTP endpoints: traces=http://127.0.0.1:14318/v1/traces, logs=none, metrics=none

logs=none 即触发 LogToSpanProcessor 分支。

  1. 发送 prompt 触发 log records
> 一句话告诉我什么是Raft协议。
  1. 5 秒后 bridge 周期性 flush,OTLP 导出收到 403 → 走入新加的 formatExportError 路径。

假服务器收到的请求

[fake-otlp] 17:11:32 POST /v1/traces (#2)
[fake-otlp] 17:11:32 POST /v1/traces (#3)
[fake-otlp] 17:11:47 POST /v1/traces (#4)
[fake-otlp] 17:11:50 POST /v1/traces (#5)
[fake-otlp] 17:11:52 POST /v1/traces (#6)
... (合计 11 次 POST 跨两轮测试)

Debug 日志文件中的 bridge 诊断(关键证据

~/.qwen/debug/<sessionId>.txt 第 113 行起:

2026-05-25T17:11:32.496Z [WARN] [OTEL] [trace_id=db4eccc3... span_id=9c66e6c4...] [LogToSpan] export failed: code=1 error="Forbidden" httpCode=403 data="Forbidden: invalid license token (PR 4482 manual test)"
2026-05-25T17:11:47.489Z [WARN] [OTEL] [LogToSpan] export failed: code=1 error="Forbidden" httpCode=403 data="Forbidden: invalid license token (PR 4482 manual test)"
2026-05-25T17:11:52.492Z [WARN] [OTEL] [LogToSpan] export failed: code=1 error="Forbidden" httpCode=403 data="Forbidden: invalid license token (PR 4482 manual test)"
2026-05-25T17:11:57.497Z [WARN] [OTEL] [LogToSpan] export failed: code=1 error="Forbidden" httpCode=403 data="Forbidden: invalid license token (PR 4482 manual test)"

逐项核对:

  • [WARN] 级别(即 debugLogger.warn 注入路径,非 stderr)
  • error="Forbidden"——从 OTLPExporterErrormessage/name,JSON-escape 后写入
  • httpCode=403——只在 typeof err.code === 'number' 时打印(这正是 PR body 强调的"load-bearing typeof guard,不能放宽到 truthy"的分支)
  • data="Forbidden: invalid license token (PR 4482 manual test)"——取响应体前 200 字符,JSON-escape

TUI 渲染区扫描(反向证据

$ tmux capture-pane -t pr4482 -p -S -2000 | grep -E "LogToSpan|export failed"
(no output)

整个 2000 行 scrollback 内 LogToSpan / export failed 字符串。模型回答正常渲染,无任何换行或符号被破坏。这就是 PR 修复的核心目标——bridge 诊断不再越过 Ink 渲染区域写到原生 stderr。

覆盖矩阵:PR claim → 验证

PR 声称 验证
error= 字段从 OTLPExporterError.message/.name 取,JSON-escape ✅ 实际看到 error="Forbidden"
httpCode=N 仅在 code 为 number 时打印(避免把 ECONNREFUSED 误标为 HTTP code) ✅ 看到 httpCode=403;前一轮 server 配错时 code 不是数字,输出里也确实没有 httpCode 字段
data="..." 取响应体前 200 字符 ✅ 看到 data="Forbidden: invalid license token (PR 4482 manual test)"(< 200,未截断;超长 case 由单测覆盖)
用户内容 JSON-escape,不会撕裂日志行 [LogToSpan] 行均为单行;newline 注入 case 由单测覆盖
交互模式注入 diagnosticsSink 路由到 debugLogger.warn ✅ 实际 [WARN] [OTEL] 落在文件,不去 stderr
非交互模式保留默认 stderr sink ✅ 由单测覆盖(sdk.test.ts: "No explicit sink → processor falls back to its default (stderr)")
旧的 3-arg 数字构造器(new LogToSpanProcessor(exporter, flushIntervalMs, maxBufferSize))保持 stderr 行为 ✅ 由单测覆盖
LogToSpanDiagnosticsSink 不从 telemetry/index.tscore/src/index.ts 重导出 grep -rn "LogToSpanDiagnosticsSink" packages/core/src/index.ts packages/core/src/telemetry/index.ts → 无匹配

4. 一点配置可用性提醒(非阻塞)

我在搭建复现时踩过两个坑,文档/帮助里似乎没明说,记下来供其他 reviewer 或后续用户参考:

  1. QWEN_TELEMETRY_TARGET 的合法值只有 local / gcp——不能otlp。OTLP 路径是通过设置 QWEN_TELEMETRY_OTLP_* 系列环境变量自动启用的。
  2. 想触发 LogToSpan bridge 必须让 getTelemetryOtlpEndpoint() 返回空串。该 getter 实现是 this.telemetrySettings.otlpEndpoint ?? DEFAULT_OTLP_ENDPOINT,所以只设 QWEN_TELEMETRY_OTLP_TRACES_ENDPOINT 而不显式 export QWEN_TELEMETRY_OTLP_ENDPOINT="" 时,logsUrl 会被解析成 http://localhost:4317/v1/logs,bridge 不会启用。这是 OTLP 配置的现有行为,与本 PR 无关,但对验证本 PR 的 manual 测试很关键——值得在 docs 里补一句。

5. 结论

  • ✅ 静态检查(build / tsc / eslint)全绿
  • ✅ 105/105 单测通过(log-to-span-processor 48 + sdk 57,超 PR body 声称的 80 个)
  • ✅ 真实 CLI 在 OTLP 后端 403 时,bridge 诊断按新格式 error="..." httpCode=403 data="..." 落入 debug 日志文件
  • ✅ TUI 渲染区彻底无 [LogToSpan] 泄漏(2000 行 scrollback 零匹配)

从本地真实测试角度,该 PR 可以合并。 ARMS Singapore 那种 "traces+metrics 通但 logs 不通" 的真实失败场景下,运维拿到的诊断从原来的 code=1 error=(空)变成了 code=1 error="..." httpCode=N data="...",且不再污染 TUI——这是 PR body 承诺的两件事,本地实测都成立。

@doudouOUC

Copy link
Copy Markdown
Collaborator Author

感谢这么细的端到端验证 🙏

特别想 highlight 几条:

  • TUI scrollback 0 匹配——tmux capture-pane -S -2000 | grep "LogToSpan|export failed" 出空集,是本 PR 的核心目标得到独立证实
  • httpCode=403 这条——刚好命中"load-bearing typeof guard"的活路径,这是上一轮 review 里你和 DEEP-tier bot 各自从不同角度都标过的点,看到实跑里它如愿打印出 numeric 而非误标 ECONNREFUSED,挺安心的
  • 新格式 error="..." httpCode=N data="..." 在真实 ARMS-403-with-body 场景下完整落进文件日志——这就是 PR 头图截到的 code=1 error=(空)问题的实际治疗效果

第 4 节那两条 OTLP 配置可用性提示(QWEN_TELEMETRY_TARGET 的合法 enum、需要显式 OTLP_ENDPOINT="" 才能触发 bridge),同意你的判断——属于 OTLP 配置文档的现有 quirk,不在本 PR scope。但确实值得补到 docs,让我开个 follow-up issue 跟,或者你直接 ping 一个 owner 也行。

PR side 不再有改动,等 OWNER/MEMBER 一票就能合。再次感谢。

🤖 Generated with Qwen Code

@doudouOUC doudouOUC merged commit 34b7d47 into QwenLM:main May 27, 2026
11 checks passed
xaelistic pushed a commit to xaelistic/qwen-code that referenced this pull request Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type/bug Something isn't working as expected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants