feat(telemetry): define HTTP OTLP endpoint behavior and signal routing#3779
Conversation
- Add resolveHttpOtlpUrl() that appends /v1/traces, /v1/logs, /v1/metrics to base HTTP OTLP endpoints per the OpenTelemetry specification - Add per-signal endpoint overrides (otlpTracesEndpoint, otlpLogsEndpoint, otlpMetricsEndpoint) for backends with non-standard paths (e.g. Alibaba Cloud) - Add LogToSpanProcessor that bridges OTel log records to spans for traces-only backends, with session-based traceId correlation and error status propagation - Auto-wire LogToSpanProcessor when traces URL exists but logs URL doesn't - Validate per-signal URLs gracefully (log error + skip, don't crash) - Preserve query strings when appending signal paths to URLs - Guard gRPC branch against missing base endpoint with per-signal config - Update telemetry documentation with signal routing semantics and Alibaba Cloud HTTP per-signal endpoint examples Closes #3734 Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
…ests Use typed ExportedSpan interface and bracket notation for index signature properties to satisfy strict TypeScript checks in CI. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
CodeQL flagged MD5 as a weak cryptographic algorithm when used with session.id (considered sensitive data). Switch to SHA-256 truncated to 32 hex chars to satisfy CodeQL while maintaining the same traceId format required by the OTel specification. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
Additional Critical (no diff anchor): process.on('exit', shutdownTelemetry) at packages/core/src/telemetry/sdk.ts:258-260 is an async no-op. Node's 'exit' listeners must be synchronous — shutdownTelemetry() is async, so await sdk.shutdown() enqueues microtasks that never run before the process tears down. The SIGINT/SIGTERM handlers also lack await and a follow-up process.exit(). Up to 5 s of telemetry (the BatchSpanProcessor scheduled delay + LogToSpanProcessor flush interval) is silently dropped on every clean termination. Fix: SIGINT/SIGTERM should await shutdownTelemetry(); process.exit(0);, and the 'exit' handler should be removed since it cannot do useful async work.
— claude-opus-4-7 via Claude Code /qreview
… INDEX - QwenLM/qwen-code#3779 — feat(telemetry): HTTP OTLP signal routing (merge-after-nits) - aaif-goose/goose#8940 — show extension status and usage in goose2 (needs-discussion) - INDEX: record drip-231 (8 PRs, full 6/6 repo coverage, 2 as-is / 5 after-nits / 0 RC / 1 ND)
…ness - Wrap JSON.stringify in try/catch to handle circular refs and BigInt - Add export timeout (30s) and try/catch to prevent hung shutdown - Track in-flight exports to avoid interval-vs-shutdown race condition - Fix deriveSpanStatus: use truthy checks (!!), drop success===false heuristic since declined tool calls are normal, not errors - Enforce http(s) scheme in validateUrl to reject file:/javascript: URLs - Change DiagLogLevel from ERROR to WARN to preserve operational diagnostics - Preserve logRecord.instrumentationScope instead of hardcoding - Forward severityNumber/severityText as span attributes - Add tests for circular refs, error status edge cases, severity Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Remove async process exit handlers from telemetry initialization and route SDK shutdown through Config cleanup so normal CLI exit paths await pending telemetry exports. Keep shutdown idempotent while an SDK shutdown is in flight. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
[Critical] Process signal handlers removed with no replacement in interactive mode.
process.on('SIGTERM'/'SIGINT'/'exit', ...) handlers were removed from initializeTelemetry(). Config.shutdown() now calls shutdownTelemetry() in its finally block, but Config.shutdown() is never called from the interactive (TUI) code path — only nonInteractiveCli.ts and mcp/reconnect.ts call it. When a user hits Ctrl+C or the process receives SIGTERM in interactive mode, the OTel SDK (including LogToSpanProcessor's buffered spans) will not be flushed, silently dropping telemetry data.
Suggested fix: Either register process signal handlers in the interactive session startup that call config.shutdown(), or restore the signal handlers in sdk.ts.
— pai/glm-5 via Qwen Code /review
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
|
Addressed the no-diff shutdown review bodies as well:
GitHub does not expose review-body comments as resolvable review threads, so there is no separate "Resolve conversation" action for these two top-level review bodies. |
There was a problem hiding this comment.
Pull request overview
This PR clarifies and implements HTTP OTLP signal routing (traces/logs/metrics) with optional per-signal endpoint overrides, and adds a logs→spans bridge for traces-only backends. It also adjusts shutdown/cleanup wiring so telemetry shutdown happens via the app’s cleanup paths rather than async process-exit handlers.
Changes:
- Add
resolveHttpOtlpUrl()and per-signal HTTP OTLP endpoint resolution/validation, including config/env overrides. - Introduce
LogToSpanProcessorto bridge OTel log records into spans when logs export isn’t configured but traces export is. - Update CLI interactive signal handling and related tests; update telemetry docs and config tests for the new behavior.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/telemetry/sdk.ts | Implements HTTP OTLP per-signal URL routing, per-signal overrides, optional logs→spans bridge, and shutdown promise guarding. |
| packages/core/src/telemetry/sdk.test.ts | Adds unit tests for URL routing and updated exporter wiring behavior. |
| packages/core/src/telemetry/log-to-span-processor.ts | Adds a LogToSpanProcessor that converts log records into spans and exports them. |
| packages/core/src/telemetry/log-to-span-processor.test.ts | Adds unit coverage for the bridge processor behavior and lifecycle. |
| packages/core/src/telemetry/config.ts | Resolves per-signal OTLP endpoints from QWEN_/OTEL_ env vars and settings. |
| packages/core/src/telemetry/config.test.ts | Updates expectations and adds coverage for per-signal env/settings resolution. |
| packages/core/src/config/config.ts | Extends TelemetrySettings, adds getters for per-signal endpoints, and ensures telemetry shutdown during Config.shutdown(). |
| packages/core/src/config/config.test.ts | Tests telemetry shutdown gating and new per-signal endpoint getters. |
| packages/cli/src/gemini.tsx | Adds interactive SIGINT/SIGTERM cleanup handler installation and removes raw-mode-only signal hooks. |
| packages/cli/src/gemini.test.tsx | Adds test coverage for interactive SIGTERM cleanup and ensures signal listeners are cleaned up between tests. |
| docs/developers/development/telemetry.md | Documents HTTP OTLP routing semantics and per-signal endpoint overrides/env vars. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
wenshao
left a comment
There was a problem hiding this comment.
No issues found. LGTM! ✅ — gpt-5.5 via Qwen Code /review
#3779) * feat(telemetry): define HTTP OTLP endpoint behavior and signal routing - Add resolveHttpOtlpUrl() that appends /v1/traces, /v1/logs, /v1/metrics to base HTTP OTLP endpoints per the OpenTelemetry specification - Add per-signal endpoint overrides (otlpTracesEndpoint, otlpLogsEndpoint, otlpMetricsEndpoint) for backends with non-standard paths (e.g. Alibaba Cloud) - Add LogToSpanProcessor that bridges OTel log records to spans for traces-only backends, with session-based traceId correlation and error status propagation - Auto-wire LogToSpanProcessor when traces URL exists but logs URL doesn't - Validate per-signal URLs gracefully (log error + skip, don't crash) - Preserve query strings when appending signal paths to URLs - Guard gRPC branch against missing base endpoint with per-signal config - Update telemetry documentation with signal routing semantics and Alibaba Cloud HTTP per-signal endpoint examples Closes #3734 Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): fix TS noPropertyAccessFromIndexSignature errors in tests Use typed ExportedSpan interface and bracket notation for index signature properties to satisfy strict TypeScript checks in CI. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): replace MD5 with SHA-256 for traceId derivation CodeQL flagged MD5 as a weak cryptographic algorithm when used with session.id (considered sensitive data). Switch to SHA-256 truncated to 32 hex chars to satisfy CodeQL while maintaining the same traceId format required by the OTel specification. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): address review feedback for LogToSpanProcessor robustness - Wrap JSON.stringify in try/catch to handle circular refs and BigInt - Add export timeout (30s) and try/catch to prevent hung shutdown - Track in-flight exports to avoid interval-vs-shutdown race condition - Fix deriveSpanStatus: use truthy checks (!!), drop success===false heuristic since declined tool calls are normal, not errors - Enforce http(s) scheme in validateUrl to reject file:/javascript: URLs - Change DiagLogLevel from ERROR to WARN to preserve operational diagnostics - Preserve logRecord.instrumentationScope instead of hardcoding - Forward severityNumber/severityText as span attributes - Add tests for circular refs, error status edge cases, severity Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): flush sdk shutdown through cleanup Remove async process exit handlers from telemetry initialization and route SDK shutdown through Config cleanup so normal CLI exit paths await pending telemetry exports. Keep shutdown idempotent while an SDK shutdown is in flight. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): harden bridged log shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): address review follow-ups Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
… and CLI flags (#4066) * docs(telemetry): align config and docs semantics for target, outfile, and CLI flags - Remove stale warning note "This feature requires corresponding code changes" — the OTLP implementation is now complete (#3779, #4061) - Clarify that `target` is an informational destination label and does not control exporter routing; `otlpEndpoint` or `outfile` must be set to configure where data is sent - Mark `--telemetry-target` CLI flag as deprecated in the configuration table to match the deprecateOption() call in cli/src/config/config.ts - Fix `outfile` / `QWEN_TELEMETRY_OUTFILE` descriptions: remove the incorrect "when target is local" qualifier — outfile overrides OTLP export regardless of the target value - Simplify the file-based output example by removing the now-redundant `"target": "local"` and `"otlpEndpoint": ""` fields Closes the "Align telemetry config and docs semantics for target, useCollector, otlpEndpoint, otlpProtocol, and outfile" checklist item in #3731. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * docs(telemetry): address Copilot review comments on outfile and target descriptions - Fix outfile table row in telemetry.md: "overrides `otlpEndpoint`" → "overrides OTLP export" (outfile disables all OTLP exporting, not just the base endpoint) - Use fully-qualified setting names (`telemetry.otlpEndpoint`, `telemetry.outfile`) in the target description in settings.md for consistency with the rest of the table 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * docs(telemetry): update QWEN_TELEMETRY_TARGET env var description and add outfile note - Align QWEN_TELEMETRY_TARGET env var description with the updated telemetry.target setting semantics (informational label, not routing) - Add a note after the file-based output example clarifying that outfile automatically disables OTLP export 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
… and CLI flags (QwenLM#4066) * docs(telemetry): align config and docs semantics for target, outfile, and CLI flags - Remove stale warning note "This feature requires corresponding code changes" — the OTLP implementation is now complete (QwenLM#3779, QwenLM#4061) - Clarify that `target` is an informational destination label and does not control exporter routing; `otlpEndpoint` or `outfile` must be set to configure where data is sent - Mark `--telemetry-target` CLI flag as deprecated in the configuration table to match the deprecateOption() call in cli/src/config/config.ts - Fix `outfile` / `QWEN_TELEMETRY_OUTFILE` descriptions: remove the incorrect "when target is local" qualifier — outfile overrides OTLP export regardless of the target value - Simplify the file-based output example by removing the now-redundant `"target": "local"` and `"otlpEndpoint": ""` fields Closes the "Align telemetry config and docs semantics for target, useCollector, otlpEndpoint, otlpProtocol, and outfile" checklist item in QwenLM#3731. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * docs(telemetry): address Copilot review comments on outfile and target descriptions - Fix outfile table row in telemetry.md: "overrides `otlpEndpoint`" → "overrides OTLP export" (outfile disables all OTLP exporting, not just the base endpoint) - Use fully-qualified setting names (`telemetry.otlpEndpoint`, `telemetry.outfile`) in the target description in settings.md for consistency with the rest of the table 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * docs(telemetry): update QWEN_TELEMETRY_TARGET env var description and add outfile note - Align QWEN_TELEMETRY_TARGET env var description with the updated telemetry.target setting semantics (informational label, not routing) - Add a note after the file-based output example clarifying that outfile automatically disables OTLP export 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
… and CLI flags (#4066) * docs(telemetry): align config and docs semantics for target, outfile, and CLI flags - Remove stale warning note "This feature requires corresponding code changes" — the OTLP implementation is now complete (#3779, #4061) - Clarify that `target` is an informational destination label and does not control exporter routing; `otlpEndpoint` or `outfile` must be set to configure where data is sent - Mark `--telemetry-target` CLI flag as deprecated in the configuration table to match the deprecateOption() call in cli/src/config/config.ts - Fix `outfile` / `QWEN_TELEMETRY_OUTFILE` descriptions: remove the incorrect "when target is local" qualifier — outfile overrides OTLP export regardless of the target value - Simplify the file-based output example by removing the now-redundant `"target": "local"` and `"otlpEndpoint": ""` fields Closes the "Align telemetry config and docs semantics for target, useCollector, otlpEndpoint, otlpProtocol, and outfile" checklist item in #3731. 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * docs(telemetry): address Copilot review comments on outfile and target descriptions - Fix outfile table row in telemetry.md: "overrides `otlpEndpoint`" → "overrides OTLP export" (outfile disables all OTLP exporting, not just the base endpoint) - Use fully-qualified setting names (`telemetry.otlpEndpoint`, `telemetry.outfile`) in the target description in settings.md for consistency with the rest of the table 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code) * docs(telemetry): update QWEN_TELEMETRY_TARGET env var description and add outfile note - Align QWEN_TELEMETRY_TARGET env var description with the updated telemetry.target setting semantics (informational label, not routing) - Add a note after the file-based output example clarifying that outfile automatically disables OTLP export 🤖 Generated with [Qwen Code](https://github.com/QwenLM/qwen-code)
QwenLM#3779) * feat(telemetry): define HTTP OTLP endpoint behavior and signal routing - Add resolveHttpOtlpUrl() that appends /v1/traces, /v1/logs, /v1/metrics to base HTTP OTLP endpoints per the OpenTelemetry specification - Add per-signal endpoint overrides (otlpTracesEndpoint, otlpLogsEndpoint, otlpMetricsEndpoint) for backends with non-standard paths (e.g. Alibaba Cloud) - Add LogToSpanProcessor that bridges OTel log records to spans for traces-only backends, with session-based traceId correlation and error status propagation - Auto-wire LogToSpanProcessor when traces URL exists but logs URL doesn't - Validate per-signal URLs gracefully (log error + skip, don't crash) - Preserve query strings when appending signal paths to URLs - Guard gRPC branch against missing base endpoint with per-signal config - Update telemetry documentation with signal routing semantics and Alibaba Cloud HTTP per-signal endpoint examples Closes QwenLM#3734 Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): fix TS noPropertyAccessFromIndexSignature errors in tests Use typed ExportedSpan interface and bracket notation for index signature properties to satisfy strict TypeScript checks in CI. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): replace MD5 with SHA-256 for traceId derivation CodeQL flagged MD5 as a weak cryptographic algorithm when used with session.id (considered sensitive data). Switch to SHA-256 truncated to 32 hex chars to satisfy CodeQL while maintaining the same traceId format required by the OTel specification. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): address review feedback for LogToSpanProcessor robustness - Wrap JSON.stringify in try/catch to handle circular refs and BigInt - Add export timeout (30s) and try/catch to prevent hung shutdown - Track in-flight exports to avoid interval-vs-shutdown race condition - Fix deriveSpanStatus: use truthy checks (!!), drop success===false heuristic since declined tool calls are normal, not errors - Enforce http(s) scheme in validateUrl to reject file:/javascript: URLs - Change DiagLogLevel from ERROR to WARN to preserve operational diagnostics - Preserve logRecord.instrumentationScope instead of hardcoding - Forward severityNumber/severityText as span attributes - Add tests for circular refs, error status edge cases, severity Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): flush sdk shutdown through cleanup Remove async process exit handlers from telemetry initialization and route SDK shutdown through Config cleanup so normal CLI exit paths await pending telemetry exports. Keep shutdown idempotent while an SDK shutdown is in flight. Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): harden bridged log shutdown Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> * fix(telemetry): address review follow-ups Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com> --------- Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Summary
resolveHttpOtlpUrl()appends/v1/traces,/v1/logs,/v1/metricsto base endpoints per the OTel specification, with query string preservationotlpTracesEndpoint,otlpLogsEndpoint,otlpMetricsEndpoint) in config/env vars for backends with non-standard paths (e.g., Alibaba Cloud's/api/otlp/traces)LogToSpanProcessorthat bridges OTel log records to spans for traces-only backends, with session-based traceId correlation (SHA-256 ofsessionId, truncated to a 128-bit trace ID) and error status propagationTest plan
resolveHttpOtlpUrl: base URL, trailing slash, explicit signal path, custom prefix, HTTPS, query stringsLogToSpanProcessor: attribute mapping, duration handling, traceId derivation, error status, flush/shutdown lifecycleQWEN_TELEMETRY_OTLP_{TRACES,LOGS,METRICS}_ENDPOINTenv vars人工验证
Closes #3734
🤖 Generated with Qwen Code