feat(telemetry): add interaction span and detailed sensitive attributes#4097
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces new telemetry primitives for hierarchical “session tracing” and adds optional (flag-gated) detailed span attributes to capture richer LLM/tool content in traces.
Changes:
- Added a new
session-tracingmodule to create interaction/LLM/tool/tool-execution spans and manage their lifecycle. - Added
detailed-span-attributeshelpers (with tests) to attach truncated/deduplicated prompt/tool/schema/result content to spans when sensitive attributes are enabled. - Wired interaction span start/end and user-prompt attributes into
GeminiClient.sendMessageStream, and added tool input/result + LLM prompt/output attribute attachment in core execution paths.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/telemetry/session-tracing.ts | New session tracing span lifecycle + AsyncLocalStorage tracking + testing reset helper. |
| packages/core/src/telemetry/session-tracing.test.ts | Unit tests for the new session tracing API. |
| packages/core/src/telemetry/sdk.ts | Ensures active interaction is ended on telemetry shutdown. |
| packages/core/src/telemetry/index.ts | Re-exports new tracing and detailed-attribute helpers from the telemetry barrel. |
| packages/core/src/telemetry/detailed-span-attributes.ts | New helpers for adding sensitive prompt/tool/schema/output attributes with truncation + hash dedup. |
| packages/core/src/telemetry/detailed-span-attributes.test.ts | Unit tests for detailed span attribute helpers. |
| packages/core/src/telemetry/constants.ts | Adds span name constants for the new session tracing spans. |
| packages/core/src/core/loggingContentGenerator/loggingContentGenerator.ts | Attaches system prompt/tool schema/model output attributes onto existing API spans. |
| packages/core/src/core/coreToolScheduler.ts | Attaches tool input/result attributes onto tool spans. |
| packages/core/src/core/client.ts | Starts/ends interaction spans around top-level message handling and attaches user prompt attributes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Cross-cutting review notes (in addition to inline comments)1. Branch is stacked on a now-merged PR — please rebasePR #4071 was merged ~6h before this review as Rebasing on 2.
|
| Item | Block merge? |
|---|---|
| Rebase | Yes |
| Doc update | Yes |
| Integration tests | Nice to have |
| Cost note in docs | Nice to have |
| E2E check | Yes |
e6bd431 to
146fb0c
Compare
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
wenshao
left a comment
There was a problem hiding this comment.
[Critical] packages/cli/src/config/settingsSchema.ts — includeSensitiveSpanAttributes description is now inaccurate.
The schema description says "Only controls bridge spans; OTel logs and other telemetry sinks may still receive response_text." This PR now uses the same flag to gate direct native OTel span attribute writes for user prompts (new_context), system prompts (system_prompt), tool I/O (tool_input, tool_result), and model output (response.model_output) via detailed-span-attributes.ts. An operator reading this description would believe enabling the flag only affects the log-to-span bridge, not that it streams full conversation content and tool I/O into every trace span.
Suggested fix: Update the description to reflect the expanded scope:
"When enabled, user prompts, system prompts, tool inputs, tool outputs, and model responses are written to native OTel span attributes (not limited to the log-to-span bridge). Defaults to false. Warning: enabling this may expose sensitive data from file contents, shell commands, and conversation history to your OTLP backend."
— DeepSeek/deepseek-v4-pro via Qwen Code /review
|
Re: review comment on Adopted in 3eb6a8c — updated
Good catch — the old description ("Only controls bridge spans") was no longer accurate after this PR. |
wenshao
left a comment
There was a problem hiding this comment.
No issues found in incremental diff (description string update). LGTM! ✅ — DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Review summary
跑了完整的本地验证(单测 + tsc + 真实 dashscope 后端 E2E),功能上 detailed span attributes 这部分实现得很扎实,但 PR 标题/描述里宣称的 "hierarchical interaction → llm_request / tool" 在运行时并没有生效,而且合并前还有一个 type error 需要先修。详细如下。
🛑 Blocker — tsc --build 编译失败
合并到当前 main 后,以下 type error 会阻塞 npm run build / npm run typecheck:
packages/core/src/telemetry/detailed-span-attributes.test.ts(47,3): error TS2739:
Type '{ ... }' is missing the following properties from type 'MockSpan': addLink, addLinks
原因:@opentelemetry/api@1.9.0 的 Span 接口要求 addLink / addLinks,而 createMockSpan() 缺这两个方法。CI 在 2026-05-14 通过应该是因为 PR 分支还没 rebase 到 main 上引入这个依赖期望的提交。两行补全就能修:
// packages/core/src/telemetry/detailed-span-attributes.test.ts ~ line 85
addLink() { return this; },
addLinks() { return this; },🚨 功能性 gap — hierarchy 实际并没有连起来
PR 描述:
Adds hierarchical session tracing spans (
interaction→llm_request/tool→tool.execution) … with proper parent-child relationships using AsyncLocalStorage
E2E 实测(QWEN_TELEMETRY_TARGET=local + INCLUDE_SENSITIVE_SPAN_ATTRIBUTES=1,跑 "what is 2+2" 和带 shell 工具的 prompt):
qwen-code.interaction traceId=8d5209bf… parentSpanId=null ← 独立 trace
api.generateContentStream traceId=e611263a… parentSpanId=<session-root>
tool.run_shell_command traceId=e611263a… parentSpanId=<session-root>
interaction span 与 LLM/tool span 不在同一个 trace 里,parentSpanId 也对不上,任何 backend(Jaeger / Honeycomb / Tempo)都会把它显示为孤立的 span 而不是父跨度。
根因 — session-tracing.ts:
startInteractionSpan用的是tracer.startSpan(...)(session-tracing.ts:129),不是startActiveSpan,新建的 span 不会进入 OTel 活动 context- 仅调用了
interactionContext.enterWith(spanContextObj)(session-tracing.ts:144)写入 Node 的AsyncLocalStorage,但 OTel SDK 自己的ContextManager不读这个存储 —— 两个机制不互通 - 下游
withSpan/startSpanWithContext(tracer.ts)通过context.active()找父 span,只能找到 session-root,看不到 interaction
更直接的证据 — startLLMRequestSpan / endLLMRequestSpan / startToolSpan / endToolSpan / startToolExecutionSpan / endToolExecutionSpan 在生产代码里没有一处调用(全仓 grep -v test.ts):
packages/core/src/telemetry/session-tracing.ts ← 定义
packages/core/src/telemetry/index.ts ← re-export
(没有其他调用方)
所以 qwen-code.llm_request / qwen-code.tool / qwen-code.tool.execution 这三种 span 运行时根本没产生过;coreToolScheduler.ts 和 loggingContentGenerator.ts 里走的依然是原来 withSpan('api.generateContentStream', …) / withSpan('tool.<name>', …) 那套,PR 只是给它们额外塞了几个 attribute。
✅ 实际确实在工作的部分
qwen-code.interactionspan 本身正常输出,带有session.id/prompt_id/message_type/model/approval_mode/interaction.sequence/interaction.duration_ms/qwen-code.turn_status—— 对回合级可观测很有价值addUserPromptAttributes写出的new_context正确包含[USER PROMPT]\n…前缀addSystemPromptAttributes:sp_<hex>哈希 + preview + length,第二次同 hash 时system_prompt字段确实被去重(实测两个api.generateContentStreamspan,第一个has_full=true第二个has_full=false)addToolSchemaAttributes/addModelOutputAttributes/addToolInputAttributes/addToolResultAttributes都按预期写入,带[TOOL INPUT: <name>]/[TOOL RESULT: <name>]前缀- 60KB 截断 +
*_truncated/*_original_length元数据正确 includeSensitiveSpanAttributes=false(默认)时 6 个敏感属性全部不写入,隐私门控有效- 75 个新增/相关单测 + 141 个 callsite 测试全部通过,lint 通过
建议的两种 resolution
A. 范围收敛 + 死代码清理(更快,推荐)
把这个 PR 的范围明确为 "interaction span + detailed attributes",更新标题/描述去掉 hierarchy 措辞,并删除以下未被生产路径使用的部分:
session-tracing.ts中startLLMRequestSpan/endLLMRequestSpan/startToolSpan/endToolSpan/startToolExecutionSpan/endToolExecutionSpan及其类型constants.ts中SPAN_LLM_REQUEST/SPAN_TOOL/SPAN_TOOL_EXECUTIONindex.ts里对上述的 re-exportsession-tracing.test.ts中针对这些函数的测试
留下 qwen-code.interaction + 所有 add*Attributes 工具就是一个完整、对得起描述、可发布的 PR。
B. 真正做完层级(更彻底,需要下一轮 review)
如果想保留原始目标,需要把 hierarchy 真正接上:
startInteractionSpan改用startActiveSpan,或显式context.with(trace.setSpan(context.active(), span), …)包裹后续逻辑,把 interaction span 推入 OTel context- 在
loggingContentGenerator.generateContentStream与coreToolScheduler里把现有withSpan('api.generateContentStream', …)/withSpan('tool.<name>', …)替换为新的startLLMRequestSpan/startToolSpan(或者让现有withSpan路径以新名字落地) - 跑一遍 trace 后端(Jaeger 即可)目视确认层级显示正确,再补 E2E 断言
实测复现命令(便于复核):
rm -f /tmp/t.jsonl
QWEN_TELEMETRY_ENABLED=1 \
QWEN_TELEMETRY_OUTFILE=/tmp/t.jsonl \
QWEN_TELEMETRY_INCLUDE_SENSITIVE_SPAN_ATTRIBUTES=1 \
QWEN_TELEMETRY_TARGET=local \
node packages/cli/dist/index.js --prompt "what is 2+2" --max-session-turns 1
jq -c 'select(.name) | {name, sid:._spanContext.spanId, tid:._spanContext.traceId, pid:.parentSpanContext.spanId}' /tmp/t.jsonl倾向方案 A,先把已经做好的部分快速合入。
|
Thanks for the thorough review and E2E findings. Adopted Plan A in commit 97a960d: Removed (dead code that wasn't wired into production):
Kept (what actually works at runtime):
Updated:
Re: the type error blocker — |
wenshao
left a comment
There was a problem hiding this comment.
[Critical] Trace topology silently broken — startInteractionSpan (packages/core/src/telemetry/session-tracing.ts) uses getTracer().startSpan() but never activates the interaction span in the OTel context via trace.setSpan(context.active(), span). It only stores the span in AsyncLocalStorage. However, tracer.ts's getParentContext() checks context.active() (OTel context), not the ALS. Result: LLM/tool spans created by withSpan/startSpanWithContext become siblings of the interaction span (children of the session root), not its children. The previously-removed startLLMRequestSpan tests specifically asserted parent-child relationships — those assertions are gone with no replacement.
Fix: After getTracer().startSpan(SPAN_INTERACTION, ...), activate it: const ctx = trace.setSpan(context.active(), span); interactionContext.enterWith(spanContextObj, ctx);. Then verify the trace hierarchy in Honeycomb/OTel backend.
Other findings (details in inline comments):
coreToolScheduler.ts:1875—safeJsonStringifycrashes onBigInt,isInternalgating missing for tool spanscoreToolScheduler.ts:1913,2287—addToolResultAttributesmissing on pre-hook block and execution exception pathsloggingContentGenerator.ts:225— span leak risk (attributes set before try/catch block)loggingContentGenerator.ts:500— streaming failure drops partially-accumulated model output
— DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
[Critical] Try-block cancellation path (~line 1989, inside try { const toolResult = await promise }) is missing addToolResultAttributes. The PR adds telemetry to 4 early-exit paths, but there are actually 5 — the signal.aborted check at line 1989 (try-block, before result processing) is structurally identical to the catch-block path that was fixed at ~line 2245, but was left uncovered. When a tool completes successfully but is cancelled during result processing, the span will be marked as cancelled with no content context — inconsistent with the other 5 paths that now correctly record tool_result.
Fix:
addToolResultAttributes(
this.config,
span,
toolName,
`CANCELLED: ${cancelMessage}`,
);
Adding this before setToolSpanCancelled(span) at line 2013 would close the gap.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
|
Re: cross-cutting review notes Status of the 5 items raised:
Doc text now covers:
|
b562902 to
54679fa
Compare
wenshao
left a comment
There was a problem hiding this comment.
Review Summary
3 Critical, 2 Suggestions. See inline comments for items (4) and (5).
(1) [Critical] Trace topology broken — packages/core/src/telemetry/session-tracing.ts:101
startInteractionSpan creates the qwen-code.interaction span via getTracer().startSpan() but never propagates it into the OTel context (no trace.setSpan() / context.with()). tracer.ts:77 getParentContext() resolves parent via context.active() which has no knowledge of the interactionContext ALS. Every tool span and API span created via withSpan lands as a sibling under the session root, not as a child of qwen-code.interaction. The span tree is completely flat.
Fix: Modify getParentContext() in tracer.ts to fall back to getActiveInteractionSpan() when context.active() has no span:
import { getActiveInteractionSpan } from './session-tracing.js';
function getParentContext(): Context {
const active = context.active();
if (trace.getSpan(active)) return active;
const interactionSpan = getActiveInteractionSpan();
if (interactionSpan) return trace.setSpan(active, interactionSpan);
return getSessionContext() ?? active;
}(2) [Critical] Missing addToolResultAttributes on inner signal.aborted path — packages/core/src/core/coreToolScheduler.ts:~1990
Inside the try block (after await promise resolved but signal was aborted), only setToolSpanCancelled(span) is called. The toolResult is available (promise resolved) but not recorded. All other cancellation paths correctly include addToolResultAttributes.
Fix: Add addToolResultAttributes before the return:
if (signal.aborted) {
// ... existing hook code ...
const resultContent = safeJsonStringify({
llmContent: toolResult.llmContent,
error: toolResult.error,
});
addToolResultAttributes(
this.config, span, toolName,
`CANCELLED: ${resultContent}`,
);
setToolSpanCancelled(span);
return;
}(3) [Critical] ensureCleanupInterval calls span.end() without try/catch — packages/core/src/telemetry/session-tracing.ts:64
If any OTel SDK throws from end(), the setInterval dies silently — permanently stopping all cleanup. activeSpans/strongSpans grow unbounded. The codebase's own tracer.ts uses safeEndSpan pattern for this exact reason.
Fix:
if (!ctx.ended) {
ctx.ended = true;
try {
ctx.span.end();
} catch {
// OTel errors in cleanup must not kill the interval
}
}— DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
Additional findings (not on diff lines):
[Critical] packages/core/src/core/coreToolScheduler.ts ~L1993 — The try-block cancellation path (signal.aborted inside try { await promise }) calls setToolSpanCancelled(span) but never invokes addToolResultAttributes. The catch-block cancellation path correctly calls it at L2266. This means try-block cancellations produce spans with no tool_result: CANCELLED: attribute. Fix: hoist cancelMessage before the if/else, then add addToolResultAttributes before setToolSpanCancelled.
[Critical] packages/core/src/core/coreToolScheduler.test.ts — All mock Config objects lack getTelemetryIncludeSensitiveSpanAttributes method. The 6 new addToolInputAttributes/addToolResultAttributes call sites become silent no-ops in tests, leaving tool span attribute wiring with zero test coverage. Add getTelemetryIncludeSensitiveSpanAttributes: () => true to the mock Config.
[Critical] packages/core/src/core/loggingContentGenerator/loggingContentGenerator.test.ts — createConfig function lacks getTelemetryIncludeSensitiveSpanAttributes. The 6 new addSystemPromptAttributes/addToolSchemaAttributes/addModelOutputAttributes calls become no-ops, leaving LLM span attribute wiring with zero test coverage. Add getTelemetryIncludeSensitiveSpanAttributes: () => true to createConfig.
wenshao
left a comment
There was a problem hiding this comment.
Additional findings not tied to specific diff lines:
[Suggestion] Missing test: getActiveInteractionSpan() should return undefined when SDK is not initialized. When sdkInitialized = false, startInteractionSpan is a no-op — the path where telemetry is disabled is untested.
[Suggestion] Missing test: getActiveInteractionSpan() should return undefined after clearSessionTracingForTesting(). This validates test isolation for the new function.
— DeepSeek/deepseek-v4-pro via Qwen Code /review
wenshao
left a comment
There was a problem hiding this comment.
本地拉了 head ceb134f37 跑了一次真实 E2E(DeepSeek 后端 + OTel file exporter,单 prompt 无工具)。Plan A 之后单元测试都过,但真实数据揭示出 interaction span 跟同一轮的 api span 落在两棵独立 trace 上。复现日志:
// 同一轮 turn,同一个 prompt_id,traceId 不同:
{"name":"qwen-code.interaction","traceId":"62c7835d87f326220fc337d538d5db61","parentSpanContext":null,
"attributes":{"qwen-code.prompt_id":"76921c3994bd3","session.id":"92315704-…","interaction.sequence":1,"qwen-code.turn_status":"ok"}}
{"name":"api.generateContentStream","traceId":"baff3a59c40a5281e346919f59cb249c","parentSpanContext":{"spanId":"f3af5b47167b394d"},
"attributes":{"prompt_id":"76921c3994bd3","system_prompt_hash":"sp_cd566f5b1cb7","response.model_output":"HELLO"}}f3af5b47167b394d 是 session-root context span(synthetic,只作为父引用出现);所有走 withSpan / startSpanWithContext 的 child span 都会通过 getParentContext() 兜底到 getSessionContext(),因此 api/tool span 都在 session trace 里。
qwen-code.interaction 没走这条路径 —— packages/core/src/telemetry/session-tracing.ts:102-105:
const span = getTracer().startSpan(SPAN_INTERACTION, {
kind: SpanKind.INTERNAL,
attributes,
});只传了两个参数,没有把 session context 作为 parent 传进去,OTel SDK 就把它当成根 span 起了一条新 trace。结果同一轮 turn 在后端被切成两条 trace。
后果:
- 后端用
traceId没法把 interaction 元数据(new_context/interaction.sequence/interaction.duration_ms/qwen-code.turn_status)跟当轮的 api+tool span join 起来 - 退而求其次只能用
prompt_id字符串属性做关联——这个 PR 引入 interaction span 的主要价值(提供一个干净的 per-turn 容器)就被打了折扣
修复建议(一行)
session-tracing.ts:102 显式传 session context 作为 parent:
import { context } from '@opentelemetry/api';
import { getSessionContext } from './session-context.js';
const parentCtx = getSessionContext() ?? context.active();
const span = getTracer().startSpan(
SPAN_INTERACTION,
{ kind: SpanKind.INTERNAL, attributes },
parentCtx,
);改完 qwen-code.interaction 就跟它当轮的 api/tool span 共享 trace;要不要再 context.with(...) 把 interaction span 装到 active context 里(让 api/tool 直接 parent 到 interaction)属于 Plan A 之外、可选。
单元测试盲区
现有 session-tracing.test.ts 的 mock 只验证 tracer.startSpan 的 attrs 和状态,没有断言 parent context 传参,所以这条缺失漏过了 12/12 单测。建议补一条断言:
expect(mockTracer.lastStartSpanArgs?.[2]).toBe(<expected session context>);否则后续重构很容易再次回归。
顺带一条不严重的发现
packages/core/src/utils/internalPromptIds.ts 的内部 prompt set 没包含 managed-auto-memory-extractor-*,导致开启 includeSensitiveSpanAttributes 时 auto-memory 抽取调用的 system_prompt / response.model_output 也写到了 OTel 上。不是这个 PR 引入的问题,但这个 PR 显著放大了它(之前只是 log,现在是 native span attr)。可以放进 follow-up。
—— 修上面那一条 startSpan 之后我再过一次就可以合。
Layer detailed content attributes onto the existing hierarchical spans (qwen-code.interaction / qwen-code.llm_request / qwen-code.tool) gated by includeSensitiveSpanAttributes: - Interaction span: user prompt (new_context) - LLM request span: system prompt + hash + preview + length (full text deduped per session via SHA-256), tool schemas (per-tool tool_schema events, also hash-deduped), model output - Tool span: tool input, tool result on every exit path (success + pre-hook block + post-hook stop + tool error + try-block cancel + catch-block cancel + execution exception) All large content truncated at 60KB with *_truncated and *_original_length metadata. Heavy serialization (safeJsonStringify on tool I/O, partToString on user prompt) is guarded by the sensitive flag at the call site so it doesn't run when telemetry is off. Also adds: - getActiveInteractionSpan() helper for client.ts to attach prompt attributes to the interaction span. - Updated config schema description and docs (telemetry.md + settings.md) to reflect expanded scope and add security/cost notes. - 28 unit tests for detailed-span-attributes, 4 tests for getActiveInteractionSpan, integration mocks updated.
ceb134f to
d4063fa
Compare
…hierarchical-spans
Summary
Adds a top-level
qwen-code.interactionspan per user-driven turn, and — whenincludeSensitiveSpanAttributesis enabled — attaches rich content attributes (user prompt, system prompt, tool I/O, model output) to existing LLM and tool spans. Aligns with Claude Code's beta tracing capability.What's recorded
qwen-code.interactionnew_context(user prompt), plus baselinesession.id,prompt_id,message_type,model,approval_mode,interaction.sequence,interaction.duration_ms,qwen-code.turn_statusapi.generateContent*system_prompt+system_prompt_hash+system_prompt_preview+system_prompt_length(full text deduped per session via SHA-256),toolssummary +tools_count+ per-tooltool_schemaevents (also hash-deduped),response.model_outputtool.<name>tool_input,tool_result(incl. error path)All large content is truncated at 60KB with
*_truncated/*_original_lengthmetadata. The flag defaults tofalse; when off, none of the sensitive attributes are written.Files changed
detailed-span-attributes.tsdetailed-span-attributes.test.tssession-tracing.tsqwen-code.interactionspan (startInteractionSpan/endInteractionSpan/getActiveInteractionSpan)index.tsclient.tsaddUserPromptAttributesafterstartInteractionSpanloggingContentGenerator.tsaddSystemPromptAttributes/addToolSchemaAttributes/addModelOutputAttributescoreToolScheduler.tsaddToolInputAttributes/addToolResultAttributes(success + error paths)settingsSchema.ts+settings.schema.jsonincludeSensitiveSpanAttributesdescription with scope and data-exposure warningTest plan
vitest run src/telemetry/detailed-span-attributes.test.ts— 27 passvitest run src/telemetry/session-tracing.test.ts— 8 passvitest run src/telemetry/log-to-span-processor.test.ts— 30 passtsc --noEmitcleanManual verification
git checkout feat/session-tracing-hierarchical-spans npm run build rm -f /tmp/t.jsonl QWEN_TELEMETRY_ENABLED=1 \ QWEN_TELEMETRY_OUTFILE=/tmp/t.jsonl \ QWEN_TELEMETRY_INCLUDE_SENSITIVE_SPAN_ATTRIBUTES=1 \ node packages/cli/dist/index.js --prompt "list files in current directory" --max-session-turns 2Inspect spans:
Expected:
{"name":"qwen-code.interaction","new_context":"[USER PROMPT]\nlist files in current directory"} {"name":"api.generateContentStream","hash":"sp_<hex>","length":51649,"has_full":true} {"name":"api.generateContentStream","hash":"sp_<hex>","length":51649,"has_full":false} ← deduped {"name":"api.generateContentStream","output":"..."} {"name":"tool.run_shell_command","tool_input":"[TOOL INPUT: run_shell_command]\n{\"command\":\"ls\"...}"} {"name":"tool.run_shell_command","tool_result":"[TOOL RESULT: run_shell_command]\n..."}With the flag unset, none of the
new_context/system_prompt*/tool_input/tool_result/response.model_outputattributes appear.🤖 Generated with Qwen Code