Skip to content

feat(diagnostics): attach trace context to otel logs#70961

Merged
vincentkoc merged 2 commits intomainfrom
feat/diagnostic-log-trace-context
Apr 24, 2026
Merged

feat(diagnostics): attach trace context to otel logs#70961
vincentkoc merged 2 commits intomainfrom
feat/diagnostic-log-trace-context

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

@vincentkoc vincentkoc commented Apr 24, 2026

Summary

  • Problem: OTEL log export accepted OpenClaw log records but ignored diagnostic trace context carried in log metadata.
  • Why it matters: logs produced inside traced work could not correlate with future spans even when callers passed a trace context explicitly.
  • What changed: diagnostics-otel now extracts DiagnosticTraceContext from log bindings/metadata, exports explicit trace attributes, and attaches an OTEL log context when a span id is present.
  • What did NOT change (scope boundary): no global trace registry, no retained trace maps, no automatic process-wide OTEL SDK state in core, no hook wiring yet.

Change Type

  • Feature
  • Changelog update

Validation

  • pnpm test extensions/diagnostics-otel/src/service.test.ts
  • pnpm test extensions/diagnostics-otel
  • pnpm format:check -- CHANGELOG.md extensions/diagnostics-otel/src/service.ts extensions/diagnostics-otel/src/service.test.ts
  • pnpm exec oxlint --tsconfig tsconfig.oxlint.extensions.json extensions/diagnostics-otel/src/service.ts extensions/diagnostics-otel/src/service.test.ts

Risks and Mitigations

  • Risk: log metadata can contain arbitrary objects.
  • Mitigation: trace context extraction validates trace/span ids and flags with SDK helpers before emitting OTEL trace fields.
  • Risk: OTEL log context must not retain process-wide trace state.
  • Mitigation: log records receive per-record context from validated metadata only; no global trace registry or retained maps were added.

@aisle-research-bot
Copy link
Copy Markdown

aisle-research-bot Bot commented Apr 24, 2026

🔒 Aisle Security Analysis

We found 1 potential security issue(s) in this PR:

# Severity Title
1 🟡 Medium Telemetry poisoning via untrusted trace context injection into OTEL log records
1. 🟡 Telemetry poisoning via untrusted trace context injection into OTEL log records
Property Value
Severity Medium
CWE CWE-345
Location extensions/diagnostics-otel/src/service.ts:391-458

Description

diagnostics-otel now discovers a W3C-like trace context inside arbitrary log bindings / arguments and uses it to set an OpenTelemetry span context on exported log records.

  • Input: bindings are parsed from the first log argument if it looks like JSON, and numericArgs include all remaining log arguments (registerLogTransport receives everything logged by the process).
  • Propagation: findLogTraceContext() scans bindings and every numeric arg for an object containing { traceId, spanId, traceFlags } (or nested under .trace).
  • Sink: when spanId is present, the code calls trace.setSpanContext(..., { traceId, spanId, traceFlags, isRemote: true }) and attaches the resulting context to logRecord.context.

This allows any code path that logs user-influenced objects (e.g., request payloads, webhook bodies, headers, plugin inputs) to inject attacker-chosen traceId/spanId into exported telemetry. In multi-tenant or shared OTEL backends, this can:

  • Correlate attacker logs into unrelated traces (cross-tenant / cross-request correlation)
  • Poison traces/log-based alerts and dashboards by fabricating parentage
  • Mislead incident response by forging trace relationships

Vulnerable code:

const traceContext = findLogTraceContext(bindings, numericArgs);
...
if (traceContext?.spanId) {
  logRecord.context = trace.setSpanContext(otelContextApi.active(), {
    traceId: traceContext.traceId,
    spanId: traceContext.spanId,
    traceFlags: traceFlagsToOtel(traceContext.traceFlags),
    isRemote: true,
  });
}

Recommendation

Treat trace context as trusted-only and do not infer it from arbitrary log arguments.

Options (choose one):

  1. Require an explicit, internal-only field (set by OpenClaw code, not by application payloads), and ignore all others:
// Only accept from known binding key that OpenClaw sets (not from numeric args)
const traceContext = normalizeTraceContext(bindings?.__openclawTrace);
  1. Gate on subsystem/logger (e.g., only when bindings.subsystem === "diagnostic") and only read from bindings, not arbitrary args.

  2. If the intent is just to record IDs for debugging, only add attributes (openclaw.traceId, etc.) and do not set logRecord.context / isRemote.

Additionally, consider a config flag to disable this behavior in shared/multi-tenant deployments.


Analyzed PR: #70961 at commit 312565f

Last updated on: 2026-04-24T06:39:14Z

@vincentkoc vincentkoc self-assigned this Apr 24, 2026
@openclaw-barnacle openclaw-barnacle Bot added extensions: diagnostics-otel Extension: diagnostics-otel size: S maintainer Maintainer-authored PR labels Apr 24, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 24, 2026

Greptile Summary

This PR wires DiagnosticTraceContext from log bindings/metadata into exported OTEL log records: explicit openclaw.traceId/spanId/parentSpanId/traceFlags attributes are appended via addTraceAttributes, and a non-recording remote span context is attached via trace.setSpanContext when a spanId is present. Validation reuses the SDK helpers (isValidDiagnosticTrace*) before touching any OTEL primitives, keeping the scope narrow and safe.

Confidence Score: 5/5

Safe to merge; all remaining findings are minor style or test-coverage suggestions.

No P0/P1 issues found. The validation logic is correct, the bitmask in traceFlagsToOtel is sound for the current W3C spec, and the truthiness checks on hex strings behave as intended. The three comments are P2 (attribute consistency, clarifying comment, and additional test coverage).

No files require special attention.

Prompt To Fix All With AI
This is a comment left during a code review.
Path: extensions/diagnostics-otel/src/service.ts
Line: 156-158

Comment:
**`openclaw.traceFlags` silently absent when flags are undefined**

When `traceContext.traceFlags` is `undefined` (valid — the field is optional in `DiagnosticTraceContext`), `addTraceAttributes` omits the `openclaw.traceFlags` attribute, but the OTEL log context is still attached below with a default of `TraceFlags.NONE`. An attribute consumer has no way to infer the effective sampling state from the attributes alone.

Consider emitting the attribute whenever a span context is attached, using the same default that `traceFlagsToOtel` already applies:

```typescript
if (traceContext.spanId) {
  attributes["openclaw.traceFlags"] = traceContext.traceFlags ?? "00";
} else if (traceContext.traceFlags) {
  attributes["openclaw.traceFlags"] = traceContext.traceFlags;
}
```

Or more simply, always write it (conditionally on spanId presence) so attributes stay consistent with the attached OTEL span context.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/diagnostics-otel/src/service.ts
Line: 135-140

Comment:
**`traceFlagsToOtel` bitmask can silently drop additional W3C flag bits**

`(parsed & TraceFlags.SAMPLED) === TraceFlags.SAMPLED ? TraceFlags.SAMPLED : TraceFlags.NONE` strips every bit except the sampled bit and returns a value that may not equal the original numeric flags. For the current W3C spec this is harmless (only bit 0 is defined), but it differs from the semantics of just casting with `parsed as TraceFlags`. A brief comment explaining the intentional bitmask behaviour (rather than a direct cast) would prevent future readers from "correcting" this into `parsed as TraceFlags` and silently forwarding unknown bits.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: extensions/diagnostics-otel/src/service.test.ts
Line: 348-388

Comment:
**No test coverage for trace context embedded directly in bindings or for unsampled flags**

The test exercises trace context nested under a `trace` key on a positional argument (`numericArgs[1].trace`). Two branches of `normalizeTraceContext`/`findLogTraceContext` aren't exercised:

1. Trace context supplied at the root of a positional arg (direct form: `{ traceId, spanId, traceFlags }` without the `trace` wrapper).
2. `traceFlags: "00"` (unsampled) — worth verifying that `openclaw.traceFlags` is still emitted (the non-empty string `"00"` is truthy so it should be, but it is a non-obvious path).

Not blocking, but these cases are easy to add alongside the existing test.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(diagnostics): attach trace context ..." | Re-trigger Greptile

Comment on lines +156 to +158
if (traceContext.traceFlags) {
attributes["openclaw.traceFlags"] = traceContext.traceFlags;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 openclaw.traceFlags silently absent when flags are undefined

When traceContext.traceFlags is undefined (valid — the field is optional in DiagnosticTraceContext), addTraceAttributes omits the openclaw.traceFlags attribute, but the OTEL log context is still attached below with a default of TraceFlags.NONE. An attribute consumer has no way to infer the effective sampling state from the attributes alone.

Consider emitting the attribute whenever a span context is attached, using the same default that traceFlagsToOtel already applies:

if (traceContext.spanId) {
  attributes["openclaw.traceFlags"] = traceContext.traceFlags ?? "00";
} else if (traceContext.traceFlags) {
  attributes["openclaw.traceFlags"] = traceContext.traceFlags;
}

Or more simply, always write it (conditionally on spanId presence) so attributes stay consistent with the attached OTEL span context.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/diagnostics-otel/src/service.ts
Line: 156-158

Comment:
**`openclaw.traceFlags` silently absent when flags are undefined**

When `traceContext.traceFlags` is `undefined` (valid — the field is optional in `DiagnosticTraceContext`), `addTraceAttributes` omits the `openclaw.traceFlags` attribute, but the OTEL log context is still attached below with a default of `TraceFlags.NONE`. An attribute consumer has no way to infer the effective sampling state from the attributes alone.

Consider emitting the attribute whenever a span context is attached, using the same default that `traceFlagsToOtel` already applies:

```typescript
if (traceContext.spanId) {
  attributes["openclaw.traceFlags"] = traceContext.traceFlags ?? "00";
} else if (traceContext.traceFlags) {
  attributes["openclaw.traceFlags"] = traceContext.traceFlags;
}
```

Or more simply, always write it (conditionally on spanId presence) so attributes stay consistent with the attached OTEL span context.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +135 to +140
function traceFlagsToOtel(traceFlags: string | undefined): TraceFlags {
const parsed = Number.parseInt(traceFlags ?? "00", 16);
return (parsed & TraceFlags.SAMPLED) === TraceFlags.SAMPLED
? TraceFlags.SAMPLED
: TraceFlags.NONE;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 traceFlagsToOtel bitmask can silently drop additional W3C flag bits

(parsed & TraceFlags.SAMPLED) === TraceFlags.SAMPLED ? TraceFlags.SAMPLED : TraceFlags.NONE strips every bit except the sampled bit and returns a value that may not equal the original numeric flags. For the current W3C spec this is harmless (only bit 0 is defined), but it differs from the semantics of just casting with parsed as TraceFlags. A brief comment explaining the intentional bitmask behaviour (rather than a direct cast) would prevent future readers from "correcting" this into parsed as TraceFlags and silently forwarding unknown bits.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/diagnostics-otel/src/service.ts
Line: 135-140

Comment:
**`traceFlagsToOtel` bitmask can silently drop additional W3C flag bits**

`(parsed & TraceFlags.SAMPLED) === TraceFlags.SAMPLED ? TraceFlags.SAMPLED : TraceFlags.NONE` strips every bit except the sampled bit and returns a value that may not equal the original numeric flags. For the current W3C spec this is harmless (only bit 0 is defined), but it differs from the semantics of just casting with `parsed as TraceFlags`. A brief comment explaining the intentional bitmask behaviour (rather than a direct cast) would prevent future readers from "correcting" this into `parsed as TraceFlags` and silently forwarding unknown bits.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 348 to 388
}
});

test("attaches diagnostic trace context to exported logs", async () => {
const emitCall = await emitAndCaptureLog({
0: '{"subsystem":"diagnostic"}',
1: {
trace: {
traceId: TRACE_ID,
spanId: SPAN_ID,
traceFlags: "01",
},
},
2: "traceable log",
_meta: { logLevelName: "INFO", date: new Date() },
});

expect(emitCall?.attributes).toMatchObject({
"openclaw.traceId": TRACE_ID,
"openclaw.spanId": SPAN_ID,
"openclaw.traceFlags": "01",
});
expect(telemetryState.tracer.setSpanContext).toHaveBeenCalledWith(
expect.anything(),
expect.objectContaining({
traceId: TRACE_ID,
spanId: SPAN_ID,
traceFlags: 1,
isRemote: true,
}),
);
expect(emitCall?.context).toEqual({
spanContext: expect.objectContaining({
traceId: TRACE_ID,
spanId: SPAN_ID,
}),
});
});

test("redacts sensitive reason in session.state metric attributes", async () => {
const service = createDiagnosticsOtelService();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No test coverage for trace context embedded directly in bindings or for unsampled flags

The test exercises trace context nested under a trace key on a positional argument (numericArgs[1].trace). Two branches of normalizeTraceContext/findLogTraceContext aren't exercised:

  1. Trace context supplied at the root of a positional arg (direct form: { traceId, spanId, traceFlags } without the trace wrapper).
  2. traceFlags: "00" (unsampled) — worth verifying that openclaw.traceFlags is still emitted (the non-empty string "00" is truthy so it should be, but it is a non-obvious path).

Not blocking, but these cases are easy to add alongside the existing test.

Prompt To Fix With AI
This is a comment left during a code review.
Path: extensions/diagnostics-otel/src/service.test.ts
Line: 348-388

Comment:
**No test coverage for trace context embedded directly in bindings or for unsampled flags**

The test exercises trace context nested under a `trace` key on a positional argument (`numericArgs[1].trace`). Two branches of `normalizeTraceContext`/`findLogTraceContext` aren't exercised:

1. Trace context supplied at the root of a positional arg (direct form: `{ traceId, spanId, traceFlags }` without the `trace` wrapper).
2. `traceFlags: "00"` (unsampled) — worth verifying that `openclaw.traceFlags` is still emitted (the non-empty string `"00"` is truthy so it should be, but it is a non-obvious path).

Not blocking, but these cases are easy to add alongside the existing test.

How can I resolve this? If you propose a fix, please make it concise.

@openclaw-barnacle openclaw-barnacle Bot added channel: whatsapp-web Channel integration: whatsapp-web size: M and removed size: S labels Apr 24, 2026
@vincentkoc vincentkoc force-pushed the feat/diagnostic-log-trace-context branch from 5e7af80 to 312565f Compare April 24, 2026 06:32
@openclaw-barnacle openclaw-barnacle Bot added size: S and removed channel: whatsapp-web Channel integration: whatsapp-web size: M labels Apr 24, 2026
@vincentkoc vincentkoc merged commit 8fade9d into main Apr 24, 2026
66 checks passed
@vincentkoc vincentkoc deleted the feat/diagnostic-log-trace-context branch April 24, 2026 06:40
zhongmairen pushed a commit to agencyos-cn/openclaw-bad that referenced this pull request Apr 24, 2026
* 'main' of https://github.com/openclaw/openclaw: (521 commits)
  perf: slim reset model selection imports
  perf: slim directive parse test imports
  ci(release): clarify npm telegram approval label
  test: relax live tail readiness checks
  ci(release): use release approval for npm telegram e2e
  Project Codex hook notifications into agent events (openclaw#70969)
  fix: match codex verbose tool logs to pi (openclaw#70966)
  feat(diagnostics): attach trace context to otel logs (openclaw#70961)
  docs(faq): tighten FAQ stub pointers and Related cards
  docs(help): rewrite help index to match new tab structure, drop redundant troubleshooting H1
  refactor(release): distill npm telegram docker runner
  ci(release): gate npm telegram e2e by release team
  ci(release): harden npm telegram beta e2e
  test(release): address npm telegram e2e review
  test(release): avoid docker argv secret values
  ci(release): add manual npm telegram beta e2e
  test(release): support convex npm telegram credentials
  docs: clarify private ws node setup
  fix: filter gateway node list locally
  feat(plugins): move Bonjour discovery into bundled plugin
  ...
Angfr95 pushed a commit to Angfr95/openclaw that referenced this pull request Apr 25, 2026
* feat(diagnostics): attach trace context to otel logs

* fix(diagnostics): satisfy trace flags lint
ogt-redknie pushed a commit to ogt-redknie/OPENX that referenced this pull request May 2, 2026
* feat(diagnostics): attach trace context to otel logs

* fix(diagnostics): satisfy trace flags lint
github-actions Bot pushed a commit to Desicool/openclaw that referenced this pull request May 9, 2026
* feat(diagnostics): attach trace context to otel logs

* fix(diagnostics): satisfy trace flags lint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

extensions: diagnostics-otel Extension: diagnostics-otel maintainer Maintainer-authored PR size: S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant