feat(diagnostics-otel): add trace context propagation and GenAI semantic conventions#18901
feat(diagnostics-otel): add trace context propagation and GenAI semantic conventions#18901sergical wants to merge 10 commits intoopenclaw:mainfrom
Conversation
| @@ -162,6 +185,8 @@ export function logMessageProcessed(params: { | |||
| outcome: params.outcome, | |||
| reason: params.reason, | |||
| error: params.error, | |||
| traceId: state.traceId, | |||
| parentSpanId: spanId, | |||
There was a problem hiding this comment.
message.processed span becomes a child of itself
The spanId generated on line 174 is stored as state.currentSpanId (for future child spans to reference) but is also emitted as parentSpanId on the message.processed event itself. In the OTel plugin, spanWithDuration uses evt.parentSpanId to construct a parent SpanContext, so the message.processed span is created as a child of the same spanId it just generated. This creates a phantom self-referential parent rather than the intended hierarchy.
Per the PR description, the intended hierarchy is message.processed as the parent span, with chat model-name as child spans. For that to work, message.processed should be a root span within its trace (no parentSpanId), and only downstream model.usage events should carry parentSpanId equal to state.currentSpanId.
Consider removing parentSpanId: spanId from this event emission so that the OTel plugin creates message.processed as a root span within the trace. Then, when model.usage events are wired up to pass parentSpanId: state.currentSpanId, they will correctly become children of the message.processed span.
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/logging/diagnostic.ts
Line: 174:189
Comment:
**`message.processed` span becomes a child of itself**
The `spanId` generated on line 174 is stored as `state.currentSpanId` (for future child spans to reference) but is also emitted as `parentSpanId` on the `message.processed` event itself. In the OTel plugin, `spanWithDuration` uses `evt.parentSpanId` to construct a parent `SpanContext`, so the `message.processed` span is created as a child of the same `spanId` it just generated. This creates a phantom self-referential parent rather than the intended hierarchy.
Per the PR description, the intended hierarchy is `message.processed` as the parent span, with `chat model-name` as child spans. For that to work, `message.processed` should be a root span within its trace (no `parentSpanId`), and only downstream `model.usage` events should carry `parentSpanId` equal to `state.currentSpanId`.
Consider removing `parentSpanId: spanId` from this event emission so that the OTel plugin creates `message.processed` as a root span within the trace. Then, when model.usage events are wired up to pass `parentSpanId: state.currentSpanId`, they will correctly become children of the message.processed span.
How can I resolve this? If you propose a fix, please make it concise.- Fix TypeScript errors in service.test.ts (proper tuple type assertions) - Remove parentSpanId from message.processed events — message spans are root spans in their trace, only model.usage spans are children - Update tests to verify correct hierarchy: message.processed has no parentSpanId, model.usage has parentSpanId pointing to message span
- Add non-null assertions for spanAttrs access (TS18048) - Use 'as unknown as' for mock.calls tuple casts (2→3 elements) - Make 3rd tuple element optional (| undefined)
0fabcf1 to
a1e8f1f
Compare
spanWithDuration always passes context.active() as the 3rd arg to startSpan, even without a parent trace. So message.processed's 3rd arg is never undefined — instead verify __parentSpanContext is absent.
Without this, message.processed events (which have traceId but no parentSpanId) get a fake parent with spanId '0000000000000000'. Now message.processed is a true root span — parent context is only created when both traceId AND parentSpanId exist.
message.processed is a root span — the gateway never emits parentSpanId for it (removed in diagnostic.ts). The test was still passing it, so spanWithDuration created a parent context where none should exist.
|
Thanks for pushing this forward. I’m closing this as a duplicate of #28166. That PR resolves the same trace-context + diagnostics-otel scope and is the consolidated, mergeable path. Your work in #18901 is preserved and will be attributed. If this split is off or you think this should be done another way, say the word and I’ll reopen review right away. |
feat(diagnostics-otel): Add trace context propagation and GenAI semantic conventions
Summary
This PR adds two related improvements to the diagnostics-otel plugin:
Trace context propagation — Diagnostic events now carry
traceIdandparentSpanIdfields, enabling the OTel plugin to create proper parent-child span relationships instead of disconnected root spans.GenAI semantic convention attributes — Model usage spans now include standardized
gen_ai.*attributes alongside existingopenclaw.*attributes, following the OTel GenAI semantic conventions.Trace Hierarchy
Before (all root spans, unlinked):
After (linked parent-child traces):
The trace context lifecycle:
logWebhookReceivedgenerates a 32-hex-chartraceId(UUID without dashes)logMessageQueuedstores thetraceIdon the session state when providedlogMessageProcessedreads the session'straceId, generates a 16-hex-charspanId, and emits both on the eventtraceIdandparentSpanIdfrom the session statetrace.setSpanContext()to create child spans under the proper parentGenAI Convention Attributes Added
On
model.usagespans (alongside existingopenclaw.*attributes):gen_ai.operation.name"chat"gen_ai.systemevt.providergen_ai.request.modelevt.modelgen_ai.usage.input_tokensevt.usage.inputgen_ai.usage.output_tokensevt.usage.outputSpan names updated for GenAI conventions:
chat ${model}(e.g.,chat claude-opus-4-6)openclaw.message.processed)Files Changed
src/infra/diagnostic-events.tstraceIdandparentSpanIdtoDiagnosticBaseEventsrc/logging/diagnostic-session-state.tstraceIdandcurrentSpanIdtoSessionStatesrc/logging/diagnostic.tsextensions/diagnostics-otel/src/service.tssrc/infra/diagnostic-events.test.tsextensions/diagnostics-otel/src/service.test.tsBackwards Compatibility
traceId?: string,parentSpanId?: string)openclaw.*attributes remain on all spans —gen_ai.*attributes are added alongsideDiagnosticEventInputomitstsandseqas before; the new fields are optional in the intersection typestraceIdis present, spans are created as root spans (existing behavior)logMessageQueuedfunction'straceIdparameter is optional; existing callers don't need changesHow to Test
Unit tests:
Manual verification with a collector:
diagnostics.otel.endpointto point at a local OTLP collector or Jaegeropenclaw.message.processedandchat <model>spans share the sametraceIdand have a parent-child relationshipgen_ai.*attributes appear on model usage spansBackwards compat check:
openclaw.*attributes still appear on all spansGreptile Summary
This PR adds trace context propagation and GenAI semantic convention attributes to the diagnostics-otel plugin. It introduces
traceId/parentSpanIdfields on diagnostic events, stores trace context on session state, and addsgen_ai.*attributes alongside existingopenclaw.*attributes on model usage spans. Span names for model usage are updated to follow GenAI conventions (chat <model>).DiagnosticBaseEventgains optionaltraceIdandparentSpanId;SessionStategainstraceIdandcurrentSpanId. The OTel plugin'sspanWithDurationnow accepts parent context and usestrace.setSpanContextto establish parent-child links.gen_ai.operation.name,gen_ai.system,gen_ai.request.model,gen_ai.usage.input_tokens, andgen_ai.usage.output_tokensper the OTel GenAI spec.logMessageProcessedemits its own generatedspanIdasparentSpanId, causing the OTel plugin to create a self-referential parent span rather than the intended root span. See inline comment for details.logMessageQueuedpasstraceId, andmodel.usageemitters don't pass trace context, so the propagation chain is incomplete in practice. This may be intentional as groundwork for a follow-up PR.Confidence Score: 3/5
src/logging/diagnostic.ts— theparentSpanIdemitted on message.processed events creates a self-referential span parent instead of the intended trace hierarchy.Last reviewed commit: 52ad825
(4/5) You can add custom instructions or style guidelines for the agent here!