Skip to content

Feature: Enhanced OpenTelemetry trace context for tool call correlation #8249

@evancohen-openclaw

Description

@evancohen-openclaw

OpenClaw Feature Request: Enhanced OpenTelemetry Trace Context

Summary

Add proper trace context propagation so tool call spans are children of model usage spans, enabling hierarchical trace visualization in Jaeger/etc.

Current Behavior

  • openclaw.model.usage spans are created independently
  • openclaw.tool.call spans (via plugin hooks) are created independently
  • No shared trace ID or parent-child relationships
  • Plugin hooks don't have access to runId

Desired Behavior

1. Expose runId in plugin hook context

The internal agent event system has runId which is consistent per agent turn. Exposing this in hook contexts would allow plugins to correlate spans:

// Current
export type PluginHookToolContext = {
  agentId?: string;
  sessionKey?: string;
  toolName: string;
};

// Proposed
export type PluginHookToolContext = {
  agentId?: string;
  sessionKey?: string;
  toolName: string;
  runId?: string;  // NEW
};

2. Proper trace context propagation

Start a root span for each agent run and make model/tool spans children:

openclaw.run (root span)
├── openclaw.model.usage (thinking)
│   ├── openclaw.tool.call (web_search)
│   ├── openclaw.tool.call (exec)
│   └── openclaw.tool.call (read)
└── openclaw.model.usage (response)

This requires:

  1. Creating a root span when a run starts (e.g., in pi-embedded-runner)
  2. Storing trace context in run state
  3. Passing context to model usage span creation
  4. Exposing context to plugin hooks for tool spans

3. Wire up after_tool_call hook

Currently runAfterToolCall is exported from hooks.js but never called. The hook exists in the type system but isn't wired up. Either:

  • Call it after tool execution completes, OR
  • Remove it from the types to avoid confusion

Use Case

Observability for agent runs — understanding the sequence and timing of model reasoning + tool execution. Currently requires manual correlation by sessionKey + timestamp; proper trace hierarchy would make debugging much easier.

Workaround

Using tool_result_persist hook instead of after_tool_call, correlating by sessionKey attribute in Jaeger queries.


Filed from local patch work on diagnostics-otel extension.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions