Skip to content

feat(openclaw): add OTEL diagnostics setup for conversation latency traces #4368

@cv

Description

@cv

Problem Statement

Users need a supported way to diagnose long agent response turnaround time during OpenClaw conversations running inside NemoClaw. NemoClaw now has opt-in onboard/setup profiling traces via #3769 and #4094, but that does not cover runtime conversation latency breakdowns.

OpenClaw already documents OTLP export through its diagnostics-otel plugin, so NemoClaw should make that path easy to use from the sandbox instead of reimplementing conversation tracing in NemoClaw.

Proposed Design

Add an opt-in OpenTelemetry diagnostics path for OpenClaw conversation traces in NemoClaw sandboxes:

  • Bake or install clawhub:@openclaw/diagnostics-otel into the OpenClaw sandbox path.
  • Provide documented NemoClaw configuration or environment variables for the OTLP/HTTP collector endpoint, including host, port, and path as needed.
  • Add a NemoClaw network policy preset that permits export to the configured collector host and port with the minimum required method/path scope.
  • Document a local visualization flow that starts with Jaeger because it is the shortest path to viewing spans.
  • Keep Tempo/Grafana or other persistent observability stacks out of the initial scope unless team observability becomes a product requirement.
  • Keep tracing disabled by default and make collector failures non-fatal for normal agent usage.

The goal is that a user can inspect spans for a conversation turn and identify where a long response spent time, similar to a Phoenix-style timeline or stack trace with time intervals.

Alternatives Considered

  • Extend NemoClaw trace artifacts to cover conversation runtime events. This risks duplicating OpenClaw OTel support and would likely miss lower-level OpenClaw spans.
  • Build a first-party trace visualization UI in NemoClaw. Useful later, but not necessary for the first diagnostic path if Jaeger can show the exported spans.
  • Track only under Observability plugin support for external telemetry adapters #3915. That issue is broader and covers a backend-agnostic NemoClaw observability plugin API. This issue is narrower: enable OpenClaw existing OTel diagnostics inside NemoClaw sandboxes.

Related Work

Acceptance Criteria

  • NemoClaw can enable clawhub:@openclaw/diagnostics-otel for OpenClaw in the sandbox without manual shell edits.
  • Users can configure an OTLP/HTTP collector endpoint through documented NemoClaw configuration or environment variables.
  • A network policy preset allows the configured collector endpoint without broadly opening outbound access.
  • A conversation turn emits spans to an OTLP collector when tracing is enabled.
  • Documentation shows a minimal local Jaeger flow for viewing conversation latency spans.
  • Tracing remains opt-in and disabled by default.
  • Failure to reach the collector does not break normal agent conversations.

Category

enhancement: feature

Checklist

  • I searched existing issues and this is not a duplicate
  • This is a design proposal, not a please build this request

Metadata

Metadata

Assignees

Labels

area: integrationsThird-party service integration behaviorarea: observabilityLogging, metrics, tracing, diagnostics, or debug outputarea: performanceLatency, throughput, resource use, benchmarks, or scalingarea: sandboxOpenShell sandbox lifecycle, runtime, config, or recoveryintegration: openclawOpenClaw integration behavior
No fields configured for Enhancement.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions