Skip to content

[Bug]: diagnostics-otel plugin: OTLP log export doesn't work — module isolation between gateway and plugin-sdk #39156

@gilmichlin

Description

@gilmichlin

Bug type

Behavior bug (incorrect output/state without crash)

Summary

Summary

The diagnostics-otel plugin's log export feature (diagnostics.otel.logs: true) doesn't work. The plugin loads successfully and reports "logs exporter enabled (OTLP/Protobuf)" but no log records are ever sent to the OTLP endpoint. Metrics and traces work fine.
Root Cause
registerLogTransport in the plugin-sdk (plugin-sdk/subsystem-QV9R1a2-.js) maintains its own externalTransports Set and loggingState singleton. The gateway's actual logger lives in a separate bundle (daemon-cli.js) with its own independent copies of both.

When the plugin calls registerLogTransport(callback):

  1. The callback is added to plugin-sdk's externalTransports ✅
  2. It tries to attach to loggingState.cachedLogger — but that's the plugin-sdk's logger instance, which is null (the plugin-sdk never calls getLogger()) ❌
  3. The gateway's logger in daemon-cli.js has its own externalTransports Set that the plugin never touches ❌

Result: the transport is registered in the wrong module instance and never receives log events.
Evidence

Gateway logger — daemon-cli.js

grep -c "cachedLogger" dist/daemon-cli.js

→ found (has its own loggingState)

Plugin SDK — subsystem-QV9R1a2-.js

grep -c "cachedLogger" dist/plugin-sdk/subsystem-QV9R1a2-.js

→ found (has its own separate loggingState)

Gateway main bundle does NOT reference plugin-sdk logging

grep -c "plugin-sdk" dist/subsystem-kl-vrkYi.js

→ 0

Meanwhile, onDiagnosticEvent (used for metrics/traces) works because it's event-based — the gateway emits diagnostic events that the plugin subscribes to. The log transport uses a different mechanism (attachTransport on the logger instance) which requires a shared singleton.

Steps to reproduce

  1. Enable diagnostics-otel with logs: true
  2. Confirm OTLP endpoint accepts logs (send a test JSON log record — it works)
  3. Observe: metrics and traces flow to collector, but zero log records arrive
  4. Check Loki/collector: no openclaw-gateway log streams

Environment

• OpenClaw: 2026.2.19-2 (gateway)
• Plugin: @openclaw/diagnostics-otel 2026.3.2
• Node: v24.13.1
• Stack: grafana/otel-lgtm (OTel Collector + Loki + Tempo + Mimir)

Expected behavior

OTLP log export

Actual behavior

no OLTP log export

OpenClaw version

2026.2.19-2 (gateway)

Operating system

24.04.1-Ubuntu

Install method

npm global

Logs, screenshots, and evidence

# 1. Plugin loads and reports logs enabled
[2026-03-07T10:21:16.320-08:00] INFO [plugins] diagnostics-otel: logs exporter enabled (OTLP/Protobuf)

# 2. Metrics confirmed flowing — openclaw_* metrics present in Mimir
$ curl -s 'http://localhost:3000/api/datasources/proxy/uid/prometheus/api/v1/query?query=openclaw_session_state_total' -u admin:***
{"status":"success","data":{"resultType":"vector","result":[
{"metric":{"openclaw_reason":"message_start","openclaw_state":"processing"},"value":[...,"12"]},
{"metric":{"openclaw_reason":"run_started","openclaw_state":"processing"},"value":[...,"12"]}
]}}

# 3. Traces confirmed flowing — spans present in Tempo
$ curl -s 'http://localhost:3000/api/datasources/proxy/uid/tempo/api/search?limit=5' -u admin:***
{"traces":[
{"rootServiceName":"openclaw-gateway","rootTraceName":"openclaw.message.processed","durationMs":84},
{"rootServiceName":"openclaw-gateway","rootTraceName":"openclaw.model.usage","durationMs":22306},
{"rootServiceName":"openclaw-gateway","rootTraceName":"openclaw.message.processed","durationMs":23238}
]}

# 4. Loki: zero log streams from openclaw-gateway (empty after 40+ minutes of activity)
$ curl -s 'http://localhost:3000/api/datasources/proxy/uid/loki/loki/api/v1/query_range' -u admin:*** \
--data-urlencode 'query={service_name="openclaw-gateway"}' ...
{"status":"success","data":{"result":[]}} # 0 streams

# 5. Proof the pipeline works — manual test log lands in Loki immediately
$ curl -X POST http://localhost:4318/v1/logs -H "Content-Type: application/json" \
-d '{"resourceLogs":[{"resource":{"attributes":[{"key":"service.name","value":{"stringValue":"test-service"}}]},"scopeLogs":[{"logRecords":[{"timeUnixNano":"...","severityText":"INFO","body":{"stringValue":"test log from manual"}}]}]}]}'
# → HTTP 200, appears in Loki within seconds:
{"status":"success","data":{"result":[{"stream":{"service_name":"test-service"},"values":[["...","test log from manual"]]}]}}

# 6. OTel Collector config confirms logs pipeline is wired
# /otel-lgtm/otelcol-config.yaml:
# service.pipelines.logs:
# receivers: [otlp]
# processors: [batch]
# exporters: [otlphttp/logs] → http://127.0.0.1:3100/otlp (Loki)

# 7. Root cause — two isolated module instances
$ grep -c "cachedLogger" dist/daemon-cli.js
6 # gateway has its own loggingState

$ grep -c "cachedLogger" dist/plugin-sdk/subsystem-QV9R1a2-.js
6 # plugin-sdk has a SEPARATE loggingState

$ grep -c "registerLogTransport\|externalTransports" dist/daemon-cli.js
5 # gateway has its own externalTransports Set

$ grep -c "registerLogTransport\|externalTransports" dist/plugin-sdk/subsystem-QV9R1a2-.js
5 # plugin-sdk has a SEPARATE externalTransports Set

# The plugin calls registerLogTransport from plugin-sdk → registers into plugin-sdk's Set
# The gateway logger in daemon-cli.js iterates its OWN Set → never sees the plugin's transport

Impact and severity

• Affected users: All users enabling diagnostics.otel.logs: true — the feature silently does nothing. Metrics and traces users are unaffected.
• Severity: Moderate — doesn't block workflows or cause data loss, but the feature is advertised as working and silently fails with no error. Users will spend time debugging their collector/Loki config before realizing it's an upstream issue.
• Frequency: Always. 100% reproducible on any install. The module duplication is baked into the bundle output.
• Consequence:
• OTLP log export is non-functional despite config and plugin reporting success
• Users must fall back to tailing JSONL file logs (/tmp/openclaw/*.log), losing centralized log aggregation
• No error or warning is emitted — the plugin says "logs exporter enabled" even though no logs will ever be sent
• Time wasted diagnosing (we spent ~20 minutes tracing this from Loki → collector config → OTLP endpoint → plugin code → bundler output before finding the root cause)

Additional information

Suggested Fix

Either:

  1. Share the logger singleton — export the gateway's loggingState and import it in the plugin-sdk (or use a shared module)
  2. Bridge via the plugin context — pass the gateway's logger instance to the plugin via ctx so the plugin can call ctx.logger.attachTransport() directly
  3. Use diagnostic events for logs — emit log records as diagnostic events (like model.usage) instead of relying on attachTransport

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingbug:behaviorIncorrect behavior without a crash

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions