Correlate Slack/channel message diagnostics into a single trace

## Summary

OpenClaw's diagnostics/logging docs describe request-level trace correlation across gateway handling, diagnostic events, agent runs, model usage, and model calls. In a Slack-agent run on OpenClaw 2026.5.28, the relevant diagnostics arrived in Tempo as separate root traces instead of one correlated trace waterfall.

This makes it hard to answer the basic lifecycle question: "How long did one inbound Slack message take from receipt to reply, and where was the time spent?"

## Expected behavior

`docs/logging.md` describes trace correlation as follows:

- Gateway HTTP requests and WebSocket frames establish an internal request trace scope.
- Logs and diagnostic events in that async scope inherit the request trace if no explicit context is provided.
- Agent-run and model-call traces should be children of the active request trace.
- Local logs, diagnostic snapshots, OTel spans, and provider trace headers should be joinable by `traceId`.

For one inbound Slack message, I would therefore expect a single trace containing spans/events such as:

- `openclaw.message.processed`
- `openclaw.harness.run`
- `openclaw.model.usage`
- `openclaw.model.call`

## Observed behavior

Environment, sanitized:

- OpenClaw version: `2026.5.28`
- Build/revision observed: `e932160`
- Runtime shape: OpenClaw agent with Slack channel enabled and `@openclaw/diagnostics-otel` enabled
- OTel exporter: OTLP/HTTP to Tempo
- OTel settings: traces enabled, metrics enabled, logs disabled, sample rate `1.0`

After sending one Slack message and receiving a reply, Tempo accepted new spans and the expected OpenClaw span names were present. However, the lifecycle spans appeared as separate root traces rather than one parented trace.

Observed spans from the same Slack interaction:

| Span name | Duration | Key attributes |
| --- | ---: | --- |
| `openclaw.message.processed` | `5.009s` | `openclaw.channel=slack`, `openclaw.outcome=completed` |
| `openclaw.model.usage` | `4.441s` | `openclaw.channel=slack`, `openclaw.provider=openai-codex`, `openclaw.model=gpt-5.5`, token counts present |
| `openclaw.harness.run` | `4.212s` | `openclaw.harness.id=codex`, `openclaw.harness.plugin=codex`, `openclaw.outcome=completed` |
| `openclaw.model.call` | `2.837s` | `openclaw.api=openai-codex-responses`, `openclaw.transport=stdio`, `openclaw.model=gpt-5.5` |
| `openclaw.message.processed` | `6ms` | `openclaw.channel=slack`, `openclaw.outcome=skipped`, `openclaw.reason=duplicate` |

Additional records:

- The completed `openclaw.message.processed` span began at approximately `2026-05-31T18:24:21.587-04:00` and lasted about `5.009s`.
- The gateway delivered the Slack reply at approximately `2026-05-31T18:24:26-04:00`, matching the message-processing span duration.
- `openclaw.model.usage` began within the message-processing window and lasted about `4.441s`.
- `openclaw.harness.run` began within the message-processing window and lasted about `4.212s`.
- `openclaw.model.call` began within the harness/model window and lasted about `2.837s`.
- Tempo search tags included the expected values for `openclaw.channel=slack`, `openclaw.provider`, `openclaw.model`, `openclaw.harness.*`, `gen_ai.*`, and the span names above.

The data is internally consistent as one Slack turn, but trace parentage/correlation is missing.

## Why this matters

Without trace correlation, operators can see isolated spans but cannot reliably inspect one message lifecycle as a single waterfall from receipt through dispatch, harness/model execution, and reply delivery.

## Likely area to investigate

This may be specific to Slack/channel message ingestion rather than HTTP request handling. Slack socket-mode callbacks and other long-lived channel callbacks may not naturally run inside the same gateway HTTP/WebSocket request trace scope described in `docs/logging.md`.

Potential implementation shape:

- Create or preserve a per-inbound-message trace context at `message.received` / dispatch start.
- Run message dispatch, session turn creation, harness execution, model usage, model calls, and reply delivery inside that active diagnostic trace context.
- Ensure emitted diagnostics events carry `traceId`, `spanId`, and `parentSpanId` consistently.
- Verify the OTel diagnostics plugin preserves parentage when converting diagnostic events to spans.

Potential files/areas:

- `src/infra/diagnostic-events.ts`
- `src/infra/diagnostic-trace-context.ts`
- gateway HTTP/WebSocket request scope setup
- Slack/channel monitor or dispatch code
- auto-reply/session turn creation path
- embedded agent runner model diagnostics
- diagnostics OTel exporter span conversion

## Success criteria

- A single inbound Slack message produces one OTel trace containing the message lifecycle, harness, model usage, and model call spans.
- `openclaw.message.processed`, `openclaw.harness.run`, `openclaw.model.usage`, and `openclaw.model.call` share the same `traceId`.
- Child spans have meaningful `parentSpanId` relationships instead of appearing as separate roots.
- The duplicate/skipped message event is either clearly parented to the same inbound-message trace or intentionally documented as a separate trace.
- File logs emitted during the same async lifecycle include the same top-level `traceId` where diagnostic trace context is available.
- A regression test covers the channel/Slack-style inbound message path and asserts trace correlation across lifecycle and model diagnostics.
- Live verification with `@openclaw/diagnostics-otel` and Tempo shows one trace waterfall for one Slack reply lifecycle.


Span name	Duration	Key attributes
`openclaw.message.processed`	`5.009s`	`openclaw.channel=slack`, `openclaw.outcome=completed`
`openclaw.model.usage`	`4.441s`	`openclaw.channel=slack`, `openclaw.provider=openai-codex`, `openclaw.model=gpt-5.5`, token counts present
`openclaw.harness.run`	`4.212s`	`openclaw.harness.id=codex`, `openclaw.harness.plugin=codex`, `openclaw.outcome=completed`
`openclaw.model.call`	`2.837s`	`openclaw.api=openai-codex-responses`, `openclaw.transport=stdio`, `openclaw.model=gpt-5.5`
`openclaw.message.processed`	`6ms`	`openclaw.channel=slack`, `openclaw.outcome=skipped`, `openclaw.reason=duplicate`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Correlate Slack/channel message diagnostics into a single trace #88811

Summary

Expected behavior

Observed behavior

Why this matters

Likely area to investigate

Success criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Correlate Slack/channel message diagnostics into a single trace #88811

Description

Summary

Expected behavior

Observed behavior

Why this matters

Likely area to investigate

Success criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions