[Feature]: Support OpenTelemetry GenAI Auto-Instrumentation (OpenLLMetry / IITM)



## Summary

When building an OpenTelemetry observability plugin for OpenClaw, it is currently **impossible** to use standard GenAI auto-instrumentation libraries (OpenLLMetry / `@traceloop/instrumentation-x`) to produce for example `anthropic.chat` spans with full semantic GenAI attributes. This is due to ESM module isolation and IITM (import-in-the-middle) conflicts with `@mariozechner/pi-ai`.

## What We Tried to Achieve

We built an OpenClaw plugin (`openclaw-observability-plugin`) that exports traces, metrics, and logs via OTLP to an OpenTelemetry Collector. The plugin successfully produces:

- **Connected traces:** `openclaw.request` → `openclaw.agent.turn` → `tool.*` (using `api.on()` hooks)
- **Token metrics:** `openclaw.llm.tokens.{prompt,completion,total}` (extracted from `agent_end` event)
- **Tool spans, message counts, session events**

However, we wanted **automatic GenAI spans** on the actual LLM SDK calls (`anthropic.chat`) — the standard approach in the OpenTelemetry ecosystem using [OpenLLMetry](https://github.com/traceloop/openllmetry-js) (`@traceloop/instrumentation-anthropic`). These spans capture:

- `gen_ai.request.model`, `gen_ai.system`
- `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`
- `gen_ai.request.max_tokens`, `gen_ai.request.temperature`
- Request/response content (when enabled)
- Per-LLM-call latency (separate from full agent turn duration)

This is the **standard OTel GenAI semantic convention** and what observability backends (Dynatrace, Grafana, etc.) expect for LLM observability dashboards.

## Approach 1: Plugin-Side SDK Patching (Failed)

### Attempt

Patch `Anthropic.Messages.prototype.create` from within the plugin code to wrap LLM calls with OTel spans.

### Why It Failed — ESM/CJS Module Isolation

OpenClaw's plugin loader uses **jiti**, which runs plugin code in a CJS-like context. The `@anthropic-ai/sdk` package has **dual entry points**:

- ESM: `@anthropic-ai/sdk/index.mjs` (loaded by `@mariozechner/pi-ai` via `import`)
- CJS: `@anthropic-ai/sdk/index.js` (loaded by plugin via `createRequire()`)

These are **completely separate module instances** with different prototypes:

```js
// From diagnostic logging:
ESM Anthropic === CJS Anthropic: false
```

Patching the CJS `Messages.prototype.create` has **zero effect** on the ESM instance that pi-ai actually uses. The plugin cannot access the ESM module instance.

### Additional Constraint — jiti Blocks Dynamic Import

We tried using `import()` to access the ESM instance:

```js
const sdk = await import("@anthropic-ai/sdk");
```

This fails in jiti's VM context:

```
ERR_VM_DYNAMIC_IMPORT_CALLBACK_MISSING
```

jiti converts `import()` to `require()` internally, making it impossible to access the ESM entry point from plugin code.

## Approach 2: NODE_OPTIONS Preload with IITM (Failed)

### Attempt

Use the standard OpenTelemetry ESM instrumentation pattern:

```bash
NODE_OPTIONS="--import ./instrumentation/preload.mjs"
```

The preload script:
1. Imports `@opentelemetry/instrumentation/hook.mjs` (registers IITM ESM loader hooks)
2. Creates a `NodeSDK` with `AnthropicInstrumentation` from `@traceloop/instrumentation-anthropic`
3. Starts the SDK before the application loads

This is the **officially recommended approach** for instrumenting ESM applications with OpenTelemetry.

### Why It Failed — IITM Breaks pi-ai

`import-in-the-middle` (IITM) registers **global ESM loader hooks** that intercept ALL module imports. When it intercepts `@mariozechner/pi-ai`, it breaks the module's named exports:

```
SyntaxError: The requested module '@mariozechner/pi-ai' does not provide
an export named 'getEnvApiKey'
    at ModuleJob._instantiate (node:internal/modules/esm/module_job:226:21)
```

This **crash-loops the gateway** — the process exits immediately on startup, systemd restarts it, and it crashes again.

### Root Cause

IITM wraps ESM modules by re-exporting them through a proxy. Some modules with complex export patterns (barrel files, re-exports from sub-modules) can break under this proxy. `@mariozechner/pi-ai` is one such module — its named exports become unavailable when IITM intercepts the module load.

This is not specific to our instrumentation — **any** IITM-based OTel instrumentation will trigger this crash because IITM intercepts all ESM modules globally, not just the targeted ones.

### Environment

- Node.js: v22.22.0
- `@opentelemetry/instrumentation`: 0.203.0
- `import-in-the-middle`: 1.15.0 (transitive via OTel)
- `@mariozechner/pi-ai`: (bundled with OpenClaw)
- `@anthropic-ai/sdk`: 0.71.2

## Approach 3: Manual register() with IITM (Failed)

### Attempt

Instead of `--import hook.mjs`, use `register()` from `node:module` to manually register IITM loader hooks, hoping for more selective interception:

```js
import { register } from "node:module";
register("import-in-the-middle/hook.mjs", import.meta.url);
```

### Result

Same crash. `register()` installs the same global loader hooks as `hook.mjs`. IITM does not support filtering which modules to intercept at the loader level.

## Impact

Without GenAI auto-instrumentation, the plugin **cannot produce per-LLM-call spans** (`anthropic.chat`, `openai.chat`). We work around this by extracting token usage from the `agent_end` hook event, but this gives us:

| Capability | With OpenLLMetry | Current Workaround |
|---|---|---|
| Per-LLM-call spans | ✅ Individual `anthropic.chat` spans | ❌ Only aggregate `openclaw.agent.turn` |
| Token usage | ✅ Per-call `gen_ai.usage.*` attributes | ⚠️ Summed across all calls in a turn |
| Request/response content | ✅ `gen_ai.content.prompt/completion` | ❌ Not available |
| Model per call | ✅ Per-call `gen_ai.request.model` | ⚠️ Last model in turn only |
| Latency per LLM call | ✅ Individual call duration | ❌ Only full turn duration |
| Streaming vs non-streaming | ✅ Distinguished | ❌ Not visible |
| Multiple LLM calls per turn | ✅ Each call is a separate span | ❌ All merged into one span |
| Standard GenAI dashboards | ✅ Compatible | ❌ Custom dashboards required |

## Suggested Solutions

### Option A: Built-in OTel Hook Point in pi-ai

Add a hook/callback in `@mariozechner/pi-ai`'s provider layer (e.g., `streamAnthropic()`) that fires before/after the actual SDK call. This would allow plugins to create spans around individual LLM calls without needing IITM:

```js
// Pseudocode — in pi-ai's anthropic provider
const onLLMCallStart = hookRunner?.onLLMCallStart?.({
  provider: "anthropic",
  model: model.id,
  params: sanitizedParams,
});

const stream = client.messages.stream({ ...params, stream: true });

// After completion:
onLLMCallEnd?.({ usage, stopReason, duration });
```

### Option B: Fix IITM Compatibility with pi-ai

Investigate why IITM breaks `@mariozechner/pi-ai`'s named exports. This might be:
- A barrel file pattern that IITM doesn't handle correctly
- A need for explicit IITM exclude patterns (currently not supported at loader level)
- A Node.js 22 regression in IITM's ESM loader hooks

### Option C: Expose LLM Call Events on the Plugin API

Similar to the existing `agent_end` event, emit events for individual LLM API calls:

```js
api.on("llm_call_start", (event) => {
  // event: { provider, model, sessionKey, callId }
});

api.on("llm_call_end", (event) => {
  // event: { provider, model, usage, duration, stopReason, callId }
});
```

This would give plugins everything needed to create proper GenAI spans without any monkey-patching or loader hooks.

### Option D: Native OTel Support in OpenClaw

Bundle OpenTelemetry instrumentation directly in OpenClaw, configured via `openclaw.json`. Since OpenClaw controls the process startup, it could:

1. Initialize OTel SDK before any imports
2. Register instrumentations in a controlled way
3. Avoid IITM conflicts by managing the loader hook lifecycle

## Reproduction

```bash
# 1. Clone the plugin
git clone https://github.com/henrikrexed/openclaw-observability-plugin

# 2. Add preload to NODE_OPTIONS in systemd unit
# ~/.config/systemd/user/openclaw-gateway.service
Environment="NODE_OPTIONS=--import /path/to/openclaw-observability-plugin/instrumentation/preload.mjs"

# 3. Restart gateway
systemctl --user daemon-reload
systemctl --user restart openclaw-gateway

# 4. Observe crash loop
journalctl --user -u openclaw-gateway -f
# => SyntaxError: The requested module '@mariozechner/pi-ai' does not provide an export named 'getEnvApiKey'
```

## References

- [OpenTelemetry GenAI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
- [OpenLLMetry (Traceloop)](https://github.com/traceloop/openllmetry-js)
- [import-in-the-middle](https://github.com/DataDog/import-in-the-middle)
- [OTel ESM Instrumentation Guide](https://opentelemetry.io/docs/languages/js/getting-started/nodejs/#instrumentation-setup)
- [Plugin repo](https://github.com/henrikrexed/openclaw-observability-plugin)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: Support OpenTelemetry GenAI Auto-Instrumentation (OpenLLMetry / IITM) #7312

Summary

What We Tried to Achieve

Approach 1: Plugin-Side SDK Patching (Failed)

Attempt

Why It Failed — ESM/CJS Module Isolation

Additional Constraint — jiti Blocks Dynamic Import

Approach 2: NODE_OPTIONS Preload with IITM (Failed)

Attempt

Why It Failed — IITM Breaks pi-ai

Root Cause

Environment

Approach 3: Manual register() with IITM (Failed)

Attempt

Result

Impact

Suggested Solutions

Option A: Built-in OTel Hook Point in pi-ai

Option B: Fix IITM Compatibility with pi-ai

Option C: Expose LLM Call Events on the Plugin API

Option D: Native OTel Support in OpenClaw

Reproduction

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Capability	With OpenLLMetry	Current Workaround
Per-LLM-call spans	✅ Individual `anthropic.chat` spans	❌ Only aggregate `openclaw.agent.turn`
Token usage	✅ Per-call `gen_ai.usage.*` attributes	⚠️ Summed across all calls in a turn
Request/response content	✅ `gen_ai.content.prompt/completion`	❌ Not available
Model per call	✅ Per-call `gen_ai.request.model`	⚠️ Last model in turn only
Latency per LLM call	✅ Individual call duration	❌ Only full turn duration
Streaming vs non-streaming	✅ Distinguished	❌ Not visible
Multiple LLM calls per turn	✅ Each call is a separate span	❌ All merged into one span
Standard GenAI dashboards	✅ Compatible	❌ Custom dashboards required

Uh oh!

[Feature]: Support OpenTelemetry GenAI Auto-Instrumentation (OpenLLMetry / IITM) #7312

Description

Summary

What We Tried to Achieve

Approach 1: Plugin-Side SDK Patching (Failed)

Attempt

Why It Failed — ESM/CJS Module Isolation

Additional Constraint — jiti Blocks Dynamic Import

Approach 2: NODE_OPTIONS Preload with IITM (Failed)

Attempt

Why It Failed — IITM Breaks pi-ai

Root Cause

Environment

Approach 3: Manual register() with IITM (Failed)

Attempt

Result

Impact

Suggested Solutions

Option A: Built-in OTel Hook Point in pi-ai

Option B: Fix IITM Compatibility with pi-ai

Option C: Expose LLM Call Events on the Plugin API

Option D: Native OTel Support in OpenClaw

Reproduction

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions