Plugin Hooks: Missing trace context for observability (messageId, runId in all hooks, parentSpanId)

# Plugin Hooks: Missing trace context fields for distributed tracing

## Problem

The current Plugin Hooks event/context data is **insufficient for building accurate distributed traces**, particularly in these scenarios:

1. **Concurrent messages in group chats**: Multiple users sending messages simultaneously - cannot distinguish which hook event belongs to which user
2. **Span parent-child relationships**: Cannot determine if LLM/Tool spans are inside agent.run
3. **Cross-hook correlation**: Some hooks lack key identifiers (e.g., `agent_end` has no `runId`)

This prevents observability plugins (like opik-openclaw) from building **accurate trace trees** based on plugin hooks.

---

## Scenario 1: Concurrent Group Chat Messages (Critical)

### Background
- Feishu group chat (groupId: `group-abc123`)
- User A and User B send messages at the same time

### Problem

```
[t0    ] User A sends: "Check the weather"
[t0+5ms] User B sends: "Write some code"

OpenClaw processing:
[t0+20 ] before_agent_start event
           sessionKey: "feishu:group:group-abc123"  ← Same for both users
           prompt: "Check the weather"
           ❌ No senderId
           ❌ No messageId
           ❌ No runId

[t0+25 ] before_agent_start event
           sessionKey: "feishu:group:group-abc123"  ← Same
           prompt: "Write some code"
           ❌ No senderId
           ❌ No messageId

[t0+50 ] llm_input event (User A)
           sessionKey: "feishu:group:group-abc123"  ← Same
           runId: "???"          ← Critical! If same, cannot distinguish

[t0+55 ] llm_input event (User B)
           sessionKey: "feishu:group:group-abc123"  ← Same
           runId: "???"
```

### Impact
- ❌ Plugins cannot determine which `before_agent_start` corresponds to which `llm_input`
- ❌ Cannot build separate traces (User A and B's calls get mixed together)
- ❌ Observability completely fails in group chat scenarios

---

## Scenario 2: Missing Span Parent-Child Relationships

### Problem
Cannot determine if LLM/Tool spans are inside agent.run

### Current hook data

```typescript
before_agent_start context: {
  sessionKey: "session-123",
  // ❌ No spanId (cannot pass to subsequent hooks)
}

llm_input context: {
  sessionKey: "session-123",
  // ❌ No parentSpanId (cannot determine parent)
  // ❌ No agentRunSpanId
}
```

### Ideal trace structure

```
openclaw.request.handle (root)
├─ openclaw.message.routing
├─ openclaw.agent.run            ← Should be a child span
│  ├─ openclaw.llm               ← Should nest under agent.run
│  └─ openclaw.tool.Read         ← Should nest
└─ openclaw.message.send
```

### Actual achievable structure

```
openclaw.request.handle (root)
├─ openclaw.message.routing      ← Flat
├─ openclaw.llm                  ← Flat (cannot determine if inside agent)
├─ openclaw.tool.Read            ← Flat
└─ openclaw.message.send         ← Flat
```

Without `parentSpanId`, plugins can only create flat span lists, not hierarchical trace trees.

---

## Scenario 3: Cross-Hook Correlation Failure

### Problem
`agent_end` has no `runId`

```typescript
llm_input event: {
  runId: "run-abc123",   // ✅ Has runId
  ...
}

agent_end event: {
  success: true,
  durationMs: 2000,
  // ❌ No runId
}
```

### Impact
- ❌ Cannot correlate llm_input with agent_end via runId
- ❌ Cannot precisely determine agent boundaries
- ❌ Plugins must use heuristics (time windows, state tracking) which are unreliable

---

## Proposed Solution

### 1. Add common trace fields to all Hook Contexts

```typescript
export type PluginHookAgentContext = {
  agentId?: string;
  sessionKey?: string;
  sessionId?: string;

  // ✅ NEW: Unique message identifier
  messageId?: string;        // Links to specific message (from inbound_claim)
  senderId?: string;         // Distinguishes different users (group chat)

  // ✅ NEW: Unified runId
  runId?: string;            // Present in ALL hooks (including before_agent_start, agent_end)

  // ✅ NEW: Span nesting information
  parentSpanId?: string;     // Parent span of current operation
  currentSpanId?: string;    // Current span (for use by subsequent hooks)
  callDepth?: number;        // Call depth (0 = root, 1 = agent, 2 = subagent)

  // Existing fields
  workspaceDir?: string;
  messageProvider?: string;
  trigger?: string;
  channelId?: string;
};
```

---

### 2. Add runId to all Hook Events (minimum requirement)

**Hooks currently WITH runId**:
- ✅ llm_input
- ✅ llm_output
- ⚠️ before_tool_call (optional)
- ⚠️ after_tool_call (optional)

**Hooks currently WITHOUT runId**:
- ❌ before_agent_start
- ❌ agent_end
- ❌ before_compaction
- ❌ after_compaction
- ❌ All message hooks

**Suggestion**:
```typescript
export type PluginHookBeforeAgentStartEvent = {
  prompt: string;
  messages?: unknown[];
  runId: string;          // ✅ NEW: Required
};

export type PluginHookAgentEndEvent = {
  messages: unknown[];
  success: boolean;
  error?: string;
  durationMs?: number;
  runId: string;          // ✅ NEW: Required
};
```

---

### 3. Add correlation fields to message hooks

```typescript
export type PluginHookMessageSendingEvent = {
  to: string;
  content: string;
  metadata?: Record<string, unknown>;

  // ✅ NEW: Correlation fields
  sessionKey?: string;    // Link to session
  runId?: string;         // Link to agent run
  messageId?: string;     // Link to inbound message
};
```

---

## Priority

### P0 (Critical - Correctness Issues)

1. ✅ Add `runId` to `before_agent_start`
2. ✅ Add `runId` to `agent_end`
3. ✅ Add `messageId` and `senderId` to all contexts

**Impact**:
- Group chat concurrent scenarios completely unable to distinguish traces
- Agent boundaries cannot be precisely determined

---

### P1 (Important - Usability Issues)

4. ✅ Add `parentSpanId` to contexts
5. ✅ Add `sessionKey` to message hooks

**Impact**:
- Cannot build nested trace trees
- message_sending/sent cannot link to sessions

---

### P2 (Enhancement - Improved Features)

6. ✅ Add `callDepth`
7. ✅ Add `currentSpanId`

**Impact**:
- Cannot automatically infer nesting levels
- Plugins must maintain span context themselves

---

## Reproduction

### Test Code

```typescript
const sessionKey = "feishu:group:group-123";

// Simulate User A
emit('inbound_claim', {
  senderId: 'user-A',
  messageId: 'msg-A'
}, { channelId: 'feishu' });

emit('before_agent_start', {
  prompt: 'Check weather'
}, {
  sessionKey,  // ← Same
  // ❌ No senderId
  // ❌ No messageId
});

// Simulate User B (concurrent)
emit('inbound_claim', {
  senderId: 'user-B',
  messageId: 'msg-B'
}, { channelId: 'feishu' });

emit('before_agent_start', {
  prompt: 'Write code'
}, {
  sessionKey,  // ← Same
  // ❌ Cannot distinguish User A from User B
});
```

**Expected**: Two separate traces
**Actual**: Plugins cannot distinguish, must use correlation guessing (error-prone)

---

## Benefits of Adding These Fields

With these fields, plugins can:

✅ **Accurately distinguish concurrent group messages**
```typescript
before_agent_start context: {
  sessionKey: "group-123",
  messageId: "msg-A",      // ✅ Links to specific message
  senderId: "user-A",      // ✅ Distinguishes users
  runId: "run-abc",        // ✅ Unique identifier
}
```

✅ **Build accurate nested traces**
```typescript
llm_input context: {
  sessionKey: "session-123",
  parentSpanId: "span-agent-run",  // ✅ Explicit parent
  currentSpanId: "span-llm-1",     // ✅ For subsequent hooks
}
```

✅ **Precisely correlate across hooks**
```typescript
agent_end event: {
  success: true,
  runId: "run-abc",        // ✅ Same runId as llm_input
}
```

---

## Context

We're implementing an observability plugin (extending opik-openclaw) and discovered these limitations.

The diagnostics event system has some data but lacks tool call spans (see opik-openclaw issue #37), while plugin hooks provide fine-grained events but lack unique identifiers and nesting context.

**Ideally**: Plugin hooks should have both fine-grained events AND complete trace context, enabling third-party plugins to build complete, accurate distributed traces.

---

## Related

- opik-openclaw issue #37: Missing LLM spans for each tool call
- Diagnostics system limitations: Missing tool call spans
- Only subagent hooks currently have reliable parent/child relationships (childSessionKey + requesterSessionKey)

---

## 🔍 Code Evidence: runId is Generated but Not Passed to All Hooks

### Evidence 1: runId Generation (Per-Message)

**File**: `src/auto-reply/reply/agent-runner-execution.ts:110`

```typescript
export async function runAgentTurnWithFallback(params: {...}) {
  const runId = params.opts?.runId ?? crypto.randomUUID();
  //                                    ↑ New UUID per function call
  
  // Pass to agent runner
  const result = await runEmbeddedPiAgent({
    ...
    runId,  // ✅ runId is available
  });
}
```

**Call chain (per user message)**:
```
User message
  → dispatchReplyFromConfig()
    → getReplyFromConfig()
      → runPreparedReply()
        → runReplyAgent()
          → runAgentTurnWithFallback()  ← Generates runId HERE
            → crypto.randomUUID()
```

**Conclusion**: ✅ **runId is unique per message**, even in group chat concurrent scenarios.

---

### Evidence 2: runId is Available but Not Passed to before_agent_start

**File**: `src/agents/pi-embedded-runner/run/attempt.ts:1197`

```typescript
// before_agent_start hook trigger
hookRunner.runBeforeAgentStart(
  {
    prompt: params.prompt,
    messages: params.messages,
    // ❌ No runId in event (even though params.runId exists!)
  },
  params.hookCtx  // ❌ No runId in context either
)
```

**Compare with llm_input** (line 2485):
```typescript
hookRunner.runLlmInput(
  {
    runId: params.runId,        // ✅ Event has runId
    sessionId: params.sessionId,
    provider: params.provider,
    model: params.modelId,
    prompt: effectivePrompt,
    ...
  },
  { /* context */ }
)
```

**At this point**:
- ✅ `params.runId` exists (generated at line 110 of agent-runner-execution.ts)
- ✅ `before_agent_start` is triggered AFTER runId generation
- ❌ **But runId is not passed to the hook** (type definition doesn't include it)

---

### Evidence 3: agent_end Also Missing runId

Similar situation - runId exists throughout agent execution but is not passed to `agent_end` hook.

---

### Impact on Group Chat Concurrent Messages

**Scenario**:
```
Feishu Group (groupId: "group-123")
[t0    ] User A: "Check weather"  → runId: "uuid-aaa"
[t0+5ms] User B: "Write code"     → runId: "uuid-bbb"  ← Different!

Both messages:
  sessionKey: "feishu:group:group-123"  ← Same
```

**Current problem**:
```typescript
before_agent_start event (User A):
  sessionKey: "feishu:group:group-123"
  prompt: "Check weather"
  // ❌ No runId (cannot distinguish from User B)

before_agent_start event (User B):
  sessionKey: "feishu:group:group-123"
  prompt: "Write code"
  // ❌ No runId (looks identical to plugins!)
```

**If runId were included**:
```typescript
before_agent_start event (User A):
  sessionKey: "feishu:group:group-123"
  runId: "uuid-aaa"  // ✅ Unique!
  prompt: "Check weather"

before_agent_start event (User B):
  sessionKey: "feishu:group:group-123"
  runId: "uuid-bbb"  // ✅ Different! Can distinguish!
  prompt: "Write code"
```

---

## 🎯 Summary

1. ✅ **runId already exists and is unique per message**
2. ✅ **runId is generated before before_agent_start**
3. ❌ **But runId is not included in hook event type definitions**
4. ❌ **This is purely a type definition issue, not a fundamental limitation**

The fix should be straightforward:
- Add `runId: string` to `PluginHookBeforeAgentStartEvent`
- Add `runId: string` to `PluginHookAgentEndEvent`
- Add `runId?: string` to message hook events
- Pass `params.runId` when triggering these hooks

---

---

# 中文

## 问题描述

当前 Plugin Hooks 的 event/context 数据**不足以构建准确的分布式 Trace**，特别是在以下场景：

1. **群聊并发消息**：多个用户同时发消息，无法区分哪个 hook event 属于哪个用户
2. **Span 父子关系**：无法确定 LLM/Tool spans 是否在 agent.run 内部
3. **跨 Hook 关联**：部分 hooks 缺少关键标识符（如 `agent_end` 无 `runId`）

这导致基于 plugin hooks 的可观测性插件（如 opik-openclaw）**无法构建准确的 trace 树**。

---

## 场景 1：群聊并发消息（严重问题）

### 背景
- 飞书群聊（groupId: `group-abc123`）
- 用户 A 和用户 B 同时发送消息

### 问题

群聊的 sessionKey 对所有用户相同（`feishu:group:group-abc123`），且 `before_agent_start`、`llm_input` 等关键 hooks 的 context 中：
- ❌ 没有 `senderId`（无法区分用户）
- ❌ 没有 `messageId`（无法关联到具体消息）
- ❌ `before_agent_start` 没有 `runId`

导致两个用户的并发消息**完全无法区分**，traces 会混在一起。

---

## 场景 2：Span 父子关系缺失

### 问题

Hook events 中没有 `parentSpanId`，无法确定调用的嵌套关系。

理想的 trace 结构应该是嵌套的（agent.run 包含 LLM/Tool），但当前只能做到平铺结构（所有 spans 并列）。

---

## 场景 3：跨 Hook 关联失败

`agent_end` 没有 `runId`，无法与 `llm_input` 精确关联，无法确定 agent 处理边界。

---

## 建议的解决方案

在所有 Hook Context 中添加：
- `messageId` - 关联到具体消息
- `senderId` - 区分不同用户（群聊场景）
- `runId` - 统一的运行标识（所有 hooks 都有）
- `parentSpanId` - 明确的父 span 标识
- `currentSpanId` - 当前 span（供后续 hooks 使用）
- `callDepth` - 调用深度

---

## 优先级

### P0（严重）
- 在 `before_agent_start` 和 `agent_end` 中添加 `runId`
- 在所有 context 中添加 `messageId` 和 `senderId`

### P1（重要）
- 在 context 中添加 `parentSpanId`
- 在 message hooks 中添加 `sessionKey`

### P2（改善）
- 添加 `callDepth` 和 `currentSpanId`

---

## 背景

我们在实现可观测性插件（扩展 opik-openclaw）时发现了这些限制。

诊断事件系统虽然有部分数据，但缺少 tool call spans（见 opik-openclaw issue #37），而 plugin hooks 提供了细粒度事件，但缺少唯一标识和嵌套信息。

**理想情况**：plugin hooks 既有细粒度事件，又有完整的 trace context，这样第三方插件就能构建完整、准确的分布式 trace。

---

## 相关链接

- opik-openclaw issue #37: Missing LLM spans for each tool call
- 当前只有 subagent hooks 有可靠的 parent/child 关系（childSessionKey + requesterSessionKey）

---

Uh oh!

Plugin Hooks: Missing trace context for observability (messageId, runId in all hooks, parentSpanId) #50291

Description

Plugin Hooks: Missing trace context fields for distributed tracing

Problem

Scenario 1: Concurrent Group Chat Messages (Critical)

Background

Problem

Impact

Scenario 2: Missing Span Parent-Child Relationships

Problem

Current hook data

Ideal trace structure

Actual achievable structure

Scenario 3: Cross-Hook Correlation Failure

Problem

Impact

Proposed Solution

1. Add common trace fields to all Hook Contexts

2. Add runId to all Hook Events (minimum requirement)

3. Add correlation fields to message hooks

Priority

P0 (Critical - Correctness Issues)

P1 (Important - Usability Issues)

P2 (Enhancement - Improved Features)

Reproduction

Test Code

Benefits of Adding These Fields

Context

Related

🔍 Code Evidence: runId is Generated but Not Passed to All Hooks

Evidence 1: runId Generation (Per-Message)

Evidence 2: runId is Available but Not Passed to before_agent_start

Evidence 3: agent_end Also Missing runId

Impact on Group Chat Concurrent Messages

🎯 Summary

中文

问题描述

场景 1：群聊并发消息（严重问题）

背景

问题

场景 2：Span 父子关系缺失

问题

场景 3：跨 Hook 关联失败

建议的解决方案

优先级

P0（严重）

P1（重要）

P2（改善）

背景

相关链接

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions