Skip to content

Plugin Hooks: Missing trace context for observability (messageId, runId in all hooks, parentSpanId) #50291

@Hello-World-0X

Description

@Hello-World-0X

Plugin Hooks: Missing trace context fields for distributed tracing

Problem

The current Plugin Hooks event/context data is insufficient for building accurate distributed traces, particularly in these scenarios:

  1. Concurrent messages in group chats: Multiple users sending messages simultaneously - cannot distinguish which hook event belongs to which user
  2. Span parent-child relationships: Cannot determine if LLM/Tool spans are inside agent.run
  3. Cross-hook correlation: Some hooks lack key identifiers (e.g., agent_end has no runId)

This prevents observability plugins (like opik-openclaw) from building accurate trace trees based on plugin hooks.


Scenario 1: Concurrent Group Chat Messages (Critical)

Background

  • Feishu group chat (groupId: group-abc123)
  • User A and User B send messages at the same time

Problem

[t0    ] User A sends: "Check the weather"
[t0+5ms] User B sends: "Write some code"

OpenClaw processing:
[t0+20 ] before_agent_start event
           sessionKey: "feishu:group:group-abc123"  ← Same for both users
           prompt: "Check the weather"
           ❌ No senderId
           ❌ No messageId
           ❌ No runId

[t0+25 ] before_agent_start event
           sessionKey: "feishu:group:group-abc123"  ← Same
           prompt: "Write some code"
           ❌ No senderId
           ❌ No messageId

[t0+50 ] llm_input event (User A)
           sessionKey: "feishu:group:group-abc123"  ← Same
           runId: "???"          ← Critical! If same, cannot distinguish

[t0+55 ] llm_input event (User B)
           sessionKey: "feishu:group:group-abc123"  ← Same
           runId: "???"

Impact

  • ❌ Plugins cannot determine which before_agent_start corresponds to which llm_input
  • ❌ Cannot build separate traces (User A and B's calls get mixed together)
  • ❌ Observability completely fails in group chat scenarios

Scenario 2: Missing Span Parent-Child Relationships

Problem

Cannot determine if LLM/Tool spans are inside agent.run

Current hook data

before_agent_start context: {
  sessionKey: "session-123",
  // ❌ No spanId (cannot pass to subsequent hooks)
}

llm_input context: {
  sessionKey: "session-123",
  // ❌ No parentSpanId (cannot determine parent)
  // ❌ No agentRunSpanId
}

Ideal trace structure

openclaw.request.handle (root)
├─ openclaw.message.routing
├─ openclaw.agent.run            ← Should be a child span
│  ├─ openclaw.llm               ← Should nest under agent.run
│  └─ openclaw.tool.Read         ← Should nest
└─ openclaw.message.send

Actual achievable structure

openclaw.request.handle (root)
├─ openclaw.message.routing      ← Flat
├─ openclaw.llm                  ← Flat (cannot determine if inside agent)
├─ openclaw.tool.Read            ← Flat
└─ openclaw.message.send         ← Flat

Without parentSpanId, plugins can only create flat span lists, not hierarchical trace trees.


Scenario 3: Cross-Hook Correlation Failure

Problem

agent_end has no runId

llm_input event: {
  runId: "run-abc123",   // ✅ Has runId
  ...
}

agent_end event: {
  success: true,
  durationMs: 2000,
  // ❌ No runId
}

Impact

  • ❌ Cannot correlate llm_input with agent_end via runId
  • ❌ Cannot precisely determine agent boundaries
  • ❌ Plugins must use heuristics (time windows, state tracking) which are unreliable

Proposed Solution

1. Add common trace fields to all Hook Contexts

export type PluginHookAgentContext = {
  agentId?: string;
  sessionKey?: string;
  sessionId?: string;

  // ✅ NEW: Unique message identifier
  messageId?: string;        // Links to specific message (from inbound_claim)
  senderId?: string;         // Distinguishes different users (group chat)

  // ✅ NEW: Unified runId
  runId?: string;            // Present in ALL hooks (including before_agent_start, agent_end)

  // ✅ NEW: Span nesting information
  parentSpanId?: string;     // Parent span of current operation
  currentSpanId?: string;    // Current span (for use by subsequent hooks)
  callDepth?: number;        // Call depth (0 = root, 1 = agent, 2 = subagent)

  // Existing fields
  workspaceDir?: string;
  messageProvider?: string;
  trigger?: string;
  channelId?: string;
};

2. Add runId to all Hook Events (minimum requirement)

Hooks currently WITH runId:

  • ✅ llm_input
  • ✅ llm_output
  • ⚠️ before_tool_call (optional)
  • ⚠️ after_tool_call (optional)

Hooks currently WITHOUT runId:

  • ❌ before_agent_start
  • ❌ agent_end
  • ❌ before_compaction
  • ❌ after_compaction
  • ❌ All message hooks

Suggestion:

export type PluginHookBeforeAgentStartEvent = {
  prompt: string;
  messages?: unknown[];
  runId: string;          // ✅ NEW: Required
};

export type PluginHookAgentEndEvent = {
  messages: unknown[];
  success: boolean;
  error?: string;
  durationMs?: number;
  runId: string;          // ✅ NEW: Required
};

3. Add correlation fields to message hooks

export type PluginHookMessageSendingEvent = {
  to: string;
  content: string;
  metadata?: Record<string, unknown>;

  // ✅ NEW: Correlation fields
  sessionKey?: string;    // Link to session
  runId?: string;         // Link to agent run
  messageId?: string;     // Link to inbound message
};

Priority

P0 (Critical - Correctness Issues)

  1. ✅ Add runId to before_agent_start
  2. ✅ Add runId to agent_end
  3. ✅ Add messageId and senderId to all contexts

Impact:

  • Group chat concurrent scenarios completely unable to distinguish traces
  • Agent boundaries cannot be precisely determined

P1 (Important - Usability Issues)

  1. ✅ Add parentSpanId to contexts
  2. ✅ Add sessionKey to message hooks

Impact:

  • Cannot build nested trace trees
  • message_sending/sent cannot link to sessions

P2 (Enhancement - Improved Features)

  1. ✅ Add callDepth
  2. ✅ Add currentSpanId

Impact:

  • Cannot automatically infer nesting levels
  • Plugins must maintain span context themselves

Reproduction

Test Code

const sessionKey = "feishu:group:group-123";

// Simulate User A
emit('inbound_claim', {
  senderId: 'user-A',
  messageId: 'msg-A'
}, { channelId: 'feishu' });

emit('before_agent_start', {
  prompt: 'Check weather'
}, {
  sessionKey,  // ← Same
  // ❌ No senderId
  // ❌ No messageId
});

// Simulate User B (concurrent)
emit('inbound_claim', {
  senderId: 'user-B',
  messageId: 'msg-B'
}, { channelId: 'feishu' });

emit('before_agent_start', {
  prompt: 'Write code'
}, {
  sessionKey,  // ← Same
  // ❌ Cannot distinguish User A from User B
});

Expected: Two separate traces
Actual: Plugins cannot distinguish, must use correlation guessing (error-prone)


Benefits of Adding These Fields

With these fields, plugins can:

Accurately distinguish concurrent group messages

before_agent_start context: {
  sessionKey: "group-123",
  messageId: "msg-A",      // ✅ Links to specific message
  senderId: "user-A",      // ✅ Distinguishes users
  runId: "run-abc",        // ✅ Unique identifier
}

Build accurate nested traces

llm_input context: {
  sessionKey: "session-123",
  parentSpanId: "span-agent-run",  // ✅ Explicit parent
  currentSpanId: "span-llm-1",     // ✅ For subsequent hooks
}

Precisely correlate across hooks

agent_end event: {
  success: true,
  runId: "run-abc",        // ✅ Same runId as llm_input
}

Context

We're implementing an observability plugin (extending opik-openclaw) and discovered these limitations.

The diagnostics event system has some data but lacks tool call spans (see opik-openclaw issue #37), while plugin hooks provide fine-grained events but lack unique identifiers and nesting context.

Ideally: Plugin hooks should have both fine-grained events AND complete trace context, enabling third-party plugins to build complete, accurate distributed traces.


Related

  • opik-openclaw issue Incorrect Timezone in Replies #37: Missing LLM spans for each tool call
  • Diagnostics system limitations: Missing tool call spans
  • Only subagent hooks currently have reliable parent/child relationships (childSessionKey + requesterSessionKey)

🔍 Code Evidence: runId is Generated but Not Passed to All Hooks

Evidence 1: runId Generation (Per-Message)

File: src/auto-reply/reply/agent-runner-execution.ts:110

export async function runAgentTurnWithFallback(params: {...}) {
  const runId = params.opts?.runId ?? crypto.randomUUID();
  //                                    ↑ New UUID per function call
  
  // Pass to agent runner
  const result = await runEmbeddedPiAgent({
    ...
    runId,  // ✅ runId is available
  });
}

Call chain (per user message):

User message
  → dispatchReplyFromConfig()
    → getReplyFromConfig()
      → runPreparedReply()
        → runReplyAgent()
          → runAgentTurnWithFallback()  ← Generates runId HERE
            → crypto.randomUUID()

Conclusion: ✅ runId is unique per message, even in group chat concurrent scenarios.


Evidence 2: runId is Available but Not Passed to before_agent_start

File: src/agents/pi-embedded-runner/run/attempt.ts:1197

// before_agent_start hook trigger
hookRunner.runBeforeAgentStart(
  {
    prompt: params.prompt,
    messages: params.messages,
    // ❌ No runId in event (even though params.runId exists!)
  },
  params.hookCtx  // ❌ No runId in context either
)

Compare with llm_input (line 2485):

hookRunner.runLlmInput(
  {
    runId: params.runId,        // ✅ Event has runId
    sessionId: params.sessionId,
    provider: params.provider,
    model: params.modelId,
    prompt: effectivePrompt,
    ...
  },
  { /* context */ }
)

At this point:

  • params.runId exists (generated at line 110 of agent-runner-execution.ts)
  • before_agent_start is triggered AFTER runId generation
  • But runId is not passed to the hook (type definition doesn't include it)

Evidence 3: agent_end Also Missing runId

Similar situation - runId exists throughout agent execution but is not passed to agent_end hook.


Impact on Group Chat Concurrent Messages

Scenario:

Feishu Group (groupId: "group-123")
[t0    ] User A: "Check weather"  → runId: "uuid-aaa"
[t0+5ms] User B: "Write code"     → runId: "uuid-bbb"  ← Different!

Both messages:
  sessionKey: "feishu:group:group-123"  ← Same

Current problem:

before_agent_start event (User A):
  sessionKey: "feishu:group:group-123"
  prompt: "Check weather"
  // ❌ No runId (cannot distinguish from User B)

before_agent_start event (User B):
  sessionKey: "feishu:group:group-123"
  prompt: "Write code"
  // ❌ No runId (looks identical to plugins!)

If runId were included:

before_agent_start event (User A):
  sessionKey: "feishu:group:group-123"
  runId: "uuid-aaa"  // ✅ Unique!
  prompt: "Check weather"

before_agent_start event (User B):
  sessionKey: "feishu:group:group-123"
  runId: "uuid-bbb"  // ✅ Different! Can distinguish!
  prompt: "Write code"

🎯 Summary

  1. runId already exists and is unique per message
  2. runId is generated before before_agent_start
  3. But runId is not included in hook event type definitions
  4. This is purely a type definition issue, not a fundamental limitation

The fix should be straightforward:

  • Add runId: string to PluginHookBeforeAgentStartEvent
  • Add runId: string to PluginHookAgentEndEvent
  • Add runId?: string to message hook events
  • Pass params.runId when triggering these hooks


中文

问题描述

当前 Plugin Hooks 的 event/context 数据不足以构建准确的分布式 Trace,特别是在以下场景:

  1. 群聊并发消息:多个用户同时发消息,无法区分哪个 hook event 属于哪个用户
  2. Span 父子关系:无法确定 LLM/Tool spans 是否在 agent.run 内部
  3. 跨 Hook 关联:部分 hooks 缺少关键标识符(如 agent_endrunId

这导致基于 plugin hooks 的可观测性插件(如 opik-openclaw)无法构建准确的 trace 树


场景 1:群聊并发消息(严重问题)

背景

  • 飞书群聊(groupId: group-abc123
  • 用户 A 和用户 B 同时发送消息

问题

群聊的 sessionKey 对所有用户相同(feishu:group:group-abc123),且 before_agent_startllm_input 等关键 hooks 的 context 中:

  • ❌ 没有 senderId(无法区分用户)
  • ❌ 没有 messageId(无法关联到具体消息)
  • before_agent_start 没有 runId

导致两个用户的并发消息完全无法区分,traces 会混在一起。


场景 2:Span 父子关系缺失

问题

Hook events 中没有 parentSpanId,无法确定调用的嵌套关系。

理想的 trace 结构应该是嵌套的(agent.run 包含 LLM/Tool),但当前只能做到平铺结构(所有 spans 并列)。


场景 3:跨 Hook 关联失败

agent_end 没有 runId,无法与 llm_input 精确关联,无法确定 agent 处理边界。


建议的解决方案

在所有 Hook Context 中添加:

  • messageId - 关联到具体消息
  • senderId - 区分不同用户(群聊场景)
  • runId - 统一的运行标识(所有 hooks 都有)
  • parentSpanId - 明确的父 span 标识
  • currentSpanId - 当前 span(供后续 hooks 使用)
  • callDepth - 调用深度

优先级

P0(严重)

  • before_agent_startagent_end 中添加 runId
  • 在所有 context 中添加 messageIdsenderId

P1(重要)

  • 在 context 中添加 parentSpanId
  • 在 message hooks 中添加 sessionKey

P2(改善)

  • 添加 callDepthcurrentSpanId

背景

我们在实现可观测性插件(扩展 opik-openclaw)时发现了这些限制。

诊断事件系统虽然有部分数据,但缺少 tool call spans(见 opik-openclaw issue #37),而 plugin hooks 提供了细粒度事件,但缺少唯一标识和嵌套信息。

理想情况:plugin hooks 既有细粒度事件,又有完整的 trace context,这样第三方插件就能构建完整、准确的分布式 trace。


相关链接

  • opik-openclaw issue Incorrect Timezone in Replies #37: Missing LLM spans for each tool call
  • 当前只有 subagent hooks 有可靠的 parent/child 关系(childSessionKey + requesterSessionKey)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions