Skip to content

[Feature]: add call-level model hooks for accurate tracing and debugging #57653

@totoyang

Description

@totoyang

Summary

OpenClaw already provides call-level tool hooks such as before_tool_call / after_tool_call, but model-side hooks appear to be higher-level. In practice, llm_input / llm_output seem to represent an overall prompt attempt rather than each real provider model call. For tool-using turns, this makes it hard to trace the actual execution sequence and payloads. It would be very helpful to add non-breaking call-level model hooks, such as before_model_call / after_model_call, so each real model invocation can be observed individually.

Problem to solve

A single agent turn may involve multiple real model calls:

  • one model call to decide which tools to use
  • one or more tool calls
  • another model call to produce the final answer

Today, tool calls can be observed individually, but model calls cannot. As a result:

  • multiple real model calls may be collapsed into one higher-level event
  • tool-loop sequencing is hard to reconstruct accurately
  • observed model input/output may differ from the actual provider-facing request/response
  • it is hard to distinguish “tool-selection” model calls from “final-answer” model calls

This makes accurate tracing, debugging, and observability integrations difficult.

Proposed solution

Add a new non-breaking pair of model call lifecycle hooks:

  • before_model_call
  • after_model_call

These would complement the existing:

  • before_tool_call
  • after_tool_call

and should ideally fire at the boundary where the real provider request is assembled and sent.

Suggested fields:

  • runId
  • sessionId
  • provider
  • model
  • api
  • callId
  • requestPayload
  • responsePayload
  • error
  • durationMs

The key goal is for each event to correspond to one real model invocation, not one overall prompt attempt.

Alternatives considered

1. Reconstruct real model calls from llm_input / llm_output

This is not reliable because final provider payloads may depend on transcript sanitation, turn validation, provider-specific formatting, and retry/repair logic.

2. Keep using only llm_input / llm_output

This is sufficient for coarse observability, but not for accurate tracing of multi-call tool loops.

3. Change the semantics of existing llm_input / llm_output

This seems riskier for backward compatibility and would make the current hook semantics less clear.

Impact

This would improve:

  • tracing of multi-step tool loops
  • provider-payload-level debugging
  • observability / telemetry integrations
  • failure investigation and replay
  • consistency between model-call and tool-call lifecycle tracing

It would also provide a cleaner and more symmetric observability model: tool calls already have call-level hooks, and model calls would have them too.

Evidence/examples

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions