Skip to content

embedded agent tool-calling is unreliable across volcengine-plan/kimi failover after successful toolResult #40742

@woodenxyz

Description

@woodenxyz

Summary

Tool-calling is unreliable in embedded agent runs across provider failover.

In my local setup on OpenClaw 2026.3.7, I can reproduce all of the following within the same session:

  1. volcengine-plan/ark-code-latest does produce real toolCall + toolResult
  2. the same run later gets aborted / times out before a stable final assistant response
  3. OpenClaw then attempts to continue via fallback / retry paths
  4. kimi-coding/k2p5 may either:
    • reply TOOL_UNAVAILABLE without attempting a tool call, or
    • answer from prior context instead of issuing a fresh tool call, or
    • hit provider rate limits / timeouts during continuation

So this does not look like the older "tools parameter is never sent" bug from #8923.
It looks more like a session / embedded-run / failover reliability problem after tool execution has already started or completed.

Environment

  • OpenClaw version: 2026.3.7
  • OS: macOS arm64
  • Primary model during repro: volcengine-plan/ark-code-latest
  • Fallback model during repro: kimi-coding/k2p5
  • Provider API types involved:
    • volcengine-plan: openai-completions
    • kimi-coding: anthropic-messages

Minimal Prompt Used

Use exactly one available tool to inspect the current working directory. Do not simulate a tool call or reuse prior results. If tool invocation is unavailable, reply with TOOL_UNAVAILABLE.

What I Observed

1. Volcengine first fails with connection error

Session file recorded:

{"stopReason":"error","errorMessage":"Connection error."}

2. Volcengine then successfully emits a real tool call

Recorded in session JSONL:

  • assistant emits toolCall with name:"exec"
  • tool result is recorded immediately after

Excerpt:

{"type":"toolCall","name":"exec","arguments":{"command":"pwd && ls -la"}}

followed by:

{"role":"toolResult","toolName":"exec","isError":false}

3. Later tagged repro ([RUN V1]) also succeeds at toolCall + toolResult

Again, Volcengine emitted a real exec call and OpenClaw recorded the corresponding toolResult.

4. But the embedded run still ends as aborted / timed out

After the successful tool result, the same run was still marked aborted / timed out:

{"customType":"openclaw:prompt-error","data":{"error":"aborted"}}

and gateway logs showed:

[agent/embedded] embedded run timeout ... timeoutMs=45000
FailoverError: LLM request timed out.

5. Kimi fallback / continuation is inconsistent

In the same session history, after switching to / continuing with kimi-coding/k2p5, I observed:

  • one response that returned:
TOOL_UNAVAILABLE

with no fresh toolCall recorded for that turn

  • another response that answered from prior context instead of clearly issuing a fresh tool call
  • a provider-side 429 during continuation:
{"errorMessage":"429 {\"error\":{\"type\":\"rate_limit_error\",...}}"}

Why This Seems Distinct From Existing Issues

Expected Behavior

If a model successfully emits a tool call and OpenClaw records a valid tool result, then one of the following should happen deterministically:

  1. the same run completes cleanly with a final assistant response, or
  2. the run fails in a way that preserves coherent session state for the next retry / fallback attempt

Fallback / continuation should not degrade into:

  • aborted runs after successful tool execution
  • stale-context answers instead of fresh tool calls
  • TOOL_UNAVAILABLE from the fallback model when tools are in fact available in the session

Actual Behavior

Successful tool execution can still be followed by:

  • aborted
  • LLM request timed out
  • FailoverError: LLM request timed out
  • fallback continuation that no longer behaves consistently with available tools

Related Issues

Suggested Areas To Inspect

  • embedded run timeout behavior after a successful toolResult
  • failover / continuation serialization between provider adapters (openai-completions -> anthropic-messages)
  • whether tool availability / tool schema state is preserved correctly across aborted runs
  • whether continuation prompts after timeout are causing models to infer from context instead of issuing tool calls

Local Evidence

I can provide the exact session JSONL / timestamps if helpful, but the key repro facts are already visible locally:

  • real toolCall + toolResult for volcengine-plan/ark-code-latest
  • later aborted for the same run
  • subsequent fallback / continuation instability with kimi-coding/k2p5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions