Skip to content

fix: multi-tool-call results incorrectly dropped causing LLM API 400#730

Merged
yinwm merged 1 commit intosipeed:mainfrom
winterfx:main
Feb 25, 2026
Merged

fix: multi-tool-call results incorrectly dropped causing LLM API 400#730
yinwm merged 1 commit intosipeed:mainfrom
winterfx:main

Conversation

@winterfx
Copy link
Contributor

@winterfx winterfx commented Feb 24, 2026

📝 Description

When an assistant message contains multiple tool calls (e.g. ToolCalls: ["A", "B"]), the second and subsequent tool results were incorrectly dropped by sanitizeHistoryForProvider. The old code only checked the immediate predecessor, which for the second tool result was another tool result — not the assistant message — causing it to be treated as orphaned and removed. This led to a mismatch between assistant tool calls and tool results, resulting in LLM API 400 errors.

The fix walks backwards over preceding tool messages to find the nearest assistant with ToolCalls. Added unit tests covering key edge cases.

🗣️ Type of Change

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 📖 Documentation update
  • ⚡ Code refactoring (no functional changes, no api changes)

🤖 AI Code Generation

  • 🤖 Fully AI-generated (100% AI, 0% Human)
  • 🛠️ Mostly AI-generated (AI draft, Human verified/modified)
  • 👨‍💻 Mostly Human-written (Human lead, AI assisted or none)

🔗 Related Issue

#704

📚 Technical Context (Skip for Docs)

  • Reference URL: N/A
  • Reasoning: sanitizeHistoryForProvider used last := sanitized[len(sanitized)-1] to check if a tool result has a matching assistant. For multi-tool-call, tool results are consecutive([assistant(A,B), tool(A), tool(B)]), so tool(B)'s immediate predecessor is tool(A), not the assistant. The fix uses a backward loop that skips over tool messages to locate the originating assistant.

🧪 Test Environment

  • Hardware: Mac
  • OS: macOS 26
  • Model/Provider: OpenAI/Gemini/Moonshot AI
  • Channels: N/A

📸 Evidence (Optional)

Click to view Logs/Screenshots

☑️ Checklist

  • My code/docs follow the style of this project.
  • I have performed a self-review of my own changes.
  • I have updated the documentation accordingly.

…vider

Walk backwards over preceding tool messages to find the nearest assistant
with ToolCalls, instead of only checking the immediate predecessor. Add
unit tests for sanitizeHistoryForProvider covering key edge cases.
@winterfx
Copy link
Contributor Author

winterfx commented Feb 24, 2026

This bug is more severe than it appears. Once a multi-tool-call turn exists in session history, all subsequent requests on that workspace will fail permanently — even after restartingpicoclaw. The session data on disk is correct, but sanitizeHistoryForProvider corrupts it on every load by dropping the 2nd+ tool results. Restarting doesn't help because the same broken sanitization runs again.

The only recovery is manually deleting the session file. Users with no knowledge of the internal file structure are completely stuck.

Trigger condition is simple: any LLM response with ≥2 parallel tool_calls (e.g. web_search + web_fetch in one turn). The next message will fail with a 400 error on every provider —OpenAI, Gemini — all reject the malformed history.

Reproduce

🦞 You: 同时做两件事:1) 搜索"Rust vs Go 2024" 2) 访问 https://www.rust-lang.org
2026/02/24 18:17:00 [2026-02-24T10:17:00Z] [INFO] agent: Processing message from cli:cron: 同时做两件事:1) 搜索"Rust vs Go 2024" 2) 访问 https://www.rust-lang.org {channel=cli, chat_id=direct, sender_id=cron, session_key=cli:default}
2026/02/24 18:17:00 [2026-02-24T10:17:00Z] [INFO] agent: Routed message {agent_id=main, session_key=agent:main:main, matched_by=default}
2026/02/24 18:17:02 [2026-02-24T10:17:02Z] [INFO] agent: LLM requested tool calls {agent_id=main, tools=[web_search web_fetch], count=2, iteration=1}
2026/02/24 18:17:02 [2026-02-24T10:17:02Z] [INFO] agent: Tool call: web_search({"count":5,"query":"Rust vs Go 2024"}) {iteration=1, agent_id=main, tool=web_search}
2026/02/24 18:17:02 [2026-02-24T10:17:02Z] [INFO] tool: Tool execution started {tool=web_search, args=map[count:5 query:Rust vs Go 2024]}
2026/02/24 18:17:03 [2026-02-24T10:17:03Z] [INFO] tool: Tool execution completed {tool=web_search, duration_ms=1329, result_length=2087}
2026/02/24 18:17:03 [2026-02-24T10:17:03Z] [INFO] agent: Tool call: web_fetch({"maxChars":4000,"url":"https://www.rust-lang.org"}) {agent_id=main, tool=web_fetch, iteration=1}
2026/02/24 18:17:03 [2026-02-24T10:17:03Z] [INFO] tool: Tool execution started {tool=web_fetch, args=map[maxChars:4000 url:https://www.rust-lang.org]}
2026/02/24 18:17:04 [2026-02-24T10:17:04Z] [INFO] tool: Tool execution completed {result_length=85, tool=web_fetch, duration_ms=1050}
2026/02/24 18:17:10 [2026-02-24T10:17:10Z] [INFO] agent: LLM response without tool calls (direct answer) {agent_id=main, iteration=2, content_chars=1226}
2026/02/24 18:17:10 [2026-02-24T10:17:10Z] [INFO] agent: Response: ### 1) 搜索 “Rust vs Go 2024”
Top 结果(5条):
1. GeeksforGeeks — *Golang vs Rust: Top Differences*  
   https://www.geeksfo... {agent_id=main, session_key=agent:main:main, iterations=2, final_length=1226}

🦞 ### 1) 搜索 “Rust vs Go 2024”
Top 结果(5条):
1. GeeksforGeeks — *Golang vs Rust: Top Differences*  
   https://www.geeksforgeeks.org/blogs/golang-vs-rust-top-differences/
2. LogRocket — *Go vs. Rust: When to use Rust and when to use Go*  
   https://blog.logrocket.com/go-vs-rust-when-use-rust-when-use-go/
3. JetBrains RustRover Blog(偏 2025/2026)— *Rust vs Go: Which One to Choose in 2025*  
   https://blog.jetbrains.com/rust/2025/06/12/rust-vs-go/
4. DevGenius (Medium) — *Rust vs. Go: A Comprehensive Technical Comparison (2024 Edition)*  
   https://blog.devgenius.io/rust-vs-go-a-comprehensive-technical-comparison-2024-edition-1ccf0490fdea
5. DEV Community — *Rust vs Go — Which Should You Learn…*  
   https://dev.to/thatcoolguy/rust-vs-go-which-should-you-choose-in-2024-50k5

### 2) 访问 https://www.rust-lang.org
页面要点(首页内容摘要):
- Rust 的定位:强调 **性能**、**可靠性**、**生产力**(performance / reliability / productivity)。
- 提供下载与入门入口(安装 Rust 工具链、文档/学习资源等)。
- 介绍生态与应用场景(系统编程、CLI、服务端、WebAssembly 等),并引导到文档与社区资源。

🦞 You: 好的
2026/02/24 18:18:49 [2026-02-24T10:18:49Z] [INFO] agent: Processing message from cli:cron: 好的 {channel=cli, chat_id=direct, sender_id=cron, session_key=cli:default}
2026/02/24 18:18:49 [2026-02-24T10:18:49Z] [INFO] agent: Routed message {agent_id=main, session_key=agent:main:main, matched_by=default}
2026/02/24 18:18:51 [2026-02-24T10:18:51Z] [ERROR] agent: LLM call failed {agent_id=main, iteration=1, error=API request failed:
  Status: 400
  Body:   {"error":{"message":"litellm.BadRequestError: litellm.ContentPolicyViolationError: litellm.ContentPolicyViolationError: AzureException - An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_HnPG3wHn0cEKgKneUYEv4RMP\nmodel=gpt-5.2. content_policy_fallback=None. fallbacks=None.\n\nSet 'content_policy_fallback' - https://docs.litellm.ai/docs/routing#fallbacks. Received Model Group=gpt-5.2\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"400","provider_specific_fields":{"innererror":null}}}}
Error: LLM call failed after retries: API request failed:
  Status: 400
  Body:   {"error":{"message":"litellm.BadRequestError: litellm.ContentPolicyViolationError: litellm.ContentPolicyViolationError: AzureException - An assistant message with 'tool_calls' must be followed by tool messages responding to each 'tool_call_id'. The following tool_call_ids did not have response messages: call_HnPG3wHn0cEKgKneUYEv4RMP\nmodel=gpt-5.2. content_policy_fallback=None. fallbacks=None.\n\nSet 'content_policy_fallback' - https://docs.litellm.ai/docs/routing#fallbacks. Received Model Group=gpt-5.2\nAvailable Model Group Fallbacks=None","type":null,"param":null,"code":"400","provider_specific_fields":{"innererror":null}}}

@Zhaoyikaiii
Copy link
Collaborator

this change fixes a critical edge case in sanitizeHistoryForProvider() — consecutive tool messages produced by multi-tool calls, which is common across providers and can break Anthropic/Claude validation.

Copy link
Collaborator

@yinwm yinwm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants