Replies: 7 comments 4 replies
-
|
Looks like there was a previous attempt to address this here which has not been merged: https://github.com/open-webui/open-webui/pull/19578/changes |
Beta Was this translation helpful? Give feedback.
-
|
I can confirm I am also seeing issues where the model after many turns with heavy tool calling starts hallucinating tool responses in it's assistant's response instead of calling the tool correctly. Some models will handle this better and some worse I guess. But it appears to me that it would be better to solve this. One side is the confusion of the model, and the other side is prompt caching - when the format of the tool response changes in the conversation history, the cache will miss and force it to re-process the tool response call once injected into the assistant's message. If the tool call response is very long, this could have meaningful impact on responsiveness and potentially costs. At least that is my speculation / current understanding. I have created a fork of main branch and started playing around with how to fix this, eventually finding out about and getting inspiration from the PR#19578. I managed to get to good working condition, but I will be testing it like this to see if the issues are fixed. Looking at the calls and preliminary testing, it all looks good to me. Feel free to test it out: |
Beta Was this translation helpful? Give feedback.
-
|
Tim is working on it guys. |
Beta Was this translation helpful? Give feedback.
-
|
Should be resolved by f2aca78 |
Beta Was this translation helpful? Give feedback.
-
|
yeah tested it this introduces the tool role and it works as it should work based on reported above |
Beta Was this translation helpful? Give feedback.
-
|
Great news @Classic298. Do you know when the next minor release will be? |
Beta Was this translation helpful? Give feedback.
-
|
@Classic298 @pfn Just confirming that I have tested and confirmed the fix using docker image built by the 0.8.2 release PR (e10e7d0), and it works perfectly - the API requests are now interleaved as expected between Note to avoid any confusion: The fix is for the completions API - completely unrelated to the /responses mode |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This discussion is a spin-off of the brief conversation which happened here: #20600 (comment)
Description of the issue
role: "assistant"messages with tool calls androle: "tool"messages with the tool outputsHere's an example of an initial message with multiple tool calls, as sent to litellm:
{ "model": "Claude Sonnet 4.5", "tools": [ ... ], "stream": true, "messages": [ { "role": "system", "content": "[...]" }, { "role": "user", "content": "How many customers do we have?" }, { "role": "assistant", "content": "I'll help you find out how many customers we have. Let me first check what data is available.", "tool_calls": [ { "id": "toolu_01PNNJfVpmqwa5p8RYGoA9UQ", "type": "function", "index": 0, "function": { "name": "list_views", "arguments": "{}" } } ] }, { "role": "tool", "content": "[...]", "tool_call_id": "toolu_01PNNJfVpmqwa5p8RYGoA9UQ" }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "toolu_016StFZmbRmw9hCaotEV8Sug", "type": "function", "index": 0, "function": { "name": "get_view", "arguments": "{\"view_name\": \"chat__customers\"}" } } ] }, { "role": "tool", "content": "[...]", "tool_call_id": "toolu_016StFZmbRmw9hCaotEV8Sug" }, { "role": "assistant", "content": "", "tool_calls": [ { "id": "toolu_01AkAt7CtriPqU2YQWvic5qx", "type": "function", "index": 0, "function": { "name": "query", "arguments": "{\"measures\": [\"chat__customers.dim_customers_count\"]}" } } ] }, { "role": "tool", "content": "[...]", "tool_call_id": "toolu_01AkAt7CtriPqU2YQWvic5qx" } ], "max_tokens": 64000 }And here's what it's turned into when the user sends a followup message:
{ "model": "Claude Sonnet 4.5", "tools": [ ... ], "stream": true, "messages": [ { "role": "system", "content": "" }, { "role": "user", "content": "How many customers do we have?" }, { "role": "assistant", "content": "I'll help you find out how many customers we have. Let me first check what data is available.\n\""Available Views:\\n\\n- chat__customers\\n Title: Chat Customers\\n Customer-centric view for data analysis. Base entity is customers, so includes all customers even those who have never placed an order or subscription. ... (litellm_truncated skipped 29331 chars) ... ]\\n}"\"\n\""Rows 1-1 of 1:\\n\\nchat__customers.dim_customers_count\\r\\n1234\\n"\"\nPetDirect has **1,234 customers** in total." }, { "role": "user", "content": "How many new customers this year?" } ], "max_tokens": 64000 }Some things to note:
role: "assistant"messageImpact
The impact of this API misuse is debated.
Logically, it's not hard to see how injecting text into previous assistant messages would cause LLMs to produce similar text themselves in subsequent messages. After all, LLMs are fundamentally pattern matching engines, and the influence of "I did this before and it worked, I should keep doing it" will always be present.
My experience is that Claude 4.5 models start hallucinating injected tool outputs after 5-6 messages, while Gemini models seem more resistant.
The hallucination is just as easy to reproduce with the HTML escaping fix applied, so this IS NOT simply an issue with the escaping. In fact, while I don't have conclusive data to prove this, it seems that the LLM is slightly more easily swayed when the injected tool output is correctly unescaped.
I also tried alleviating the issue via the system prompt, explicitly explaining to the AI how the output injection works, what to expect, and how to behave. This was unsuccessful, and actually made the problem occur sooner.
Related Issues
I've previously observed a similar issue with preservation of thinking output. I'm currently unable to use Claude models with thinking enabled, because the thinking output is not correctly preserved for followup messages. The underlying cause is the same. I submitted a PR here which was rejected: #18478
Conclusion
This issue is pretty debilitating for use of Open WebUI in my org, as users have to be specifically trained in how to spot these hallucinations and how to deal with them.
I'm quite happy to get my hands dirty and help fix this issue myself on my employer's time, however I would want some guarantees that my PR wouldn't simply be rejected before I begin.
TL;DR: Open WebUI doesn't preserve the original OpenAI API structured messages across multiple sessions/user turns, leading to tool call hallucinations, and preventing use of thinking models
Beta Was this translation helpful? Give feedback.
All reactions