Open WebUI merges tool responses into assistant messages, causing hallucinations #21098

Bickio · 2026-02-02T00:31:34Z

Bickio
Feb 2, 2026

This discussion is a spin-off of the brief conversation which happened here: #20600 (comment)

Description of the issue

During the processing of a single message, Open WebUI correctly uses the OpenAI API format, with interleaved role: "assistant" messages with tool calls and role: "tool" messages with the tool outputs
However, once the user has sent another message, Open WebUI incorrectly merges all consecutive assistant and tool messages into a single assistant message

Here's an example of an initial message with multiple tool calls, as sent to litellm:

{
  "model": "Claude Sonnet 4.5",
  "tools": [
    ...
  ],
  "stream": true,
  "messages": [
    {
      "role": "system",
      "content": "[...]"
    },
    {
      "role": "user",
      "content": "How many customers do we have?"
    },
    {
      "role": "assistant",
      "content": "I'll help you find out how many customers we have. Let me first check what data is available.",
      "tool_calls": [
        {
          "id": "toolu_01PNNJfVpmqwa5p8RYGoA9UQ",
          "type": "function",
          "index": 0,
          "function": {
            "name": "list_views",
            "arguments": "{}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "[...]",
      "tool_call_id": "toolu_01PNNJfVpmqwa5p8RYGoA9UQ"
    },
    {
      "role": "assistant",
      "content": "",
      "tool_calls": [
        {
          "id": "toolu_016StFZmbRmw9hCaotEV8Sug",
          "type": "function",
          "index": 0,
          "function": {
            "name": "get_view",
            "arguments": "{\"view_name\": \"chat__customers\"}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "[...]",
      "tool_call_id": "toolu_016StFZmbRmw9hCaotEV8Sug"
    },
    {
      "role": "assistant",
      "content": "",
      "tool_calls": [
        {
          "id": "toolu_01AkAt7CtriPqU2YQWvic5qx",
          "type": "function",
          "index": 0,
          "function": {
            "name": "query",
            "arguments": "{\"measures\": [\"chat__customers.dim_customers_count\"]}"
          }
        }
      ]
    },
    {
      "role": "tool",
      "content": "[...]",
      "tool_call_id": "toolu_01AkAt7CtriPqU2YQWvic5qx"
    }
  ],
  "max_tokens": 64000
}

And here's what it's turned into when the user sends a followup message:

{
  "model": "Claude Sonnet 4.5",
  "tools": [
    ...
  ],
  "stream": true,
  "messages": [
    {
      "role": "system",
      "content": ""
    },
    {
      "role": "user",
      "content": "How many customers do we have?"
    },
    {
      "role": "assistant",
      "content": "I'll help you find out how many customers we have. Let me first check what data is available.\n\"&quot;Available Views:\\n\\n- chat__customers\\n  Title: Chat Customers\\n  Customer-centric view for data analysis. Base entity is customers, so includes all customers even those who have never placed an order or subscription. ... (litellm_truncated skipped 29331 chars) ...   ]\\n}&quot;\"\n\"&quot;Rows 1-1 of 1:\\n\\nchat__customers.dim_customers_count\\r\\n1234\\n&quot;\"\nPetDirect has **1,234 customers** in total."
    },
    {
      "role": "user",
      "content": "How many new customers this year?"
    }
  ],
  "max_tokens": 64000
}

Some things to note:

The interleaved assistant and tool messages have been condensed down to a single role: "assistant" message
Only the tool output is included, the tool calls themselves are entirely missing
The tool output is HTML escaped - this is being fixed here: fix: decode HTML entities in tool call results for multi-turn conversations #20755

Impact

The impact of this API misuse is debated.

Is it worth implementing it? Nobody has issues with the current behaviour, otherwise someone would have flagged it, and it requires significant work for - ... for no performance gain and not to fix a bug either.

Logically, it's not hard to see how injecting text into previous assistant messages would cause LLMs to produce similar text themselves in subsequent messages. After all, LLMs are fundamentally pattern matching engines, and the influence of "I did this before and it worked, I should keep doing it" will always be present.

My experience is that Claude 4.5 models start hallucinating injected tool outputs after 5-6 messages, while Gemini models seem more resistant.

The hallucination is just as easy to reproduce with the HTML escaping fix applied, so this IS NOT simply an issue with the escaping. In fact, while I don't have conclusive data to prove this, it seems that the LLM is slightly more easily swayed when the injected tool output is correctly unescaped.

I also tried alleviating the issue via the system prompt, explicitly explaining to the AI how the output injection works, what to expect, and how to behave. This was unsuccessful, and actually made the problem occur sooner.

Related Issues

I've previously observed a similar issue with preservation of thinking output. I'm currently unable to use Claude models with thinking enabled, because the thinking output is not correctly preserved for followup messages. The underlying cause is the same. I submitted a PR here which was rejected: #18478

Conclusion

This issue is pretty debilitating for use of Open WebUI in my org, as users have to be specifically trained in how to spot these hallucinations and how to deal with them.

I'm quite happy to get my hands dirty and help fix this issue myself on my employer's time, however I would want some guarantees that my PR wouldn't simply be rejected before I begin.

TL;DR: Open WebUI doesn't preserve the original OpenAI API structured messages across multiple sessions/user turns, leading to tool call hallucinations, and preventing use of thinking models

Bickio · 2026-02-02T03:00:42Z

Bickio
Feb 2, 2026
Author

Looks like there was a previous attempt to address this here which has not been merged: https://github.com/open-webui/open-webui/pull/19578/changes

0 replies

Koumi460 · 2026-02-02T19:18:18Z

Koumi460
Feb 2, 2026

I can confirm I am also seeing issues where the model after many turns with heavy tool calling starts hallucinating tool responses in it's assistant's response instead of calling the tool correctly.

Some models will handle this better and some worse I guess. But it appears to me that it would be better to solve this. One side is the confusion of the model, and the other side is prompt caching - when the format of the tool response changes in the conversation history, the cache will miss and force it to re-process the tool response call once injected into the assistant's message. If the tool call response is very long, this could have meaningful impact on responsiveness and potentially costs. At least that is my speculation / current understanding.

I have created a fork of main branch and started playing around with how to fix this, eventually finding out about and getting inspiration from the PR#19578. I managed to get to good working condition, but I will be testing it like this to see if the issues are fixed. Looking at the calls and preliminary testing, it all looks good to me. Feel free to test it out:
Fork with fixed tool call history and parallel call handling

2 replies

pfn Feb 15, 2026

Hopefully this gets more traction, this incorrect tool call handling is becoming unbearable and I'm about to stop using openwebui in favor of finding something else.

Bickio Feb 15, 2026
Author

@pfn I find it near impossible to understand what's in scope, but from what I can make out, the new /responses mode is intended to fix the tool calling issue eventually. Unfortunately, when that mode is enabled, tool calling doesn't work at all, but I'm told (via having my issue closed as dupe) that this will be resolved here: #21340

Classic298 · 2026-02-15T22:05:33Z

Classic298
Feb 15, 2026
Collaborator

Tim is working on it guys.

0 replies

Classic298 · 2026-02-15T22:38:04Z

Classic298
Feb 15, 2026
Collaborator

Should be resolved by f2aca78

0 replies

Classic298 · 2026-02-15T22:50:55Z

Classic298
Feb 15, 2026
Collaborator

yeah tested it this introduces the tool role and it works as it should work based on reported above

0 replies

Bickio · 2026-02-15T23:20:09Z

Bickio
Feb 15, 2026
Author

Great news @Classic298. Do you know when the next minor release will be?

1 reply

Classic298 Feb 15, 2026
Collaborator

very soon

Bickio · 2026-02-16T02:23:03Z

Bickio
Feb 16, 2026
Author

@Classic298 @pfn Just confirming that I have tested and confirmed the fix using docker image built by the 0.8.2 release PR (e10e7d0), and it works perfectly - the API requests are now interleaved as expected between role: assistant messages with tool calls, and role: tool messages with the output.

Note to avoid any confusion: The fix is for the completions API - completely unrelated to the /responses mode

1 reply

pfn Feb 16, 2026

awesome, I'm off to try 0.8.2

Uh oh!

Open WebUI merges tool responses into assistant messages, causing hallucinations #21098

Uh oh!

Bickio Feb 2, 2026

Description of the issue

Impact

Related Issues

Conclusion

Replies: 7 comments · 4 replies

Uh oh!

Bickio Feb 2, 2026 Author

Uh oh!

Koumi460 Feb 2, 2026

Uh oh!

pfn Feb 15, 2026

Uh oh!

Bickio Feb 15, 2026 Author

Uh oh!

Classic298 Feb 15, 2026 Collaborator

Uh oh!

Uh oh!

Classic298 Feb 15, 2026 Collaborator

Uh oh!

Classic298 Feb 15, 2026 Collaborator

Uh oh!

Bickio Feb 15, 2026 Author

Uh oh!

Classic298 Feb 15, 2026 Collaborator

Uh oh!

Bickio Feb 16, 2026 Author

Uh oh!

pfn Feb 16, 2026

Bickio
Feb 2, 2026

Replies: 7 comments 4 replies

Bickio
Feb 2, 2026
Author

Koumi460
Feb 2, 2026

Bickio Feb 15, 2026
Author

Classic298
Feb 15, 2026
Collaborator

Classic298
Feb 15, 2026
Collaborator

Classic298
Feb 15, 2026
Collaborator

Bickio
Feb 15, 2026
Author

Classic298 Feb 15, 2026
Collaborator

Bickio
Feb 16, 2026
Author