issue: missing tokens when streaming on fast inference providers

### Check Existing Issues

- [x] I have searched the existing issues and discussions.
- [x] I am using the latest version of Open WebUI.

### Installation Method

Docker

### Open WebUI Version

v0.6.16

### Ollama Version (if applicable)

_No response_

### Operating System

Windows Sequoia

### Browser (if applicable)

_No response_

### Confirmation

- [x] I have read and followed all instructions in `README.md`.
- [x] I am using the latest version of **both** Open WebUI and Ollama.
- [x] I have included the browser console logs.
- [x] I have included the Docker container logs.
- [x] I have **provided every relevant configuration, setting, and environment variable used in my setup.**
- [x] I have clearly **listed every relevant configuration, custom setting, environment variable, and command-line option that influences my setup** (such as Docker Compose overrides, .env values, browser settings, authentication configurations, etc).
- [x] I have documented **step-by-step reproduction instructions that are precise, sequential, and leave nothing to interpretation**. My steps:
- Start with the initial platform/version/OS and dependencies used,
- Specify exact install/launch/configure commands,
- List URLs visited, user input (incl. example values/emails/passwords if needed),
- Describe all options and toggles enabled or changed,
- Include any files or environmental changes,
- Identify the expected and actual result at each stage,
- Ensure any reasonably skilled user can follow and hit the same issue.


### Expected Behavior

Streaming output provides all streamed content and does not miss any parts

### Actual Behavior

Streaming output occasionally misses a stream chunk (a few characters). It is often unnoticeable and you may assume it's a model issue or an inference provider issue, however, I have validated the issue occurs with multiple models from multiple inference providers. 


### Steps to Reproduce

1. Overcome issue https://github.com/open-webui/open-webui/issues/15848 by monkeypatching backend/open_webui/utils/middleware.py replacing https://github.com/open-webui/open-webui/blob/2470da833679f61619f2275862185259fe7f5168/backend/open_webui/utils/middleware.py#L2042 with `log.debug(f"Error: {e}")` (this enables debug logging to print streaming errors properly)
2. Run latest with `docker run -d --name openwebui -p 3000:8080 -e GLOBAL_LOG_LEVEL=debug -v /path/to/monkeypatched_middleware.py:/app/backend/open_webui/utils/middleware.py -v openwebui-data:/app/backend/data --restart unless-stopped ghcr.io/open-webui/open-webui:latest`
3. Configure with any model like Cerebras qwen-3-235b-a22b or OpenAI gpt-4o-mini
4. Run a prompt like 'print a bunch of stuff'
5. Check the logs for errors like those indicated below
6. If you don't see the error then rerun the prompt a few times or another prompt that outputs a bunch of tokens and it'll show up

**NOTE: I believe this happens more frequently on faster streaming models like OpenAI gpt-4o-mini or Cerebras qwen-3-235b-a22b . **

### Logs & Screenshots

```
2025-07-18 20:35:40.116 | DEBUG    | open_webui.utils.middleware:stream_body_handler:2058 - Error: Unterminated string starting at: line 1 column 139 (char 138) - {}
2025-07-18 20:35:40.137 | DEBUG    | open_webui.utils.middleware:stream_body_handler:2058 - Error: Unterminated string starting at: line 1 column 172 (char 171) - {}
2025-07-18 20:35:40.268 | DEBUG    | open_webui.utils.middleware:stream_body_handler:2058 - Error: Expecting ':' delimiter: line 1 column 171 (char 170) - {}
2025-07-18 20:35:40.306 | DEBUG    | open_webui.utils.middleware:stream_body_handler:2058 - Error: Unterminated string starting at: line 1 column 7 (char 6) - {}
2025-07-18 20:35:40.325 | DEBUG    | open_webui.utils.middleware:stream_body_handler:2058 - Error: Unterminated string starting at: line 1 column 173 (char 172) - {}
```

### Additional Information

I have investigated this at length. If you add do some debugging after line 1786 you'll find that 1. every so often a valid JSON event will be chunked into two `line` iterations, the first will contain some of the data including the beginning of the JSON string, and the second will contain the remaining part of the JSON string, and 2. each `line` does not contain line endings. 

I dug around and found https://github.com/open-webui/open-webui/blob/2470da833679f61619f2275862185259fe7f5168/backend/open_webui/routers/openai.py#L865 which seems to use aiohttp.ClientSession, and then I try to follow that through and get a bit confused. 

I don't know whether the correct solution is to do buffering in openwebui's middleware.py where it's processing the lines (which won't work well because the line endings are not showing up and you can only key on something like `}`), or if the solution is to do something lower level to prevent SSE JSON lines from ever being fragmented in the first place...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

issue: missing tokens when streaming on fast inference providers #15850

Check Existing Issues

Installation Method

Open WebUI Version

Ollama Version (if applicable)

Operating System

Browser (if applicable)

Confirmation

Expected Behavior

Actual Behavior

Steps to Reproduce

Logs & Screenshots

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

issue: missing tokens when streaming on fast inference providers #15850

Description

Check Existing Issues

Installation Method

Open WebUI Version

Ollama Version (if applicable)

Operating System

Browser (if applicable)

Confirmation

Expected Behavior

Actual Behavior

Steps to Reproduce

Logs & Screenshots

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions