[Bugfix] Fix bench_serve UTF-8 decode crash on split multi-byte chars by he-yufeng · Pull Request #38732 · vllm-project/vllm

he-yufeng · 2026-04-01T15:47:03Z

Summary

StreamedResponseHandler.add_chunk() calls chunk_bytes.decode("utf-8") directly, which crashes when a multi-byte UTF-8 character (e.g. Chinese) is split across HTTP chunk boundaries:

UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 65527-65528: unexpected end of data

The reporter hit this at ~0.4% rate (4/1000 requests) during bench_serve with H100x8.

Replace the one-shot decode with codecs.IncrementalDecoder which buffers incomplete byte sequences across add_chunk() calls and decodes them once the next chunk completes the character. This is the standard Python approach for streaming byte-to-str conversion.

Fixes #38717

Test plan

ruff check and ruff format --check pass
Verified the crash traceback matches endpoint_request_func.py:32
IncrementalDecoder correctly handles split multi-byte sequences (tested locally with artificially split bytes of Chinese characters)

When streaming responses containing multi-byte UTF-8 characters (e.g. Chinese text), HTTP chunk boundaries can split a character across two chunks. The direct bytes.decode("utf-8") call crashes: UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 65527-65528: unexpected end of data Replace the one-shot decode with codecs.IncrementalDecoder which buffers incomplete byte sequences across add_chunk() calls and returns them once the next chunk completes the character. Fixes vllm-project#38717

gemini-code-assist

Code Review

This pull request replaces direct UTF-8 decoding with an incremental decoder in the StreamedResponseHandler to correctly handle multi-byte characters split across chunks. However, the reviewer pointed out a potential data loss issue because the decoder is never finalized; a mechanism to flush the remaining buffer at the end of the stream is required.

gemini-code-assist · 2026-04-01T15:51:04Z

        """Add a chunk of bytes to the buffer and return any complete
        messages."""
-        chunk_str = chunk_bytes.decode("utf-8")
+        chunk_str = self._decoder.decode(chunk_bytes)


The IncrementalDecoder can buffer incomplete byte sequences. If the stream ends with such an incomplete sequence, it will remain in the decoder's buffer and will be lost because there is no final call to flush the decoder. This can lead to data loss.

To fix this, you should add a mechanism to finalize the decoding process after the last chunk has been processed. This typically involves calling self._decoder.decode(b'', final=True) to flush any buffered data.

This would likely require adding a new method to StreamedResponseHandler, for example finalize(), and calling it from the request handling functions (e.g., async_request_openai_completions) after the streaming loop is complete.

Example of a finalize method:

def finalize(self) -> list[str]: """Flushes the decoder and processes any remaining buffered data.""" final_chunk_str = self._decoder.decode(b'', final=True) if not final_chunk_str: return [] self.buffer += final_chunk_str # It's best to refactor the message processing logic from add_chunk # into a private helper method to be reused here. messages = self._process_buffer() if self.buffer: # Handle or log any remaining incomplete message in the buffer # after final processing. pass return messages

The call sites in async_request_openai_completions, async_request_openai_chat_completions, and async_request_openai_audio would need to be updated to call this finalize method.

Actually @he-yufeng this looks like a valid point. Should we flush the incremental decoder?

njhill

Thanks @he-yufeng!

he-yufeng · 2026-04-13T07:11:05Z

ping — bench_serve crashes on CJK output, the fix is straightforward (IncrementalDecoder).

ywang96 · 2026-04-16T19:01:10Z

I think it's okay to have this in - merging

…vllm-project#38732)

…vllm-project#38732) Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

gemini-code-assist Bot reviewed Apr 1, 2026

View reviewed changes

mergify Bot added performance Performance-related issues bug Something isn't working labels Apr 1, 2026

njhill approved these changes Apr 1, 2026

View reviewed changes

ZJY0516 requested a review from DarkLight1337 April 1, 2026 17:10

ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 16, 2026

ywang96 merged commit de111f3 into vllm-project:main Apr 16, 2026
9 of 16 checks passed

bnellnm pushed a commit to neuralmagic/vllm that referenced this pull request Apr 20, 2026

[Bugfix] Fix bench_serve UTF-8 decode crash on split multi-byte chars (…

5a56261

…vllm-project#38732)

baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Apr 23, 2026

[Bugfix] Fix bench_serve UTF-8 decode crash on split multi-byte chars (…

0794d14

…vllm-project#38732)

whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026

[Bugfix] Fix bench_serve UTF-8 decode crash on split multi-byte chars (…

16fbd6d

…vllm-project#38732)

avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026

[Bugfix] Fix bench_serve UTF-8 decode crash on split multi-byte chars (…

6737c1b

…vllm-project#38732) Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Fix bench_serve UTF-8 decode crash on split multi-byte chars#38732

[Bugfix] Fix bench_serve UTF-8 decode crash on split multi-byte chars#38732
ywang96 merged 1 commit intovllm-project:mainfrom
he-yufeng:fix/bench-utf8-split

he-yufeng commented Apr 1, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 1, 2026

Uh oh!

njhill Apr 1, 2026

Uh oh!

njhill left a comment

Uh oh!

he-yufeng commented Apr 13, 2026

Uh oh!

ywang96 commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

he-yufeng commented Apr 1, 2026

Summary

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

njhill Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

njhill left a comment

Choose a reason for hiding this comment

Uh oh!

he-yufeng commented Apr 13, 2026

Uh oh!

ywang96 commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants