Skip to content

Avoid streaming incomplete UTF-8 characters#727

Merged
cjpais merged 1 commit intomozilla-ai:mainfrom
corebonts:incomplete-utf
Mar 24, 2025
Merged

Avoid streaming incomplete UTF-8 characters#727
cjpais merged 1 commit intomozilla-ai:mainfrom
corebonts:incomplete-utf

Conversation

@corebonts
Copy link
Copy Markdown
Contributor

Some characters, like the chinese fù is sometimes returned as two tokens, as "\u00e8\u00b5" and "\u008b" in this case.

This is also depends on the model, but when it happens, for example with DeepSeek R1, we have to wait for the character to be complete and send it only then.

This resolves #722 and #646

Some characters, like the chinese fù is sometimes returned as two tokens,
as "\u00e8\u00b5" and "\u008b" in this case.

This is also depends on the model, but when it happens, for example
with DeepSeek R1, we have to wait for the character to be complete and
send it only then.

This resolves mozilla-ai#722 and mozilla-ai#646
@corebonts
Copy link
Copy Markdown
Contributor Author

corebonts commented Mar 21, 2025

@cjpais
Copy link
Copy Markdown
Collaborator

cjpais commented Mar 24, 2025

I've tested this and it works. We may want to modify the function slightly later, but for now it works from my basic testing, so pulling it in.

Thanks so much for the contribution!

@cjpais cjpais merged commit a9658c7 into mozilla-ai:main Mar 24, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Chinese coding error in Server V2

2 participants