fix TextDecoder streaming to ignore empty chunks by anonrig · Pull Request #6212 · cloudflare/workerd

anonrig · 2026-02-27T21:35:10Z

If I understood correctly, this should fix #6193

ask-bonk

Fix looks correct — short-circuiting empty non-flush chunks is the right workaround for the encoding_rs bug. One suggestion to strengthen the test coverage.

src/workerd/api/tests/encoding-test.js

ask-bonk · 2026-02-27T21:37:02Z

I'm Bonk, and I've done a quick review of your PR.

Summary: Workaround for an encoding_rs bug where empty chunks during streaming mode incorrectly flush decoder state, breaking multi-byte character decoding for legacy CJK encodings.

Findings:

Low — Test coverage only covers one of the affected encodings. The issue reports failures for big5, euc-kr, and shift_jis, but the test only validates shift_jis. I posted a suggestion adding big5 and euc-kr test cases to match the reported scope.

The C++ fix itself is correct and minimal — returning an empty string for empty non-flush buffers avoids tickling the encoding_rs bug while preserving flush semantics (where an empty buffer still needs to finalize pending state).

github run

ChALkeR · 2026-03-04T02:47:38Z

@anonrig I re-tested workerd, it now passes my streaming and snapshot tests!

Nice

It also passes the heavy tests
"mistakes" test was only an extract of that listed cases that tripped some engines
The actual one is pseudo-random / heavy

ChALkeR · 2026-03-04T02:53:46Z

... it somehow manages to be at least 1.5x slower than my js impl though, on workerd
but that's something that i will investigate separately

JS:

chalker@macbook-air bytes % ./node_modules/.bin/exodus-test --engine workerd:bundle tests/encoding/multi-byte.test.js
Engine: workerd:bundle, running on ./node_modules/.pnpm/workerd@1.20260301.1/node_modules/workerd/bin/workerd
Warning: workerd:bundle engine support is incomplete
# tests/encoding/multi-byte.test.js
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > gbk
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > gb18030
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > big5
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > euc-jp
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > iso-2022-jp
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > shift_jis
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > euc-kr
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > gbk
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > gb18030
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > big5
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > euc-jp
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > iso-2022-jp
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > shift_jis
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > euc-kr
workerd/server/server.c++:5928: debug: [ TEST ] main
workerd/server/server.c++:5942: debug: [ PASS ] main (9.142784s)
All 1 test suites passed
Total time: 10.027s

Native:

chalker@macbook-air bytes % ./node_modules/.bin/exodus-test --engine workerd:bundle tests/encoding/multi-byte.test.js
Engine: workerd:bundle, running on ./node_modules/.pnpm/workerd@1.20260301.1/node_modules/workerd/bin/workerd
Warning: workerd:bundle engine support is incomplete
# tests/encoding/multi-byte.test.js
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > gbk
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > gb18030
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > big5
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > euc-jp
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > iso-2022-jp
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > shift_jis
✔ PASS legacy multi-byte encodings snapshot tests > Fresh instance > euc-kr
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > gbk
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > gb18030
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > big5
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > euc-jp
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > iso-2022-jp
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > shift_jis
✔ PASS legacy multi-byte encodings snapshot tests > Reuse instance > euc-kr
workerd/server/server.c++:5928: debug: [ TEST ] main
workerd/server/server.c++:5942: debug: [ PASS ] main (14.529759s)
All 1 test suites passed
Total time: 15.559s

That includes the whole setup/snapshot logic etc, so it's actually more than 1.5x

ChALkeR · 2026-03-04T02:57:56Z

Performance is likely not that significant for legacy encodings though
What's most important is that the results are now consistent with the spec!
This is very good

fix TextDecoder streaming to ignore empty chunks

fadde35

anonrig requested a review from jasnell February 27, 2026 21:35

anonrig requested review from a team as code owners February 27, 2026 21:35

ask-bonk bot reviewed Feb 27, 2026

View reviewed changes

src/workerd/api/tests/encoding-test.js Show resolved Hide resolved

anonrig enabled auto-merge February 27, 2026 21:41

jasnell approved these changes Feb 27, 2026

View reviewed changes

anonrig merged commit c8cd8f1 into main Feb 27, 2026
29 checks passed

anonrig deleted the yagiz/fix-text-decoder-empty branch February 27, 2026 22:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix TextDecoder streaming to ignore empty chunks#6212

fix TextDecoder streaming to ignore empty chunks#6212
anonrig merged 1 commit intomainfrom
yagiz/fix-text-decoder-empty

anonrig commented Feb 27, 2026

Uh oh!

ask-bonk bot left a comment

Uh oh!

Uh oh!

ask-bonk bot commented Feb 27, 2026

Uh oh!

Uh oh!

ChALkeR commented Mar 4, 2026 •

edited

Loading

Uh oh!

ChALkeR commented Mar 4, 2026

Uh oh!

ChALkeR commented Mar 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anonrig commented Feb 27, 2026

Uh oh!

ask-bonk bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ask-bonk bot commented Feb 27, 2026

Uh oh!

Uh oh!

ChALkeR commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ChALkeR commented Mar 4, 2026

Uh oh!

ChALkeR commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChALkeR commented Mar 4, 2026 •

edited

Loading

ChALkeR commented Mar 4, 2026 •

edited

Loading