Skip to content

legacy multi-byte encodings in TextDecoder are wrong in streaming #6193

@ChALkeR

Description

@ChALkeR

Extracting from #6038 (comment)

Using the same tests, which can be found at https://github.com/ExodusOSS/bytes/blob/main/tests/encoding/mistakes.test.js and web-platform-tests/wpt#56892

Even with the text_decoder_cjk_decoder flag, this happens:

✖ FAIL Common implementation mistakes > stream > big5
✖ FAIL Common implementation mistakes > stream > shift_jis
✖ FAIL Common implementation mistakes > stream > euc-kr
✖ FAIL Common implementation mistakes > fatal stream > iso-2022-jp

Streaming is wrong

Which means that the result of decoding a stream depends on the shape of the stream, not just bytes of the stream

Which means that the same data with the same hash/signature could be decoded to different results depending on what underlying chunks it is split into

That should not happen

It's definitely better than before though!

Streaming mistakes

✖ FAIL Common implementation mistakes > stream > big5
✖ FAIL Common implementation mistakes > stream > shift_jis
✖ FAIL Common implementation mistakes > stream > euc-kr

As per #6091 (comment), this fails in the exact place that I warned about in #6091 (comment) / #6091 (comment)

That review comment should not have been discarded

To fix non-fatal streaming, follow the suggestion there

This an encoding_rs bug with a trivial workaround (see thread)

fatal:true streaming mistakes

✖ FAIL Common implementation mistakes > fatal stream > iso-2022-jp

That one fatal streaming error is much less significant as no one should really continue streaming after fatal mode error at all
Ref: whatwg/encoding#358

And even if the results are misaligned with the spec in fatal streaming mode, I think that no behavior there except of erroring makes sense anyway. You could fix it match the spec, but there otherwise than following the spec to the letter there is little value in that

Feel free to ignore that one and close this issue once non-fatal streaming is fixed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions