Skip to content

[Bug]: Definite String Decoder Requests Unlimited Bytes After UTF-8 Chunk Boundary #264

@tylzh97

Description

@tylzh97

Things to check first

  • I have searched the existing issues and didn't find my bug already reported there

  • I have checked that my bug is still present in the latest release

cbor2 version

5.7.0

Python version

3.10.18

What happened?

Summary

decode_definite_long_string() keeps buffer_size at the old allocation even after UTF-8 leftovers are consumed. The next loop iteration computes chunk_length = 65536 - buffer_size, which drops below zero and gets passed straight to read(-1).
Any stream decoder can be driven into an unlimited read followed by CBORDecodeEOF.

How can we reproduce the bug?

poc

import io
import cbor2
import _cbor2

class LoggingReader(io.BytesIO):
    def read(self, n=-1):
        print(f"read({n})")
        return super().read(n)

payload = "a"*65535 + "€" + "b"*65533 + "€" + "d"*100

decoder = _cbor2.CBORDecoder(LoggingReader(cbor2.dumps(payload)))
decoder.decode()

Traceback

read(1)
read(4)
read(65536)
read(65536)
read(-1)
Traceback (most recent call last):
  ...
_cbor2.CBORDecodeEOF: premature end of stream (expected to read -1 bytes, got 102 instead)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions