Skip to content

Relax rectangular check in FlateDecode._decode_png_prediction #3241

@stefan6419846

Description

@stefan6419846

In

pypdf/pypdf/filters.py

Lines 182 to 183 in b7ae2e5

if len(data) % rowlength != 0:
raise PdfReadError("Image data is not rectangular")
we check for strictly rectangular image data. While in an ideal world, this would be great, I regularly see PDF files which violate this.

I therefore would like to relax this to only issue a warning and do the necessary padding on our side:

            missing_bytes = b"\x00" * (rowlength - len(data) % rowlength)
            data += missing_bytes

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.4.0-150600.23.42-default-x86_64-with-glibc2.38

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.4.0, crypt_provider=('cryptography', '44.0.0'), PIL=11.1.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

reader = PdfReader('file.pdf')
for page in reader.pages:
    for name, image in page.images.items():
        print(name)

I currently do not have an example I could share publicly.

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/stefan/tmp/pypdf/run.py", line 5, in <module>
    for name, image in page.images.items():
                       ^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/_page.py", line 444, in items
    return [(x, self[x]) for x in self.ids_function()]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/_page.py", line 444, in <listcomp>
    return [(x, self[x]) for x in self.ids_function()]
                ~~~~^^^
  File "/home/stefan/tmp/pypdf/pypdf/_page.py", line 464, in __getitem__
    return self.get_function(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/_page.py", line 657, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/filters.py", line 741, in _xobj_to_image
    data = x_object_obj.get_data()  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/generic/_data_structures.py", line 1109, in get_data
    decoded.set_data(decode_stream_data(self))
                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/filters.py", line 657, in decode_stream_data
    data = FlateDecode.decode(data, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/filters.py", line 172, in decode
    str_data = FlateDecode._decode_png_prediction(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/filters.py", line 183, in _decode_png_prediction
    raise PdfReadError("Image data is not rectangular")
pypdf.errors.PdfReadError: Image data is not rectangular

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-robustness-issueFrom a users perspective, this is about robustnessworkflow-imagesFrom a users perspective, image handling is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions