Relax rectangular check in FlateDecode._decode_png_prediction

In https://github.com/py-pdf/pypdf/blob/b7ae2e5f8406bbfe2f50cc1ee425fa636619fe54/pypdf/filters.py#L182-L183 we check for strictly rectangular image data. While in an ideal world, this would be great, I regularly see PDF files which violate this.

I therefore would like to relax this to only issue a warning and do the necessary padding on our side:

```python
            missing_bytes = b"\x00" * (rowlength - len(data) % rowlength)
            data += missing_bytes
```

## Environment

Which environment were you using when you encountered the problem?

```bash
$ python -m platform
Linux-6.4.0-150600.23.42-default-x86_64-with-glibc2.38

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.4.0, crypt_provider=('cryptography', '44.0.0'), PIL=11.1.0
```

## Code + PDF

This is a minimal, complete example that shows the issue:

```python
from pypdf import PdfReader

reader = PdfReader('file.pdf')
for page in reader.pages:
    for name, image in page.images.items():
        print(name)
```

I currently do not have an example I could share publicly.

## Traceback

This is the complete traceback I see:

```
Traceback (most recent call last):
  File "/home/stefan/tmp/pypdf/run.py", line 5, in <module>
    for name, image in page.images.items():
                       ^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/_page.py", line 444, in items
    return [(x, self[x]) for x in self.ids_function()]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/_page.py", line 444, in <listcomp>
    return [(x, self[x]) for x in self.ids_function()]
                ~~~~^^^
  File "/home/stefan/tmp/pypdf/pypdf/_page.py", line 464, in __getitem__
    return self.get_function(index)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/_page.py", line 657, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/filters.py", line 741, in _xobj_to_image
    data = x_object_obj.get_data()  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/generic/_data_structures.py", line 1109, in get_data
    decoded.set_data(decode_stream_data(self))
                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/filters.py", line 657, in decode_stream_data
    data = FlateDecode.decode(data, params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/filters.py", line 172, in decode
    str_data = FlateDecode._decode_png_prediction(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/stefan/tmp/pypdf/pypdf/filters.py", line 183, in _decode_png_prediction
    raise PdfReadError("Image data is not rectangular")
pypdf.errors.PdfReadError: Image data is not rectangular
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax rectangular check in FlateDecode._decode_png_prediction #3241

Environment

Code + PDF

Traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if len(data) % rowlength != 0:
	raise PdfReadError("Image data is not rectangular")

Relax rectangular check in FlateDecode._decode_png_prediction #3241

Description

Environment

Code + PDF

Traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions