Skip to content

FlateDecode will fail if columns are an IndirectObject #2158

@stefan6419846

Description

@stefan6419846

Retrieving the images from a PDF file might fail for the FlateDecode case when decode_parms is something like

{'/Colors': 3, '/Columns': IndirectObject(33, 0, 140029965619504), '/Predictor': 15}

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.81-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.15.5, crypt_provider=('pycryptodome', '3.18.0'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader


PATH = 'file.pdf'
for index, page in enumerate(PdfReader(PATH).pages):
    print(index, page)
    for key in page.images.keys():
        print(key)
        print(page.images[key].indirect_reference)
        page.images[key].image.convert('RGB').save(key[1:] + '.png')

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/stefan/temp/run.py", line 9, in <module>
    print(page.images[key].indirect_reference)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 2637, in __getitem__
    return self.get_function(index)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 544, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 988, in _xobj_to_image
    data = x_object_obj.get_data()  # type: ignore
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 953, in get_data
    decoded.set_data(b_(decode_stream_data(self)))
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 696, in decode_stream_data
    data = FlateDecode.decode(data, params)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 166, in decode
    math.ceil(columns * colors * bits_per_component / 8) + 1
TypeError: unsupported operand type(s) for *: 'IndirectObject' and 'NumberObject'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions