Skip to content

Image loading errors might be dropped silently #3220

@stefan6419846

Description

@stefan6419846

I tried to load a rather large image (16708x12811), but without any notice, image.image has been None.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.4.0-150600.23.42-default-x86_64-with-glibc2.2.5

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.4.0, crypt_provider=('cryptography', '44.0.0'), PIL=11.1.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

reader = PdfReader('file.pdf')
for page in reader.pages:
    print(page)
    for name, image in page.images.items():
        print(name)
        print(image.image.width)

I cannot share the PDF file here for privacy reasons.

Traceback

This is the complete traceback I see:

AttributeError: 'NoneType' object has no attribute 'width'

Printing the exception in

pypdf/pypdf/filters.py

Lines 841 to 844 in 24b81eb

try: # temporary try/except until other fixes of images
img = Image.open(BytesIO(data))
except Exception:
img = None # type: ignore
shows why:

Image size (214046188 pixels) exceeds limit of 178956970 pixels, could be decompression bomb DOS attack.

With this approach, relevant information might get lost and thus we should at least issue a logger_warning here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-imagesFrom a users perspective, image handling is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions