Skip to content

Return read data instead of throwing "Unexpected EOD in RunLengthDecode/ASCIIHexDecode"? #2303

@stefan6419846

Description

@stefan6419846

I am currently experiencing some issues about Unexpected EOD in RunLengthDecode when extracting images from some PDF files. Is there any reason to use a hard exception there instead of logger_warning and returning the read data?

In a specific case, PDFBox Debugger, MuPDF and Evince are able to correctly extract the image; replacing the exception with a return value in pypdf.filters.RunLengthDecode.decode seems to produce an image which only seems to contain the wrong colors.

Environment

$ python -m platform
Linux-5.14.21-150400.24.97-default-x86_64-with-glibc2.31

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.1, crypt_provider=('pycryptodome', '3.18.0'), PIL=10.0.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

for index, page in enumerate(PdfReader('out1.pdf').pages):
    print(index, page)
    for key in page.images.keys():
        print(key)
        print(page.images[key].indirect_reference)

out1.pdf

Traceback

This is the complete traceback I see:

Traceback (most recent call last):
  File "/home/stefan/temp/run.py", line 7, in <module>
    print(page.images[key].indirect_reference)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 2704, in __getitem__
    return self.get_function(index)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 547, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 747, in _xobj_to_image
    data = x_object_obj.get_data()  # type: ignore
  File "/home/stefan/pdf/venv/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 969, in get_data
    decoded.set_data(b_(decode_stream_data(self)))
  File "/home/stefan/pdf/venv/lib/python3.9/site-packages/pypdf/filters.py", line 686, in decode_stream_data
    data = RunLengthDecode.decode(data)
  File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 343, in decode
    raise PdfStreamError("Unexpected EOD in RunLengthDecode")
pypdf.errors.PdfStreamError: Unexpected EOD in RunLengthDecode

Metadata

Metadata

Assignees

No one assigned

    Labels

    Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsworkflow-imagesFrom a users perspective, image handling is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions