-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Return read data instead of throwing "Unexpected EOD in RunLengthDecode/ASCIIHexDecode"? #2303
Copy link
Copy link
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsworkflow-imagesFrom a users perspective, image handling is the affected feature/workflowFrom a users perspective, image handling is the affected feature/workflow
Description
I am currently experiencing some issues about Unexpected EOD in RunLengthDecode when extracting images from some PDF files. Is there any reason to use a hard exception there instead of logger_warning and returning the read data?
In a specific case, PDFBox Debugger, MuPDF and Evince are able to correctly extract the image; replacing the exception with a return value in pypdf.filters.RunLengthDecode.decode seems to produce an image which only seems to contain the wrong colors.
Environment
$ python -m platform
Linux-5.14.21-150400.24.97-default-x86_64-with-glibc2.31
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.1, crypt_provider=('pycryptodome', '3.18.0'), PIL=10.0.0Code + PDF
This is a minimal, complete example that shows the issue:
from pypdf import PdfReader
for index, page in enumerate(PdfReader('out1.pdf').pages):
print(index, page)
for key in page.images.keys():
print(key)
print(page.images[key].indirect_reference)Traceback
This is the complete traceback I see:
Traceback (most recent call last):
File "/home/stefan/temp/run.py", line 7, in <module>
print(page.images[key].indirect_reference)
File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 2704, in __getitem__
return self.get_function(index)
File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 547, in _get_image
imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 747, in _xobj_to_image
data = x_object_obj.get_data() # type: ignore
File "/home/stefan/pdf/venv/lib/python3.9/site-packages/pypdf/generic/_data_structures.py", line 969, in get_data
decoded.set_data(b_(decode_stream_data(self)))
File "/home/stefan/pdf/venv/lib/python3.9/site-packages/pypdf/filters.py", line 686, in decode_stream_data
data = RunLengthDecode.decode(data)
File "/home/stefan/temp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 343, in decode
raise PdfStreamError("Unexpected EOD in RunLengthDecode")
pypdf.errors.PdfStreamError: Unexpected EOD in RunLengthDecode
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsworkflow-imagesFrom a users perspective, image handling is the affected feature/workflowFrom a users perspective, image handling is the affected feature/workflow