-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Broken image extraction if no filters and CMYK colorspace #2522
Copy link
Copy link
Closed
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-imagesFrom a users perspective, image handling is the affected feature/workflowFrom a users perspective, image handling is the affected feature/workflow
Description
Image extraction is broken when isinstance(lfilters, NullObject) and mode == "CMYK" in
Lines 818 to 826 in 0106904
| else: | |
| if mode == "": | |
| raise PdfReadError(f"ColorSpace field not found in {x_object_obj}") | |
| img, image_format, extension, invert_color = ( | |
| Image.frombytes(mode, size, data), | |
| "PNG", | |
| ".png", | |
| False, | |
| ) |
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.1.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=10.2.0Code + PDF
This is a minimal, complete example that shows the issue:
from pypdf import PdfReader
reader = PdfReader('file.pdf')
for page in reader.pages:
print(page)
for key in page.images.keys():
print(key)
print(page.images[key])An anonymized version of the file is out3.pdf.
Traceback
This is the complete traceback I see:
Traceback (most recent call last):
File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1279, in _save
rawmode, mode = _OUTMODES[mode]
KeyError: 'CMYK'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 876, in _xobj_to_image
img.save(img_byte_arr, format=image_format)
File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/Image.py", line 2439, in save
save_handler(self, fp, filename)
File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1282, in _save
raise OSError(msg) from e
OSError: cannot write mode CMYK as PNG
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1279, in _save
rawmode, mode = _OUTMODES[mode]
KeyError: 'CMYK'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/stefan/tmp/run.py", line 9, in <module>
print(page.images[key])
File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 2420, in __getitem__
return self.get_function(index)
File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/_page.py", line 501, in _get_image
imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
File "/home/stefan/tmp/venv/lib/python3.9/site-packages/pypdf/filters.py", line 880, in _xobj_to_image
img.save(img_byte_arr, format=image_format)
File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/Image.py", line 2439, in save
save_handler(self, fp, filename)
File "/home/stefan/tmp/venv/lib64/python3.9/site-packages/PIL/PngImagePlugin.py", line 1282, in _save
raise OSError(msg) from e
OSError: cannot write mode CMYK as PNG
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFworkflow-imagesFrom a users perspective, image handling is the affected feature/workflowFrom a users perspective, image handling is the affected feature/workflow