Skip to content

Cannot interpret colorspace NullObject #2061

@stefan6419846

Description

@stefan6419846

I am trying to extract the images from a PDF file, but receive an error for xref object 16, while MuPDF is able to extract it.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.14.21-150400.24.69-default-x86_64-with-glibc2.3.4

$ python -c "import pypdf;print(pypdf.__version__)"
3.14.0

Code + PDF

This is a minimal, complete example that shows the issue:

from pypdf import PdfReader

reader = PdfReader('ARK_HV_Batteriesystem_Datenblatt_DE_202304.pdf')
for page in reader.pages:
    for key in reader.images.keys():
        print(key)
        print(reader.images[key])

A publicly available PDF file which reproduces this issue for me is https://de.growatt.com/upload/file/ARK_HV_Batteriesystem_Datenblatt_DE_202304.pdf

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/home/stefan/pdf/pypdf/venv/lib/python3.6/site-packages/pypdf/_page.py", line 2620, in __getitem__
    return self.get_function(index)
  File "/home/stefan/pdf/pypdf/venv/lib/python3.6/site-packages/pypdf/_page.py", line 534, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
  File "/home/stefan/pdf/pypdf/venv/lib/python3.6/site-packages/pypdf/filters.py", line 957, in _xobj_to_image
    "",
  File "/home/stefan/pdf/pypdf/venv/lib/python3.6/site-packages/pypdf/filters.py", line 726, in _get_imagemode
    "can not interprete colorspace", color_space
pypdf.errors.PdfReadError: ('can not interprete colorspace', NullObject)

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions