-
Notifications
You must be signed in to change notification settings - Fork 1.6k
"/Pages" might be undefined #2516
Copy link
Copy link
Closed
Labels
PdfReaderThe PdfReader component is affectedThe PdfReader component is affectedkey-errorCould be a bug, but also a robustness issueCould be a bug, but also a robustness issue
Description
I am trying to get the pages of a PDF file which does not have the /Pages object, but where we have got
self.root_object == {'/Type': '/Pages', '/Kids': [IndirectObject(3, 0, 140441093721392)], '/Count': 1}and
catalog["/Kids"][0].get_object().get_data() == b'q 595.20 0 0 840.96 0.00 0.00 cm 1 g /Im1 Do Q\r'This does not seem to be properly supported at the moment.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.3.4
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==latest Git main, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=8.4.0Code + PDF
This is a minimal, complete example that shows the issue:
from pypdf import PdfReader, PdfWriter
reader = PdfReader('file.pdf')
for page in reader.pages:
print(page)I cannot disclose the PDF file here as the referenced image holds sensitive information, but I hope that it is indeed possible to generate a basic example PDF file from the provided values above.
Traceback
This is the complete traceback I see:
Traceback (most recent call last):
File "run.py", line 5, in <module>
for page in reader.pages:
File "/home/stefan/temp/pypdf/pypdf/_page.py", line 2275, in __iter__
for i in range(len(self)):
File "/home/stefan/pypdf/pypdf/_page.py", line 2206, in __len__
return self.length_function()
File "/home/stefan/pypdf/pypdf/_reader.py", line 463, in _get_num_pages
self._flatten()
File "/home/stefan/pypdf/pypdf/_reader.py", line 1146, in _flatten
pages = cast(DictionaryObject, catalog["/Pages"].get_object())
File "/home/stefan/pypdf/pypdf/generic/_data_structures.py", line 385, in __getitem__
return dict.__getitem__(self, key).get_object()
KeyError: '/Pages'Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PdfReaderThe PdfReader component is affectedThe PdfReader component is affectedkey-errorCould be a bug, but also a robustness issueCould be a bug, but also a robustness issue