-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
I am using the code below (https://stackoverflow.com/questions/54303318/read-all-bookmarks-from-a-pdf-document-and-create-a-dictionary-with-pagenumber-a) as a starting point and it crashes in several PDFs (see an example here: https://easyupload.io/7fsipz).
Apparently, the PDF itself has some structural errors, but pypdf is not able to ignore them.
The output:
"( ValueError: not enough values to unpack (expected 3, got 1)"
C:\Users\XXXXX\PycharmProjects\pythonProject\venv\Scripts\python.exe "C:\Google Drive\python\projects\Get bookmarks.py"
Traceback (most recent call last):
File "C:\Google Drive\python\projects\Get bookmarks.py", line 24, in
bms = bookmark_dict(reader.outline, use_labels=False)
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 844, in outline
return self._get_outline()
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 880, in _get_outline
outline_obj = self._build_outline_item(node)
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 1054, in _build_outline_item
outline_item = self._build_destination(title, dest)
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 1018, in _build_destination
return Destination(title, page, Fit(fit_type=typ, fit_args=array)) # type: ignore
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf\generic_data_structures.py", line 1495, in init
(
ValueError: not enough values to unpack (expected 3, got 2)
Process finished with exit code 1
The code (a direct use of the thread mentioned above).
from typing import Dict, Union
from pypdf import PdfReader
def bookmark_dict(
bookmark_list, use_labels: bool = False
) -> Dict[Union[str, int], str]:
result = {}
for item in bookmark_list:
if isinstance(item, list):
result.update(bookmark_dict(item))
else:
page_index = reader.get_destination_page_number(item)
page_label = reader.page_labels[page_index]
if use_labels:
result[page_label] = item.title
else:
result[page_index] = item.title
return result
if __name__ == "__main__":
folder ="x:\\"
file="TestPDF.pdf"
reader = PdfReader(folder + file)
bms = bookmark_dict(reader.outline, use_labels=False)
for page_nb, title in sorted(bms.items(), key=lambda n: f"{str(n[0]):>5}"):
print(f"{page_nb:>3}: {title}")
The PDF file that is giving me an error can be found here:
Thanks guys!