Skip to content

Not able to deal with errors in the bookmark structure #2236

@PAlvesLancs

Description

@PAlvesLancs

I am using the code below (https://stackoverflow.com/questions/54303318/read-all-bookmarks-from-a-pdf-document-and-create-a-dictionary-with-pagenumber-a) as a starting point and it crashes in several PDFs (see an example here: https://easyupload.io/7fsipz).

Apparently, the PDF itself has some structural errors, but pypdf is not able to ignore them.
The output:

"( ValueError: not enough values to unpack (expected 3, got 1)"
C:\Users\XXXXX\PycharmProjects\pythonProject\venv\Scripts\python.exe "C:\Google Drive\python\projects\Get bookmarks.py"
Traceback (most recent call last):
File "C:\Google Drive\python\projects\Get bookmarks.py", line 24, in
bms = bookmark_dict(reader.outline, use_labels=False)
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 844, in outline
return self._get_outline()
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 880, in _get_outline
outline_obj = self._build_outline_item(node)
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 1054, in _build_outline_item
outline_item = self._build_destination(title, dest)
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf_reader.py", line 1018, in _build_destination
return Destination(title, page, Fit(fit_type=typ, fit_args=array)) # type: ignore
File "C:\Users\XXXXX\PycharmProjects\pythonProject\venv\lib\site-packages\pypdf\generic_data_structures.py", line 1495, in init
(
ValueError: not enough values to unpack (expected 3, got 2)
Process finished with exit code 1

The code (a direct use of the thread mentioned above).

from typing import Dict, Union
from pypdf import PdfReader

def bookmark_dict(
        bookmark_list, use_labels: bool = False
) -> Dict[Union[str, int], str]:
    result = {}
    for item in bookmark_list:
        if isinstance(item, list):
            result.update(bookmark_dict(item))
        else:
            page_index = reader.get_destination_page_number(item)
            page_label = reader.page_labels[page_index]
            if use_labels:
                result[page_label] = item.title
            else:
                result[page_index] = item.title
    return result

if __name__ == "__main__":
    folder ="x:\\"
    file="TestPDF.pdf"
    reader = PdfReader(folder + file)
    bms = bookmark_dict(reader.outline, use_labels=False)
    for page_nb, title in sorted(bms.items(), key=lambda n: f"{str(n[0]):>5}"):
         print(f"{page_nb:>3}: {title}")

The PDF file that is giving me an error can be found here:

Thanks guys!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions