Skip to content

Attempting to read document outline fails with KeyError #3268

@larsga

Description

@larsga

Trying to read pdfreader.outline crashes with KeyError.

Environment

macOS-12.7.1-x86_64-i386-64bit
pypdf==5.4.0, crypt_provider=('cryptography', '42.0.8'), PIL=10.3.0

Code + PDF

from pypdf import PdfReader

reader = PdfReader('nek_en_iec_60034-2-2/NEK_EN_IEC_60034-2-2.pdf')
print(reader.outline)

Unfortunately, I don't know if I can share the input PDF yet. I'm checking, and may create a test PDF instead if I am able to. Trying to avoid creating a test as it would probably be a lot of work.

Traceback

Traceback (most recent call last):
  File "/Users/larsga/data/realta/tmp/SD-2759/tst.py", line 5, in <module>
    print(reader.outline)
          ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pypdf/_doc_common.py", line 848, in outline
    return self._get_outline()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pypdf/_doc_common.py", line 874, in _get_outline
    outline_obj = self._build_outline_item(node)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pypdf/_doc_common.py", line 994, in _build_outline_item
    dest = action[GoToActionArguments.D]
           ~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pypdf/generic/_data_structures.py", line 478, in __getitem__
    return dict.__getitem__(self, key).get_object()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '/D'

Possible fix

The line that fails is line 994 in _doc_common.py, the last line here:

        if "/A" in node:
            # Action, PDFv1.7 Section 12.6 (only type GoTo supported)
            action = cast(DictionaryObject, node["/A"])
            action_type = cast(NameObject, action[GoToActionArguments.S])
            if action_type == "/GoTo":
                dest = action[GoToActionArguments.D]

It fails because /D isn't in action. Further down we find the following:

        elif dest is None:
            # outline item not required to have destination or action
            # PDFv1.7 Table 153
            outline_item = self._build_destination(title, dest)

So the code would handle the destination being missing, but PDF v1.7, p418, section 12.6.4.2, table 199, is very clear that the /D is required. So it looks to me like the code should be:

        if "/A" in node:
            # Action, PDFv1.7 Section 12.6 (only type GoTo supported)
            action = cast(DictionaryObject, node["/A"])
            action_type = cast(NameObject, action[GoToActionArguments.S])
            if action_type == "/GoTo":
                if GoToActionArguments.D in action:
                    dest = action[GoToActionArguments.D]
                elif self.strict:
                    raise PdfReadError(f"Outline Action Missing /D attribute: {node!r}")

I'm happy to make a PR with this change, if desired.

When I make this change to the PyPDF source code and run the test program it prints an outline that looks the same as the one that Adobe Acrobat shows. Mac Preview does not show any outline at all (probably because it's not valid).

If I set reader.strict = True then I get this error:

{'/A': {'/S': '/GoTo'}, '/C': [0, 0, 0], '/Count': 14, '/First': IndirectObject(216, 0, 4444283264), '/Last': IndirectObject(217, 0, 4444283264), '/Parent': IndirectObject(214, 0, 4444283264), '/Title': 'NEK EN IEC 60034-2-2:2024'}

Adobe Acrobat does not display this bookmark, or any bookmarks after it.

If I had a test input file I could share I could add two tests: one demonstrating that the file can be read without problems, and a second showing that it fails with PdfReadError if strict is enabled. Waiting to see what I can do about a test file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions