-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Trying to read pdfreader.outline crashes with KeyError.
Environment
macOS-12.7.1-x86_64-i386-64bit
pypdf==5.4.0, crypt_provider=('cryptography', '42.0.8'), PIL=10.3.0
Code + PDF
from pypdf import PdfReader
reader = PdfReader('nek_en_iec_60034-2-2/NEK_EN_IEC_60034-2-2.pdf')
print(reader.outline)
Unfortunately, I don't know if I can share the input PDF yet. I'm checking, and may create a test PDF instead if I am able to. Trying to avoid creating a test as it would probably be a lot of work.
Traceback
Traceback (most recent call last):
File "/Users/larsga/data/realta/tmp/SD-2759/tst.py", line 5, in <module>
print(reader.outline)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pypdf/_doc_common.py", line 848, in outline
return self._get_outline()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pypdf/_doc_common.py", line 874, in _get_outline
outline_obj = self._build_outline_item(node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pypdf/_doc_common.py", line 994, in _build_outline_item
dest = action[GoToActionArguments.D]
~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/pypdf/generic/_data_structures.py", line 478, in __getitem__
return dict.__getitem__(self, key).get_object()
^^^^^^^^^^^^^^^^^^^^^^^^^^^
KeyError: '/D'
Possible fix
The line that fails is line 994 in _doc_common.py, the last line here:
if "/A" in node:
# Action, PDFv1.7 Section 12.6 (only type GoTo supported)
action = cast(DictionaryObject, node["/A"])
action_type = cast(NameObject, action[GoToActionArguments.S])
if action_type == "/GoTo":
dest = action[GoToActionArguments.D]
It fails because /D isn't in action. Further down we find the following:
elif dest is None:
# outline item not required to have destination or action
# PDFv1.7 Table 153
outline_item = self._build_destination(title, dest)
So the code would handle the destination being missing, but PDF v1.7, p418, section 12.6.4.2, table 199, is very clear that the /D is required. So it looks to me like the code should be:
if "/A" in node:
# Action, PDFv1.7 Section 12.6 (only type GoTo supported)
action = cast(DictionaryObject, node["/A"])
action_type = cast(NameObject, action[GoToActionArguments.S])
if action_type == "/GoTo":
if GoToActionArguments.D in action:
dest = action[GoToActionArguments.D]
elif self.strict:
raise PdfReadError(f"Outline Action Missing /D attribute: {node!r}")
I'm happy to make a PR with this change, if desired.
When I make this change to the PyPDF source code and run the test program it prints an outline that looks the same as the one that Adobe Acrobat shows. Mac Preview does not show any outline at all (probably because it's not valid).
If I set reader.strict = True then I get this error:
{'/A': {'/S': '/GoTo'}, '/C': [0, 0, 0], '/Count': 14, '/First': IndirectObject(216, 0, 4444283264), '/Last': IndirectObject(217, 0, 4444283264), '/Parent': IndirectObject(214, 0, 4444283264), '/Title': 'NEK EN IEC 60034-2-2:2024'}
Adobe Acrobat does not display this bookmark, or any bookmarks after it.
If I had a test input file I could share I could add two tests: one demonstrating that the file can be read without problems, and a second showing that it fails with PdfReadError if strict is enabled. Waiting to see what I can do about a test file.