-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
UTF-8 named destinations are missing because instead of being TextStringObjects, they are ByteStringObjects and thus silently ignored when constructing named_destinations:
Lines 498 to 501 in b185ab3
| key = cast(str, names[i].get_object()) | |
| i += 1 | |
| if not isinstance(key, str): | |
| continue |
This seems to be similar to #2929 and #2930.
I don't really care whether such UTF-8 destination names are decoded to str or just kept as bytes. I just want them to not be missing from named_destinations. Currently they are skipped and it's literally impossible to look them up.
In particular, it's fine to be bytes because the /D field also has bytes. The content is irrelevant, but it should be possible to just look up the key without caring. Right now it appears via pypdf as if the PDF is broken, when it's not.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
Linux-6.8.0-55-generic-x86_64-with-glibc2.39
$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.3.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=noneCode + PDF
This iterates over hyperlinks in the PDF and looks up their named destinations:
from pypdf import PdfReader
reader = PdfReader("pypdf_issue.pdf")
for page in reader.pages:
for annotation in page.annotations:
if annotation["/Subtype"] == "/Link":
action = annotation["/A"]
action_type = action["/S"]
if action_type == "/GoTo":
named_destination = action["/D"]
print(reader.named_destinations[named_destination]) # KeyError: b'cite.dac\xc3\xadk2025racerflightweightstaticdata'The problem is e.g. with this PDF: pypdf_issue.pdf.
The link works in PDF viewers.
Traceback
This is the complete traceback I see:
File "/home/simmo/dev/pdfflow/pdfflow/outline.py", line 21, in from_named_destination
named_destination: Destination = reader.named_destinations[name]
~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
KeyError: b'cite.0@dac\xc3\xadk2025racerflightweightstaticdata'