Skip to content

UTF-8 named destinations missing #3259

@sim642

Description

@sim642

UTF-8 named destinations are missing because instead of being TextStringObjects, they are ByteStringObjects and thus silently ignored when constructing named_destinations:

pypdf/pypdf/_doc_common.py

Lines 498 to 501 in b185ab3

key = cast(str, names[i].get_object())
i += 1
if not isinstance(key, str):
continue

This seems to be similar to #2929 and #2930.

I don't really care whether such UTF-8 destination names are decoded to str or just kept as bytes. I just want them to not be missing from named_destinations. Currently they are skipped and it's literally impossible to look them up.
In particular, it's fine to be bytes because the /D field also has bytes. The content is irrelevant, but it should be possible to just look up the key without caring. Right now it appears via pypdf as if the PDF is broken, when it's not.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-6.8.0-55-generic-x86_64-with-glibc2.39


$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.3.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none

Code + PDF

This iterates over hyperlinks in the PDF and looks up their named destinations:

from pypdf import PdfReader

reader = PdfReader("pypdf_issue.pdf")

for page in reader.pages:
    for annotation in page.annotations:
        if annotation["/Subtype"] == "/Link":
            action = annotation["/A"]
            action_type = action["/S"]
            if action_type == "/GoTo":
                named_destination = action["/D"]
                print(reader.named_destinations[named_destination])   # KeyError: b'cite.dac\xc3\xadk2025racerflightweightstaticdata'

The problem is e.g. with this PDF: pypdf_issue.pdf.
The link works in PDF viewers.

Traceback

This is the complete traceback I see:

  File "/home/simmo/dev/pdfflow/pdfflow/outline.py", line 21, in from_named_destination
    named_destination: Destination = reader.named_destinations[name]
                                     ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
KeyError: b'cite.0@dac\xc3\xadk2025racerflightweightstaticdata'

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions