Skip to content

DocumentInformation.title sometimes return bytes instead of str #2929

@reformy

Description

@reformy

I am reading a PDF file from:
https://www.ms-ad-hd.com/en/ir/ir_event/event/presentation/main/01111119/teaserItems1/00/linkList/00/link/20220810Tranverse%20QA%20Summary.pdf
The "title" for this doc returns bytes instead of str, although the method should always return str.

Environment

Python 3.11

$ python -m platform
macOS-14.4.1-arm64-arm-64bit

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==4.0.1, crypt_provider=('cryptography', '41.0.7'), PIL=9.5.0

Code + PDF

import io
import pypdf
import requests

response = requests.get('https://www.ms-ad-hd.com/en/ir/ir_event/event/presentation/main/01111119/teaserItems1/00/linkList/00/link/20220810Tranverse%20QA%20Summary.pdf')
pdf_reader = pypdf.PdfReader(io.BytesIO(response.content))
print(pdf_reader.metadata.title)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requests

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions