Skip to content

Unspecific type hints for reader.metadata #1222

@adamchainz

Description

@adamchainz

Take the below example file:

from PyPDF2 import PdfReader

with open("example.pdf", "rb") as fp:
    reader = PdfReader(fp)
    metadata = reader.metadata
    assert metadata is not None
    date_str = metadata["/CreationDate"]
    date_str = date_str.removeprefix("D:").replace("'", "")
    print(date_str)

It runs fine:

$ python example.py
20220415093243+0200

but Mypy complains about using remove_prefix() on date_str:

$ mypy example.py
example.py:8: error: "PdfObject" has no attribute "removeprefix"  [attr-defined]
Found 1 error in 1 file (checked 1 source file)

This is due to DocumentInformation being a subclass of DictionaryObject, and thus only guaranteeing that the values returned are PdfObjects. In practice they seem to only be TextStringObjects, which subclass str. If they're always TextStringObjects, the types in DocumentInformation should be adjusted accordingly.

Environment

$ python -m platform
macOS-12.5-arm64-arm-64bit

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.9.0

Code + PDF

above, used metadata.pdf from PyPDF2 resources

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-maintenanceAnything that is just internal: Simplifying code, syntax changes, updating docs, speed improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions