Skip to content

Unexpected xml.parsers.expat.ExpatError on malformed PDF #585

@Google-Autofuzz

Description

@Google-Autofuzz

When running the following code with the latest pypi version of PyPDF2 on the attached input results in an unexpected xml.parsers.expat.ExpatError:

MCVE: Code + PDF

Example document: test.pdf

from PyPDF2 import PdfReader

reader = PdfReader("test.pdf")
reader.xmp_metadata

Traceback

Traceback (most recent call last):
  File "foo.py", line 5, in <module>
    reader.xmp_metadata
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 318, in xmp_metadata
    return self.trailer[TK.ROOT].xmp_metadata  # type: ignore
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 671, in xmp_metadata
    metadata = XmpInformation(metadata)
  File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/xmp.py", line 206, in __init__
    doc_root: Document = parseString(self.stream.get_data())
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/xml/dom/minidom.py", line 1968, in parseString
    return expatbuilder.parseString(string)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/xml/dom/expatbuilder.py", line 925, in parseString
    return builder.parseString(string)
  File "/home/moose/.pyenv/versions/3.6.15/lib/python3.6/xml/dom/expatbuilder.py", line 223, in parseString
    parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line 53, column 15

Environment

$ python -c "import PyPDF2; print(PyPDF2.__version__)"
2.3.1-dev

Metadata

Metadata

Assignees

No one assigned

    Labels

    Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions