Skip to content

[PyPDF2-2.10.6] ValueError: invalid literal for int() with base 16 #1340

@sietzeberends

Description

@sietzeberends

Merging one of the pdfs is raising a ValueError. This happens since the release of PyPDF2-2..10.6.

The pdfs that are merged are owasp dependency check reports. Only one of them causes the exception. Note: this also happens when I only merge this specific PDF and not the other, working, PDFs.

Note: this is my first issue raised for PyPDF2. I've tried to include all the information required but am not familiar with the process. If anything's missing or invalid, let me know.

Environment
Python 3.9

Code:

from PyPDF2 import PdfFileMerger
if __name__ == '__main__':
    pdf_file = 'path-to-file-that-exists.pdf'
    merger.append(pdf_file)

Code in PyPDF2 where it breaks:

@staticmethod
    def unnumber(sin: str) -> str:
        i = sin.find("#")
        while i >= 0:
            sin = sin[:i] + chr(int(sin[i + 1 : i + 3], 16)) + sin[i + 3 :]
            i = sin.find("#")
        return sin

Value of sin at moment of exception:
'/ñjÔª\x0cÎ\x87´³°#J#86#fe#2a#b2jYJ#94'

Exception raised:

PyPDF2.errors.PdfReadError: ValueError("invalid literal for int() with base 16: 'J#'")

Stacktrace:

Traceback (most recent call last):
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 266, in read_from_stream
    key = read_object(stream, pdf)
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 824, in read_object
    return NameObject.read_from_stream(stream, pdf)
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_base.py", line 444, in read_from_stream
    ret = NameObject.unnumber(ret)
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_base.py", line 427, in unnumber
    sin = sin[:i] + chr(int(sin[i + 1 : i + 3], 16)) + sin[i + 3 :]
ValueError: invalid literal for int() with base 16: 'J#'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\main.py", line 37, in <module>
    merge_files(args.depcheck, 'depcheck')
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\main.py", line 27, in merge_files
    merger.append(pdf_file)
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_utils.py", line 389, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_merger.py", line 283, in append
    self.merge(len(self.pages), fileobj, outline_item, pages, import_outline)
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_utils.py", line 389, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_merger.py", line 184, in merge
    outline = reader.outline
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_reader.py", line 696, in outline
    return self._get_outline()
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_reader.py", line 725, in _get_outline
    self._namedDests = self._get_named_destinations()
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_reader.py", line 643, in _get_named_destinations
    tree = cast(TreeObject, catalog[CA.DESTS])
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 149, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_base.py", line 163, in get_object
    obj = self.pdf.get_object(self)
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_reader.py", line 1179, in get_object
    retval = read_object(self.stream, self)  # type: ignore
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 831, in read_object
    return DictionaryObject.read_from_stream(stream, pdf, forced_encoding)
  File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 272, in read_from_stream
    raise PdfReadError(exc.__repr__())
PyPDF2.errors.PdfReadError: ValueError("invalid literal for int() with base 16: 'J#'")

Pdf:
Unfortunately I'm not allowed to share the file (yet). I've tried to include as much information as possible to reproduce this. Please let me know if it's enough, otherwise I can check again with my manager if I can share some redacted version of the pdf that causes the exception to be raised.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PdfMergerThe PdfMerger component is affected

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions