Merging one of the pdfs is raising a ValueError. This happens since the release of PyPDF2-2..10.6.
The pdfs that are merged are owasp dependency check reports. Only one of them causes the exception. Note: this also happens when I only merge this specific PDF and not the other, working, PDFs.
Note: this is my first issue raised for PyPDF2. I've tried to include all the information required but am not familiar with the process. If anything's missing or invalid, let me know.
Environment
Python 3.9
Code:
from PyPDF2 import PdfFileMerger
if __name__ == '__main__':
pdf_file = 'path-to-file-that-exists.pdf'
merger.append(pdf_file)
Code in PyPDF2 where it breaks:
@staticmethod
def unnumber(sin: str) -> str:
i = sin.find("#")
while i >= 0:
sin = sin[:i] + chr(int(sin[i + 1 : i + 3], 16)) + sin[i + 3 :]
i = sin.find("#")
return sin
Value of sin at moment of exception:
'/ñjÔª\x0cÎ\x87´³°#J#86#fe#2a#b2jYJ#94'
Exception raised:
PyPDF2.errors.PdfReadError: ValueError("invalid literal for int() with base 16: 'J#'")
Stacktrace:
Traceback (most recent call last):
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 266, in read_from_stream
key = read_object(stream, pdf)
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 824, in read_object
return NameObject.read_from_stream(stream, pdf)
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_base.py", line 444, in read_from_stream
ret = NameObject.unnumber(ret)
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_base.py", line 427, in unnumber
sin = sin[:i] + chr(int(sin[i + 1 : i + 3], 16)) + sin[i + 3 :]
ValueError: invalid literal for int() with base 16: 'J#'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\MG78YH\PycharmProjects\pythonProject\main.py", line 37, in <module>
merge_files(args.depcheck, 'depcheck')
File "C:\Users\MG78YH\PycharmProjects\pythonProject\main.py", line 27, in merge_files
merger.append(pdf_file)
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_utils.py", line 389, in wrapper
return func(*args, **kwargs)
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_merger.py", line 283, in append
self.merge(len(self.pages), fileobj, outline_item, pages, import_outline)
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_utils.py", line 389, in wrapper
return func(*args, **kwargs)
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_merger.py", line 184, in merge
outline = reader.outline
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_reader.py", line 696, in outline
return self._get_outline()
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_reader.py", line 725, in _get_outline
self._namedDests = self._get_named_destinations()
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_reader.py", line 643, in _get_named_destinations
tree = cast(TreeObject, catalog[CA.DESTS])
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 149, in __getitem__
return dict.__getitem__(self, key).get_object()
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_base.py", line 163, in get_object
obj = self.pdf.get_object(self)
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\_reader.py", line 1179, in get_object
retval = read_object(self.stream, self) # type: ignore
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 831, in read_object
return DictionaryObject.read_from_stream(stream, pdf, forced_encoding)
File "C:\Users\MG78YH\PycharmProjects\pythonProject\venv\lib\site-packages\PyPDF2\generic\_data_structures.py", line 272, in read_from_stream
raise PdfReadError(exc.__repr__())
PyPDF2.errors.PdfReadError: ValueError("invalid literal for int() with base 16: 'J#'")
Pdf:
Unfortunately I'm not allowed to share the file (yet). I've tried to include as much information as possible to reproduce this. Please let me know if it's enough, otherwise I can check again with my manager if I can share some redacted version of the pdf that causes the exception to be raised.
Merging one of the pdfs is raising a ValueError. This happens since the release of PyPDF2-2..10.6.
The pdfs that are merged are owasp dependency check reports. Only one of them causes the exception. Note: this also happens when I only merge this specific PDF and not the other, working, PDFs.
Note: this is my first issue raised for PyPDF2. I've tried to include all the information required but am not familiar with the process. If anything's missing or invalid, let me know.
Environment
Python 3.9
Code:
Code in PyPDF2 where it breaks:
Value of sin at moment of exception:
'/ñjÔª\x0cÎ\x87´³°#J#86#fe#2a#b2jYJ#94'Exception raised:
Stacktrace:
Pdf:
Unfortunately I'm not allowed to share the file (yet). I've tried to include as much information as possible to reproduce this. Please let me know if it's enough, otherwise I can check again with my manager if I can share some redacted version of the pdf that causes the exception to be raised.