-
Notifications
You must be signed in to change notification settings - Fork 1.6k
TypeError handling incorrect xref size: '<' not supported between instances of 'int' and 'NameObject' #2575
Description
Our system received a PDF from a user in which the xref size reported in the PDF was 1 greater than the actual number of xref entries.
The resulting error was originally reported here, but the root cause was unknown at the time. This issue has steps to reproduce with a sample file based on a PDF from the sample-files sub-module.
I am not certain that any change is necessary due to the helpful WARNING log output: entry 14 in Xref table invalid; object not found. Re-opening the conversation here just in case the issue is worth another look now that it can be reproduced.
Environment
This does not appear to be environment specific.
- We observed in our application with pypdf version 3.17.0
- I have reproduced on the latest pypdf main / 4.1.0
Both on Python 3.12.1
Code + PDF
This is a minimal, complete example that shows the issue using the attached PDF:
from pypdf import PdfReader
PdfReader('./sample-files/002-trivial-libre-office-writer/002-trivial-libre-office-writer-broken.pdf')002-trivial-libre-office-writer-broken.pdf
The PDF file attached is a slightly modified version of the sample-files 002-trivial-libre-office-writer.pdf such that the xref size is 1 greater than the number of xref entries.
Original sample-file xref begins:
xref
0 14
Attached "broken" xref begins:
xref
0 15
(I also took a couple liberties with whitespace bytes in the trailer to ensure the error matched the original report, but that is not necessary.)
I am not able to share the original user's PDF, and I do not have information about which tool was used to create or modify it before it was uploaded to our system.
Traceback
This is the complete traceback I see:
entry 14 in Xref table invalid; object not found
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 127, in __init__
self.read(stream)
File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 550, in read
self._read_xref_tables_and_trailers(stream, startxref, xref_issue_nr)
File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 781, in _read_xref_tables_and_trailers
startxref = self._read_xref(stream)
^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 817, in _read_xref
self._read_standard_xref_table(stream)
File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 683, in _read_standard_xref_table
while cnt < size:
^^^^^^^^^^
TypeError: '<' not supported between instances of 'int' and 'NameObject'