Skip to content

TypeError handling incorrect xref size: '<' not supported between instances of 'int' and 'NameObject' #2575

@tsclausing

Description

@tsclausing

Our system received a PDF from a user in which the xref size reported in the PDF was 1 greater than the actual number of xref entries.

The resulting error was originally reported here, but the root cause was unknown at the time. This issue has steps to reproduce with a sample file based on a PDF from the sample-files sub-module.

I am not certain that any change is necessary due to the helpful WARNING log output: entry 14 in Xref table invalid; object not found. Re-opening the conversation here just in case the issue is worth another look now that it can be reproduced.

Environment

This does not appear to be environment specific.

  • We observed in our application with pypdf version 3.17.0
  • I have reproduced on the latest pypdf main / 4.1.0

Both on Python 3.12.1

Code + PDF

This is a minimal, complete example that shows the issue using the attached PDF:

from pypdf import PdfReader

PdfReader('./sample-files/002-trivial-libre-office-writer/002-trivial-libre-office-writer-broken.pdf')

002-trivial-libre-office-writer-broken.pdf

The PDF file attached is a slightly modified version of the sample-files 002-trivial-libre-office-writer.pdf such that the xref size is 1 greater than the number of xref entries.

Original sample-file xref begins:

xref
0 14

Attached "broken" xref begins:

xref
0 15

(I also took a couple liberties with whitespace bytes in the trailer to ensure the error matched the original report, but that is not necessary.)

I am not able to share the original user's PDF, and I do not have information about which tool was used to create or modify it before it was uploaded to our system.

Traceback

This is the complete traceback I see:

entry 14 in Xref table invalid; object not found
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 127, in __init__
    self.read(stream)
  File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 550, in read
    self._read_xref_tables_and_trailers(stream, startxref, xref_issue_nr)
  File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 781, in _read_xref_tables_and_trailers
    startxref = self._read_xref(stream)
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 817, in _read_xref
    self._read_standard_xref_table(stream)
  File "/Users/tsclausing/OpenSource/pypdf/pypdf/_reader.py", line 683, in _read_standard_xref_table
    while cnt < size:
          ^^^^^^^^^^
TypeError: '<' not supported between instances of 'int' and 'NameObject'

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-robustness-issueFrom a users perspective, this is about robustness

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions