Skip to content

BUG: Fix PdfReadError when xref table contains comments before trailer#3710

Merged
stefan6419846 merged 1 commit intopy-pdf:mainfrom
rassie:fix-xref-comment-parsing
Apr 7, 2026
Merged

BUG: Fix PdfReadError when xref table contains comments before trailer#3710
stefan6419846 merged 1 commit intopy-pdf:mainfrom
rassie:fix-xref-comment-parsing

Conversation

@rassie
Copy link
Copy Markdown
Contributor

@rassie rassie commented Apr 1, 2026

Closes #3709.

Summary

  • Skip PDF comments (% to EOL) between xref table entries and the trailer keyword in _read_standard_xref_table
  • Some PDF producers (e.g. Vectorizer.AI) insert comments at this position, which is legal per PDF spec §7.2.3
  • Previously this caused PdfReadError: Could not read Boolean object because read_non_whitespace() stops at % and the parser misinterprets trailer as a boolean token

Changes

  • pypdf/_reader.py: Added a loop after reading xref entries that calls skip_over_comment() to consume any comment lines before checking for the trailer tag
  • tests/test_reader.py: Added test_xref_table_with_comments_before_trailer with an inline minimal PDF reproducer containing comment lines between xref entries and trailer

Test plan

  • New test test_xref_table_with_comments_before_trailer passes
  • All 13 existing xref-related tests pass without regression

Some PDF producers (e.g. Vectorizer.AI) insert legal PDF comments
(% to end of line) between the last xref table entry and the
`trailer` keyword.  The `_read_standard_xref_table` method did not
skip comments at this position, causing it to misparse the `%`
character and ultimately raise `PdfReadError: Could not read Boolean
object`.

The fix adds a loop after reading xref entries that calls
`skip_over_comment()` to consume any comment lines before checking
for the `trailer` tag.  This is consistent with PDF spec §7.2.3
which allows comments anywhere except inside strings or streams.
@rassie rassie changed the title Fix PdfReadError when xref table contains comments before trailer BUG: Fix PdfReadError when xref table contains comments before trailer Apr 1, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.43%. Comparing base (f3f501b) to head (27c3014).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3710   +/-   ##
=======================================
  Coverage   97.43%   97.43%           
=======================================
  Files          55       55           
  Lines       10016    10022    +6     
  Branches     1841     1842    +1     
=======================================
+ Hits         9759     9765    +6     
  Misses        149      149           
  Partials      108      108           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@stefan6419846 stefan6419846 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@stefan6419846 stefan6419846 merged commit bd95bd8 into py-pdf:main Apr 7, 2026
40 of 44 checks passed
stefan6419846 added a commit that referenced this pull request Apr 10, 2026
## What's new

### Security (SEC)
- Disallow custom XML entity declarations for XMP metadata (#3724) by @stefan6419846

### New Features (ENH)
- Skip MD5 key derivation for AES-256 encrypted PDFs (#3694) by @Ygnas

### Bug Fixes (BUG)
- Use remove_orphans in compress_identical_objects (#3310) by @j-t-1
- Fix PdfReadError when xref table contains comments before trailer (#3710) by @rassie
- Correctly verify AES padding during decryption (#3699) by @stefan6419846
- Fix stale object cache from non-authoritative object streams (#3698) by @astahlman
- Fix extract_links pairing when annotations include non-links (#3687) by @ReinerBRO

### Documentation (DOC)
- Add AI policy (#3717) by @stefan6419846

[Full Changelog](6.9.2...6.10.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PdfReadError when xref table contains comments before trailer keyword

2 participants