BUG: Fix PdfReadError when xref table contains comments before trailer by rassie · Pull Request #3710 · py-pdf/pypdf

rassie · 2026-04-01T15:21:29Z

Closes #3709.

Summary

Skip PDF comments (% to EOL) between xref table entries and the trailer keyword in _read_standard_xref_table
Some PDF producers (e.g. Vectorizer.AI) insert comments at this position, which is legal per PDF spec §7.2.3
Previously this caused PdfReadError: Could not read Boolean object because read_non_whitespace() stops at % and the parser misinterprets trailer as a boolean token

Changes

pypdf/_reader.py: Added a loop after reading xref entries that calls skip_over_comment() to consume any comment lines before checking for the trailer tag
tests/test_reader.py: Added test_xref_table_with_comments_before_trailer with an inline minimal PDF reproducer containing comment lines between xref entries and trailer

Test plan

New test test_xref_table_with_comments_before_trailer passes
All 13 existing xref-related tests pass without regression

Some PDF producers (e.g. Vectorizer.AI) insert legal PDF comments (% to end of line) between the last xref table entry and the `trailer` keyword. The `_read_standard_xref_table` method did not skip comments at this position, causing it to misparse the `%` character and ultimately raise `PdfReadError: Could not read Boolean object`. The fix adds a loop after reading xref entries that calls `skip_over_comment()` to consume any comment lines before checking for the `trailer` tag. This is consistent with PDF spec §7.2.3 which allows comments anywhere except inside strings or streams.

codecov · 2026-04-01T15:29:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.43%. Comparing base (f3f501b) to head (27c3014).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3710   +/-   ##
=======================================
  Coverage   97.43%   97.43%           
=======================================
  Files          55       55           
  Lines       10016    10022    +6     
  Branches     1841     1842    +1     
=======================================
+ Hits         9759     9765    +6     
  Misses        149      149           
  Partials      108      108

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

stefan6419846

Thanks.

@stefan6419846

## What's new ### Security (SEC) - Disallow custom XML entity declarations for XMP metadata (#3724) by @stefan6419846 ### New Features (ENH) - Skip MD5 key derivation for AES-256 encrypted PDFs (#3694) by @Ygnas ### Bug Fixes (BUG) - Use remove_orphans in compress_identical_objects (#3310) by @j-t-1 - Fix PdfReadError when xref table contains comments before trailer (#3710) by @rassie - Correctly verify AES padding during decryption (#3699) by @stefan6419846 - Fix stale object cache from non-authoritative object streams (#3698) by @astahlman - Fix extract_links pairing when annotations include non-links (#3687) by @ReinerBRO ### Documentation (DOC) - Add AI policy (#3717) by @stefan6419846 [Full Changelog](6.9.2...6.10.0)

rassie changed the title ~~Fix PdfReadError when xref table contains comments before trailer~~ BUG: Fix PdfReadError when xref table contains comments before trailer Apr 1, 2026

stefan6419846 approved these changes Apr 7, 2026

View reviewed changes

stefan6419846 merged commit bd95bd8 into py-pdf:main Apr 7, 2026
40 of 44 checks passed

annaswims mentioned this pull request Apr 10, 2026

PPF-1678: fix nightly test chinapandaman/PyPDFForm#1679

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fix PdfReadError when xref table contains comments before trailer#3710

BUG: Fix PdfReadError when xref table contains comments before trailer#3710
stefan6419846 merged 1 commit intopy-pdf:mainfrom
rassie:fix-xref-comment-parsing

rassie commented Apr 1, 2026

Uh oh!

codecov bot commented Apr 1, 2026 •

edited

Loading

Uh oh!

stefan6419846 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rassie commented Apr 1, 2026

Summary

Changes

Test plan

Uh oh!

codecov bot commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stefan6419846 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Apr 1, 2026 •

edited

Loading