BUG: Use remove_orphans in compress_identical_objects by j-t-1 · Pull Request #3310 · py-pdf/pypdf

j-t-1 · 2025-06-11T18:09:48Z

Issue #3306: PdfWriter.compress_identical_objects ignored remove_orphans. Correct for this. Also deprecate_with_replacement remove_orphans to remove_unreferenced.

Issue py-pdf#3306: PdfWriter.compress_identical_objects ignored remove_orphans. Correct for this. Also deprecate_with_replacement remove_orphans to remove_unreferenced.

codecov · 2025-06-11T18:18:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.43%. Comparing base (018a52e) to head (e684a13).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3310   +/-   ##
=======================================
  Coverage   97.43%   97.43%           
=======================================
  Files          55       55           
  Lines       10005    10016   +11     
  Branches     1837     1841    +4     
=======================================
+ Hits         9748     9759   +11     
  Misses        149      149           
  Partials      108      108

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

j-t-1 · 2026-02-26T18:40:20Z

Fixed deprecate_with_replacement. Ready for code review.

stefan6419846 · 2026-03-23T07:30:17Z

Could you please elaborate if we have any test in place to properly check that the remove_orphans parameter is properly respected and where? #3691 has been opened recently and I want to avoid merging this first when your PR already covers all relevant aspects.

j-t-1 · 2026-03-23T09:00:26Z

Could you please elaborate if we have any test in place to properly check that the remove_orphans parameter is properly respected and where?

Lines 2578 to 2585 do this check.

stefan6419846 · 2026-03-24T09:25:10Z

Why is this test in a section which checks that warnings are emitted? As far as I can see, this always sets remove_orphans, which actually is the inverse of what I am asking for.

Additionally, from the code it is not directly obvious what is checked here due to some "arbitrary" length checks - could you please try to improve it to make it more obvious what happened? Maybe there are some unreferenced dummy objects we could check for explicitly?

j-t-1 · 2026-03-24T11:39:58Z

Why is this test in a section which checks that warnings are emitted? As far as I can see, this always sets remove_orphans, which actually is the inverse of what I am asking for.

It does the warning and then removes duplicates. Sorry I do not understand what you mean. Do you mean split it in to two, one just tests a warning and then one tests deduplication?

Additionally, from the code it is not directly obvious what is checked here due to some "arbitrary" length checks - could you please try to improve it to make it more obvious what happened? Maybe there are some unreferenced dummy objects we could check for explicitly?

This is a good idea. How about loop over writer._objects, take the id()of each and then count using a dict? Do this before and after and then ensure any values that were greater than one are now equal to one.

stefan6419846 · 2026-03-25T15:30:03Z

Do you mean split it in to two, one just tests a warning and then one tests deduplication?

Yes, because I usually tend to remove all warning code where possible when preparing a new major release, thus having this separated making it easier to maintain.

Regarding the proposed test, doing something like this should be sufficient in theory:

def test_remove_orphans():
    writer = PdfWriter(clone_from=RESOURCE_ROOT / "crazyones.pdf")
    writer._add_object(DictionaryObject({}))
    dictionary_object = DictionaryObject({NameObject("/Testing"): NameObject("/UniqueNameForTesting")})
    reference = writer._add_object(dictionary_object)

    writer.compress_identical_objects(remove_orphans=False)
    assert writer.get_object(reference) == dictionary_object

    writer.compress_identical_objects(remove_orphans=True)
    with pytest.raises(AssertionError):
        writer.get_object(reference)

The AssertionError might be replaced by a proper exception here - mypy is the wrong justification here.

j-t-1 · 2026-03-25T16:33:48Z

Is test_remove_orphans the test of the deprecation? If yes, it should have the warnings, if not it should use the renamed values.

Can you clarify the names of the test functions that test deprecation and test compression?

stefan6419846 · 2026-03-25T16:42:42Z

No, this would be the one for the new way. I just used the old names as this is what my local testing environment for quick verification has.

Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>

@stefan6419846

## What's new ### Security (SEC) - Disallow custom XML entity declarations for XMP metadata (#3724) by @stefan6419846 ### New Features (ENH) - Skip MD5 key derivation for AES-256 encrypted PDFs (#3694) by @Ygnas ### Bug Fixes (BUG) - Use remove_orphans in compress_identical_objects (#3310) by @j-t-1 - Fix PdfReadError when xref table contains comments before trailer (#3710) by @rassie - Correctly verify AES padding during decryption (#3699) by @stefan6419846 - Fix stale object cache from non-authoritative object streams (#3698) by @astahlman - Fix extract_links pairing when annotations include non-links (#3687) by @ReinerBRO ### Documentation (DOC) - Add AI policy (#3717) by @stefan6419846 [Full Changelog](6.9.2...6.10.0)

j-t-1 added 4 commits June 10, 2025 18:39

BUG: Use remove_orphans in compress_identical_objects

1a5a7b6

Issue py-pdf#3306: PdfWriter.compress_identical_objects ignored remove_orphans. Correct for this. Also deprecate_with_replacement remove_orphans to remove_unreferenced.

Update _writer.py

77829b9

Update test_workflows.py

12a27b5

Update test_writer.py

ba65fd9

stefan6419846 reviewed Jun 11, 2025

View reviewed changes

Comment thread pypdf/_writer.py Outdated

stefan6419846 added the needs-change The PR/issue cannot be handled as issue and needs to be improved label Jul 31, 2025

j-t-1 added 6 commits February 17, 2026 16:26

Merge branch 'main' into compress

774c2cb

Merge branch 'main' into compress

d10cd8a

Add import

719a501

Fix erros

300d091

Use remove_unreferenced

15a6351

Fix tests

e507b42

j-t-1 added 3 commits February 28, 2026 10:10

Use different approach

203bd7f

Fix NameError

64b811e

Remove trailing whitespace

86acbb0

stefan6419846 reviewed Mar 2, 2026

View reviewed changes

Comment thread pypdf/_writer.py Outdated

stefan6419846 reviewed Mar 2, 2026

View reviewed changes

Comment thread pypdf/_writer.py Outdated

j-t-1 added 10 commits March 3, 2026 14:04

Increase coverage

0de2ae2

Fix call to deprecate_with_replacement

33dc9c1

Modify tests to use new arguments

a72c475

Use remove_duplicates

ea98758

Remove trailing whitespace

23d0399

Merge branch 'main' into compress

43dbf83

Fix errors

59035e6

Rename argument

ac19299

Merge branch 'main' into compress

2be736a

Merge branch 'main' into compress

c09fb58

stefan6419846 mentioned this pull request Mar 24, 2026

BUG: Fix compress_identical_objects ignoring remove_orphans parameter #3691

Closed

Incorporate code review

3c12070

Fix test

4f2816c

stefan6419846 reviewed Mar 26, 2026

View reviewed changes

Comment thread pypdf/_writer.py Outdated

stefan6419846 reviewed Mar 26, 2026

View reviewed changes

Comment thread pypdf/_writer.py Outdated

stefan6419846 reviewed Mar 26, 2026

View reviewed changes

Comment thread pypdf/_writer.py Outdated

fix docs

e164145

Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>

stefan6419846 requested changes Mar 26, 2026

View reviewed changes

Comment thread tests/test_workflows.py Outdated

Comment thread tests/test_writer.py Outdated

Comment thread tests/test_writer.py Outdated

Comment thread tests/test_writer.py Outdated

Comment thread tests/test_writer.py Outdated

stefan6419846 removed the needs-change The PR/issue cannot be handled as issue and needs to be improved label Mar 26, 2026

j-t-1 and others added 4 commits March 26, 2026 13:14

Update tests/test_writer.py

7ebb533

Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>

Update test_writer.py

8c6c32f

Update test_workflows.py

f93719b

Fix E501 Line too long

6039ac7

j-t-1 requested a review from stefan6419846 March 26, 2026 14:01

stefan6419846 requested changes Mar 26, 2026

View reviewed changes

Comment thread tests/test_writer.py Outdated

Update match string

98fe23b

j-t-1 requested a review from stefan6419846 March 26, 2026 15:48

stefan6419846 added 3 commits March 27, 2026 11:17

use proper exception

5c3249d

use proper exception

4ceda8a

Merge branch 'main' into compress

e684a13

stefan6419846 approved these changes Mar 27, 2026

View reviewed changes

stefan6419846 merged commit 6f10e02 into py-pdf:main Mar 27, 2026
24 of 32 checks passed

j-t-1 deleted the compress branch March 27, 2026 10:52

Conversation

j-t-1 commented Jun 11, 2025

Uh oh!

codecov bot commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

j-t-1 commented Feb 26, 2026

Uh oh!

Uh oh!

Uh oh!

stefan6419846 commented Mar 23, 2026

Uh oh!

j-t-1 commented Mar 23, 2026

Uh oh!

stefan6419846 commented Mar 24, 2026

Uh oh!

j-t-1 commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefan6419846 commented Mar 25, 2026

Uh oh!

j-t-1 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefan6419846 commented Mar 25, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jun 11, 2025 •

edited

Loading

j-t-1 commented Mar 24, 2026 •

edited

Loading

j-t-1 commented Mar 25, 2026 •

edited

Loading