BUG: Use remove_orphans in compress_identical_objects#3310
BUG: Use remove_orphans in compress_identical_objects#3310stefan6419846 merged 34 commits intopy-pdf:mainfrom
Conversation
Issue py-pdf#3306: PdfWriter.compress_identical_objects ignored remove_orphans. Correct for this. Also deprecate_with_replacement remove_orphans to remove_unreferenced.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3310 +/- ##
=======================================
Coverage 97.43% 97.43%
=======================================
Files 55 55
Lines 10005 10016 +11
Branches 1837 1841 +4
=======================================
+ Hits 9748 9759 +11
Misses 149 149
Partials 108 108 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Fixed |
|
Could you please elaborate if we have any test in place to properly check that the |
Lines 2578 to 2585 do this check. |
|
Why is this test in a section which checks that warnings are emitted? As far as I can see, this always sets Additionally, from the code it is not directly obvious what is checked here due to some "arbitrary" length checks - could you please try to improve it to make it more obvious what happened? Maybe there are some unreferenced dummy objects we could check for explicitly? |
It does the warning and then removes duplicates. Sorry I do not understand what you mean. Do you mean split it in to two, one just tests a warning and then one tests deduplication?
This is a good idea. How about loop over |
Yes, because I usually tend to remove all warning code where possible when preparing a new major release, thus having this separated making it easier to maintain. Regarding the proposed test, doing something like this should be sufficient in theory: def test_remove_orphans():
writer = PdfWriter(clone_from=RESOURCE_ROOT / "crazyones.pdf")
writer._add_object(DictionaryObject({}))
dictionary_object = DictionaryObject({NameObject("/Testing"): NameObject("/UniqueNameForTesting")})
reference = writer._add_object(dictionary_object)
writer.compress_identical_objects(remove_orphans=False)
assert writer.get_object(reference) == dictionary_object
writer.compress_identical_objects(remove_orphans=True)
with pytest.raises(AssertionError):
writer.get_object(reference)The |
|
Is Can you clarify the names of the test functions that test deprecation and test compression? |
|
No, this would be the one for the new way. I just used the old names as this is what my local testing environment for quick verification has. |
Co-authored-by: Stefan <96178532+stefan6419846@users.noreply.github.com>
## What's new ### Security (SEC) - Disallow custom XML entity declarations for XMP metadata (#3724) by @stefan6419846 ### New Features (ENH) - Skip MD5 key derivation for AES-256 encrypted PDFs (#3694) by @Ygnas ### Bug Fixes (BUG) - Use remove_orphans in compress_identical_objects (#3310) by @j-t-1 - Fix PdfReadError when xref table contains comments before trailer (#3710) by @rassie - Correctly verify AES padding during decryption (#3699) by @stefan6419846 - Fix stale object cache from non-authoritative object streams (#3698) by @astahlman - Fix extract_links pairing when annotations include non-links (#3687) by @ReinerBRO ### Documentation (DOC) - Add AI policy (#3717) by @stefan6419846 [Full Changelog](6.9.2...6.10.0)
Issue #3306: PdfWriter.compress_identical_objects ignored remove_orphans. Correct for this. Also deprecate_with_replacement remove_orphans to remove_unreferenced.