TST: Compare extracted images against ground truth#2072
Conversation
|
you should have a look in test_filters.py : I did some image comparison using pillow |
|
you should used ImageOps : this allow to prevent to look at the "encoded" image |
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## main #2072 +/- ##
=======================================
Coverage 94.23% 94.23%
=======================================
Files 41 41
Lines 7340 7340
Branches 1445 1445
=======================================
Hits 6917 6917
Misses 263 263
Partials 160 160 ☔ View full report in Codecov by Sentry. |
|
@pubpub-zz Now it's ready for review :-) I undid the moving of test code. I don't know yet where exactly this code should be (test_images.py vs test_filters.py vs test_page.py). Only when we have a rather clear and helpful rule where to put which test, we can move stuff around. |
I personnally don't care and have no recommendation. at that time there was already some tests in test_filters.py that's why I've continued in the same way. |
A function
image_similaritywas introduced which quantifies visual similarities of two images via Mean Squared Error (MSE). This way we can compare the extracted images with what we expect.We cannot make a byte-wise comparison as updates to PIL can change the representation.
The new function helps us to ensure that updates to the pypdf code don't break image extraction.