Skip to content

TST: Compare extracted images against ground truth#2072

Merged
MartinThoma merged 11 commits intomainfrom
more-image-tests
Aug 9, 2023
Merged

TST: Compare extracted images against ground truth#2072
MartinThoma merged 11 commits intomainfrom
more-image-tests

Conversation

@MartinThoma
Copy link
Copy Markdown
Member

@MartinThoma MartinThoma commented Aug 8, 2023

A function image_similarity was introduced which quantifies visual similarities of two images via Mean Squared Error (MSE). This way we can compare the extracted images with what we expect.

We cannot make a byte-wise comparison as updates to PIL can change the representation.

The new function helps us to ensure that updates to the pypdf code don't break image extraction.

@pubpub-zz
Copy link
Copy Markdown
Collaborator

you should have a look in test_filters.py : I did some image comparison using pillow

@pubpub-zz
Copy link
Copy Markdown
Collaborator

you should used ImageOps : this allow to prevent to look at the "encoded" image

@codecov
Copy link
Copy Markdown

codecov bot commented Aug 8, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (aad26dd) 94.23% compared to head (8a15677) 94.23%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2072   +/-   ##
=======================================
  Coverage   94.23%   94.23%           
=======================================
  Files          41       41           
  Lines        7340     7340           
  Branches     1445     1445           
=======================================
  Hits         6917     6917           
  Misses        263      263           
  Partials      160      160           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma MartinThoma requested a review from pubpub-zz August 8, 2023 20:26
@MartinThoma
Copy link
Copy Markdown
Member Author

@pubpub-zz Now it's ready for review :-)

I undid the moving of test code. I don't know yet where exactly this code should be (test_images.py vs test_filters.py vs test_page.py). Only when we have a rather clear and helpful rule where to put which test, we can move stuff around.

@pubpub-zz
Copy link
Copy Markdown
Collaborator

@pubpub-zz Now it's ready for review :-)

I undid the moving of test code. I don't know yet where exactly this code should be (test_images.py vs test_filters.py vs test_page.py). Only when we have a rather clear and helpful rule where to put which test, we can move stuff around.

I personnally don't care and have no recommendation. at that time there was already some tests in test_filters.py that's why I've continued in the same way.

@MartinThoma MartinThoma merged commit 82e8681 into main Aug 9, 2023
@MartinThoma MartinThoma deleted the more-image-tests branch August 9, 2023 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants