BUG: invalid cm/tm in visitor functions#2206
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #2206 +/- ##
==========================================
+ Coverage 94.40% 94.44% +0.04%
==========================================
Files 43 43
Lines 7599 7621 +22
Branches 1501 1502 +1
==========================================
+ Hits 7174 7198 +24
+ Misses 262 261 -1
+ Partials 163 162 -1
☔ View full report in Codecov by Sentry. |
|
@troethe, |
| delta_y = m[5] - m_prev[5] | ||
| k = math.sqrt(abs(m[0] * m[3]) + abs(m[1] * m[2])) | ||
| f = font_size * k | ||
| cm_prev = m |
There was a problem hiding this comment.
This doesn't look quite right. I think this line can however be safely removed, because cm_prev doesn't get accessed until it is being assigned to again at the end of this function (i.e. this is currently a noop).
There was a problem hiding this comment.
agree.
Good point to have noticed it . Actually I've forgot to pass the multiplied cm_matrix to the visitor
pypdf/_page.py
Outdated
| 0.0, | ||
| 0.0, | ||
| ] # will store previous tm_matrix | ||
| ] # will store cm_matrix * tm_matrix |
There was a problem hiding this comment.
I think the old comment was more correct, right? This change was probably a side effect of the revert.
There was a problem hiding this comment.
Correct we store inhere only tm_matrix
|
@troethe @srogmann |
|
@pubpub-zz I also have no example of cm_matrix without tm_matrix at hand. PDF-reference, "5.3.3 Text Space Details", uses the product of cm_matrix and tm_matrix you mentioned. An argument to stay at cm_matrix instead of the product might be the performance if the product is computed but not used. In that case the publish of a mult()-function might be useful. In text-extraction-test 3 of tests/test_page.py I used visitor_operand_after to verify the Td-operation (see visitor_td). The return of the operator by visit_operand_before() can support patches or studies like the one in #1389 (comment). Can or should the operand-list be modified inside visit_operand_before() by using append(...), clear(...), ...? Otherwise it would be better to return the tuple (operator, operands). |
@MartinThoma |
|
I don't have an opinion on this topic. To be honest, I haven't read the specs so I need to trust you on those changes. If two of you agree that this PR is good (and CI is green), then I would merge :-) |
|
@MartinThoma / @srogmann |
|
Plus stdby |
I take this as "please standby", so I should not merge? |
Correct interpretation |
|
@MartinThoma |
Co-authored-by: Martin Thoma <info@martin-thoma.de>
Co-authored-by: Martin Thoma <info@martin-thoma.de>
|
The cm/tm parts of the visitor functions are really hard to get right. Thank you for taking care of it @pubpub-zz 🙏 |
## What's new ### Bug Fixes (BUG) - invalid cm/tm in visitor functions (#2206) by @pubpub-zz - Encrypt / decrypt Stream object dictionaries (#2228) by @pilotandy - Support nested color spaces for the /DeviceN color space (#2241) by @Stefan - images property fails if NullObject in list (#2215) by @pubpub-zz ### Robustness (ROB) - XYZ destination to cope with missing left and top param (#2237) by @pubpub-zz ### Documentation (DOC) - Add pilotandy for #2228 as a contributor by @MartinThoma - Link to CONTRIBUTING.md in README.md (#2232) by @markpfeifle - Changelog by @MartinThoma ### Developer Experience (DEV) - Exclude tests from mypy checks by @MartinThoma - Unify mypy options and warn redundant workarounds (#2223) by @exiledkingcc - Stabilize Pillow test with Pillow missing (#2229) by @Stefan ### Maintenance (MAINT) - Update pinned packages (#2243) by @MartinThoma ### Testing (TST) - Regression test against non-deterministic accidental object reuse (#2244) by @MartinThoma [Full Changelog](3.16.2...3.16.3)
closes #2200
closes #2075
reworks and is still valid to close #2059