BUG: Format floats using their intrinsic decimal precision#1267
BUG: Format floats using their intrinsic decimal precision#1267MartinThoma merged 3 commits intopy-pdf:mainfrom
Conversation
e73cb59 to
d7d447c
Compare
Codecov ReportBase: 94.63% // Head: 94.63% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #1267 +/- ##
=======================================
Coverage 94.63% 94.63%
=======================================
Files 30 30
Lines 5140 5141 +1
Branches 1058 1058
=======================================
+ Hits 4864 4865 +1
Misses 164 164
Partials 112 112
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
5611919 to
a71d15b
Compare
…ng to 5 decimal places Explicitly format floats in outline color test so they can be compared
rather than adding a precision property to FloatObject
2cfe102 to
9766c75
Compare
|
Rebased this PR so tests are passing, and believe all the changes requested in the last review have been addressed. Could you take another look, @MasterOdin? |
MartinThoma
left a comment
There was a problem hiding this comment.
It looks good from my side - thank you for writing a unit test 🤗
|
@MasterOdin You were a lot more involved in this PR than I was. What do you think? |
|
Looks good. Thanks for the work here @programmarchy! 👍 |
|
Thank you both for all the work you put in it 🙏 I've just merged the PR and I will release it today to PyPI :-) |
|
@programmarchy If you want, I can add you to https://pypdf2.readthedocs.io/en/latest/meta/CONTRIBUTORS.html :-) |
New Features (ENH): - Add rotation property and transfer_rotate_to_content (#1348) Performance Improvements (PI): - Avoid string concatenation with large embedded base64-encoded images (#1350) Bug Fixes (BUG): - Format floats using their intrinsic decimal precision (#1267) Robustness (ROB): - Fix merge_page for pages without resources (#1349) Full Changelog: 2.10.8...2.10.9
That would be very cool, thank you @MartinThoma! |
|
When you use |
|
Interesting, would you mind sharing how you came to find out 20 digits is the tipping point for Acrobat, @mrknwk? One way I was thinking of to make this configurable would be to adopt context vars as implemented in decimal.Context for example. The context provides sane defaults with a central point for changing behavior. It would allow us to write something like: import PyPDF2
from PyPDF2 import PdfReader, PdfWriter
from PyPDF2.context import Context, StripExtraTrailingZeros, QuantizeInteger
ctx = StreamContext()
ctx.max_prec = 5 # specify maximum precision
ctx.flags = [
StripExtraTrailingZeros,
QuantizeInteger,
] # could also specify additional format flags
PyPDF2.setcontext(ctx)
reader = PdfReader("./path/to/file.pdf")
reader.pages[0].scale_by(0.5)
writer = PdfWriter()
writer.add_page(reader.pages[0])
...Or like this: with PyPDF2.localcontext() as ctx:
ctx.max_prec = 5 # specify maximum precision
ctx.flags = [
StripExtraTrailingZeros,
QuantizeInteger,
] # could also specify additional format flags
...Or maybe this: PyPDF2.setcontext(AdobeAcrobactContext)Might make sense to open a separate issue to discuss further. |
|
@programmarchy It really was just trial and error. 😊 But 20 digits is also the limit that one of the maintainers of PDF Arranger found in a test. He contacted Adobe about it and apparently it is an Acrobat "implementation level limitation". So maybe the third option would be a nice way to go then. |
|
@mrknwk I'd be happy to take a stab at implementing the above. Could you please create a GitHub issue with a corresponding sample PDF, and tag me? |
Since
FloatObjectis represented as a decimal, format numbers using their intrinsic precision, instead of reducing the precision to 5 decimal places.This fixes rendering issues for PDFs that contain coordinates, transformations, etc. with real numbers containing more than 5 decimal places of precision. For example, PDFs exported from Microsoft PowerPoint contain numbers with up to 11 decimal places.
Fixes: #1266