-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Comparing changes
Open a pull request
base repository: py-pdf/pypdf
base: 1.27.8
head repository: py-pdf/pypdf
compare: 1.27.9
- 19 commits
- 28 files changed
- 6 contributors
Commits on Apr 21, 2022
-
DEV: Add Benchmark for Performance Testing (#781)
We want to track performance over time only for what actually is in main. Closes #761
Configuration menu - View commit details
-
Copy full SHA for f0f1fa3 - Browse repository at this point
Copy the full SHA f0f1fa3View commit details
Commits on Apr 22, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 668869f - Browse repository at this point
Copy the full SHA 668869fView commit details
Commits on Apr 23, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ffb2084 - Browse repository at this point
Copy the full SHA ffb2084View commit details -
Configuration menu - View commit details
-
Copy full SHA for 40df4d7 - Browse repository at this point
Copy the full SHA 40df4d7View commit details -
ROB: Handle recursion error (#804)
This doesn't solve the issue, but it might make it less severe. See #520 See #268 See virantha/pypdfocr#59 sfneal@3558a69 Co-authored-by: danniesim <geemee@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 3d65938 - Browse repository at this point
Copy the full SHA 3d65938View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9941099 - Browse repository at this point
Copy the full SHA 9941099View commit details -
MAINT: Quadratic runtime while parsing reduced to linear (#808)
When the PdfFileReader tries to find the xref marker, the readNextEndLine methods builds a so called line by reading byte-for-byte. Every time a new byte is read, it is concatenated with the currently read line. This leads to quadratic runtime O(n²) behavior as Python strings (also byte-strings) are immutable and have to be copied where n is the size of the file. For files where the xref marker can not be found at the end this takes a enormous amount of time: * 1mb of zeros at the end: 45.54 seconds * 2mb of zeros at the end: 357.04 seconds (measured on a laptop made in 2015) This pull request changes the relevant section of the code to become linear runtime O(n), leading to a run time of less then a second for both cases mentioned above. Furthermore this PR adds a regression test.
Configuration menu - View commit details
-
Copy full SHA for c6c56f5 - Browse repository at this point
Copy the full SHA c6c56f5View commit details -
Configuration menu - View commit details
-
Copy full SHA for d4c8cab - Browse repository at this point
Copy the full SHA d4c8cabView commit details -
BUG: Improve spacing for text extraction (#806)
PyPDF2 now takes positive / negative spaces between text blocks into account. Not very elegant, but the result looks way better than before.
Configuration menu - View commit details
-
Copy full SHA for d1be80d - Browse repository at this point
Copy the full SHA d1be80dView commit details
Commits on Apr 24, 2022
-
Configuration menu - View commit details
-
Copy full SHA for b3247e8 - Browse repository at this point
Copy the full SHA b3247e8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 7541047 - Browse repository at this point
Copy the full SHA 7541047View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6729b80 - Browse repository at this point
Copy the full SHA 6729b80View commit details -
MAINT: Make PdfFileMerger.addBookmark() behave life PdfFileWriters' (#…
…339) People stumbled over this inconsistency: * #40 * https://stackoverflow.com/a/42991101/562769 This was also tested with: https://stackoverflow.com/questions/42941742/pypdf2-nested-bookmarks-with-same-name-not-working/42991101#comment73249244_42991101
Configuration menu - View commit details
-
Copy full SHA for 07848e5 - Browse repository at this point
Copy the full SHA 07848e5View commit details -
Configuration menu - View commit details
-
Copy full SHA for f48b4ac - Browse repository at this point
Copy the full SHA f48b4acView commit details -
ROB: Use null ID when encrypted but no ID given (#812)
If no '/ID' key is present in self.trailer an array of two empty bytestrings is used in place of an '/ID'. This is how Apache PDFBox handles this case. This makes PyPDF2 more robust to malformed PDFs. Closes #608 Closes #610 Full credit for this one to Richard Millson - Martin Thoma only fixed a merge conflict Co-authored-by: Richard Millson <8217613+richardmillson@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 663ca98 - Browse repository at this point
Copy the full SHA 663ca98View commit details -
BUG: TypeError in xmp._converter_date (#813)
Fix: Convert decimal to int before passing it to datetime Closes #774
Configuration menu - View commit details
-
Copy full SHA for 63b4c91 - Browse repository at this point
Copy the full SHA 63b4c91View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5bc7219 - Browse repository at this point
Copy the full SHA 5bc7219View commit details -
Configuration menu - View commit details
-
Copy full SHA for e673a6e - Browse repository at this point
Copy the full SHA e673a6eView commit details -
A change I would like to highlight is the performance improvement for large PDF files (#808) 🎉 New Features (ENH): - Add papersizes (#800) - Allow setting permission flags when encrypting (#803) - Allow setting form field flags (#802) Bug Fixes (BUG): - TypeError in xmp._converter_date (#813) - Improve spacing for text extraction (#806) - Fix PDFDocEncoding Character Set (#809) Robustness (ROB): - Use null ID when encrypted but no ID given (#812) - Handle recursion error (#804) Documentation (DOC): - CMaps (#811) - The PDF Format + commit prefixes (#810) - Add compression example (#792) Developer Experience (DEV): - Add Benchmark for Performance Testing (#781) Maintenance (MAINT): - Validate PDF magic byte in strict mode (#814) - Make PdfFileMerger.addBookmark() behave life PdfFileWriters\' (#339) - Quadratic runtime while parsing reduced to linear (#808) Testing (TST): - Newlines in text extraction (#807) Full Changelog: 1.27.8...1.27.9
Configuration menu - View commit details
-
Copy full SHA for 22033d7 - Browse repository at this point
Copy the full SHA 22033d7View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff 1.27.8...1.27.9