Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: py-pdf/pypdf
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 1.27.8
Choose a base ref
...
head repository: py-pdf/pypdf
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 1.27.9
Choose a head ref
  • 19 commits
  • 28 files changed
  • 6 contributors

Commits on Apr 21, 2022

  1. DEV: Add Benchmark for Performance Testing (#781)

    We want to track performance over time only for what actually
    is in main.
    
    Closes #761
    MartinThoma authored Apr 21, 2022
    Configuration menu
    Copy the full SHA
    f0f1fa3 View commit details
    Browse the repository at this point in the history

Commits on Apr 22, 2022

  1. Configuration menu
    Copy the full SHA
    668869f View commit details
    Browse the repository at this point in the history

Commits on Apr 23, 2022

  1. ENH: Allow setting form field flags (#802)

    Closes #574
    Closes #801
    
    Co-authored-by: Craig Jones <craig@k6nnl.com>
    MartinThoma and polyglot-jones authored Apr 23, 2022
    Configuration menu
    Copy the full SHA
    ffb2084 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    40df4d7 View commit details
    Browse the repository at this point in the history
  3. ROB: Handle recursion error (#804)

    This doesn't solve the issue, but it might make it less severe.
    
    See #520
    See #268
    See virantha/pypdfocr#59
    
    sfneal@3558a69
    
    Co-authored-by: danniesim <geemee@gmail.com>
    MartinThoma and danniesim authored Apr 23, 2022
    Configuration menu
    Copy the full SHA
    3d65938 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9941099 View commit details
    Browse the repository at this point in the history
  5. MAINT: Quadratic runtime while parsing reduced to linear (#808)

    When the PdfFileReader tries to find the xref marker, the readNextEndLine methods builds a so called line by reading byte-for-byte. Every time a new byte is read, it is concatenated with the currently read line. This leads to quadratic runtime O(n²) behavior as Python strings (also byte-strings) are immutable and have to be copied where n is the size of the file.
    For files where the xref marker can not be found at the end this takes a enormous amount of time:
    
    * 1mb of zeros at the end: 45.54 seconds
    * 2mb of zeros at the end: 357.04 seconds
    (measured on a laptop made in 2015)
    
    This pull request changes the relevant section of the code to become linear runtime O(n), leading to a run time of less then a second for both cases mentioned above. Furthermore this PR adds a regression test.
    dsk7 authored Apr 23, 2022
    Configuration menu
    Copy the full SHA
    c6c56f5 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    d4c8cab View commit details
    Browse the repository at this point in the history
  7. BUG: Improve spacing for text extraction (#806)

    PyPDF2 now takes positive / negative spaces between text blocks into account. Not very elegant, but the result looks way better than before.
    MartinThoma authored Apr 23, 2022
    Configuration menu
    Copy the full SHA
    d1be80d View commit details
    Browse the repository at this point in the history

Commits on Apr 24, 2022

  1. Configuration menu
    Copy the full SHA
    b3247e8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7541047 View commit details
    Browse the repository at this point in the history
  3. DOC: CMaps (#811)

    MartinThoma authored Apr 24, 2022
    Configuration menu
    Copy the full SHA
    6729b80 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    07848e5 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    f48b4ac View commit details
    Browse the repository at this point in the history
  6. ROB: Use null ID when encrypted but no ID given (#812)

    If no '/ID' key is present in self.trailer an array of two empty bytestrings is used in place of an '/ID'. This is how Apache PDFBox handles this case.
    
    This makes PyPDF2 more robust to malformed PDFs.
    
    Closes #608
    Closes #610
    
    Full credit for this one to Richard Millson - Martin Thoma only fixed a merge conflict
    
    Co-authored-by: Richard Millson <8217613+richardmillson@users.noreply.github.com>
    MartinThoma and richardmillson authored Apr 24, 2022
    Configuration menu
    Copy the full SHA
    663ca98 View commit details
    Browse the repository at this point in the history
  7. BUG: TypeError in xmp._converter_date (#813)

    Fix: Convert decimal to int before passing it to datetime
    
    Closes #774
    MartinThoma authored Apr 24, 2022
    Configuration menu
    Copy the full SHA
    63b4c91 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    5bc7219 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    e673a6e View commit details
    Browse the repository at this point in the history
  10. REL: 1.27.9

    A change I would like to highlight is the performance improvement for
    large PDF files (#808) 🎉
    
    New Features (ENH):
    -  Add papersizes (#800)
    -  Allow setting permission flags when encrypting (#803)
    -  Allow setting form field flags (#802)
    
    Bug Fixes (BUG):
    -  TypeError in xmp._converter_date (#813)
    -  Improve spacing for text extraction (#806)
    -  Fix PDFDocEncoding Character Set (#809)
    
    Robustness (ROB):
    -  Use null ID when encrypted but no ID given (#812)
    -  Handle recursion error (#804)
    
    Documentation (DOC):
    -  CMaps (#811)
    -  The PDF Format + commit prefixes (#810)
    -  Add compression example (#792)
    
    Developer Experience (DEV):
    -  Add Benchmark for Performance Testing (#781)
    
    Maintenance (MAINT):
    -  Validate PDF magic byte in strict mode (#814)
    -  Make PdfFileMerger.addBookmark() behave life PdfFileWriters\' (#339)
    -  Quadratic runtime while parsing reduced to linear  (#808)
    
    Testing (TST):
    -  Newlines in text extraction (#807)
    
    Full Changelog: 1.27.8...1.27.9
    MartinThoma committed Apr 24, 2022
    Configuration menu
    Copy the full SHA
    22033d7 View commit details
    Browse the repository at this point in the history
Loading