Skip to content

PERF: Use bytearray instead of b"" in encode_pdfdocencoding#2325

Merged
MartinThoma merged 1 commit intopy-pdf:mainfrom
zuypt:patch-1
Dec 4, 2023
Merged

PERF: Use bytearray instead of b"" in encode_pdfdocencoding#2325
MartinThoma merged 1 commit intopy-pdf:mainfrom
zuypt:patch-1

Conversation

@zuypt
Copy link
Copy Markdown
Contributor

@zuypt zuypt commented Dec 4, 2023

Since b"" is not mutable it causes python to allocate and deallocate memory repeatedly in the for loop which cause hang/long runtime when handle very large string. For example when using add_js to to add a very big javascript code.

Since b"" is not mutable it causes python to allocate and deallocate memory repeatedly in the for loop which cause hang/long runtime when handle very large string. For example when using add_js to  to add a very big javascript code.
@codecov
Copy link
Copy Markdown

codecov bot commented Dec 4, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (40e25ec) 94.37% compared to head (e3ec6cc) 94.37%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2325   +/-   ##
=======================================
  Coverage   94.37%   94.37%           
=======================================
  Files          43       43           
  Lines        7660     7660           
  Branches     1515     1515           
=======================================
  Hits         7229     7229           
  Misses        267      267           
  Partials      164      164           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stefan6419846
Copy link
Copy Markdown
Collaborator

Could you please update the title to use the recommended naming scheme? https://pypdf.readthedocs.io/en/latest/dev/intro.html#commit-messages

@MartinThoma MartinThoma changed the title Update _base.py PERF: Update _base.py Dec 4, 2023
@MartinThoma MartinThoma changed the title PERF: Update _base.py PERF: Use bytearray instead of b"" in encode_pdfdocencoding Dec 4, 2023
@MartinThoma
Copy link
Copy Markdown
Member

@zuypt Do you have an example that shows the difference? (It could be a toy-example - I'm just curious :-) )

@zuypt
Copy link
Copy Markdown
Contributor Author

zuypt commented Dec 4, 2023

Could you please update the title to use the recommended naming scheme? https://pypdf.readthedocs.io/en/latest/dev/intro.html#commit-messages

I'm too lazy if some one have permission please help

@zuypt
Copy link
Copy Markdown
Contributor Author

zuypt commented Dec 4, 2023

@zuypt Do you have an example that shows the difference? (It could be a toy-example - I'm just curious :-) )

just create a PdfWriter then call add_js with a super large string you will see. This is a pretty common python programming error.

@MartinThoma
Copy link
Copy Markdown
Member

I've already adjusted the title

@MartinThoma
Copy link
Copy Markdown
Member

MartinThoma commented Dec 4, 2023

import timeit

def benchmark_empty_bytes_literal():
    result = b""
    for _ in range(100000):
        result += b"a"

def benchmark_bytes_object():
    result = bytearray()
    for _ in range(100000):
        result += b"a"

if __name__ == "__main__":
    empty_bytes_literal_time = timeit.timeit(benchmark_empty_bytes_literal, number=100)
    bytes_object_time = timeit.timeit(benchmark_bytes_object, number=100)

    print(f"Empty Bytes Literal Time: {empty_bytes_literal_time:.1f}")
    print(f"bytearray Time: {bytes_object_time:.1f}")

shows:

Empty Bytes Literal Time: 21.4
bytearray Time: 0.5

@MartinThoma MartinThoma merged commit 6cb5343 into py-pdf:main Dec 4, 2023
@MartinThoma
Copy link
Copy Markdown
Member

@zuypt Thanks for your contribution! If you want, I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html

@MartinThoma
Copy link
Copy Markdown
Member

It will be part of the next release on Sunday.

@zuypt
Copy link
Copy Markdown
Contributor Author

zuypt commented Dec 6, 2023

@zuypt Thanks for your contribution! If you want, I can add you to https://pypdf.readthedocs.io/en/latest/meta/CONTRIBUTORS.html

sure. Thanks for the recognition

MartinThoma added a commit that referenced this pull request Dec 10, 2023
## What's new

### Bug Fixes (BUG)
-  Cope with deflated images with CMYK Black Only (#2322) by @pubpub-zz
-  Handle indirect objects as parameters for CCITTFaxDecode (#2307) by @stefan6419846
-  check words length in _cmap type1_alternative function (#2310) by @Takher

### Robustness (ROB)
-  Relax flate decoding for too many lookup values (#2331) by @stefan6419846
-  Let _build_destination skip in case of missing /D key (#2018) by @nickryand

### Documentation (DOC)
-  Note in reading form data (#2338) by @MartinThoma
-  Pull Request prefixes and size by @MartinThoma
-  Add https://github.com/zuypt for #2325 as a contributor by @MartinThoma
-  Fix docstring for RunLengthDecode.decode (#2302) by @stefan6419846

### Maintenance (MAINT)
-  Enable `disallow_any_generics` and add missing generics (#2278) by @nilehmann

### Testing (TST)
-  Centralize file downloads (#2324) by @MartinThoma

### Code Style (STY)
-  Fix typo "steam" \xe2\x86\x92 "stream" (#2327) by @stefan6419846
-  Run black by @MartinThoma
-  Make Traceback in bug report template uppercase (#2304) by @stefan6419846

[Full Changelog](3.17.1...3.17.2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants