ENH: Accelerate image list keys generation by pubpub-zz · Pull Request #2014 · py-pdf/pypdf

pubpub-zz · 2023-07-25T18:28:50Z

closes #1987

closes py-pdf#1987

pubpub-zz · 2023-07-25T18:31:26Z

@MartinThoma
I've got 2 mypy errors I do not understand Can you have a look please 😘

codecov · 2023-07-26T18:56:34Z

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.02% ⚠️

Comparison is base (890c93a) 94.03% compared to head (88c8bb2) 94.01%.
Report is 6 commits behind head on main.

❗ Current head 88c8bb2 differs from pull request most recent head c756267. Consider uploading reports for the commit c756267 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2014      +/-   ##
==========================================
- Coverage   94.03%   94.01%   -0.02%     
==========================================
  Files          33       33              
  Lines        7076     7090      +14     
  Branches     1413     1418       +5     
==========================================
+ Hits         6654     6666      +12     
- Misses        263      264       +1     
- Partials      159      160       +1

Files Changed	Coverage Δ
pypdf/_page.py	`93.61% <100.00%> (-0.15%)`	⬇️
pypdf/_utils.py	`99.17% <100.00%> (+<0.01%)`	⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pubpub-zz · 2023-07-26T19:08:18Z

@MartinThoma
I've found a nice fix. Now it's all your 😀

MartinThoma · 2023-07-28T11:58:10Z

I've tested it with https://github.com/py-pdf/pypdf/files/12160419/table_redacted.pdf and now I get:

Traceback (most recent call last):
  File "/home/moose/Github/py-pdf/pypdf/sample-files/foo.py", line 20, in <module>
    run("table_redacted.pdf")
  File "/home/moose/Github/py-pdf/pypdf/sample-files/foo.py", line 13, in run
    for image in page.images:
  File "/home/moose/Github/py-pdf/pypdf/pypdf/_page.py", line 2633, in __iter__
    yield self[i]
          ~~~~^^^
  File "/home/moose/Github/py-pdf/pypdf/pypdf/_page.py", line 2629, in __getitem__
    return self.get_function(lst[index])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/moose/Github/py-pdf/pypdf/pypdf/_page.py", line 532, in _get_image
    return self.inline_images[id]
           ~~~~~~~~~~~~~~~~~~^^^^
KeyError: '~0~'

stefan6419846 · 2023-07-28T12:00:41Z

Which code did you use for testing? Did you remove page.inline_images = dict()?

MartinThoma · 2023-07-28T12:11:30Z

Thank you - I forgot that 🙈

MartinThoma · 2023-07-28T12:14:30Z

Before (current main):

4.24s: 009-pdflatex-geotopo/GeoTopo.pdf
2.88s: 009-pdflatex-geotopo/GeoTopo-komprimiert.pdf

With this PR:

2.01s: 009-pdflatex-geotopo/GeoTopo.pdf
0.44s: 009-pdflatex-geotopo/GeoTopo-komprimiert.pdf

Good work 🎉

pubpub-zz · 2023-07-28T13:04:05Z

I've tested it with https://github.com/py-pdf/pypdf/files/12160419/table_redacted.pdf and now I get:

Traceback (most recent call last):
  File "/home/moose/Github/py-pdf/pypdf/sample-files/foo.py", line 20, in <module>
    run("table_redacted.pdf")
  File "/home/moose/Github/py-pdf/pypdf/sample-files/foo.py", line 13, in run
    for image in page.images:
  File "/home/moose/Github/py-pdf/pypdf/pypdf/_page.py", line 2633, in __iter__
    yield self[i]
          ~~~~^^^
  File "/home/moose/Github/py-pdf/pypdf/pypdf/_page.py", line 2629, in __getitem__
    return self.get_function(lst[index])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/moose/Github/py-pdf/pypdf/pypdf/_page.py", line 532, in _get_image
    return self.inline_images[id]
           ~~~~~~~~~~~~~~~~~~^^^^
KeyError: '~0~'

can you clarify the code you are using ? page.inline_images = dict() is normally not required

MartinThoma · 2023-07-28T14:02:04Z

Yes, I was adding page.inline_images = dict(). That was leading to the error. However, I would not consider this a blocker as this is modifying pypdf behavior in an unexpected way.

I want to have a final look after work, but so far it seems like a great improvement. I'll likely merge it as it is :-)

pypdf/_page.py

## What's new ### New Features (ENH) - Accelerate image list keys generation (#2014) - Use `cryptography` for encryption/decryption as a fallback for PyCryptodome (#2000) - Extract LaTeX characters (#2016) - ASCIIHexDecode.decode now returns bytes instead of str (#1994) ### Bug Fixes (BUG) - Add RunLengthDecode filter (#2012) - Process /Separation ColorSpace (#2007) - Handle single element ColorSpace list (#2026) - Process lookup decoded as TextStringObjects (#2008) ### Robustness (ROB) - Cope with garbage collector during cloning (#1841) ### Maintenance (MAINT) - Cleanup of annotations (#1745) [Full Changelog](3.13.0...3.14.0)

ENH : accelerate image list keys generation

30dacb9

closes py-pdf#1987

mypy

88c8bb2

MartinThoma changed the title ~~ENH : accelerate image list keys generation~~ ENH: Accelerate image list keys generation Jul 28, 2023

MartinThoma reviewed Jul 28, 2023

View reviewed changes

pypdf/_page.py Outdated Show resolved Hide resolved

Update pypdf/_page.py

c756267

MartinThoma merged commit 94f23f9 into py-pdf:main Jul 28, 2023

pubpub-zz deleted the iss1987 branch September 2, 2023 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Accelerate image list keys generation#2014

ENH: Accelerate image list keys generation#2014
MartinThoma merged 3 commits intopy-pdf:mainfrom
pubpub-zz:iss1987

pubpub-zz commented Jul 25, 2023

Uh oh!

pubpub-zz commented Jul 25, 2023

Uh oh!

codecov bot commented Jul 26, 2023 •

edited

Loading

Uh oh!

pubpub-zz commented Jul 26, 2023

Uh oh!

MartinThoma commented Jul 28, 2023

Uh oh!

stefan6419846 commented Jul 28, 2023

Uh oh!

MartinThoma commented Jul 28, 2023

Uh oh!

MartinThoma commented Jul 28, 2023

Uh oh!

pubpub-zz commented Jul 28, 2023

Uh oh!

MartinThoma commented Jul 28, 2023

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pubpub-zz commented Jul 25, 2023

Uh oh!

pubpub-zz commented Jul 25, 2023

Uh oh!

codecov bot commented Jul 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pubpub-zz commented Jul 26, 2023

Uh oh!

MartinThoma commented Jul 28, 2023

Uh oh!

stefan6419846 commented Jul 28, 2023

Uh oh!

MartinThoma commented Jul 28, 2023

Uh oh!

MartinThoma commented Jul 28, 2023

Uh oh!

pubpub-zz commented Jul 28, 2023

Uh oh!

MartinThoma commented Jul 28, 2023

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Jul 26, 2023 •

edited

Loading