Skip to content

PI: Iterative _sweepIndirectReferences#351

Closed
jvalenzuela wants to merge 15 commits intopy-pdf:mainfrom
jvalenzuela:ref-sweep-recursion
Closed

PI: Iterative _sweepIndirectReferences#351
jvalenzuela wants to merge 15 commits intopy-pdf:mainfrom
jvalenzuela:ref-sweep-recursion

Conversation

@jvalenzuela
Copy link
Copy Markdown

Changes the way indirect references are swept during the write process, handling nested data structures iteratively instead of recursively. Addresses problems with hitting Python's recursion limit when processing files with large data structures, e.g. thousands of pages with a bookmark on each.

@MartinThoma MartinThoma added the is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF label Apr 6, 2022
@MartinThoma
Copy link
Copy Markdown
Member

I like that you've added a test! I'll need some time to come back to this one, though

@MartinThoma MartinThoma added nf-performance Non-functional change: Performance Large labels Apr 16, 2022
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 16, 2022

Codecov Report

Merging #351 (c8062fd) into main (663ca98) will increase coverage by 0.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #351      +/-   ##
==========================================
+ Coverage   74.96%   75.08%   +0.11%     
==========================================
  Files          12       12              
  Lines        3555     3564       +9     
  Branches      820      819       -1     
==========================================
+ Hits         2665     2676      +11     
+ Misses        674      673       -1     
+ Partials      216      215       -1     
Impacted Files Coverage Δ
PyPDF2/pdf.py 82.16% <100.00%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 663ca98...c8062fd. Read the comment docs.

@MartinThoma MartinThoma added the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Apr 16, 2022
@MartinThoma MartinThoma removed the soon PRs that are almost ready to be merged, issues that get solved pretty soon label May 1, 2022
@MartinThoma MartinThoma changed the title Indirect reference sweep recursion PI: Indirect reference sweep recursion Jun 25, 2022
@MartinThoma MartinThoma added needs-rebase This PR cannot be merged as the main branch is too different. You need to rebase or merge main. PdfWriter The PdfWriter component is affected labels Jun 25, 2022
@MartinThoma MartinThoma changed the title PI: Indirect reference sweep recursion PI: Iterative _sweepIndirectReferences Jul 5, 2022
@MartinThoma
Copy link
Copy Markdown
Member

MartinThoma commented Jul 10, 2022

@jvalenzuela Thank you very much for this contribution ❤️

I'm sorry I didn't manage to merge your PR earlier. It's been a long time and PyPDF2 made a lot of progress since April. As #1072 has the similar changes and no conflicts with main, I close this PR now.

MartinThoma pushed a commit that referenced this pull request Jul 10, 2022
* Recursive Depth-first search (DFS) was changed to iterative DFS
* Removed PdfWriter.external_reference_map and calculate hash from every referred object and use that to detect duplicate objects.
* In several cases, the warning "Unable to resolve .*, returning NullObject instead" is no longer necessary.
* Bugfix: Recalculate all parents hashes when a dictionary or array object value changes

Closes #351
Closes #1036
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF Large needs-rebase This PR cannot be merged as the main branch is too different. You need to rebase or merge main. nf-performance Non-functional change: Performance PdfWriter The PdfWriter component is affected

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants