BUG: Missing new line in extract_text with cm operations by pubpub-zz · Pull Request #2142 · py-pdf/pypdf

pubpub-zz · 2023-09-03T07:45:24Z

pubpub-zz · 2023-09-03T08:11:00Z

@MartinThoma
I've met again the random issue with test_watermarking_speed. I've proposed an alternative way which sounds more robust.

thread_time not available in python 3.6

codecov · 2023-09-03T08:30:29Z

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (b899d3d) 94.36% compared to head (c5cc158) 94.36%.
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2142   +/-   ##
=======================================
  Coverage   94.36%   94.36%           
=======================================
  Files          43       43           
  Lines        7572     7574    +2     
  Branches     1488     1488           
=======================================
+ Hits         7145     7147    +2     
  Misses        263      263           
  Partials      164      164

Files Changed	Coverage Δ
pypdf/_page.py	`94.07% <100.00%> (+<0.01%)`	⬆️
pypdf/_text_extraction/__init__.py	`93.33% <100.00%> (+0.05%)`	⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pubpub-zz · 2023-09-03T08:32:05Z

@MartinThoma all yours. a rerun of the extract_text benchmark could be interesting

stefan6419846 · 2023-09-03T08:35:44Z

Re-running the CI should be as easy as going to the corresponding CI run at https://github.com/py-pdf/pypdf/actions and clicking "Re-run all jobs" in the upper right corner, given the correct repository permissions.

pubpub-zz · 2023-09-03T08:44:36Z

This permission seems to be granted only to a limited of people.🫤

MartinThoma · 2023-09-03T14:01:11Z

tests/test_writer.py



 @pytest.mark.enable_socket()
-@pytest.mark.timeout(4)  # this was a lot slower before PR #2086


I think that one is the cleaner option. If we change it, it should be a separate PR (but I'm uncertain if we should change it at all).

You mentioned that you had issues with that test. Was it local or in CI? What exactly was the issue / the error message?

I agree it is not the good place, but wanted to discuss about.

The issue during the checks was just it failed with the timeout. It is not the first time this is why I've proposed this change.
I consider the time is ok but for me it is just because We are measuring too many instructions. That' why I've focused on the specif code of the merge.
If you like it just give me a go and I will generate a dedicated PR dealing with this change only

pubpub-zz · 2023-09-11T17:07:21Z

@MartinThoma
Can you clarify what you expect to merge it ?

MartinThoma · 2023-09-15T19:13:36Z

@pubpub-zz Sorry for the delay.

The test_writer.py is a change I'm uncertain about in two directions: On the one hand, I'm uncertain if we should do it like this. I have never used thread_time and I'm uncertain about that. On the other hand, we might even want to use it in multiple directions.

Could you please remove the changes from test_writer.py? I would like to have that discussion in #2193 and decide there how we want to do it.

MartinThoma · 2023-09-15T19:14:05Z

Besides test_writer.py, I'm very happy with the PR :-)

This reverts commit c53e614.

pubpub-zz · 2023-09-16T13:32:28Z

@MartinThoma
reverted

MartinThoma · 2023-09-17T10:39:17Z

Thank you 🙏

I noticed that when I have a lot to do at work, I have a harder time taking decisions in my private life (including pypdf). Keeping PRs about one issue makes it easier for me 🙏

@pubpub-zz

## What's new ### Bug Fixes (BUG) - Missing new line in extract_text with cm operations (#2142) by @pubpub-zz - _get_fonts not processing properly CIDFonts and annotations (#2194) by @pubpub-zz ### Documentation (DOC) - Sort list of contributors by @MartinThoma ### Developer Experience (DEV) - Give attribution in release notes (#2196) by @MartinThoma ### Maintenance (MAINT) - Update packages (#2195) by @MartinThoma - Rename PdfWriter.create_viewer_preferences to PdfWriter.create_viewer_preferences (#2190) by @marcstober - Mark `cryptography` as default (#2186) by @exiledkingcc ### Testing (TST) - Issue with merging pdfkit (#2191) by @MartinThoma ### Code Style (STY) - clean-up overriden variable (#2189) by @pubpub-zz [Full Changelog](3.16.0...3.16.1)

pubpub-zz added 3 commits September 3, 2023 09:29

BUG : Missing new line in extract_text with cm operations

1f1ffbf

closes py-pdf#2138

add Test

0bc15c7

BUG : robustify test_watermarking_speed

1831a51

compatbility in time measurement in py3.6

c53e614

thread_time not available in python 3.6

MartinThoma mentioned this pull request Sep 3, 2023

Line returns missing in text_extraction() #2138

Closed

MartinThoma reviewed Sep 3, 2023

View reviewed changes

Merge branch 'main' into iss2138

9db0bd5

pubpub-zz added 2 commits September 16, 2023 13:56

Revert "compatbility in time measurement in py3.6"

7dec536

This reverts commit c53e614.

Merge remote-tracking branch 'py-pdf/main' into iss2138

f11e4df

pubpub-zz force-pushed the iss2138 branch from 125b4ae to f11e4df Compare September 16, 2023 11:59

revert test_writer.py

c5cc158

MartinThoma changed the title ~~BUG : Missing new line in extract_text with cm operations~~ BUG: Missing new line in extract_text with cm operations Sep 17, 2023

MartinThoma merged commit 5b45785 into py-pdf:main Sep 17, 2023

pubpub-zz deleted the iss2138 branch August 9, 2024 08:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Missing new line in extract_text with cm operations#2142

BUG: Missing new line in extract_text with cm operations#2142
MartinThoma merged 8 commits intopy-pdf:mainfrom
pubpub-zz:iss2138

pubpub-zz commented Sep 3, 2023

Uh oh!

pubpub-zz commented Sep 3, 2023

Uh oh!

codecov bot commented Sep 3, 2023 •

edited

Loading

Uh oh!

pubpub-zz commented Sep 3, 2023

Uh oh!

stefan6419846 commented Sep 3, 2023

Uh oh!

pubpub-zz commented Sep 3, 2023

Uh oh!

MartinThoma Sep 3, 2023

Uh oh!

pubpub-zz Sep 3, 2023

Uh oh!

pubpub-zz commented Sep 11, 2023

Uh oh!

MartinThoma commented Sep 15, 2023

Uh oh!

MartinThoma commented Sep 15, 2023

Uh oh!

pubpub-zz commented Sep 16, 2023

Uh oh!

MartinThoma commented Sep 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants



		@pytest.mark.enable_socket()
		@pytest.mark.timeout(4) # this was a lot slower before PR #2086

Conversation

pubpub-zz commented Sep 3, 2023

Uh oh!

pubpub-zz commented Sep 3, 2023

Uh oh!

codecov bot commented Sep 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pubpub-zz commented Sep 3, 2023

Uh oh!

stefan6419846 commented Sep 3, 2023

Uh oh!

pubpub-zz commented Sep 3, 2023

Uh oh!

MartinThoma Sep 3, 2023

Choose a reason for hiding this comment

Uh oh!

pubpub-zz Sep 3, 2023

Choose a reason for hiding this comment

Uh oh!

pubpub-zz commented Sep 11, 2023

Uh oh!

MartinThoma commented Sep 15, 2023

Uh oh!

MartinThoma commented Sep 15, 2023

Uh oh!

pubpub-zz commented Sep 16, 2023

Uh oh!

MartinThoma commented Sep 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Sep 3, 2023 •

edited

Loading