Skip to content

ENH : auto detect RTL for text extraction#1309

Merged
MartinThoma merged 2 commits intopy-pdf:mainfrom
pubpub-zz:arabicRTL2
Aug 31, 2022
Merged

ENH : auto detect RTL for text extraction#1309
MartinThoma merged 2 commits intopy-pdf:mainfrom
pubpub-zz:arabicRTL2

Conversation

@pubpub-zz
Copy link
Copy Markdown
Collaborator

will fix #1296
includes some customization capabilities to extend RTL
replaces #1305

will fix py-pdf#1296
includes some customization capabilities to extend RTL
@codecov
Copy link
Copy Markdown

codecov bot commented Aug 31, 2022

Codecov Report

Merging #1309 (8540e4c) into main (c696192) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main    #1309      +/-   ##
==========================================
- Coverage   95.02%   95.02%   -0.01%     
==========================================
  Files          30       30              
  Lines        4988     5024      +36     
  Branches     1026     1037      +11     
==========================================
+ Hits         4740     4774      +34     
  Misses        141      141              
- Partials      107      109       +2     
Impacted Files Coverage Δ
PyPDF2/_page.py 94.36% <100.00%> (+<0.01%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@pubpub-zz
Copy link
Copy Markdown
Collaborator Author

@MartinThoma
Don't think we will get further : it's ready

Comment thread PyPDF2/_page.py
@MartinThoma MartinThoma merged commit 7a95708 into py-pdf:main Aug 31, 2022
@MartinThoma
Copy link
Copy Markdown
Member

Thank you for all the great work you put into this 🙏

@pubpub-zz pubpub-zz deleted the arabicRTL2 branch August 31, 2022 21:47
MartinThoma added a commit that referenced this pull request Sep 4, 2022
Version 2.10.5, 2022-09-04
--------------------------

New Features (ENH):
-  Process XRefStm (#1297)
-  Auto-detect RTL for text extraction (#1309)

Bug Fixes (BUG):
-  Avoid scaling cropbox twice (#1314)

Robustness (ROB):
-  Fix offset correction in revised PDF (#1318)
-  Crop data of /U and /O in encryption dictionary to 48 bytes (#1317)
-  MultiLine bfrange in cmap (#1299)
-  Cope with 2 digit codes in bfchar (#1310)
-  Accept '/annn' charset as ASCII code (#1316)
-  Log errors during Float / NumberObject initialization (#1315)
-  Cope with corrupted entries in xref table (#1300)

Documentation (DOC):
-  Migration guide (PyPDF2 1.x \xe2\x9e\x94 2.x) (#1324)
-  Creating a coverage report (#1319)
-  Fix AnnotationBuilder.free_text example (#1311)
-  Fix usage of page.scale by replacing it with page.scale_by (#1313)

Developer Experience (DEV):
-  Only run coverage for PyPDF2

Maintenance (MAINT):
-  PdfReaderProtocol (#1303)
-  Throw PdfReadError if Trailer can't be read (#1298)
-  Remove catching OverflowException (#1302)

Full Changelog: 2.10.4...2.10.5
@MasterOdin MasterOdin mentioned this pull request Nov 10, 2022
pubpub-zz added a commit to pubpub-zz/pypdf that referenced this pull request Nov 12, 2022
includes also reintroduction of py-pdf#1303 wrongly cancelled in py-pdf#1309
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Arabic text is extracted in the wrong order

2 participants