Skip to content

BUG: Extract text in layout mode without finding resources#2555

Merged
stefan6419846 merged 1 commit intopy-pdf:mainfrom
pubpub-zz:iss2533
Mar 29, 2024
Merged

BUG: Extract text in layout mode without finding resources#2555
stefan6419846 merged 1 commit intopy-pdf:mainfrom
pubpub-zz:iss2533

Conversation

@pubpub-zz
Copy link
Copy Markdown
Collaborator

closes #2533

@pubpub-zz pubpub-zz changed the title FIX: extract text in layout without finding resources BUG: extract text in layout without finding resources Mar 29, 2024
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.71%. Comparing base (253cde4) to head (76f2fb5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2555      +/-   ##
==========================================
+ Coverage   94.67%   94.71%   +0.04%     
==========================================
  Files          50       50              
  Lines        8231     8237       +6     
  Branches     1646     1646              
==========================================
+ Hits         7793     7802       +9     
+ Misses        268      267       -1     
+ Partials      170      168       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stefan6419846 stefan6419846 changed the title BUG: extract text in layout without finding resources BUG: Extract text in layout mode without finding resources Mar 29, 2024
@stefan6419846 stefan6419846 merged commit e35df5a into py-pdf:main Mar 29, 2024
stefan6419846 added a commit that referenced this pull request Apr 7, 2024
REL: 4.2.0

## What's new

### New Features (ENH)
- Allow multiple charsets for NameObject.read_from_stream (#2585) by @pubpub-zz
- Add support for /Kids in page labels (#2562) by @stefan6419846
- Allow to update fields on many pages (#2571) by @pubpub-zz
- Tolerate PDF with invalid xref pointed objects (#2335) by @pubpub-zz
- Add Enforce from PDF2.0 in viewer_preferences (#2511) by @pubpub-zz
- Add += and -= operators to ArrayObject (#2510) by @pubpub-zz

### Bug Fixes (BUG)
- Fix merge_page sometimes generating unknown operator 'QQ' (#2588) by @rfotino
- Fix fields update where annotations are kids of field (#2570) by @pubpub-zz
- Process CMYK images without a filter correctly (#2557) by @pubpub-zz
- Extract text in layout mode without finding resources (#2555) by @pubpub-zz
- Prevent recursive loop in some PDF files (#2505) by @pubpub-zz

### Robustness (ROB)
- Tolerate "truncated" xref (#2580) by @pubpub-zz
- Replace error by warning for EOD in RunLengthDecode/ASCIIHexDecode (#2334) by @pubpub-zz
- Rebuild xref table if one entry is invalid (#2528) by @pubpub-zz
- Robustify stream extraction (#2526) by @pubpub-zz

### Documentation (DOC)
- Update release process for latest changes (#2564) by @stefan6419846
- Encryption/decryption: Clone document instead of copying all pages (#2546) by @redfast00
- Minor improvements (#2542) by @j-t-1
- Update annotation list (#2534) by @j-t-1
- Update references and formatting (#2529) by @j-t-1
- Correct threads reference, plus minor changes (#2521) by @j-t-1
- Minor readability increases (#2515) by @j-t-1
- Simplify PaperSize examples (#2504) by @j-t-1
- Minor improvements (#2501) by @j-t-1

### Developer Experience (DEV)
- Remove unused dependencies (#2572) by @stefan6419846
- Remove page labels PR link from message (#2561) by @stefan6419846
- Fix changelog generator regarding whitespace and handling of "Other" group (#2492) by @stefan6419846
- Add REL to known PR prefixes (#2554) by @stefan6419846
- Release using the REL commit instead of git tag (#2500) by @MartinThoma
- Unify code between PdfReader and PdfWriter (#2497) by @pubpub-zz
- Bump softprops/action-gh-release from 1 to 2 (#2514) by @dependabot[bot]

### Maintenance (MAINT)
- Ressources → Resources (and internal name childs) (#2550) by @pubpub-zz
- Fix typos found by codespell (#2549) by @stefan6419846
- Update Read the Docs configuration (#2538) by @j-t-1
- Add root_object, _info and _ID to PdfReader (#2495) by @pubpub-zz

### Testing (TST)
- Allow loading truncated images if required (#2586) by @stefan6419846
- Fix download issues from #2562 (#2578) by @pubpub-zz
- Improve test_get_contents_from_nullobject to show real use-case (#2524) by @stefan6419846
- Add missing test annotations (#2507) by @stefan6419846

[Full Changelog](4.1.0...4.2.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Raise KeyError: /Parent when attempting to extract the text from an empty page using layout mode

2 participants