Skip to content

ENH: Add Jupyter Notebook integration for PdfReader#2375

Merged
MartinThoma merged 2 commits intomainfrom
jupyter-notebook-integration
Dec 28, 2023
Merged

ENH: Add Jupyter Notebook integration for PdfReader#2375
MartinThoma merged 2 commits intomainfrom
jupyter-notebook-integration

Conversation

@MartinThoma
Copy link
Copy Markdown
Member

@MartinThoma MartinThoma commented Dec 28, 2023

@MartinThoma
Copy link
Copy Markdown
Member Author

I could not find any documentation regarding the include / exclude parameters, but ChatGPT thinks it should be used like this (which sounds reasonable):

def _repr_mimebundle_(self, include=None, exclude=None):
    data = {
        'text/plain': 'This is a plain text representation.',
        'text/html': '<strong>This is an HTML representation.</strong>',
        'application/json': '{"key": "value"}'
    }

    if include is not None:
        # Filter representations based on include list
        data = {k: v for k, v in data.items() if k in include}

    if exclude is not None:
        # Remove representations based on exclude list
        data = {k: v for k, v in data.items() if k not in exclude}

    return data

@MartinThoma MartinThoma added the is-feature A feature request label Dec 28, 2023
@MartinThoma
Copy link
Copy Markdown
Member Author

We could add something similar for PdfWriter and PageObject. Maybe even for annotations (creating a reader and a blank page + adding the annotation + rendering it)

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 28, 2023

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (195d82e) 94.45% compared to head (33a627d) 94.35%.

Files Patch % Lines
pypdf/_reader.py 11.11% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2375      +/-   ##
==========================================
- Coverage   94.45%   94.35%   -0.10%     
==========================================
  Files          43       43              
  Lines        7575     7584       +9     
  Branches     1515     1519       +4     
==========================================
+ Hits         7155     7156       +1     
- Misses        257      265       +8     
  Partials      163      163              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@stefan6419846
Copy link
Copy Markdown
Collaborator

stefan6419846 commented Dec 28, 2023

@MartinThoma
Copy link
Copy Markdown
Member Author

Thanks! Then the ChatGPT code is perfect 🎉

@MartinThoma MartinThoma merged commit a91e9f6 into main Dec 28, 2023
@MartinThoma MartinThoma deleted the jupyter-notebook-integration branch December 28, 2023 17:55
MartinThoma added a commit that referenced this pull request Jan 19, 2024
## What's new

pypdf==4.0.0 is a big milestone forward:

* We finally have a layout-mode text extraction.
  This enables users who want to detect / extract tables
  with heuristics to give it a try.
* We deprecated a lot of the old PyPDF2 API that was either
  not following PEP8 naming styles or was not using a
  property. Users comming from PyPDF2 might want to switch
  first to pypdf<4.0.0 to get helpful error messages
  that show the new API in their speicific cases.

A big 'Thank you!' the the whole pypdf community for your
work. Thanks to you, pypdf is better than ever.

Kudos to @shartzog who added the layout-mode with his first
contribution!

### Deprecations (DEP)
-  Drop Python 3.6 support (#2369) by @MartinThoma
-  Remove deprecated code (#2367) by @MartinThoma
-  Remove deprecated XMP properties (#2386) by @stefan6419846

### New Features (ENH)
-  Add "layout" mode for text extraction (#2388) by @shartzog
-  Add Jupyter Notebook integration for PdfReader (#2375) by @MartinThoma
-  Improve/rewrite PDF permission retrieval (#2400) by @stefan6419846

### Bug Fixes (BUG)
-  PdfWriter.add_uri was setting the wrong type (#2406) by @pmiller66
-  Add support for GBK2K cmaps (#2385) by @stefan6419846

### Documentation (DOC)
-  Add pmiller66 for #2406 as a contributor by @MartinThoma
-  Add missing expand parameter (#2393) by @Atomnp
-  Resolve build warnings (#2380) by @stefan6419846
-  Fix testing prerequisites (#2381) by @stefan6419846
-  Improve formatting of contributors page (#2383) by @stefan6419846
-  Add Tobeabellwether as a contributor for #2341 by @MartinThoma

### Developer Experience (DEV)
-  Make dependabot aware of our PR prefixes (#2415) by @stefan6419846
-  Fail on Sphinx issues (#2405) by @stefan6419846
-  Move title check to own workflow (#2384) by @MasterOdin
-  Write to temporary files instead of the working directory (#2379) by @stefan6419846
-  Ensure that the PR titles have the correct format (#2378) by @stefan6419846

### Maintenance (MAINT)
-  Complete FileSpecificationDictionaryEntries constants (#2416) by @MartinThoma
-  Return None instead of -1 when page is not attached (#2376) by @MartinThoma
-  Replace warning with logging.error (#2377) by @MartinThoma

### Testing (TST)
-  Add missing pytest.mark.samples annotations (#2412) by @kitterma
-  Correctly close temporary files (#2396) by @stefan6419846
-  Fix  side effect #2379 (#2395) by @pubpub-zz
-  Add test for layout extraction mode (#2390) by @MartinThoma

### Code Style (STY)
-  Use the UserAccessPermissions enum (#2398) by @MartinThoma
-  Run black (#2370) by @MartinThoma

[Full Changelog](3.17.4...4.0.0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

is-feature A feature request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants