ENH: Add Jupyter Notebook integration for PdfReader by MartinThoma · Pull Request #2375 · py-pdf/pypdf

MartinThoma · 2023-12-28T08:45:45Z

See

Without this PR

With this PR

See * https://ipython.readthedocs.io/en/stable/config/integrating.html#MyObject._repr_mimebundle_ * https://discourse.jupyter.org/t/what-are-include-exclude-parameter-in-repr-mimebundle-for/23125

MartinThoma · 2023-12-28T08:48:47Z

I could not find any documentation regarding the include / exclude parameters, but ChatGPT thinks it should be used like this (which sounds reasonable):

def _repr_mimebundle_(self, include=None, exclude=None):
    data = {
        'text/plain': 'This is a plain text representation.',
        'text/html': '<strong>This is an HTML representation.</strong>',
        'application/json': '{"key": "value"}'
    }

    if include is not None:
        # Filter representations based on include list
        data = {k: v for k, v in data.items() if k in include}

    if exclude is not None:
        # Remove representations based on exclude list
        data = {k: v for k, v in data.items() if k not in exclude}

    return data

MartinThoma · 2023-12-28T08:50:12Z

We could add something similar for PdfWriter and PageObject. Maybe even for annotations (creating a reader and a blank page + adding the annotation + rendering it)

codecov · 2023-12-28T08:51:48Z

Codecov Report

Attention: 8 lines in your changes are missing coverage. Please review.

Comparison is base (195d82e) 94.45% compared to head (33a627d) 94.35%.

Files	Patch %	Lines
pypdf/_reader.py	11.11%	8 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2375      +/-   ##
==========================================
- Coverage   94.45%   94.35%   -0.10%     
==========================================
  Files          43       43              
  Lines        7575     7584       +9     
  Branches     1515     1519       +4     
==========================================
+ Hits         7155     7156       +1     
- Misses        257      265       +8     
  Partials      163      163

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

stefan6419846 · 2023-12-28T09:08:02Z

Upstream implementation of includes/excludes: https://github.com/ipython/ipython/blob/d0e254420445c2204a2b39d28948cd6127717fb1/IPython/core/formatters.py#L151-L157

MartinThoma · 2023-12-28T09:53:34Z

Thanks! Then the ChatGPT code is perfect 🎉

@shartzog

## What's new pypdf==4.0.0 is a big milestone forward: * We finally have a layout-mode text extraction. This enables users who want to detect / extract tables with heuristics to give it a try. * We deprecated a lot of the old PyPDF2 API that was either not following PEP8 naming styles or was not using a property. Users comming from PyPDF2 might want to switch first to pypdf<4.0.0 to get helpful error messages that show the new API in their speicific cases. A big 'Thank you!' the the whole pypdf community for your work. Thanks to you, pypdf is better than ever. Kudos to @shartzog who added the layout-mode with his first contribution! ### Deprecations (DEP) - Drop Python 3.6 support (#2369) by @MartinThoma - Remove deprecated code (#2367) by @MartinThoma - Remove deprecated XMP properties (#2386) by @stefan6419846 ### New Features (ENH) - Add "layout" mode for text extraction (#2388) by @shartzog - Add Jupyter Notebook integration for PdfReader (#2375) by @MartinThoma - Improve/rewrite PDF permission retrieval (#2400) by @stefan6419846 ### Bug Fixes (BUG) - PdfWriter.add_uri was setting the wrong type (#2406) by @pmiller66 - Add support for GBK2K cmaps (#2385) by @stefan6419846 ### Documentation (DOC) - Add pmiller66 for #2406 as a contributor by @MartinThoma - Add missing expand parameter (#2393) by @Atomnp - Resolve build warnings (#2380) by @stefan6419846 - Fix testing prerequisites (#2381) by @stefan6419846 - Improve formatting of contributors page (#2383) by @stefan6419846 - Add Tobeabellwether as a contributor for #2341 by @MartinThoma ### Developer Experience (DEV) - Make dependabot aware of our PR prefixes (#2415) by @stefan6419846 - Fail on Sphinx issues (#2405) by @stefan6419846 - Move title check to own workflow (#2384) by @MasterOdin - Write to temporary files instead of the working directory (#2379) by @stefan6419846 - Ensure that the PR titles have the correct format (#2378) by @stefan6419846 ### Maintenance (MAINT) - Complete FileSpecificationDictionaryEntries constants (#2416) by @MartinThoma - Return None instead of -1 when page is not attached (#2376) by @MartinThoma - Replace warning with logging.error (#2377) by @MartinThoma ### Testing (TST) - Add missing pytest.mark.samples annotations (#2412) by @kitterma - Correctly close temporary files (#2396) by @stefan6419846 - Fix side effect #2379 (#2395) by @pubpub-zz - Add test for layout extraction mode (#2390) by @MartinThoma ### Code Style (STY) - Use the UserAccessPermissions enum (#2398) by @MartinThoma - Run black (#2370) by @MartinThoma [Full Changelog](3.17.4...4.0.0)

ENH: Add Jupyter Notebook integration

98e3ee6

See * https://ipython.readthedocs.io/en/stable/config/integrating.html#MyObject._repr_mimebundle_ * https://discourse.jupyter.org/t/what-are-include-exclude-parameter-in-repr-mimebundle-for/23125

MartinThoma added the is-feature A feature request label Dec 28, 2023

include/exclude

33a627d

MartinThoma merged commit a91e9f6 into main Dec 28, 2023

MartinThoma deleted the jupyter-notebook-integration branch December 28, 2023 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add Jupyter Notebook integration for PdfReader#2375

ENH: Add Jupyter Notebook integration for PdfReader#2375
MartinThoma merged 2 commits intomainfrom
jupyter-notebook-integration

MartinThoma commented Dec 28, 2023 •

edited

Loading

Uh oh!

MartinThoma commented Dec 28, 2023

Uh oh!

MartinThoma commented Dec 28, 2023

Uh oh!

codecov bot commented Dec 28, 2023 •

edited

Loading

Uh oh!

stefan6419846 commented Dec 28, 2023 •

edited

Loading

Uh oh!

MartinThoma commented Dec 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MartinThoma commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Without this PR

With this PR

Uh oh!

MartinThoma commented Dec 28, 2023

Uh oh!

MartinThoma commented Dec 28, 2023

Uh oh!

codecov bot commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

stefan6419846 commented Dec 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MartinThoma commented Dec 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MartinThoma commented Dec 28, 2023 •

edited

Loading

codecov bot commented Dec 28, 2023 •

edited

Loading

stefan6419846 commented Dec 28, 2023 •

edited

Loading