Skip to content

Provide public interface for skipping inline page images #1987

@stefan6419846

Description

@stefan6419846

Explanation

I want to extract all images from a page, but omit inline images as they are not really useful in my case and just generate overhead (2 ms without and 29 s with inline images for one page with a dotted table which has 24643 inline images, but no "real" images).

Code Example

For now, I am basically exploiting

if self.inline_images is None:
which does not seem to be a clean solution:

from pypdf import PdfReader

reader = PdfReader(path)
for page in reader.pages:
    page.inline_images = dict()  # Avoid loading inline images.
    for image in page.images:
        print(image)

Metadata

Metadata

Assignees

Labels

needs-pdfThe issue needs a PDF file to show the problem

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions