Skip to content

ENH: Compression #1910

@MartinThoma

Description

@MartinThoma

Explanation

PDF file size matters, especially if you need to send documents via e-mail or for people with bad internet. Smaller file sizes make handling the document easier.

Evidence that people care about this topic:

What this is NOT about:

  • Image size reductions done by the user: We describe that in https://pypdf.readthedocs.io/en/latest/user/file-size.html#removing-images .In the context of this enhancement proposal, we can also look at image formats / possibilities for size reduction, but it should require little user knowledge. For example, I could imagine that we automatically apply better compression algorithms.

Goals

  • Simpler file size reduction for users
  • Better reduction

Code Example

Similar to what we have already:

from pypdf import PdfReader, PdfWriter

reader = PdfReader("example.pdf")
writer = PdfWriter()

for page in reader.pages:
    writer.add_page(page)

# That is the new thing:
# - some flag to indicate if lossy / lossless compression is desired
# - maybe a flag that steers CPU usage? Sometimes compression is rather time-intensive
writer.compress(image_quality=85)

with open("out.pdf", "wb") as f:
    writer.write(f)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions