Skip to content

BUG: PDF size increases because of too high float writing precision#2213

Merged
MartinThoma merged 1 commit intopy-pdf:mainfrom
pubpub-zz:adj_float_len
Sep 24, 2023
Merged

BUG: PDF size increases because of too high float writing precision#2213
MartinThoma merged 1 commit intopy-pdf:mainfrom
pubpub-zz:adj_float_len

Conversation

@pubpub-zz
Copy link
Copy Markdown
Collaborator

closes #1910
address regression from #2203

@pubpub-zz pubpub-zz marked this pull request as ready for review September 24, 2023 16:21
@codecov
Copy link
Copy Markdown

codecov bot commented Sep 24, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (91b6dcd) 94.38% compared to head (d076f76) 94.38%.
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2213   +/-   ##
=======================================
  Coverage   94.38%   94.38%           
=======================================
  Files          43       43           
  Lines        7588     7589    +1     
  Branches     1497     1497           
=======================================
+ Hits         7162     7163    +1     
  Misses        262      262           
  Partials      164      164           
Files Changed Coverage Δ
pypdf/generic/_base.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MartinThoma
Copy link
Copy Markdown
Member

I just made a little test:

import pikepdf
from os import stat
import pypdf.generic._base

INPUT_PDF = "pypdf/sample-files/009-pdflatex-geotopo/GeoTopo.pdf"
OUTPUT_PDF = "output_example.pdf"

pypdf.generic._base.FLOAT_WRITE_PRECISION = 1


def test_filesize(INPUT_PDF, OUTPUT_PDF):
    reader = pypdf.PdfReader(INPUT_PDF)

    writer = pypdf.PdfWriter(clone_from=reader)

    for page in writer.pages:
        page.compress_content_streams(level=9)

    with open(OUTPUT_PDF, 'wb') as f:
        writer.write(f)

    orig_size = stat(INPUT_PDF).st_size / (1024)
    pypdf_size = stat(OUTPUT_PDF).st_size / (1024)

    pikepdf.settings.set_flate_compression_level(9)

    with pikepdf.Pdf.open(INPUT_PDF,
                          allow_overwriting_input=True,
                          suppress_warnings=True) as pdf:

        pdf.save(OUTPUT_PDF,
                 object_stream_mode=pikepdf.
                 ObjectStreamMode.generate,
                 compress_streams=True,
                 stream_decode_level=pikepdf.
                 StreamDecodeLevel.specialized)

    pikepdf_size = stat(OUTPUT_PDF).st_size / (1024)

    print(f'{"input file:":<15}{INPUT_PDF}')
    print(f'{"original size:":<15}{orig_size:.4f} KB')
    print(f'{"pypdf size:":<15}{pypdf_size:.4f} KB')
    print(f'{"pikepdf size:":<15}{pikepdf_size:.4f} KB')

test_filesize(INPUT_PDF, OUTPUT_PDF)

The output is:

input file:    pypdf/sample-files/009-pdflatex-geotopo/GeoTopo.pdf
original size: 5196.4180 KB
pypdf size:    5574.2656 KB
pikepdf size:  5185.7314 KB

For comparison:

* pypdf before this PR                       : 5702.3506 KB
* pypdf with a presision of 8 (as in this PR): 5594.0645 KB
* Smallpdf                                   : 1793.479 KB

Most interesting is that I wasn't able to spot a difference.

@pubpub-zz
Copy link
Copy Markdown
Collaborator Author

From my analysis, you should not put a value lower than 5 : If you take some colorspace which are using some FloatObject, you may get some too big rounding that will affect the colors and which may not be visible on most of the pdf viewers.

@MartinThoma MartinThoma changed the title BUG : pdf size increases because of float writing precision BUG: PDF size increases because of too high float writing precision Sep 24, 2023
@MartinThoma MartinThoma merged commit e3f60c1 into py-pdf:main Sep 24, 2023
return IndirectObject.read_from_stream(stream, pdf)


FLOAT_WRITE_PRECISION = 8 # shall be min 5 digits max, allow user adj
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some part of the comment seems to have got lost.

MartinThoma added a commit that referenced this pull request Sep 24, 2023
## What's new

### Bug Fixes (BUG)
-  PDF size increases because of too high float writing precision (#2213) by @pubpub-zz
-  Fix test_watermarking_reportlab_rendering() (#2203) by @LucasCimon

### Documentation (DOC)
-  Fix typos and add a paragraph to ViewerPreferences docs (#2199) by @marcstober
-  How to install pypi from any branch (#2209) by @pubpub-zz
-  Update copyright footer in docs (#2207) by @marcstober

### Developer Experience (DEV)
-  Let dependabot update Github Actions by @MartinThoma

### Maintenance (MAINT)
-  Update .pre-commit-config.yaml by @MartinThoma

[Full Changelog](3.16.1...3.16.2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ENH: Compression

3 participants