Skip to content

Some content being lost on appending and scaling pages #3680

@tigger0jk

Description

@tigger0jk

I have some existing code that appends pages from a reader into a writer, and scales the new pages.

Upon outputting the new PDF, certain PDFs are losing some of the content. Potentially it's the form fields that are retained and everything else is lost, but not confirmed. Only some pages are affected.

If this line is removed, there is no problem:

        new_page.scale_by(1)

Environment

Which environment were you using when you encountered the problem?

$ uv run python -m platform
macOS-14.6.1-arm64-arm-64bit
$ uv run python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==6.7.2, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none

Code + PDF

This is a minimal, complete example that shows the issue:

import os
from pypdf import PdfReader, PdfWriter

input_path = "./sample.pdf"
output_path = "./output.pdf"

def write_to_form(input_name: str, output_name: str):
    reader = PdfReader(input_name, strict=False)
    writer = PdfWriter()

    for i, page in enumerate(reader.pages):
        print(f"DEBUG - Adding page: {i}")
        writer.append(fileobj=reader, pages=[i], import_outline=False)
        new_page = writer.pages[i]
        #  print(f"DEBUG - New page: {new_page}")

        # in the real flow, the point of this is to fix some PDFs before another scaling operation happens that would break them (I think because they don't have a mediabox?)
        # if you remove this line it works fine
        new_page.scale_by(1)
        #  print(f"DEBUG - Scaled page: {new_page}")

    os.makedirs(os.path.dirname(output_name), exist_ok=True)
    with open(output_name, "wb") as output_stream:
        writer.write(output_stream)

    print(f"Wrote to {output_path}")

write_to_form(input_path, output_path)

I will direct message a sample PDF to @stefan6419846

There are no errors or warnings in the terminal, it's just the output is not correct. I have been using primarily Adobe Acrobat to view the documents.

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions