Skip to content

PdfReadError: Could not find object #997

@DmitriyReztsov

Description

@DmitriyReztsov

During merging two or more files I've met with such error: PdfReadError: Could not find object. Merging has its place programmatically, with opening pdf to BytesIO.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
# Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-glibc2.29

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
# 2.0.0

Code

This is a minimal, complete example that shows the issue:

def get_request(url: str) -> Tuple[bytes, bool]:
    resp = requests.get(url, allow_redirects=True)
    if not resp.ok:
        logger.error(f"Could not download file from {url}, server responded {resp.status_code}: {resp.reason}")
        return b"", False
    return resp.content, True

def merge_to_pdf(files: List, created_by: User, merged_name: str) -> Optional[File]:
    buffer = BytesIO()
    merger = PdfFileMerger()
    merged_filename = f"{merged_name}.pdf"

    if settings.BUILD_ENV == "local":
        for file in files:
            try:
                merger.append(file)  # here the error raises
            except PyPdfError:
                return None
    else:
        for file in files:
            pdf_content, ok = get_request(file)
            if not ok:
                return None
            pdf_buffer = BytesIO(pdf_content)
            try:
                merger.append(PdfFileReader(pdf_buffer))  # or here in non-local env
            except PyPdfError:
                return None

    merger.write(buffer)
    merger.close()
    buffer.seek(0)
    merged_file = File.objects.create(
        file=DjangoFile(buffer, merged_filename),
        name=merged_filename,
        created_by=created_by,
    )
    return merged_file

PDF

Exhibit_A-2_930_Enterprise_Zone_Tax_Credits_final.pdf

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

Traceback

Traceback (most recent call last):
  File "/home/dy/Et/Cor/backend_django/try.py", line 25, in <module>
    test_issue793()
  File "/home/dy/Et/Cor/backend_django/try.py", line 14, in test_issue793
    merger.append(pdf2_path)
  File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/_merger.py", line 254, in append
    self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
  File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/_merger.py", line 152, in merge
    outline = self._trim_outline(reader, outline, pages)
  File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/_merger.py", line 440, in _trim_outline
    if pdf.pages[j].get_object() == o["/Page"].get_object():
  File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/generic.py", line 607, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/generic.py", line 208, in get_object
    return self.pdf.get_object(self).get_object()
  File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/_reader.py", line 1063, in get_object
    raise PdfReadError("Could not find object.")
PyPDF2.errors.PdfReadError: Could not find object.

Metadata

Metadata

Assignees

Labels

PdfMergerThe PdfMerger component is affectedPdfReaderThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustness

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions