-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
PdfMergerThe PdfMerger component is affectedThe PdfMerger component is affectedPdfReaderThe PdfReader component is affectedThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness
Description
During merging two or more files I've met with such error: PdfReadError: Could not find object. Merging has its place programmatically, with opening pdf to BytesIO.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
# Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-glibc2.29
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
# 2.0.0Code
This is a minimal, complete example that shows the issue:
def get_request(url: str) -> Tuple[bytes, bool]:
resp = requests.get(url, allow_redirects=True)
if not resp.ok:
logger.error(f"Could not download file from {url}, server responded {resp.status_code}: {resp.reason}")
return b"", False
return resp.content, True
def merge_to_pdf(files: List, created_by: User, merged_name: str) -> Optional[File]:
buffer = BytesIO()
merger = PdfFileMerger()
merged_filename = f"{merged_name}.pdf"
if settings.BUILD_ENV == "local":
for file in files:
try:
merger.append(file) # here the error raises
except PyPdfError:
return None
else:
for file in files:
pdf_content, ok = get_request(file)
if not ok:
return None
pdf_buffer = BytesIO(pdf_content)
try:
merger.append(PdfFileReader(pdf_buffer)) # or here in non-local env
except PyPdfError:
return None
merger.write(buffer)
merger.close()
buffer.seek(0)
merged_file = File.objects.create(
file=DjangoFile(buffer, merged_filename),
name=merged_filename,
created_by=created_by,
)
return merged_fileExhibit_A-2_930_Enterprise_Zone_Tax_Credits_final.pdf
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
Traceback
Traceback (most recent call last):
File "/home/dy/Et/Cor/backend_django/try.py", line 25, in <module>
test_issue793()
File "/home/dy/Et/Cor/backend_django/try.py", line 14, in test_issue793
merger.append(pdf2_path)
File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/_merger.py", line 254, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/_merger.py", line 152, in merge
outline = self._trim_outline(reader, outline, pages)
File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/_merger.py", line 440, in _trim_outline
if pdf.pages[j].get_object() == o["/Page"].get_object():
File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/generic.py", line 607, in __getitem__
return dict.__getitem__(self, key).get_object()
File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/generic.py", line 208, in get_object
return self.pdf.get_object(self).get_object()
File "/home/dy/.local/lib/python3.10/site-packages/PyPDF2/_reader.py", line 1063, in get_object
raise PdfReadError("Could not find object.")
PyPDF2.errors.PdfReadError: Could not find object.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
PdfMergerThe PdfMerger component is affectedThe PdfMerger component is affectedPdfReaderThe PdfReader component is affectedThe PdfReader component is affectedis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness