Skip to content

PDF from Google Sheet doesn't merge with PdfMerger when import_bookmarks is True #1034

@Hatell

Description

@Hatell

A PDF from Google Sheet doesn't merge with PdfMerger when import_bookmarks is True. If that is False it works.

It seems that stream is not in a correct state for reading a header from a PDF.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.17.12-200.fc35.x86_64-x86_64-with-glibc2.34

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.4.0

Code + PDF

This is a minimal, complete example that shows the issue:

#!/usr/bin/env python
# vi: et sw=4 fileencoding=utf-8

from PyPDF2 import PdfReader, PdfMerger

import sys

out_pdf = PdfMerger()

print("This is OK")
out_pdf.append(PdfReader(sys.argv[1]), import_bookmarks=False)

print("This crashes")
out_pdf.append(PdfReader(sys.argv[1]), import_bookmarks=True)

out_file = open(sys.argv[2], 'wb')

out_pdf.write(out_file)

Sample PDF file:

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "/home/hate/git/PyPDF2/sample-files/003-pdflatex-image/bug_report.py", line 18, in <module>
    out_pdf.append(PdfReader(sys.argv[1]), import_bookmarks=True)
  File "/home/hate/git/PyPDF2/PyPDF2/_merger.py", line 252, in append
    self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
  File "/home/hate/git/PyPDF2/PyPDF2/_merger.py", line 152, in merge
    outline = reader.outlines
  File "/home/hate/git/PyPDF2/PyPDF2/_reader.py", line 665, in outlines
    return self._get_outlines()
  File "/home/hate/git/PyPDF2/PyPDF2/_reader.py", line 677, in _get_outlines
    lines = cast(DictionaryObject, catalog[CO.OUTLINES])
  File "/home/hate/git/PyPDF2/PyPDF2/generic.py", line 666, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "/home/hate/git/PyPDF2/PyPDF2/generic.py", line 237, in get_object
    obj = self.pdf.get_object(self)
  File "/home/hate/git/PyPDF2/PyPDF2/_reader.py", line 1051, in get_object
    idnum, generation = self.read_object_header(self.stream)
  File "/home/hate/git/PyPDF2/PyPDF2/_reader.py", line 1133, in read_object_header
    return int(idnum), int(generation)
ValueError: invalid literal for int() with base 10: b'F-1.4'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsPdfMergerThe PdfMerger component is affectedis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions