-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness
Description
When trying to extract the text from a PDF, I get an exception.
Environment
$ python -m platform
Linux-5.4.0-113-generic-x86_64-with-glibc2.31
$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.2.0MCVE: PDF + Code
Using this PDF: https://corpora.tika.apache.org/base/docs/govdocs1/989/989691.pdf
from PyPDF2 import PdfReader
reader = PdfReader("pdf/989691.pdf") # PdfReadWarning: incorrect startxref pointer(1)
reader.pages[0].extract_text()I get this traceback:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1462, in __getitem__
len_self = len(self)
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_page.py", line 1453, in __len__
return self.length_function()
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 362, in _get_num_pages
self._flatten()
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/_reader.py", line 929, in _flatten
catalog = self.trailer[TK.ROOT].get_object()
File "/home/moose/Github/py-pdf/PyPDF2/PyPDF2/generic.py", line 623, in __getitem__
return dict.__getitem__(self, key).get_object()
KeyError: '/Root'
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustness