-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustnessworkflow-encryptionFrom a users perspective, encryption is the affected feature/workflowFrom a users perspective, encryption is the affected feature/workflow
Description
Bug report
Some PDFs (e.g. encrypted_doc_no_id.pdf) are encrypted but do not contain an 'ID' value in their trailer, causing decryption to fail. This also affects pdfminer.six where I've opend this issue.
Steps to reproduce
from PyPDF2 import PdfFileReader
with open('encrypted_doc_no_id.pdf', 'rb') as fp:
reader = PdfFileReader(fp)
reader.decrypt('')raises a KeyError: '/ID'.
Solution
As Apache PDFBox does, if no 'ID' is specified in the trailer then supply an array with two empty byte strings in its place.
from PyPDF2 import PdfFileReader
from PyPDF2.generic import ArrayObject, ByteStringObject, NameObject
with open('encrypted_doc_no_id.pdf', 'rb') as fp:
reader = PdfFileReader(fp)
print(reader.trailer)
reader.trailer[NameObject('/ID')] = ArrayObject([ByteStringObject(b''), ByteStringObject(b'')])
print(reader.trailer)
reader.decrypt('')
print(reader.getDocumentInfo())
page = reader.getPage(1)
print(page.extractText())produces
{'/Size': 16, '/Root': IndirectObject(9, 0), '/Info': IndirectObject(8, 0), '/Encrypt': IndirectObject(10, 0)}
{'/Size': 16, '/Root': IndirectObject(9, 0), '/Info': IndirectObject(8, 0), '/Encrypt': IndirectObject(10, 0), '/ID': [b'', b'']}
{'/Producer': 'European Patent Office'}
and succesfully decrypts the PDF.
Next steps
If this project is still actively maintained I can open a PR. Otherwise I leave this issue here for other users that may encounter the same KeyError: '/ID' and wonder how to fix it.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustnessworkflow-encryptionFrom a users perspective, encryption is the affected feature/workflowFrom a users perspective, encryption is the affected feature/workflow