Skip to content

Cannot decrypt PDF missing 'ID' in trailer #608

@richardmillson

Description

@richardmillson

Bug report

Some PDFs (e.g. encrypted_doc_no_id.pdf) are encrypted but do not contain an 'ID' value in their trailer, causing decryption to fail. This also affects pdfminer.six where I've opend this issue.

Steps to reproduce

from PyPDF2 import PdfFileReader

with open('encrypted_doc_no_id.pdf', 'rb') as fp:
    reader = PdfFileReader(fp)
    reader.decrypt('')

raises a KeyError: '/ID'.

Solution

As Apache PDFBox does, if no 'ID' is specified in the trailer then supply an array with two empty byte strings in its place.

from PyPDF2 import PdfFileReader
from PyPDF2.generic import ArrayObject, ByteStringObject, NameObject

with open('encrypted_doc_no_id.pdf', 'rb') as fp:
    reader = PdfFileReader(fp)
    print(reader.trailer)
    reader.trailer[NameObject('/ID')] = ArrayObject([ByteStringObject(b''), ByteStringObject(b'')])
    print(reader.trailer)
    reader.decrypt('')
    print(reader.getDocumentInfo())
    page = reader.getPage(1)
    print(page.extractText())

produces

{'/Size': 16, '/Root': IndirectObject(9, 0), '/Info': IndirectObject(8, 0), '/Encrypt': IndirectObject(10, 0)}
{'/Size': 16, '/Root': IndirectObject(9, 0), '/Info': IndirectObject(8, 0), '/Encrypt': IndirectObject(10, 0), '/ID': [b'', b'']}
{'/Producer': 'European Patent Office'}

and succesfully decrypts the PDF.

Next steps

If this project is still actively maintained I can open a PR. Otherwise I leave this issue here for other users that may encounter the same KeyError: '/ID' and wonder how to fix it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    is-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFis-robustness-issueFrom a users perspective, this is about robustnessworkflow-encryptionFrom a users perspective, encryption is the affected feature/workflow

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions