IndexError: index out of range when encountering a digital certificate/signature

I'll start with I'm very new to using Python and PyPDF. I'm trying to collect all of the fields within a pdf to collect into a dataframe. Eventually I want to collect thousands of PDFs that all have the same structure (form) as the baseline and place them into the PDF. I was able to get this code to work great on a PDF without a digital certificate/signature. However, when I run the code on a PDF with the digital certificate/signature I get an error.

I don't really need the digital signature/certificate spot of the document so I think the easiest way to do this is to just skip that field of the PDF. However, I don't know how to do that since the PyPDF2 package looks at every field.

I was able to get around the error by doing try/except but then it wouldn't capture the information from the pdf (i.e. result was blank).  

## Environment

Plotly Dash Workspace

```bash
$ python -m platform
# TODO: Linux-3.10.0-1160.49.1.el7.x86_64-x86_64-with-debian-buster-sid

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
# TODO: 2.10.0
```

## Code + PDF

```python
import PyPDF2 as pypdf

directory = 'files'

for filename in os.listdir(directory):
    f = os.path.join(directory, filename)
    if os.path.isfile(f):
        print(f)
        pdf=pypdf.PdfFileReader(f, strict= False)
        print(pdf)
        #information = pdf.getFormTextFields()
        information = pdf.getFields()
        print(information)
        output = pd.DataFrame([information])
        df = pd.concat([df, output], ignore_index=True)
```

I'll have to play around with the PDF to see if I can post it as it have PII information.  

## Traceback

```
Traceback (most recent call last):
  File "/workspace/app.py", line 77, in <module>
    information = pdf.getFields()
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_reader.py", line 526, in getFields
    return self.get_fields(tree, retval, fileobj)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_reader.py", line 510, in get_fields
    self._build_field(field, retval, fileobj, field_attributes)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_reader.py", line 535, in _build_field
    self._check_kids(field, retval, fileobj)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_reader.py", line 555, in _check_kids
    self.get_fields(kid.get_object(), retval, fileobj)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_reader.py", line 499, in get_fields
    self._check_kids(tree, retval, fileobj)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_reader.py", line 555, in _check_kids
    self.get_fields(kid.get_object(), retval, fileobj)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_reader.py", line 503, in get_fields
    self._build_field(tree, retval, fileobj, field_attributes)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_reader.py", line 547, in _build_field
    retval[key] = Field(field)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/generic.py", line 1626, in __init__
    self[NameObject(attr)] = data[attr]
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/generic.py", line 679, in __getitem__
    return dict.__getitem__(self, key).get_object()
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/generic.py", line 251, in get_object
    obj = self.pdf.get_object(self)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_reader.py", line 1167, in get_object
    retval, indirect_reference.idnum, indirect_reference.generation
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_encryption.py", line 741, in decrypt_object
    return cf.decrypt_object(obj)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_encryption.py", line 182, in decrypt_object
    obj[dictkey] = self.decrypt_object(value)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_encryption.py", line 185, in decrypt_object
    obj[i] = self.decrypt_object(obj[i])
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_encryption.py", line 182, in decrypt_object
    obj[dictkey] = self.decrypt_object(value)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_encryption.py", line 176, in decrypt_object
    data = self.strCrypt.decrypt(obj.original_bytes)
  File "/app/.heroku/python/lib/python3.7/site-packages/PyPDF2/_encryption.py", line 88, in decrypt
    return d[: -d[-1]]
IndexError: index out of range
```

TODO
I believe the best solution would be something for if the getFields() or getFormFields() methods encounter a digital signature/certificate then it passes that field.  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IndexError: index out of range when encountering a digital certificate/signature #1245

Environment

Code + PDF

Traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

IndexError: index out of range when encountering a digital certificate/signature #1245

Description

Environment

Code + PDF

Traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions