Skip to content

AttributeError: 'NoneType' object has no attribute 'get_object' #1295

@DL6ER

Description

@DL6ER

See #1269 for further details, this reports another issue I've come accross.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
Linux-5.4.0-122-generic-x86_64-with-glibc2.29

$ python -c "import PyPDF2;print(PyPDF2.__version__)"
2.10.3

Code + PDF

This is a minimal, complete example that shows the issue:

import PyPDF2
with open("Segmentation & Activation Lab.pdf", "rb") as f:
  pdfreader = PyPDF2.PdfFileReader(f, strict=False)
  full_content = " ".join([page.extractText() for page in pdfreader.pages])

PDF used above: Segmentation & Activation Lab.pdf

Traceback

This is the complete Traceback I see:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_page.py", line 1538, in extractText
    return self.extract_text()
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_page.py", line 1510, in extract_text
    return self._extract_text(
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_page.py", line 1146, in _extract_text
    cmaps[f] = build_char_map(f, space_width, obj)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_cmap.py", line 21, in build_char_map
    encoding, space_code = parse_encoding(ft, space_code)
  File "/usr/local/lib/python3.8/dist-packages/PyPDF2/_cmap.py", line 124, in parse_encoding
    enc: Union(str, DictionaryObject) = ft["/Encoding"].get_object()  # type: ignore
AttributeError: 'NoneType' object has no attribute 'get_object'

The PDF can be read using a normal PDF viewer and the PDF even comes from Adobe.

Another example:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions