Skip to content

AttributeError: 'int' object has no attribute 'isspace' #1983

@michelcrypt4d4mus

Description

@michelcrypt4d4mus

Tried to extract text from attached PDF.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
macOS-13.4.1-arm64-arm-64bit

$ python -c "import pypdf;print(pypdf.__version__)"
3.12.1

Code + PDF

The code is here

PDF is attached. It's public and can be used for tests etc.
New Jersey Coinbase staking securities charges 2023-0606_Coinbase-Penalty-and-C-D.pdf

Traceback

Traceback (most recent call last):
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/bin/sort_screenshots", line 6, in <module>
    sys.exit(sort_screenshots())
             ^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/workspace/clown_sort/clown_sort/__init__.py", line 43, in sort_screenshots
    file_to_sort.sort_file()
  File "/Users/uzor/workspace/clown_sort/clown_sort/files/sortable_file.py", line 60, in sort_file
    search_text = self.basename_without_ext + ' ' + (self.extracted_text() or '')
                                                     ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/workspace/clown_sort/clown_sort/files/pdf_file.py", line 50, in extracted_text
    for image_number, image in enumerate(page.images, start=1):
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 2603, in __iter__
    for i in range(len(self)):
                   ^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 2565, in __len__
    return len(self.ids_function())
               ^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 479, in _get_ids_image
    self.inline_images = self._get_inline_images()
                         ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 662, in _get_inline_images
    extension, byte_stream, img = _xobj_to_image(ii["object"])
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/filters.py", line 814, in _xobj_to_image
    data = x_object_obj.get_data()  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/generic/_data_structures.py", line 919, in get_data
    decoded._data = decode_stream_data(self)
                    ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/filters.py", line 613, in decode_stream_data
    data = ASCIIHexDecode.decode(data)  # type: ignore
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/filters.py", line 280, in decode
    elif char.isspace():
         ^^^^^^^^^^^^
AttributeError: 'int' object has no attribute 'isspace'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions