Skip to content

AttributeError: 'TextStringObject' object has no attribute 'get_data' #1982

@michelcrypt4d4mus

Description

@michelcrypt4d4mus

Tried to extract text from attached PDF.

Environment

Which environment were you using when you encountered the problem?

$ python -m platform
macOS-13.4.1-arm64-arm-64bit

$ python -c "import pypdf;print(pypdf.__version__)"
3.12.1

Code + PDF

The code is here

PDF is attached. It's public and can be used for tests etc.
2023 USDC_Circle Examination Report May 2023.pdf

Traceback

Traceback (most recent call last):
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/bin/sort_screenshots", line 6, in <module>
    sys.exit(sort_screenshots())
             ^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/workspace/clown_sort/clown_sort/__init__.py", line 43, in sort_screenshots
    file_to_sort.sort_file()
  File "/Users/uzor/workspace/clown_sort/clown_sort/files/sortable_file.py", line 60, in sort_file
    search_text = self.basename_without_ext + ' ' + (self.extracted_text() or '')
                                                     ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/workspace/clown_sort/clown_sort/files/pdf_file.py", line 50, in extracted_text
    for image_number, image in enumerate(page.images, start=1):
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 2604, in __iter__
    yield self[i]
          ~~~~^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 2600, in __getitem__
    return self.get_function(lst[index])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 522, in _get_image
    imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/filters.py", line 844, in _xobj_to_image
    img, image_format, extension = _handle_flate(
                                   ^^^^^^^^^^^^^^
  File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/filters.py", line 762, in _handle_flate
    img.putpalette(lookup.get_data())
                   ^^^^^^^^^^^^^^^
AttributeError: 'TextStringObject' object has no attribute 'get_data'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions