-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF
Description
Tried to extract text from attached PDF.
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
macOS-13.4.1-arm64-arm-64bit
$ python -c "import pypdf;print(pypdf.__version__)"
3.12.1Code + PDF
The code is here
PDF is attached. It's public and can be used for tests etc.
2023 USDC_Circle Examination Report May 2023.pdf
Traceback
Traceback (most recent call last):
File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/bin/sort_screenshots", line 6, in <module>
sys.exit(sort_screenshots())
^^^^^^^^^^^^^^^^^^
File "/Users/uzor/workspace/clown_sort/clown_sort/__init__.py", line 43, in sort_screenshots
file_to_sort.sort_file()
File "/Users/uzor/workspace/clown_sort/clown_sort/files/sortable_file.py", line 60, in sort_file
search_text = self.basename_without_ext + ' ' + (self.extracted_text() or '')
^^^^^^^^^^^^^^^^^^^^^
File "/Users/uzor/workspace/clown_sort/clown_sort/files/pdf_file.py", line 50, in extracted_text
for image_number, image in enumerate(page.images, start=1):
File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 2604, in __iter__
yield self[i]
~~~~^^^
File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 2600, in __getitem__
return self.get_function(lst[index])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/_page.py", line 522, in _get_image
imgd = _xobj_to_image(cast(DictionaryObject, xobjs[id]))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/filters.py", line 844, in _xobj_to_image
img, image_format, extension = _handle_flate(
^^^^^^^^^^^^^^
File "/Users/uzor/Library/Caches/pypoetry/virtualenvs/clown-sort-zLqmJuxs-py3.11/lib/python3.11/site-packages/pypdf/filters.py", line 762, in _handle_flate
img.putpalette(lookup.get_data())
^^^^^^^^^^^^^^^
AttributeError: 'TextStringObject' object has no attribute 'get_data'
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Has MCVEA minimal, complete and verifiable example helps a lot to debug / understand feature requestsA minimal, complete and verifiable example helps a lot to debug / understand feature requestsis-bugFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDFFrom a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF