-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Closed
Labels
is-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustnessworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow
Description
I have tried to use function extract_text():
self.reader.pages[page_num].extract_text()Environment
I have used the 5.3.0 version and got the following error , python version 3.11.
When using version pypdf==4.3.1 everything goes well.
File "/Users/ivanovcinnikov/PycharmProjects/teams-rag/jobs/readers/pypdf_section_reader.py", line 39, in extract_text
text=self.reader.pages[page_num].extract_text(),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ivanovcinnikov/PycharmProjects/teams-rag/.venv/lib/python3.11/site-packages/pypdf/_page.py", line 2378, in extract_text
return self._extract_text(
^^^^^^^^^^^^^^^^^^^
File "/Users/ivanovcinnikov/PycharmProjects/teams-rag/.venv/lib/python3.11/site-packages/pypdf/_page.py", line 2091, in _extract_text
process_operation(b"Tj", [op])
File "/Users/ivanovcinnikov/PycharmProjects/teams-rag/.venv/lib/python3.11/site-packages/pypdf/_page.py", line 2035, in process_operation
text, rtl_dir, _actual_str_size = self._handle_tj(
^^^^^^^^^^^^^^^^
File "/Users/ivanovcinnikov/PycharmProjects/teams-rag/.venv/lib/python3.11/site-packages/pypdf/_page.py", line 1809, in _handle_tj
self._get_actual_font_widths(cmap, text_operands, font_size, space_width))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/ivanovcinnikov/PycharmProjects/teams-rag/.venv/lib/python3.11/site-packages/pypdf/_page.py", line 1775, in _get_actual_font_widths
font_widths += compute_font_width(font_width_map, char)
TypeError: unsupported operand type(s) for +=: 'int' and 'DictionaryObject'Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
is-robustness-issueFrom a users perspective, this is about robustnessFrom a users perspective, this is about robustnessworkflow-text-extractionFrom a users perspective, text extraction is the affected feature/workflowFrom a users perspective, text extraction is the affected feature/workflow