Text extraction throws IndexError on some PDFs

Recently I ran into a particular kind of pdf file from which I cannot extract text because the library throws an exception.

## Environment

Which environment were you using when you encountered the problem?

```bash
$ python -m platform
Windows-10-10.0.22621-SP0

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==3.17.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=none
```

## Code + PDF

This is a minimal, complete example that shows the issue:

```python
from pypdf import PdfReader

reader = PdfReader("example.pdf")
number_of_pages = len(reader.pages)
page = reader.pages[0]
text = page.extract_text()
print(text)
```
Sample PDF file can be found here:
[example.pdf](https://github.com/py-pdf/pypdf/files/13452885/example.pdf)

## Traceback

This is the complete Traceback I see:

```bash
Traceback (most recent call last):
    File "...\prueba_pdf\test.py", line 6, in <module>
        text = page.extract_text()
    File "...\prueba_pdf\venv\lib\site-packages\pypdf\_page.py", line 2284, in extract_text
        return self._extract_text(
    File "...\prueba_pdf\venv\lib\site-packages\pypdf\_page.py", line 1903, in _extract_text
        cmaps[f] = build_char_map(f, space_width, obj)
    File "...\prueba_pdf\venv\lib\site-packages\pypdf\_cmap.py", line 29, in build_char_map
        font_subtype, font_halfspace, font_encoding, font_map = build_char_map_from_dict(
    File "...\prueba_pdf\venv\lib\site-packages\pypdf\_cmap.py", line 54, in build_char_map_from_dict
        map_dict, space_code, int_entry = parse_to_unicode(ft, space_code)
    File "...\prueba_pdf\venv\lib\site-packages\pypdf\_cmap.py", line 224, in parse_to_unicode
        return type1_alternative(ft, map_dict, space_code, int_entry)
    File "...\prueba_pdf\venv\lib\site-packages\pypdf\_cmap.py", line 481, in type1_alternative
        if words[3] != b"put":
IndexError: list index out of range
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Text extraction throws IndexError on some PDFs #2290

Environment

Code + PDF

Traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Text extraction throws IndexError on some PDFs #2290

Description

Environment

Code + PDF

Traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions