Insufficient handling of inline images containing `EI ` sequences

*pypdf* is currently unable to correctly handle inline images whose actual content stream contains the sequence `EI `. This breaks text extraction as well.

## Environment

Which environment were you using when you encountered the problem?

```bash
$ python -m platform
Linux-6.4.0-150600.23.33-default-x86_64-with-glibc2.38

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.2.0, crypt_provider=('cryptography', '41.0.7'), PIL=10.1.0
```

## Code + PDF

This is a minimal, complete example that shows the issue:

```python
from pypdf import PdfReader

reader = PdfReader('file.pdf')
reader.pages[1].extract_text()
```

I currently do not have a file which would not contain personal data.

Excerpt of the relevant section (`...` marks redacted content):

```
...
BI
/IM true
/W 41
/H 41
/BPC 1
/D[1
0]
/F/CCF
/DP<</K -1
/Columns 41>>
ID >...EI E...
EI Q
q
...
```

## Traceback

This is the complete traceback I see (`...` marks redacted content):

```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/stefan/pdf/pypdf/pypdf/_page.py", line 2378, in extract_text
    return self._extract_text(
  File "/home/stefan/pdf/pypdf/pypdf/_page.py", line 2073, in _extract_text
    for operands, operator in content.operations:
  File "/home/stefan/pdf/pypdf/pypdf/generic/_data_structures.py", line 1423, in operations
    self._parse_content_stream(BytesIO(self._data))
  File "/home/stefan/pdf/pypdf/pypdf/generic/_data_structures.py", line 1325, in _parse_content_stream
    operands.append(read_object(stream, None, self.forced_encoding))
  File "/home/stefan/pdf/pypdf/pypdf/generic/_data_structures.py", line 1496, in read_object
    raise PdfReadError(
pypdf.errors.PdfReadError: Invalid Elementary Object starting with b'\x0b' @1495: b'I E\x0e\x1e\x8a\...\xe0\xc7\x0b$;...'
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Insufficient handling of inline images containing `EI` sequences #3107

Environment

Code + PDF

Traceback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Insufficient handling of inline images containing EI sequences #3107

Description

Environment

Code + PDF

Traceback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Insufficient handling of inline images containing `EI` sequences #3107