BUG: 'IndexError: index out of range' when using extract_text

I am a new user of PyPDF2 within 24 hours. Maybe it is my problem.
I got an error when using extract_text,a suggestion of extract_text and a mistake in document.

# Environment
```
(PDFProcess) E:\pyProject\PDFProcess>python -m platform
Windows-10-10.0.19041-SP0(Windows家庭中文版)
(PDFProcess) E:\pyProject\PDFProcess>python -c "import PyPDF2,sys;print(PyPDF2.__version__,sys.version,sep='###')"
2.10.9###3.9.1 (tags/v3.9.1:1e5d33e, Dec  7 2020, 17:08:21) [MSC v.1927 64 bit (AMD64)]
```

**PDF**-> [UnicodeCharts](https://www.unicode.org/Public/15.0.0/charts/CodeCharts.pdf)

```python
from PyPDF2 import PdfReader

reader = PdfReader("Unicode/CodeCharts_15.0.0.pdf")
page_0 = reader.pages[0]
page_0.extract_text() 
```

## Bug 
**Location**: PyPDF2/generic/_data_structures.py --> class `ContentStream(DecodedStreamObject)::__init__`
approximately in line 690, code  `if data[-1] != b"\n":` will raise `IndexError` when `data == b""`
maybe should change it to if-elif statement:
```
if len(data) ==0:pass
elif if data[-1] != b"\n": 
    data += b"\n"
```
or just change to:
```
if len(data) ==0 or data[-1] != b"\n":
    data += b"\n"
```

## Suggestion
**Location**:PyPDF2/_page.py  --> class `PageObject(DictionaryObject)::_extract_text` --> function `process_operation`  -->`elif operator == b"Tj"`:
approximately in line 1514 ,not sure yet.
when I use (fixed) page_num.extract_text() ,I got a String without appropriate separator such as '\n' to break or split lines,
I try to add a newline between `#fmt: on` and `else:return  None`
```
                # fmt: on
    text+="*LineBreak*"
else:
    return None
```
It works in pure text page,but have bad performance in other formats like table.
I have little knowledge about where is right place to add linebreaks.
So,I think it is necessary to add a new argument like `def extract_text(sep:str=""):` and then implement.

## Documentation

https://pypdf2.readthedocs.io/en/latest/user/reading-pdf-annotations.html#attachments

**Location**:docs/user/reading-pdf-annotations.md  --> Attachments
The example code has NameError, 
```python
attachments = {}
for page in reader.pages:
    if "/Annots" in page:
        for annotation in page["/Annots"]:
            subtype = annot.get_object()["/Subtype"]
            if subtype == "/FileAttachment":
                fileobj = annotobj["/FS"]
                attachments[fileobj["/F"]] = fileobj["/EF"]["/F"].get_data()
```
`for annotation`  --> `subtype = annot` --> `fileobj = annotobj`
Variables' name should be uniformed in the above example.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: 'IndexError: index out of range' when using extract_text #1358

Environment

Bug

Suggestion

Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: 'IndexError: index out of range' when using extract_text #1358

Description

Environment

Bug

Suggestion

Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions