ValueError: Ascii85 encoded byte sequences must end with b'~>'

Error extracting text from document

## Environment

Which environment were you using when you encountered the problem?

```bash
$ python -m platform
Windows-11-10.0.22631-SP0

$ python -c "import pypdf;print(pypdf._debug_versions)"
pypdf==5.1.0, crypt_provider=('local_crypt_fallback', '0.0.0'), PIL=11.0.0
```

## Code + PDF

This is a minimal, complete example that shows the issue:

```python
for page_num in range_of_pages:
        page = pdf_reader.pages[page_num]
        page_text = page.extract_text()
        page_text = page_text.strip()
        if not page_text:
            page_num_without_text.append(page_num + 1)
        page_texts.append(page_text)
```

Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!

[1af7d56a-5c8c-4914-85b3-b2536a5525cd.pdf](https://github.com/user-attachments/files/18050808/1af7d56a-5c8c-4914-85b3-b2536a5525cd.pdf)


## Traceback

This is the complete traceback I see:

```
File "common\fast_pdf_util.py", line 138, in get_pdf_info
    page_text = page.extract_text()
                ^^^^^^^^^^^^^^^^^^^
  File "venv\Lib\site-packages\pypdf\_page.py", line 2398, in extract_text
    return self._extract_text(
           ^^^^^^^^^^^^^^^^^^^
  File "venv\Lib\site-packages\pypdf\_page.py", line 1868, in _extract_text
    cmaps[f] = build_char_map(f, space_width, obj)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\git\pi-embedding\venv\Lib\site-packages\pypdf\_cmap.py", line 33, in build_char_map
    font_subtype, font_halfspace, font_encoding, font_map = build_char_map_from_dict(
                                                            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv\Lib\site-packages\pypdf\_cmap.py", line 56, in build_char_map_from_dict
    encoding, map_dict = get_encoding(ft)
                         ^^^^^^^^^^^^^^^^
  File "venv\Lib\site-packages\pypdf\_cmap.py", line 129, in get_encoding
    map_dict, int_entry = _parse_to_unicode(ft)
                          ^^^^^^^^^^^^^^^^^^^^^
  File "venv\Lib\site-packages\pypdf\_cmap.py", line 220, in _parse_to_unicode
    cm = prepare_cm(ft)
         ^^^^^^^^^^^^^^
  File "venv\Lib\site-packages\pypdf\_cmap.py", line 250, in prepare_cm
    cm = cast(DecodedStreamObject, ft["/ToUnicode"]).get_data()
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\git\pi-embedding\venv\Lib\site-packages\pypdf\generic\_data_structures.py", line 1113, in get_data
    decoded.set_data(decode_stream_data(self))
                     ^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv\Lib\site-packages\pypdf\filters.py", line 638, in decode_stream_data
    data = ASCII85Decode.decode(data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "venv\Lib\site-packages\pypdf\filters.py", line 449, in decode
    return a85decode(data, adobe=True, ignorechars=WHITESPACES_AS_BYTES)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "Python312\Lib\base64.py", line 388, in a85decode
    raise ValueError(
ValueError: Ascii85 encoded byte sequences must end with b'~>'
```


## Workaround to get past error

```diff
diff --git a/pypdf/filters.py b/pypdf/filters.py
index 517d6aa..5ea158d 100644
--- a/pypdf/filters.py
+++ b/pypdf/filters.py
@@ -635,7 +635,11 @@ def decode_stream_data(stream: Any) -> bytes:  # utils.StreamObject
             elif filter_type in (FT.LZW_DECODE, FTA.LZW):
                 data = LZWDecode._decodeb(data, params)
             elif filter_type in (FT.ASCII_85_DECODE, FTA.A85):
-                data = ASCII85Decode.decode(data)
+                try:
+                    data = ASCII85Decode.decode(data)
+                except ValueError:
+                    # ignore the error for now as workaround
+                    pass
             elif filter_type == FT.DCT_DECODE:
                 data = DCTDecode.decode(data)
             elif filter_type == FT.JPX_DECODE:
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: Ascii85 encoded byte sequences must end with b'~>' #2996

Environment

Code + PDF

Traceback

Workaround to get past error

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ValueError: Ascii85 encoded byte sequences must end with b'~>' #2996

Description

Environment

Code + PDF

Traceback

Workaround to get past error

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions