When I read certain PDF files(maybe not standard pdf), an exception is thrown,
I hope I can read them normally
Environment
Which environment were you using when you encountered the problem?
$ python -m platform
macOS-10.16-x86_64-i386-64bit
$ python -c "import pypdf;print(pypdf.__version__)"
3.2.1
Code + PDF
This is a minimal, complete example that shows the issue:
from io import BytesIO
import requests
from pypdf import PdfReader
if __name__ == '__main__':
url = "https://gz-gov-open-doc.oss-cn-gz-ysgzlt-d01-a.ltops.gzdata.com.cn/1024FPA/open/37560735-0101-4199-8dd6-c9015c438f13.pdf"
res = requests.get(url)
print(res.text)
PdfReader(BytesIO(res.content))
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
小型企业声明函.pdf
Traceback
This is the complete Traceback I see:
Traceback (most recent call last):
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/KPnmHDqb-py3.8/lib/python3.8/site-packages/pypdf/_reader.py", line 1713, in _read_xref_tables_and_trailers
xrefstream = self._read_pdf15_xref_stream(stream)
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/KPnmHDqb-py3.8/lib/python3.8/site-packages/pypdf/_reader.py", line 1842, in _read_pdf15_xref_stream
self._read_xref_subsections(idx_pairs, get_entry, used_before)
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/KPnmHDqb-py3.8/lib/python3.8/site-packages/pypdf/_reader.py", line 1907, in _read_xref_subsections
assert start >= last_end
AssertionError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/rccpony/PycharmProjects/pdf.py", line 10, in <module>
PdfReader(BytesIO(res.content))
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/KPnmHDqb-py3.8/lib/python3.8/site-packages/pypdf/_reader.py", line 319, in __init__
self.read(stream)
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/KPnmHDqb-py3.8/lib/python3.8/site-packages/pypdf/_reader.py", line 1508, in read
self._read_xref_tables_and_trailers(stream, startxref, xref_issue_nr)
File "/Users/rccpony/Library/Caches/pypoetry/virtualenvs/KPnmHDqb-py3.8/lib/python3.8/site-packages/pypdf/_reader.py", line 1722, in _read_xref_tables_and_trailers
raise PdfReadError(f"trailer can not be read {e.args}")
pypdf.errors.PdfReadError: trailer can not be read ()
When I read certain PDF files(maybe not standard pdf), an exception is thrown,
I hope I can read them normally
Environment
Which environment were you using when you encountered the problem?
$ python -m platform macOS-10.16-x86_64-i386-64bit $ python -c "import pypdf;print(pypdf.__version__)" 3.2.1Code + PDF
This is a minimal, complete example that shows the issue:
Share here the PDF file(s) that cause the issue. The smaller they are, the
better. Let us know if we may add them to our tests!
小型企业声明函.pdf
Traceback
This is the complete Traceback I see: