Welcome to pdf2docx

pdf2docx is a Python library to extract data from PDF with PyMuPDF, parse layout with rule, and generate docx files with python-docx.

pdf2docx is hosted on GitHub and registered on PyPI.


_images/intro.png

API DOCUMENTATION

Indices and tables