Welcome to pdf2docx¶

pdf2docx is a Python library to extract data from PDF with PyMuPDF, parse layout with rule, and generate docx files with python-docx.

pdf2docx is hosted on GitHub and registered on PyPI.