3

I'm attempting to read some maths questions from a number of PDFs I have, for example:

PDF File

Currently I'm using Textract's process, but it's giving me outputs as follows: (written in a HTML file with charset=utf-8)

Textract Output

My question is whether it's possible to extract the equations in a better format, at least one that is readable or definitely convertible to a readable format? Text would be preferable however images could also work. Cheers!

2
  • This might help you. github.com/pdfminer/pdfminer.six Commented Feb 2, 2018 at 16:58
  • Although closer, still having issues when using pdfminer i.imgur.com/F7Qtszq.png (real PDF on right, pdfminer output on left) Commented Feb 2, 2018 at 17:35

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.