Python - copying maths questions including formulae/symbols out of a PDF?

I'm attempting to read some maths questions from a number of PDFs I have, for example:

Currently I'm using Textract's process, but it's giving me outputs as follows: (written in a HTML file with charset=utf-8)

My question is whether it's possible to extract the equations in a better format, at least one that is readable or definitely convertible to a readable format? Text would be preferable however images could also work. Cheers!

asked Feb 2, 2018 at 16:49

Luke

1196 bronze badges

This might help you. github.com/pdfminer/pdfminer.six

Austin
– Austin

2018-02-02 16:58:05 +00:00
Commented Feb 2, 2018 at 16:58
Although closer, still having issues when using pdfminer i.imgur.com/F7Qtszq.png (real PDF on right, pdfminer output on left)

Luke
– Luke

2018-02-02 17:35:14 +00:00
Commented Feb 2, 2018 at 17:35

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

Python - copying maths questions including formulae/symbols out of a PDF?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest