Replace the PDF parsing code with a large language model (already trained)

## Description

We currently have a very basic "predict" method here: https://github.com/harmonydata/harmony/blob/main/src/harmony/parsing/pdf_parser.py#L44

Recently we ran a competition to fine tune an LLM in HuggingFace to better extract questions (and response options) from PDFs. The competition is [here](https://doxaai.com/competition/harmony-parsing) and was won by Aashvin 

![Image](https://github.com/user-attachments/assets/ef052acc-82d8-4fbd-841b-164a6d556916)

Aashvin's winning model is at https://fastdatascience.z33.web.core.windows.net/submission-5a83e434-58bc-492d-9852-37cd9128cd7e.tar.gz

I am not sure how you can get Aashvin's model into Harmony. One option is to load from this URL. There may be an option for you to upload it to our HuggingFace account https://huggingface.co/harmonydata and then Harmony loads it from HuggingFace Hub. Jay Dugad ( you can find him on the [Discord](https://discord.gg/harmonydata)) could give you access to upload the model to HuggingFace.

You would need to replace the `predict` method in `pdf_parser.py` https://github.com/harmonydata/harmony/blob/main/src/harmony/parsing/pdf_parser.py#L44 with the code that runs the model.

One difference is that the current PDF extraction model only gets question texts e.g. "How often do you feel anxious" but not response options ("Somewhat" / "Very often" etc). The new model gets both of these.

## Rationale

Harmony often extracts the wrong text from a PDF and it's quite inaccurate. This should improve the performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace the PDF parsing code with a large language model (already trained) #107

Description

Rationale

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replace the PDF parsing code with a large language model (already trained) #107

Description

Description

Rationale

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions