Arxflix

Video
Markdown
synthetic dataset and models

Inspiration

Research is tedious, and finding the right papers to read is even harder, We built arXFlix to make research fun again.

What it does

ArXFlix converts research papers into two minutes video summaries, with all the key information ready to visualize.

How we built it

Starting from an Arxiv link we generate a unified markdown formatting of the paper using the experimental Arxiv-to-HTML feature. We retrieve all the important paper content such as figures, equations, titles.

Then we generate a video script using custom finetune of Mistral 7B v0.3. The script is formatted in a specific format that can includes "Rich Content" such as Figures, Equations and Headlines that will be include in video the generated.

Mistral is choosing wisely the most figures and equations to include in the script. The script is then fine-tuned on a synthetic dataset following the right format of the script and tone for our videos framework.

in order to include Rich Content such as Figures, Equations and Headlines we define a custom format for the script that includes the following fields.

To model is fine-tuned on a synthetic dataset following the right format of the script and tone for our videos framework.

We generate the audio with ElevenLabs TTS and use Whipser to extract words timestamps in order to synchronize the audio with the captions in the video.

Finally the video can be generated from the script using Remotion, a React library to generate videos on the fly.

Challenges we ran into

Currating the dataset: Fetching and formating Arxiv paper in a unified format, enforce a specific tone and format to generate our syntethics dataset.

Finetuning a small a model on a custom format

Deal with hallucination in a critical environment, some hallucination in particular in the figures refference can make the all pipeline break.

Generating the video in a procedural way.

Accomplishments that we're proud of

We are proud of how much we accomplished during the two days of the hackathon, building a pipeline that works and can be very useful

What we learned

We learned so much during the process, for example making the LLM output data in the markdown syntax we required

What's next for Arxflix

We plan to keep building, fixing bugs and making it more robust

Built With

axolotl
huggingface
mistral
nextjs
node.js
python
pytorch
tailwind

Submitted to

Mistral AI Paris Hackathon
- Winner Model Fine-tune Track - 2nd

Created by

I worked on fine-tuning Mistral models. I made 28 fine-tunes in total, selected 12 candidate models and evaluated them all.
In this process, I have also assembled 7 datasets in different sizes from different sources to help with fine-tuning.

Maziyar Panahi
Tunji Abioye
Blanchon Blanchon
David K.