brAInstorm

Main page
Example results page

Inspiration

As regular book, article, and blog post readers as well as affluent writers, we recognized the inherently messy process to brainstorm eloquent, coherent, and impactful ideas. To streamline this messy process and give writers a space to think and compose ideas, we leveraged generative AI tools and full-stack web development to create brAInstorm.

What it does

Our full-stack app brAInstorm allows users to add snippets of text and audio onto a blank whiteboard. We then use Whisper to process audio snippets and Perplexity AI to generate idea summaries and inspiration bits to aid the challenging and unorganized process of brainstorming.

How we built it

We used Vite and React to create the frontend for our full-stack web app. Our backend was built on the FastAPI framework where we implemented a RAG pipeline to push the text snippets and a prompt-engineered query into a Perplexity AI agent and a Multi-Modal LLM called Whisper to process speech to text.

We utilized one of Intel’s AI PCs at DubHacks to fine-tune Whisper on a dataset of human speech and evaluated the model on reading-related audio snippets. We also quantized the Whisper model to optimize it for faster inferences.

Challenges we ran into

The large size of WAV audio files led to unexpected challenges passing data between the React frontend and the Fast-API backend for LLM inference–we experimented with JSON file conversion, local storage, but ultimately combined FormData file processing with asynchronous FAST-API endpoints to pass data.

Integrating Speech-2-Text Models into our fullstack Web Application proved challenging – Initially we tried directly integrating the text-2-speech model into javascript, but we realized the model accuracy suffered considerably during the conversion process, so we ultimately settled on performing model inference on the Fast-API backend.

Converting Quantized LLMs into a format that fit with our fullstack application proved challenging, ultimately we figured out that we needed both bin and xml weights for the model after scouring research papers and other usecases.

Accomplishments that we're proud of

We are proud of successfully integrating language models we were not initially familiar with, for example, the multi-modal language model Whisper.

We are proud of optimizing models to perform faster using Intel AI PCs, and stringing frontend and backend components in a seamless manner.

Being able to accomplish all of these tasks within the past 24 hours was a eye-opening experience that we thoroughly enjoyed.

What we learned

We learned techniques like Post-Training Quantization, conversion to ONNX and OpenVINO formats for optimized performance on.

We learned how to use Remote Tunnels to SSH into AI Tyber Cloud on VSCode. Perform model inference and fine-tuning on LLM models like Whisper and RAG. We also learned how to configure Fast-API endpoints to pass data between the frontend and backend.

What's next for brAInstorm

One of the many things we would like to implement in the future is image input and image generation to support a wider audience and be useful for more applications. We hope to implement stable diffusion models into our brAInstorm app to create idea-inspiring visuals for plot points, scene visualizations, or etc. to help bring ideas to life.

We also believe that writing is a collaborative process. We would love to turn this into a collaborative app where many writers working together can contribute to a single brainstorming whiteboard simultaneously.

Built With

distil-whisper
fastapi
github
intel-ai-pc
intel-tiber
intel-xion
openai-whisper
openvino
perplexity
python
pytorch
rag
react
scikit-learn
vite

Submitted to

DubHacks '24

Created by

I focused primarily on the ML side, finetuning the Speech-2-Text model on various datasets on the AI PC and Intel Tyber Cloud, optimizing models, and also performing integration for the model within the fullstack application.

Samuel Wang
I worked on implementing the overall FastAPI backend interface for this project. I also worked on integrating the LLM workflow (speech-to-text Whisper model and Perplexity AI) and query processing into our full-stack web app.

Sriram K.
I worked on the frontend for this project. I used Vite, React, and TypeScript to generate all of the UI. I took care of converting all of the text and audio snippets into parsable JSON and fetching it to the FastAPI backend.

Isaac Wu

Updates

Isaac Wu started this project — Oct 13, 2024 01:09 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.