Inspiration
I was inspired to create Transcribi by the need to streamline the transcription process for researchers, professionals, and students. I wanted to develop a tool that could accurately transcribe audio and video content, making it easier to extract information and insights from large recordings. This is because I had experienced frustration looking through large videos for specific information when conducting research for projects.
What it does
Transcribi is an advanced transcription tool that converts audio and video files into text format. It uses cutting-edge speech recognition technology through Open AI Whisper to ensure high accuracy in transcriptions. Users can simply upload their files and receive the transcribed text, saving significant time and effort. Next, users can look through the transcript while having the ability to use real-time live subtitles when viewing the recording. Also, users can chat with GPT with the transcript of the audio as context.
How we built it
Transcribi was built with React for the front end, Python and Fast API for the backend + server, and Open AI, Chroma, and Langchain for using the GPT Api and for using the Open AI Whisper model.
Challenges we ran into
One of the main challenges we encountered was optimizing the accuracy of the transcription process and streaming it in real-time to the front end client. It was also difficult to format the transcript when sending it from the backend to the frontend.
Accomplishments that we're proud of
I am proud of developing a transcription tool with high accuracy, even in challenging audio conditions. The system can transcribe various languages and handle different accents effectively. Additionally, I have developed an intuitive and user-friendly interface that simplifies the transcription process for our users and allows them to easily understand and analyze content through the use of AI.
What we learned
I learned how to use websockets with Python and React as well as how to use the Open AI Whisper model to transcribe audio. I also learned how to use FFMPEG to chunk audio files into chunks using silence detection in order to process audio much faster.
What's next for Transcribi
I am planning to make an in-app note-taking component as well as a system to export transcripts. I was also thinking of having some SQL server connections to save transcripts and load them up later.
Built With
- fastapi
- javascript
- langchain
- openai
- python
- react
Log in or sign up for Devpost to join the conversation.