Logo
Home Page
Loading Page
Dashboard Page
Team Picture
Behind the Scenes

Inspiration

As a group of multilingual high school friends, we have all witnessed the struggles of our grandparents, parents, and peers when they are unable to comprehend a video due to language barriers. Whether it’s animations, news, movies, or educational videos, the issue of content accessibility for foreign audiences remains a major challenge. The complexity and cost of dubbing — often amounting to over $300 for a 30-minute video — further exacerbate the problem. Additionally, traditional dubbing solutions lack personalization, resulting in voices that do not match the original speaker, diminishing the overall experience.

What it does

ReVoice enables anyone to easily and affordably dub videos into 30 different languages without compromising the original speaker's identity or paying exorbitant fees. With ReVoice, users can ensure that their content reaches a global audience, providing accurate and natural-sounding dubs.

How we built it

Frontend

Flask: Serves as the web framework, managing the server-side logic and routing.
DaisyUI with Tailwind CSS: Utilized for rapid UI development and styling, offering a highly customizable and responsive user interface without the need for extensive custom CSS.
Pydub: A core tool for audio manipulation, handling tasks such as splicing, fading, and layering of music and voice-over tracks.

Backend

ElevenLabs API: ReVoice integrates with the ElevenLabs API to generate high-quality text-to-speech voices in multiple languages. This allows us to maintain voice authenticity while creating personalized voiceovers.
NLTK (Natural Language Toolkit): Employed for natural language processing tasks such as tokenizing sentences and distributing them across multiple AI-generated voices, providing a conversational dynamic between speakers.
Whisper AI (OpenAI): Whisper's advanced speech recognition engine is used for transcribing the original audio before translating it into different languages. The output is highly accurate, ensuring correct translations and contextual understanding.
FFmpeg: Leveraged for audio extraction, mixing, and final stitching, FFmpeg allows seamless integration of the generated dubs into video files, preserving audio-video synchronization.
DeepTranslator: Used to translate the transcribed content into over 30 supported languages. This service ensures accurate translations that are used for creating multilingual dubs.

Challenges we ran into

Managing real-time processing for multiple video formats while preserving audio quality proved to be a challenge. Integrating Whisper AI for speech recognition with ElevenLabs' voice synthesis was complex due to the differing input and output formats. Ensuring low latency across a multi-cloud infrastructure required optimization of resource allocation and load balancing.

Accomplishments that we're proud of

We successfully integrated Whisper AI, ElevenLabs, and NLTK to produce high-quality multilingual dubs in a user-friendly web app. The dynamic voice assignments between AI-generated characters enhanced the overall viewing experience, making it feel natural and personalized. We’re especially proud of our audio synchronization, which maintains perfect lip sync despite the varying length of translations.

What we learned

Throughout this project, we gained a deep understanding of multilingual speech synthesis and audio-video processing. We also learned how to optimize microservice architectures to handle large-scale media files and how to maintain performance across geographically distributed cloud resources.

What's next for ReVoice

We plan to further integrate lip-sync technology by implementing Wav2Lip to a higher extent to enhance viewer engagement. Additionally, expanding our language support to include more niche languages and dialects will be our next milestone. We’re also aiming to improve real-time dubbing for live streaming events and virtual classrooms, bridging the gap for international learners.

Regarding Demo

We are unfortunately not live hosting the project due to costs from elevenlabs and security concerns. Please feel free to clone the repository and try it out yourself!

Built With

daisyui
elevenlabs
flask
html
javascript
python
tailwindcss
whisper

Submitted to

Soario: AI Apps for Impact High School Hackathon
- Winner 1st Place: Soario Internship
- Winner 2nd & 3rd Place: Studywise Subscription
- Winner Top 3 Winners: Cash Prize

Created by

I worked on the Frontend using HTML, TailwindCSS, DaisyUI, and JS, as well as designing the logo. I loved how the project turned out and can't wait to work on more in the future.

Aryan Jain
Andrew Zhang
Priansh -
Hi! My name is Priansh and I am a backend dev (I dip my toe in frontend sometimes). I mainly do python and js/ts. Senior in HS
Sebastian Alexis

Updates

Andrew Zhang started this project — Sep 30, 2024 01:43 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.