ESL Speaks Loud

Inspiration

As former ESL students, language has always been one of the biggest barriers in our learning journey. Around 7 million students worldwide face the same challenge. For many, presentations can be a source of extra anxiety--not just because of public speaking, but because they can’t always find the right words as quickly or express their ideas as smoothly as native speakers, especially when encountering situations where others ask questions. We want to use AI to help us speak more naturally and fluently, so we can share our thoughts with confidence like local students.

What it does

ESL Speaks Loud is a tool that helps ESL students present with more confidence. When giving an online presentation, users can easily switch between their real camera feed and a virtual video. The virtual video can continue the presentation smoothly, delivering the script in fluent English at the same level as a native speaker. If someone asks a question, ESL can also generate a natural-sounding response video, so students don’t have to panic about sudden interruptions. Behind the scenes, ESL takes in the user’s voice sample, a short face video, and their presentation slides. It then uses Cohere’s API to turn each slide into a speech script, and TopView to generate a video for each slide. This way, ESL helps ESL students focus on their ideas, instead of worrying about language barriers.

How we built it

We built our frontend using Nuxt.js (based on Vue.js), backend using FastAPI (python), and interacted with a number of external APIs such as Cohere, Topview, and Google Cloud.

Challenges we ran into

Throughout the Hack the North, we encountered several challenges, including dealing with unstable APIs that slowed our testing process, managing delays in avatar generation (which we solved by switching from HeyGen to the faster TopView API), and coordinating a rapidly expanding codebase across both frontend and backend. Another significant headache was converting uploaded files into accurate speech drafts, which required careful handling of parsing and formatting.

Accomplishments that we're proud of

We’re proud that we actually turned an idea into something real. ESL started from our own struggles as non-native speakers, and now it’s a working prototype that can take a PPT, write a speech, and even switch between a real camera and a virtual speaker.

What we learned

We expanded our skills in:

Cohere AI for dynamic, conversational speech
Cloudflare STT for real-time transcription for live Q&A
FastAPI for async backend orchestration
TypeScript frontend for scene switching & hotkeys
Topview API for avatar generation pipeline
Google Cloud for secure storage for video/audio/PPT

What's next

Looking ahead, our top priority is to make video generation significantly faster by integrating state-of-the-art models, since this remains the greatest bottleneck in our workflow. We also plan to improve the reliability and scalability of the backend, refine the speech draft conversion pipeline for better accuracy, and continue exploring more responsive APIs for smoother media generation. Beyond performance, we aim to expand ESL with features like interactive feedback, personalized practice modules, and broader device accessibility, ultimately evolving it into a robust and user-friendly platform for diverse language learners.