SpeakAble

Input and output
Sidebar
Landing page

Inspiration

We were inspired by the profound challenges faced by the over 70 million people worldwide with speech impairments, as well as the countless others who struggle with language barriers in a globalized world. We believe that communication is a fundamental human right. Seeing the gap between existing communication aids and the need for natural, real-time, and contextual interaction drove us to leverage cutting-edge AI to create a solution that truly empowers connection. Our mission is to make communication accessible to everyone, everywhere.

What it does

SpeakAble is an advanced AI-powered communication platform designed to empower speech-impaired individuals and bridge language gaps. It acts as a real-time, intelligent communication aid with our core features:

Speech Assistance: Uses ElevenLabs technology for advanced, natural-sounding voice synthesis, effectively giving a voice to those who cannot speak.
Gemini AI Integration: Utilizes Google's Gemini AI for intelligent context understanding and processing, ensuring the resulting communication is natural, accurate, and relevant to the conversation.

How we built it

We built SpeakAble by combining two powerful AI services:

Input & Processing: The user's message (spoken or typed) is first sent to the Gemini AI model. Gemini is responsible for the intelligent processing, it interprets the message, understands the context and intent, and, for translation, identifies the target language and prepares the content. This ensures the output is always intelligent and contextually relevant.
Output Generation: The processed and potentially translated text is then passed to the ElevenLabs API. ElevenLabs converts the text into highly natural-sounding speech, which is then delivered in real-time to the communication parties.
Front-End: We used React.js for a responsive user interface that supports both text and speech input to accommodate different needs.

Accomplishments that we're proud of

We are most proud of successfully integrating two industry-leading AI technologies, Gemini and ElevenLabs, into a single, seamless, and high-impact application. The natural quality of the synthesized voice is a major accomplishment, making the communication feel less robotic and more human. Furthermore, achieving a working, end-to-end communication loop that effectively tackles both speech and language barriers within the hackathon timeframe is something we take great pride in. SpeakAble truly is a Social Impact Project.

What we learned

We gained significant experience in working with advanced generative AI APIs, specifically mastering prompt engineering for Gemini AI to handle complex contextual interpretations. We also learned the critical importance of a robust, accessibility-focused user interface design. This project highlighted the power and significance of technology when harnessed for a clear social purpose.

What's next for SpeakAble

For the future of SpeakAble, we plan to:

Expand Customization: Implement features for users to select and potentially clone their own unique voice using ElevenLabs for a more personalized experience.
Offline Mode: Develop a lighter version of the AI model for on-device processing to enable core functionality even without an internet connection.
Broader Accessibility Features: Integrate support for sign language interpretation via video input and expand input methods for users with limited motor skills.