Echo.io

Calibrating for your environment
generating your own voices with animated pictures

Inspiration

A lot of people want to speak online but don’t feel safe doing it. Between cyberbullying and privacy concerns, many stay silent. We wanted to give people a way to express themselves without exposing their identity.

What it does

Echo.io lets you speak naturally while an AI character speaks for you. It blends your words, pacing, and emotion into a VTuber-style voice so listeners hear how you feel without hearing your real voice.

How we built it

We stream speech through ElevenLabs and process it in real time. We track timing, speed, and tone to estimate emotion, then clean and re-express the speech through an AI character while keeping it natural.

Challenges we ran into

Live speech is messy. Transcriptions constantly change, words repeat, and AI likes to hallucinate. Making everything feel smooth in real time without breaking character was hard.

Emotions do not translate that well

Accomplishments that we're proud of

Even with multiple AI layers, the system stays surprisingly stable. The avatar for speaking is working better than what we initially expected

What we learned

There has been so much tools created for us to make life so much easier. We are able to learn a lot and know that understanding some of the tools will allow us to move so much faster as long as we understand the basic knowledge for what we are using.

What's next for Echo.io

Finding a more reliable way to stream data, and improve on the efficiency of the speech-to-text and text-to-speech pipeline. Provide more options for speech models, and accurately find the mood from wpm and context.