Inspiration

We were inspired by AI voice assistants and translation systems.

What it does

You speak in one language or style and it will translate and create new speech with your voice in a different language.

How we built it

Almost all AI models are hosted in Nebius server with GPU (besides Mistral LLM). It uses streaming server implementation of Whisper model called (WhisperLive) for voice transcription with low latency, then the transcription is translated into new text in a different language or style with our prompts, and then the translation is used to generate new audio with XTTS-v2 model running on the Nebius server too.

Challenges we ran into

Correctly determining when utterance is finished is a big challenge. Because it supposed to work with minimal latency we are using websocket, but it's hard to program a server capable of all of the features in a short period of time.

Accomplishments that we're proud of

It is able to use my voice and latency can be 4-5 seconds, which is not a lot for a first attempt.

What we learned

How to run modern models as APIs, how to make low latency applications for AI. Don't mess with websockets if you don't have a lot of experience with it).

What's next for C-3PO (AI Interpreter)

Built With

Share this project:

Updates