Inspiration
A year ago, as a breather from my major, I joined a Mandarin class. Mandarin was a fun language to learn, but it was irritating to not have a uniform place to practice my materials. I had to write word-for-word, character-by-character to digitize my work, struggling with chatGPT to have a dialogue practicing partner, and repeating the same concepts over and over and never testing my full potential and knowledge. From this, we decided to make a program that solved these struggles.
What it does
PolyPix is an easily accessible website that has three modes: Script, Chat, and Translate. Script mode takes in picture/PDF for the practicing dialogue and verbalize it out for the user. Chat mode utilizes AI for the user to specifically practice language conversation in the given language via voice or text input. Finally, Translation mode also utilizes AI but to generate the given practice language phrases to test the user of their translation skills.
How we built it
For Script mode, we utilize MathPix OCR API to scan a given file (image or PDF) for text/characters and transcribe it. We then used ElevenLabs TTS multilingual model to bring that text to audio. For Chat and Translation mode, we used Google Gemini to generate conversations and phrases for the user to practice. And for the Chat mode, we also use ElevenLabs' STT to transcribe the user's words to input.
Challenges we ran into
At the beginning, our TTS model had a weird accent and pronunciation in most languages. However, with fine tuning to a multilingual model, we got the speech compatibility with various languages. Another challenge was that one of our key features, speech to text, is working on the local computer, but not on the same live site. We have yet to solve this, but it is a problem we would love to fix in the future!
Accomplishments that we're proud of
We are proud of the amount of work we were able to get done as a team in the small amount of time we had together. We are proud to have a product we can present and know will help people learn and refine their language skills.
What we learned
As a team, we learned about teamwork, and collaborating together to form one idea and product. For technical skills we all learned more about API's, OCRs, TTS, STT, and AI models and how to implement them to make our vision come to life.
What's next for PolyPix
In the future, we would love to add language level detection for the user to build upon their current proficiency. We also have plans to make the transcription more detailed and formatted for easier readability.
Built With
- css
- elevenlabs
- gemini
- html
- mathpix
- nextjs
- ocr
- typescript

Log in or sign up for Devpost to join the conversation.