Inspiration

Many people often use LLM models to learn and receive guidance on topics they are studying. We wanted to flip the script - that is, by teaching others about what you are studying, also known as the Protege Effect. Studies has shown those who actively taught concepts ‘significantly outperformed those who prepared to teach’ (Nestojko et al., 2014), as teaching enables one to strengthen their metacognitive strategies such as extracting key concepts, filling in knowledge gaps, and reinforce long term retention.

Hence, as quoted by Roman philosopher Seneca: “While we teach, we learn.”

What it does

Protege transforms learning into an interactive teaching experience. Users start by typing a topic and choosing a persona, such as a curious 5-year-old, a knowledgeable professor, or a historical figure. They then enter a live session to explain their topic, with the option to use a virtual whiteboard. The persona actively listens, asks follow-up questions, and challenges the user to clarify and adapt their explanation in a way the persona would understand. At the end of the session, users receive a score and personalized feedback based on how clearly they communicated and how well the persona understood.

How we built it

Protege is a ReactJS based web application that utilises Gemini 2.0 Flash Live API model to facilitate natural voice interactions with each of the curious personas. Each persona uses a specifically crafted prompt to simulate distinct communication styles, such as the inquisitive questioning of a child, or logical questioning that a teaching professor has. The users must adapt their explanations based on their audience. We built it by crafting a dynamic persona customization module that lends itself to unique persona flavours and physical voice attributes, while taking in streaming multimodal audio and video input on a streaming basis, with Voice Activity Detection and new client events for end of turn signaling, model input interruptibility, all in typescript and websockets.

Challenges we ran into

One challenge that we faced was creating a live avatar for each of the personas. Tools such as CharacterGPT were standalone and non-integrateable, softwares such as ReadyPlayerMe and D-ID were either too complex or behind paywalls. We briefly considered using Blender rigs and animating the persona movements ourselves, before realising that too much time would be spent doing so. Instead, we implemented a workaround: we generated a 3D avatar of our persona using ChatGPT and then used a separate AI tool to convert the avatar into a looping video, similar to RPG sprites. The persona has different states (such as idle or talking), with the corresponding video playing based on the current state.

Another challenge was the limited range of voices we could customize Gemini response with. If we were able to incorporate ElevenLabs’s api for voices, it would be even more special as Gemini’s voice style of speech were incredibly polished, just lacking in physical variation.

A weakness of Gemini 2.0 Flash Live was its restricted modality of either audio OR text, which limited the usage of enhancing the player experience through progress bar, badges, and initiated summarized feedback. If there was a way to combine the best of both worlds, it would allow for our app to be even more feature dense.

Accomplishments that we're proud of

We are proud that our app concretely establishes the value of the realtime streaming feature-packed functionalities of Gemini 2.0 Live Flash. From interruptability, to configuration of Voice Activity Detection, to dynamic voice customization, to receiving streaming multimodal input, to different output languages, to function calling to ground the personas in google search, all features were thoroughly needed.

What we learned

This experience stretched our imagination on how LLMs can be utilised. While many of us typically use LLMs for consuming information, we realised that by viewing this common use case from a different angle, we could flip the model: using LLMs not to teach us, but to challenge us to teach others.

Additionally, we also learnt the importance of prioritization. While we initially aimed for more advanced technical features. such as streaming Gemini’s responses via WebSockets or integrating a fully live 3D model for the personas, we realized that what mattered the most was delivering a compelling and engaging experience that sells the core idea. Sometimes, it is better to build the version that tells the story the best.

What's next for Protege

We see practical opportunities for Protegé to expand into new use cases. For example, one can use Protegé as a:

  • 🛎️ Customer Service Simulator: Train support teams by practicing responses to a wide range of customer personas, from confused first-time users to demanding power users, improving adaptability and communication under pressure.
  • 🧒 Teaching Aid for Young Children: Help young learners grasp important concepts like online safety, emotional regulation, and communication through dynamic, persona-based explanations.
  • 🎤 Pitch/Speech Reviewer: Practice delivering a pitch to personas of famous entrepreneurs for a simulated and unique virtual Shark Tank experience.
  • 🧠 Knowledge Transfer Trainer: Help teams explain complex technical or company processes to newly onboarded hires with personas of varying levels of expertise.
  • 🎯 Interview Preparation Tool: Simulate interviews across different personalities and industries to help prepare interviewees for every scenario imaginable.

With such diverse applications, we see that Protege is able to position itself as a platform for professional and personal training, strengthening learning through the act of teaching.

We are also looking into a multi-persona audience as well. For example, a user could be teaching a concept to a class of kindergarten children instead, or an entrepreneur might be pitching their ideas to a circle of the top founders in the world.

Built With

Share this project:

Updates