We will be undergoing planned maintenance on January 16th, 2026 at 1:00pm UTC. Please make sure to save your work.

Owlspeak

Mission

As AI continues to democratize access to information and labor, effective communication is becoming increasingly essential. Many aspiring scientists and engineers dedicate significant time to solving complex problems but often lack the opportunity or confidence to present their work to a broader audience. Verbal communication skills—especially under pressure, such as in presentations and interviews—are often underdeveloped due to limited practice.

We aim to change that.

Our application empowers users to communicate with greater confidence in professional settings by providing a safe, accessible environment to practice articulating their experiences and achievements.

Introducing Owlspeak—a voice chat platform designed to enhance business communication skills. By simulating interview-like experiences through conversational AI agents, Owlspeak offers interactive question-and-answer sessions that help users refine their ability to express themselves clearly and effectively.

Key Features

Owlspeak allows users to engage in mock interviews with voice agents, including:

  • Initial greeting,
  • Introductions,
  • Behavioral Q&A,
  • Closing statements.

What Makes Owlspeak Different

Unlike voice-enabled LLM chat applications, Owlspeak maintains a structured conversational flow tailored for mock interviews. The agents guide the user through each interview stage, preventing the conversation from drifting into a typical open-ended chatbot exchange. This preserves the interviewer-interviewee dynamic throughout the session.

Unlike traditional interview prep tools, Owlspeak requires users to respond verbally and spontaneously, without the luxury of reading questions in advance or drafting written responses. This mimics the pressure of real-life professional scenarios, helping users practice not just what to say—but how to say it.

User Flow

  1. On the Owlspeak site, users upload their resume and the job description relevant to the interview.
  2. Upon clicking the record button, the app validates the content and initiates the interview.
  3. The AI agent begins the session and guides the user through the agenda listed above.
  4. Progression through interview stages is determined by spoken content and, as a fallback, a timer.

Technologies Used

Frontend

  • Remix
  • React
  • TypeScript

Backend

  • Python
  • Google ADK
  • Google GenAI
  • FastAPI

Deployment

  • Docker
  • Google Cloud (gcloud)
  • Google Cloud Run

Findings and Learnings

Owlspeak is an interactive voice application, requiring smooth and reliable audio data transfer between the frontend, backend, and AI agents. Documentation and third-party resources—especially from the ADK ecosystem—were essential in helping us build the initial working prototype, offering valuable insight into socket and event system integration.

Google ADK is a fast-evolving platform. Our use case required the live_run capability, an experimental feature that came with some expected challenges. Initially, we developed and tested our agent logic in text-only mode for ease and speed. However, transferring that logic to audio agents proved nontrivial.

Text and audio agents don't map one-to-one in terms of agent flow, callbacks, and control. Our first attempt placed flow logic inside tool call functions, but this diluted the real-time voice experience.

We ultimately designed a dual-system architecture:

  • A live audio agent that manages real-time speech interaction.
  • A text-only agent that handles internal reasoning and manages the interview flow.

This hybrid setup enabled us to maintain topic-focused, responsive voice interactions aligned with the structure of a professional interview.

Going forward

Going forward, we plan to expand Owlspeak into a more robust platform for professional communication training. To better understand user behavior and improve the product, we will implement structured monitoring and analytics. Introducing user authentication will allow us to manage access, personalize experiences, and track usage against compute costs—particularly important as voice-based interactions can be resource-intensive. We also aim to broaden the range of supported business scenarios beyond interviews, such as performance reviews, client meetings, and presentations, to make the platform valuable for a wider audience. Finally, we will continue refining the conversational experience to make interactions feel more natural, responsive, and aligned with real-world professional dynamics. These improvements will help us deliver on our mission to make communication practice more accessible and effective for users at all stages of their careers.

Built With

Share this project:

Updates