Inspiration

The inspiration for FluentForm came from a personal frustration with the lack of accessible and personalized pronunciation tools. Many language learners struggle with specific sounds and often receive generic advice. We wanted to create a solution that leverages AI to provide detailed feedback and targeted practice, empowering users to achieve clearer and more confident speech. The rise of powerful cloud-based speech APIs and large language models made this vision a reality.

What it does

FluentForm is a web application that helps users improve their pronunciation using AI. Users record a sentence through their browser, which is then processed by the Azure Pronunciation Assessment API. The application provides detailed feedback on each phoneme, highlighting areas for improvement. Based on this analysis, the app generates personalized practice sentences and utilizes OpenAI's GPT-3.5-turbo to provide tailored pronunciation advice and practice exercises.

How we built it

The project was built using a combination of frontend and backend technologies:

  • Frontend: HTML and CSS were used to create a clean and intuitive user interface. The browser's MediaRecorder API was used to capture audio recordings as WebM files.
  • Backend: Node.js and JavaScript were used to build the server-side logic.
  • Audio Processing: The WebM audio recordings were converted to WAV files using a library like ffmpeg. This was necessary for compatibility with the Azure Pronunciation Assessment API.
  • Azure Pronunciation Assessment API: This API was used to analyze the WAV files and provide detailed pronunciation scores for each phoneme.
  • OpenAI API (GPT-3.5-turbo): The phoneme scores were fed into GPT-3.5-turbo to generate personalized pronunciation advice and practice exercises.
  • Server: Node.js Express was used to create the API endpoints that connect the frontend to the Azure and OpenAI APIs.

Challenges we ran into

  • Audio Format Conversion: Ensuring consistent and reliable audio format conversion across different browsers and operating systems was a significant challenge. We had to experiment with different libraries and configurations to achieve optimal results.
  • API Integration: Integrating the Azure Pronunciation Assessment API and the OpenAI API required careful handling of asynchronous requests and data parsing.
  • Phoneme Interpretation: Accurately interpreting the phoneme scores and translating them into actionable advice for users required a deep understanding of phonetics and language learning principles.
  • User Experience: Creating a seamless and intuitive user experience, especially when dealing with complex audio processing and AI-generated content, was a constant challenge.
  • Latency: The delay between recording an audio, sending it to the APIs, and receiving the results, had to be minimized.

Accomplishments that we're proud of

  • Successfully integrated two powerful AI APIs to create a unique and valuable tool.
  • Developed a user-friendly interface that makes pronunciation practice engaging and accessible.
  • Implemented a robust audio processing pipeline that ensures accurate and reliable results.
  • Created a system which delivers personalized feedback and exercises.

What we learned

  • The power of combining different AI APIs to create innovative solutions.
  • The importance of careful audio processing and format conversion in speech applications.
  • The challenges and rewards of building a full-stack web application.
  • How to effectively utilize large language models for personalized learning.
  • How to effectively handle asynchronous operations.

What's next for FluentForm

  • Expanded Language Support: Adding support for more languages beyond English.
  • User Profiles and Progress Tracking: Allowing users to track their progress and set personalized goals.
  • Mobile Optimization: Making the application accessible on mobile devices.
  • Community Features: Adding community features to allow users to share their progress and provide feedback to each other.
  • AWS Hosting Allowing anyone to access their speech therapist at any time and form any where.
Share this project:

Updates