AI-powered communication coaching platform that helps users practice and improve their speaking skills through data-driven feedback.
Fluency Lab is a comprehensive web application designed to help users improve their communication skills through three main practice modes: Job Interview, Elevator Pitch, and Presentation. The platform records user video and audio, transcribes speech, analyzes communication metrics, and provides actionable feedback using advanced AI analysis.
It is often challenging to judge yourself when preparing for an important presentation, interview, or pitch. Practicing alone rarely provides objective or actionable feedback, especially on delivery, pacing, and confidence. Fluency Lab addresses this by providing clear, data-driven insights into speaking habits, helping users improve their communication skills with each practice session.
- Job Interview Mode: Practice answering 1-5 behavioral interview questions with detailed feedback on clarity, pacing, structure, and confidence
- Elevator Pitch Mode: Master concise introductions under strict time constraints (60 seconds) with constructive feedback
- Presentation Mode: Upload slides or scripts and receive feedback on delivery, engagement, and overall presentation effectiveness
- Video & Audio Recording: Integrated webcam interface for seamless recording
- Speech Transcription: Automatic speech-to-text using ElevenLabs API
- Communication Analytics: Extract metrics including:
- Speaking rate and pacing
- Pause patterns
- Filler word usage
- Fluency indicators
- AI-Powered Feedback: Google Gemini analyzes transcripts and metrics to generate structured, actionable feedback on:
- Tone
- Fluency
- Vocabulary
- Pronunciation
- Engagement
- Confidence
- Progress Tracking: Dashboard with session history and performance trends
- User Authentication: Secure JWT-based authentication system
- Next.js 14 - React framework with App Router
- TypeScript - Type-safe development
- Tailwind CSS - Utility-first styling
- Radix UI - Accessible component primitives
- Recharts - Data visualization for progress tracking
- Lucide React - Icon library
- Next.js API Routes - Serverless API endpoints
- MongoDB - Database for users, recordings, transcriptions, and feedback
- JWT - Authentication tokens
- bcryptjs - Password hashing
- Google Gemini API - AI-powered feedback generation and question generation
- ElevenLabs API - Speech-to-text transcription
- FFmpeg - Video/audio processing (via VM service for production)
- AWS S3 - Video and document storage
- Vercel - Deployment platform (with VM service for FFmpeg operations)
- Node.js 18+ and npm
- MongoDB database (local or MongoDB Atlas)
- AWS S3 bucket (for production/storage)
- API keys:
- Google Gemini API key
- ElevenLabs API key
- (Optional) VM with FFmpeg for production deployment
-
Clone the repository
git clone <repository-url> cd DeltaHacks12
-
Install dependencies
npm install
-
Set up environment variables
Create a
.env.localfile in the root directory:# Database MONGODB_URI=mongodb://localhost:27017/esl-coaching # or for MongoDB Atlas: # MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/esl-coaching # Authentication NEXTAUTH_SECRET=your-secret-key-here # AI Services GOOGLE_GEMINI_API_KEY=your-gemini-api-key ELEVENLABS_API_KEY=your-elevenlabs-api-key # AWS S3 (required for production) AWS_REGION=us-east-1 AWS_ACCESS_KEY_ID=your-access-key AWS_SECRET_ACCESS_KEY=your-secret-key AWS_S3_BUCKET=your-bucket-name # FFmpeg VM Service (optional, for production) FFMPEG_VM_URL=http://your-vm-url:3001 FFMPEG_API_KEY=your-ffmpeg-api-key
-
Run the development server
npm run dev
-
Open your browser Navigate to http://localhost:3000
DeltaHacks12/
โโโ app/ # Next.js App Router
โ โโโ (auth)/ # Authentication routes
โ โ โโโ login/
โ โ โโโ register/
โ โโโ api/ # API routes
โ โ โโโ auth/ # Authentication endpoints
โ โ โโโ feedback/ # Feedback generation
โ โ โโโ gemini/ # Gemini AI integration
โ โ โโโ process/ # Video processing
โ โ โโโ recordings/ # Recording management
โ โ โโโ transcriptions/ # Transcription endpoints
โ โ โโโ upload/ # Video upload
โ โ โโโ upload-reference/ # Document upload
โ โ โโโ videos/ # Video serving
โ โ โโโ whisper/ # Transcription trigger
โ โโโ dashboard/ # User dashboard
โ โโโ feedback/ # Feedback viewing
โ โโโ practice/ # Practice modes
โ โ โโโ job-interview/
โ โ โโโ elevator-pitch/
โ โ โโโ presentation/
โ โโโ page.tsx # Landing page
โโโ components/ # React components
โ โโโ feedback/ # Feedback display components
โ โโโ recording/ # Video recording component
โ โโโ ui/ # UI components
โโโ lib/ # Utility libraries
โ โโโ auth.ts # Authentication utilities
โ โโโ db/ # Database connection
โ โโโ gemini/ # Gemini AI integration
โ โโโ elevenlabs/ # ElevenLabs transcription
โ โโโ s3/ # AWS S3 integration
โ โโโ vm-ffmpeg/ # FFmpeg VM service client
โโโ types/ # TypeScript type definitions
โโโ public/ # Static assets
โโโ vm-ffmpeg-service/ # FFmpeg microservice (for production)
| Variable | Description | Required |
|---|---|---|
MONGODB_URI |
MongoDB connection string | Yes |
NEXTAUTH_SECRET |
Secret key for JWT tokens | Yes |
GOOGLE_GEMINI_API_KEY |
Google Gemini API key | Yes |
ELEVENLABS_API_KEY |
ElevenLabs API key | Yes |
AWS_REGION |
AWS region for S3 | Production |
AWS_ACCESS_KEY_ID |
AWS access key | Production |
AWS_SECRET_ACCESS_KEY |
AWS secret key | Production |
AWS_S3_BUCKET |
S3 bucket name | Production |
FFMPEG_VM_URL |
FFmpeg service URL | Production |
FFMPEG_API_KEY |
FFmpeg service API key | Production |
For production deployments on Vercel, video processing is handled by a separate VM service to avoid timeout limitations. See vm-ffmpeg-service/README.md for setup instructions.
- Register/Login: Create an account or log in to access the platform
- Choose Practice Mode: Select from Job Interview, Elevator Pitch, or Presentation
- Record: Use your webcam to record your practice session
- Review Feedback: Get detailed AI-powered feedback on your performance
- Track Progress: View your improvement over time on the dashboard
- Select number of questions (1-5)
- Answer behavioral interview questions
- Receive feedback on clarity, structure, and confidence
- Practice a 60-second introduction
- Get feedback on conciseness and impact
- Upload slides or script (optional)
- Deliver your presentation
- Receive comprehensive feedback on delivery and engagement
- Recording: User records video/audio via webcam
- Upload: Video uploaded to S3 (or local storage in dev)
- Transcription: Audio extracted and transcribed via ElevenLabs
- Analysis: Transcription and metrics analyzed by Google Gemini
- Feedback: Structured feedback generated and stored
- Display: User views feedback with visualizations
users- User accounts and authenticationrecordings- Video recordings metadatatranscriptions- Speech transcription datafeedbackReports- AI-generated feedback reportsreferenceDocuments- Uploaded slides/scripts
- Push code to GitHub
- Import project in Vercel
- Configure environment variables
- Deploy
Note: For production, you'll need to set up the FFmpeg VM service separately. See vm-ffmpeg-service/DEPLOYMENT.md for details.
The app can run locally with:
- Local MongoDB instance
- Local file storage (no S3 required)
- FFmpeg installed locally (via
ffmpeg-staticpackage)
# Install dependencies
npm install
# Run development server
npm run dev
# Build for production
npm run build
# Start production server
npm start
# Lint code
npm run lint- Reliable Speech Metrics: Extracting meaningful fluency metrics from speech data in real-time
- Balanced Feedback: Designing prompts that provide specific, encouraging feedback without being overly generic or critical
- Feature Scope: Balancing feature completeness with polish within hackathon timeframe
- Video Processing: Handling large video files and processing on serverless platforms
- Robust Transcription Pipeline: Using ElevenLabs for accurate transcription with word-level timestamps
- Careful Prompt Engineering: Structured prompts that generate balanced, actionable feedback
- Modular Architecture: Clean separation of concerns for maintainability
- VM Service: Separate FFmpeg service for production video processing
- โ Complete end-to-end system with immediate, actionable feedback
- โ Multiple practice modes (Interview, Elevator Pitch, Presentation)
- โ Data-driven insights from raw speech data
- โ Strong MVP delivered within hackathon timeline
- โ Polished user interface with progress tracking
- Value of combining speech analytics with LLMs for skill development
- Building reliable speech-processing pipelines
- Designing structured AI outputs for consistent feedback
- Creating user-focused feedback systems under time constraints
- Serverless architecture considerations for media processing
Future enhancements planned:
- Real-time Feedback: Live feedback during practice sessions
- Progress Tracking: Long-term progress analytics and trends
- Additional Scenarios: Team meetings, impromptu speaking practice
- Enhanced Scoring: Refined fluency scoring algorithms
- Visual Feedback: Timeline visualizations for confidence/engagement
- Personalization: Improved personalization and adaptive learning
This project was created for DeltaHacks 12 hackathon.
Built with โค๏ธ by the Fluency Lab team.
For questions or issues, please open an issue on the repository.