Record yourself speaking and hear a cleaned-up version played back in your own cloned voice — filler words removed, phrasing tightened.
The demo moment: A volunteer speaks for 60 seconds. Within 30 seconds they hear themselves back, clean and confident.
- Voice Sample — Record a 30-second reading to clone your voice
- Main Recording — Record your actual speech (up to 5 minutes)
- Configure — Pick target audience and style (optional)
- Process — ElevenLabs Scribe transcribes → GPT-4o cleans → ElevenLabs speaks in your voice
- Results — Side-by-side original vs. cleaned audio, transcript diff, filler stats
| Layer | Choice |
|---|---|
| Frontend | Next.js 16 + Tailwind CSS |
| Backend | Python FastAPI |
| Transcription | ElevenLabs Scribe (scribe_v1) |
| LLM Cleaning | GPT-4o |
| Voice Clone + TTS | ElevenLabs Instant Voice Clone |
- Node.js 18+
- Python 3.12+
- OpenAI API key
- ElevenLabs API key
cd backend
cp .env.example .env
# Fill in OPENAI_API_KEY and ELEVENLABS_API_KEY in .env
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000cd frontend
npm install
npm run devOpen http://localhost:3000.
├── backend/
│ ├── main.py # FastAPI app — /health, /clone, /analyze
│ ├── stt_client.py # ElevenLabs Scribe transcription
│ ├── gpt_client.py # GPT-4o filler removal
│ ├── elevenlabs_client.py # Voice clone + TTS
│ └── requirements.txt
└── frontend/
├── app/
│ ├── page.tsx # 3-step recording + config flow
│ └── results/page.tsx # Stats, dual audio players, transcript diff
└── components/
├── Recorder.tsx # MediaRecorder with MIME detection
├── AudioPlayer.tsx # Labeled audio player with download
└── TranscriptDiff.tsx # Filler word highlighting
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check |
| POST | /clone |
Upload voice sample, returns voice_id |
| POST | /analyze |
Transcribe + clean + synthesize speech |
Form fields: audio, voice_id, audience (General/Investors/Technical), style (Neutral/More Confident/Add Humor), duration
Response:
{
"raw_transcript": "...",
"cleaned_transcript": "...",
"fillers": [{ "word": "um", "count": 12 }],
"total_fillers": 20,
"original_wpm": 145,
"cleaned_wpm": 132,
"audio_url": "/audio/abc123.mp3"
}