Alto

Record yourself speaking and hear a cleaned-up version played back in your own cloned voice — filler words removed, phrasing tightened.

The demo moment: A volunteer speaks for 60 seconds. Within 30 seconds they hear themselves back, clean and confident.

How It Works

Voice Sample — Record a 30-second reading to clone your voice
Main Recording — Record your actual speech (up to 5 minutes)
Configure — Pick target audience and style (optional)
Process — ElevenLabs Scribe transcribes → GPT-4o cleans → ElevenLabs speaks in your voice
Results — Side-by-side original vs. cleaned audio, transcript diff, filler stats

Tech Stack

Layer	Choice
Frontend	Next.js 16 + Tailwind CSS
Backend	Python FastAPI
Transcription	ElevenLabs Scribe (`scribe_v1`)
LLM Cleaning	GPT-4o
Voice Clone + TTS	ElevenLabs Instant Voice Clone

Setup

Prerequisites

Node.js 18+
Python 3.12+
OpenAI API key
ElevenLabs API key

Backend

cd backend
cp .env.example .env
# Fill in OPENAI_API_KEY and ELEVENLABS_API_KEY in .env

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev

Open http://localhost:3000.

Project Structure

├── backend/
│   ├── main.py               # FastAPI app — /health, /clone, /analyze
│   ├── stt_client.py         # ElevenLabs Scribe transcription
│   ├── gpt_client.py         # GPT-4o filler removal
│   ├── elevenlabs_client.py  # Voice clone + TTS
│   └── requirements.txt
└── frontend/
    ├── app/
    │   ├── page.tsx           # 3-step recording + config flow
    │   └── results/page.tsx   # Stats, dual audio players, transcript diff
    └── components/
        ├── Recorder.tsx        # MediaRecorder with MIME detection
        ├── AudioPlayer.tsx     # Labeled audio player with download
        └── TranscriptDiff.tsx  # Filler word highlighting

API

Method	Endpoint	Description
GET	`/health`	Health check
POST	`/clone`	Upload voice sample, returns `voice_id`
POST	`/analyze`	Transcribe + clean + synthesize speech

POST /analyze

Form fields: audio, voice_id, audience (General/Investors/Technical), style (Neutral/More Confident/Add Humor), duration

Response:

{
  "raw_transcript": "...",
  "cleaned_transcript": "...",
  "fillers": [{ "word": "um", "count": 12 }],
  "total_fillers": 20,
  "original_wpm": 145,
  "cleaned_wpm": 132,
  "audio_url": "/audio/abc123.mp3"
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alto

How It Works

Tech Stack

Setup

Prerequisites

Backend

Frontend

Project Structure

API

POST /analyze

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Alto

How It Works

Tech Stack

Setup

Prerequisites

Backend

Frontend

Project Structure

API

POST /analyze

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages