Clone any voice in 15 seconds.
Drop a 10-second clip in, get a voice ready in seconds. Have a sitting president narrate your dating app, run a tech-billionaire launch for your worst idea, or build a fake-panel podcast — no booth, no impressionist on retainer.
Please read the paragraph above
Built for speed, shipped without the wait
Ten seconds of audio. A usable voice in seconds. No long studio sessions, no training queues, no premium tier gate.
10-second clone
One short clip is enough. No 30-minute studio session, no premium tier required.
Studio-grade fidelity
Captures timbre, cadence, and micro-prosody on the first pass — even from noisy field recordings.
Ready in seconds
Instant turnaround. No multi-hour training queue between you and a usable voice.
Zero-shot in 13 languages
Clone once, speak everywhere. No separate multilingual model, no extra training, no re-recording.
Emotion that survives the clone
Anger, irony, hesitation — the small things that make a voice recognisable carry through every sentence.
Open-source S2, API-ready
Self-host the model, hit our sub-300ms streaming endpoint, or ship voices into your agents and apps.
Why Fish Audio S2
Fast cloning, open deployment, global voices, and streaming built for production.
Reference audio
10 seconds is enough
Time to clone-ready
Seconds, not hours
Cross-lingual
Zero-shot in 13 languages
Streaming latency
Sub-300ms end-to-end
Model openness
S2 open-source, self-hostable
Free tier
Start free, no card required
What creators actually use it for
Sketches, takes, and crossovers built for feeds — not boardrooms.
Sketch & impression reels
Drop a populist rant onto your dating-app meltdown, voice-act a tech-billionaire product launch for your worst startup idea, or run a weekly impression bit. No booth, no impressionist on retainer — record the joke, ship the clip.
24/7 takes channels
Spin up a hot-take channel that reacts to today's news before bedtime, stack a fake-panel podcast where every cohost is someone you'd never get on Zoom, or feed a daily news bit into an AI host that never burns out.
Memes that travel
Take an English impression and ship the same delivery in Spanish, Japanese, or Arabic the same afternoon. One joke, every region — your algorithm doesn't care which timezone you're farming.
Clone a voice that moves rooms
10 seconds of audio. One API call. Voices ready for comedy clips, reaction channels, parody podcasts, and multilingual memes.
Frequently asked questions
Fish Audio S2 clones from a 10-second sample, ships sub-300ms streaming, and produces zero-shot cross-lingual output across 13 languages — and the model itself is open-source. Try it on the voice cloning page.
Ten seconds of clean speech is enough. Longer samples can help with very expressive voices, but most public-figure clips, podcast cuts, or phone-quality recordings work on the first try.
Yes. S2 is zero-shot cross-lingual across 13 languages. Clone an English speech once and ship the same voice in Spanish, Japanese, Arabic, or any supported language without retraining.
You are responsible for confirming you have the rights, consents, and disclosures required for any voice you clone, and for complying with applicable laws — including those covering name, likeness, and AI-generated content in your region. Fish Audio does not pre-clear individual use cases and may remove content or accounts that violate our terms or applicable law.
Yes. Paid plans include commercial rights, and the streaming API serves cloned voices directly into your apps, agents, and dubbing pipelines. See pricing for tier details.