Limited Time Offer- 50% OFF YEARLYRedeem
Voice Cloning · Fish Audio S2

Clone any voice in 15 seconds.

Drop a 10-second clip in, get a voice ready in seconds. Have a sitting president narrate your dating app, run a tech-billionaire launch for your worst idea, or build a fake-panel podcast — no booth, no impressionist on retainer.

10s sample is enoughSub-300ms streamingOpen-source S2 modelFree tier, no card

Please read the paragraph above

Powered by Fish Audio S2 Pro
UNLOCK THE FULL AUDIO POWER

Built for speed, shipped without the wait

Ten seconds of audio. A usable voice in seconds. No long studio sessions, no training queues, no premium tier gate.

10-second clone

One short clip is enough. No 30-minute studio session, no premium tier required.

Studio-grade fidelity

Captures timbre, cadence, and micro-prosody on the first pass — even from noisy field recordings.

Ready in seconds

Instant turnaround. No multi-hour training queue between you and a usable voice.

Zero-shot in 13 languages

Clone once, speak everywhere. No separate multilingual model, no extra training, no re-recording.

Emotion that survives the clone

Anger, irony, hesitation — the small things that make a voice recognisable carry through every sentence.

Open-source S2, API-ready

Self-host the model, hit our sub-300ms streaming endpoint, or ship voices into your agents and apps.

Why Fish Audio S2

Fast cloning, open deployment, global voices, and streaming built for production.

Reference audio

10 seconds is enough

Time to clone-ready

Seconds, not hours

Cross-lingual

Zero-shot in 13 languages

Streaming latency

Sub-300ms end-to-end

Model openness

S2 open-source, self-hostable

Free tier

Start free, no card required

What creators actually use it for

Sketches, takes, and crossovers built for feeds — not boardrooms.

Sketch & impression reels

Drop a populist rant onto your dating-app meltdown, voice-act a tech-billionaire product launch for your worst startup idea, or run a weekly impression bit. No booth, no impressionist on retainer — record the joke, ship the clip.

24/7 takes channels

Spin up a hot-take channel that reacts to today's news before bedtime, stack a fake-panel podcast where every cohost is someone you'd never get on Zoom, or feed a daily news bit into an AI host that never burns out.

Memes that travel

Take an English impression and ship the same delivery in Spanish, Japanese, or Arabic the same afternoon. One joke, every region — your algorithm doesn't care which timezone you're farming.

Clone a voice that moves rooms

10 seconds of audio. One API call. Voices ready for comedy clips, reaction channels, parody podcasts, and multilingual memes.

Free tier, no card10-second reference is enoughOpen-source S2 model

Frequently asked questions

Fish Audio S2 clones from a 10-second sample, ships sub-300ms streaming, and produces zero-shot cross-lingual output across 13 languages — and the model itself is open-source. Try it on the voice cloning page.

Ten seconds of clean speech is enough. Longer samples can help with very expressive voices, but most public-figure clips, podcast cuts, or phone-quality recordings work on the first try.

Yes. S2 is zero-shot cross-lingual across 13 languages. Clone an English speech once and ship the same voice in Spanish, Japanese, Arabic, or any supported language without retraining.

You are responsible for confirming you have the rights, consents, and disclosures required for any voice you clone, and for complying with applicable laws — including those covering name, likeness, and AI-generated content in your region. Fish Audio does not pre-clear individual use cases and may remove content or accounts that violate our terms or applicable law.

Yes. Paid plans include commercial rights, and the streaming API serves cloned voices directly into your apps, agents, and dubbing pipelines. See pricing for tier details.