Gradium is out of stealth to solve voice. We raised $70M and after only 3 months we’re releasing our transcription and synthesis products to power the next generation of voice AI.
We upgraded Gradium TTS for the cases voice agents can't get wrong: phone numbers, codes, email addresses read back right the first time. Couple of examples: English: 97% on emails, top of the field. French: leads every competitor we benchmarked. Samples + methodology →
In this joint work with @kyutai_labs, we design a reward model for conversational dynamics to teach full-duplex models how a human behaves in conversation, using cues to know when to interrupt, backchannel or stay silent.
New paper: Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models
We use RL to post-train speech models (Moshi and PersonaPlex) to talk more like a human: to know when to respond, when to wait, and when to nod along with “yeah”s and “okay”s when listening.
We'll be at @VivaTech next week showcasing our models. Come find us at Booth 7.2 | 2F13 with @awscloud all week, and on the @LaFrenchTech booth on Wednesday.
@neilzegh is giving two talks: Wed 17th, 5:20pm, @nvidia Stage 1 and on Fri, 10am, Théâtre AWS
Learn how to build an audiobook voice agent using Gradium and @pipecat_ai
Gradium's TTS handles the narration and Pipecat's built-in WebRTC transport delivers the audio to the browser.
Reasoning LLMs typically take 2-3 seconds to start emitting tokens. In a voice agent, that's 2-3 seconds of silence after the user finishes speaking.
The @MiniMax_AI team just shipped a community contribution to Gradbot with two models running in parallel. MiniMax-M2-her
A full house at the @joinhexa office in Paris yesterday.
Our CTO @olivierteboul joined the discussion by sharing why low latency matters for voice agents and how Gradium models support enterprise use cases for voice AI.
"I'd like to cancel my flight from Boston to..." You pause to check a date. The agent cuts in: "Got it, where to?" Now you're talking over it to finish your own sentence.
That's acoustic turn detection. Semantic VAD waits because it knows you're not done: gradium.ai/blog/semantic-…
👉 slator.ch/Conversational…
At SlatorCon London, we discussed voice #AI capabilities and deployments, and how voice AI 🗣️🤖 is shifting the operational infrastructure ⚙️ of enterprises with Neil Zeghidour, Co-Founder and CEO at @GradiumAI, Arkadiusz Kwapiszewski, Head of Agent
Berlin was geht ab, Tavily ist jetzt in town! We're here with @GradiumAI showing off our new voice integration and hosting a hackathon alongside @nebiusai and @cursor_ai. You won't want to miss this one.