xAI just launched their STT model, and you can now run a complete voice agent pipeline on xAI through LiveKit Inference. That means xAI STT, Grok as your LLM, and xAI TTS, all wired together under a single LiveKit API key. The video below walks through why a cascaded pipeline still wins over a realtime model when you need control, debuggability, and visibility at every stage. It also shows how easy it is to swap components and use mature tool calling. The demo runs Grok's built-in tools in a live agent, including web_search, x_search for live X results, and code_interpreter. Check it out to see the full stack in action.
LiveKit
Technology, Information and Internet
Build applications that can see, hear, and speak with an end-to-end developer platform for voice, video, and physical AI
About us
LiveKit offers open source frameworks and a cloud platform for building voice, video, and physical AI agents.
- Website
-
https://livekit.com
External link for LiveKit
- Industry
- Technology, Information and Internet
- Company size
- 51-200 employees
- Type
- Privately Held
- Founded
- 2021
Employees at LiveKit
Updates
-
Wake words are the short spoken phrases like "Hey Siri" or "Alexa" that activate a voice-enabled device or agent. They sound simple but are surprisingly hard to get right. We launched livekit-wakeword, an open-source library that lets you train a custom wake word model from scratch with a single command. Give it a word or phrase, and it generates synthetic training data using TTS, augments it with background noise, and trains a lightweight model you can deploy anywhere with no ML experience or labeled datasets required. Compared to openWakeWord, it delivers 100x fewer false positives per hour, a 60x lower detection error rate, and 86% recall versus 69%. The exported models are fully compatible with openWakeWord's ONNX format, so they drop right into Home Assistant or any existing integration with no changes required. It’s designed for LiveKit agents and works anywhere you need hands-free voice activation, like noisy environments, kiosks, in-car systems, and accessibility devices. Give it a try and let us know what you're building.
-
livekit/agents just hit 10k stars on GitHub. We released version 1.0 of our Python Agents SDK a year ago with the goal of making it easier to build realtime voice, video, and physical AI. Today, our customers are building agents for healthcare, finance, insurance, education, robotics, and more. It's been amazing to see this community grow over the past year. From bug reports to feature requests to product feedback, every contribution has helped shape what our Agents SDKs are today. Thank you to everyone building with us.
-
We put together a demo with Keyframe Labs avatars running on the LiveKit Agents Framework. We wanted to see how far we could push the experience when you combine emotionally expressive avatars with realtime agent infrastructure. The avatar picks up on emotional context and reacts on its face in real time. These aren’t scripted expressions. It reads the tone of the conversation and shifts accordingly, all handled automatically through the Keyframe plugin for LiveKit Agents. Halfway through the conversation, the avatar transfers to a completely different agent without dropping the session or restarting. Conversation context carries over and the new agent picks up where the last one left off. That's the multi-agent handoff pattern in our Agents framework. The new agent makes tool calls that fire RPCs to the frontend over LiveKit's data channel, so things like a travel itinerary update on screen instantly without polling. The same pattern works for dashboards, booking systems, form fills, anything where an agent needs to push state to the client. Give it a try and let us know what you're building.
-
Most developers building voice agents don't need another place to chat with their agent. They need a fast way to answer specific technical questions: why is my agent slow on this turn but not the last one? Why did it interrupt the user here? Why did the tool call block the response? Standard logs usually don't tell you enough. We just shipped Agent Console, a realtime debugging surface that gives you a live view of your voice agent session across audio waveforms, events, latency, tool calls, transcripts, participant state, RPC traffic, DTMF, and usage. You can inspect the system as it runs instead of reconstructing it from partial evidence after the fact. It works for agents built with Agent Builder, agents written in Python or TypeScript with our Agent SDKs, and agents running locally or deployed on LiveKit Cloud. Check it out in the LiveKit Cloud dashboard and let us know what you think.
-
"How can I improve my agent's latency?" is the question we hear more than any other. The tricky part is that it's never just one thing. Network hops, model choice, tool calls, geography, turn detection. They all add up, and improving one often means trading off another. We put together a comprehensive guide that walks through every major source of voice agent latency, how much each one matters, and what you can do about it. The short version? Monitor your pipeline to find the real bottleneck. Co-locate your agent with your models. Evaluate faster models in the stage that dominates. And keep your tooling clean. Check out the full playbook here: https://lnkd.in/geyT4b5C
-
One of the fastest ways to break trust in a voice agent is bad pronunciation. Imagine calling your doctor's office and the AI nurse can't say your medication name. That's not a minor UX issue. That's a dealbreaker. Rime just launched Mist v3, and it solves this with a feature called phonetic brackets. You define the exact pronunciation for any word using Rime's phonetic alphabet, and the model reproduces it deterministically. No guessing. No variation. The same correct result every time. We built a demo where a voice agent acts as a nurse reviewing prescriptions and lab results. Out of the box, it stumbled on words like "Lisinopril" and "gastroesophageal." After adding phonetic brackets, it nailed every single one. It's also fast. Mist v3 delivers as low as 100ms time-to-first byte, which makes conversations feel natural instead of laggy. It's live today through LiveKit Inference. If you're building voice agents in healthcare, finance, legal, or any domain with specialized terminology, this is worth checking out.
-
Data Tracks extend LiveKit's realtime track infrastructure beyond audio and video. You can now publish binary data over the same SFU infrastructure that handles media, with the same low-latency, selective forwarding model. The SFU only forwards to participants who are subscribed, so bandwidth stays proportional to actual demand. Use cases this opens up: - Teleoperation: send control commands to remotely operate robots, drones, and other devices - Sensor data: publish readings from IMUs, LiDARs, RGBD cameras, or other sources at high frequency - Telemetry: stream application-specific metrics and logs in realtime - Non-standard media: stream formats not natively supported by WebRTC, like MJPEG from edge devices A few things worth knowing: - No practical limit on concurrent tracks (theoretical max is 65,535), so one track per sensor or actuator is realistic - Frames support custom 64-bit timestamps, useful for measuring end-to-end latency - End-to-end encryption is supported if enabled for the room - Available now in JS, Rust, Python, C++, and Unity Full details in the blog post and video linked in the comments.
-
LiveKit reposted this
❗Just added an exciting new batch of speakers to our lineup for the Cerebral Valley Voice Summit. They are: • Russ d'Sa, founder & CEO, LiveKit • Scott Stephenson, founder & CEO, Deepgram • Karan Goel, founder & CEO, Cartesia • Olivia Moore, partner, Andreessen Horowitz • Jeffery L., founder & co-CEO, Assort Health • Justin Uberti, head of realtime AI, OpenAI • Grace Isford, partner, Lux Capital • Tanay Kothari, founder & CEO, Wispr Flow Learn more, and join us on May 6 in San Francisco, at the link in comments.
-
-
LiveKit reposted this
The cloud was not built for AI agents. At the recent Daytona Compute Conference, Russ d'Sa, co-founder & CEO of LiveKit, sat down with Matt Turck, VC at FirstMark to break down why stateful, long-running agent sessions cannot be deployed and scaled the same way as traditional web applications. 📹 Link to the full talk in the comments.
-