Meet Moshiko and Moshika, the open source Moshi models 📖🟢. Moshi is a 7B text-audio model, capable of doing full-duplex conversations: it can listen and speak at any time. Plus, its inner text monologue improves the generation 💬 All on device🧑💻
🔎kyutai.org/Moshi.pdf
00:00
Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️
Paper: kyutai.org/Moshi.pdf
Repo:









