Inspiration

Video calls, VoIP, phone calls, and digital radio all rely on transmitting raw audio waveforms, which require tens of thousands of bits per second. Drop below that, and audio becomes choppy, delayed, or disappears entirely. In situations at sea, natural disasters, or just low infrastructure environments, networks can fail, leaving us without effective means of communication. Janus is our answer, allowing first responders, remote communities, and people to still hear each other’s voices, even when the network can’t carry any audio.

What it does

Janus is a semantic audio codec for ultra-low-bandwidth networks. It lets people communicate naturally using voice, even when the connection is far too weak to carry audio itself. Janus listens to the speaker’s live voice, extracts the text (content), pitch and loudness (tone), and emotion cues (intent), and compresses this information into a tiny MessagePack payload under 300 bits per second. Instead of transmitting audio, it sends only the meaning and tone. On the receiving end, a generative TTS model reconstructs the message using a clear, neutral voice shaped by the sender’s real-time prosody. The result is natural, expressive speech transmitted over a connection that is too weak for any existing audio codec, including state-of-the-art systems like Google Lyra.

How we built it

We built Janus as an end-to-end pipeline with a sender, a semantic compression protocol, and a receiver. On the sender side, a Python “Ear” module captures live microphone audio, transcribes it with Faster-Whisper, and extracts pitch and loudness using Aubio, packaging these features into a compact semantic payload. This payload is serialized with MessagePack and kept under ~300 bps using different modes (semantic voice, text-only, or emotion override). On the receiving end, a “Mouthpiece” module reads the packets and regenerates the message using Fish Audio’s generative TTS, applying the sender’s real-time prosody. We also built a Next.js demo interface that logs transmitted packets and provides a push-to-talk control surface for interacting with the system.

Challenges we ran into

One of the biggest challenges we faced was getting the system to behave consistently across different machines. Components that worked perfectly on one laptop would break on another, due to environment differences, version mismatches, or OS-level audio handling.

Accomplishments that we're proud of

We built a fully functional semantic audio codec in just 24 hours and successfully transmitted expressive speech over a 300-bps simulated link. Janus was able to reconstruct a clear, natural-sounding voice with pitch, tone, and emotional cues preserved, demonstrating that semantic communication can work even when audio cannot. We implemented a complete technical pipeline inspired by cutting-edge research like SemantiCodec and delivered an integrated end-to-end demo, including the sender, protocol layer, receiver, and a real-time UI, to show the system operating live.

What we learned

We learned that waveform audio is surprisingly fragile, while semantic information is remarkably resilient. Understanding what someone said requires only a fraction of the bandwidth needed to transmit raw sound waves. Building Janus made this clear: semantic communication works even when audio transmission completely fails. This project showed us firsthand that semantic codecs aren’t just an academic idea; they represent the future of low-bandwidth communication and offer a practical path toward more reliable, efficient voice systems in constrained or degraded networks.

What's next for Janus

Make Janus a deployable service by packaging the pipeline into a production-ready, Dockerized backend with a documented API that can plug into mesh networks, satellite links, and other low-bandwidth systems. Build companion iOS and Android apps that provide a simple push-to-talk interface on real devices.

Built With

Share this project:

Updates