Inspiration

As voice-activated AI agents become the new standard for customer service, internal help desks, and automated phone systems, a massive new attack surface has opened up. Traditional security tools are built for text and code, leaving voice interfaces vulnerable to conversational manipulation, prompt injection, and unauthorized data exfiltration. I realized there was a critical need for an automated, voice-native security tool: Banshee

What it does

Banshee is an automated voice AI penetration testing suite. Instead of requiring a human security researcher to spend hours talking to a bot, Banshee acts as an adversarial AI. It initiates conversations with target voice systems, deploying a variety of conversational payloads designed to test for vulnerabilities like prompt leaking, role-play jailbreaks, and unauthorized API executions. Banshee takes on a wide array of different personalities randomly selected at the beginning of the conversation maximizing the number of vulnerabilities that can be uncovered overtime.

The results of these adversarial interactions are transcribed, analyzed, and surfaced in a centralized dashboard, giving teams (technical or not) a clear, actionable view of their voice agent's vulnerabilities.

How I built it

The first step in a pen testing project is setting up a realistic test environment. We chose our target to be where data integrity matters most, a hospital's front desk

Both our target and adversarial AI are build on Livekit's webRTC agent framework to maximize quality and reliability. Both utilize Deepgram as their STT, ElevenLabs as their TTS, Silero Vad to detect when speech was occurring, and Twilio as their phone provider. Finally, both use OpenAI 4.1 as their conversational brain and 4.1 mini is used to analyze and score the transcript for security compliance.

I understood that having both AI's conversate through a phone environment (Twilio) would introduce additional latency and complexity, it was super important to me that the project was as realistic as possible. As such, all conversations are not simulations but instead represent real phone calls.

Challenges we ran into

The first hurdle that was obvious was the cost of the project, each test consumes about 10k Eleven Labs credits, and I only have 110k to work with for the project. I also had to manage spending and monitor usage on all other APIs being used.

One of the biggest technical hurdles was managing the latency in voice-to-voice AI communication. Ensuring our adversarial engine could process the target's audio, generate a malicious payload, and synthesize it back into speech fast enough to maintain a natural conversation was incredibly difficult. Particularly, if it was too slow it would be continuously interrupted by the target AI which made it unusable.

Additionally, unlike text-based SQL injection where success is binary, determining if a voice prompt injection was "successful" is highly nuanced. We had to heavily refine the logic that parses the resulting transcripts to accurately flag when a vulnerability was actually triggered and guide the conversation toward checking all vulnerabilities.

Accomplishments that I'm proud of

I am super happy I got a full working product off the ground while staying inside free tiers and budgets. There are a whole lot of APIs that have to be tied together and it takes a lot of work to get a voice ai system off the ground let alone both the target and attack ai.

I am also super happy I was able to get a video that looks half decent as I have never edited anything at all together.

What we learned

I learned a lot about the strengths and weaknesses of voice ai systems. Particularly where they are most vulnerable. Voice shares many of the same vulnerabilities as other LLMs that is they can often get lost in growing context and drift from their original instructions or they could have their input prompt injected. However, voice AI also faces many unique vulnerabilities, that is speaking in other languages, using irregular speaking cadences, and using different tonalities to exploit.

Throughout the project I learned a lot in general about building efficient and safe voice ai systems with enterprise software.

What's next for Banshee

  • Stronger Sustained Attacks: Banshee can do a whole lot more than demoed and I want that to keep growing. Many of the attacks demoed today were short and focused on purely conversational exploits, however it is important to consider security in longer conversations as well where we can monitor drift and strike when the ai is less focused on its primary task.
  • Fix Suggestions: Integrating features that not only find the vulnerabilities but suggest the exact system prompt adjustments needed to patch them.
  • One click integrations for customers: The system is already widely scalable and can be launched on any phone system with just a few code changes, but we would like that to expand to be able to be deployed against any system with filling in a web label and hitting "go"

Built With

Share this project:

Updates