Inspiration

Research into LLM jailbreak attacks using evolutionary algorithms.

What it does

Runs a prompt breeding/evolution multi-agent system to find vulnerabilities. Builds on the DeepMind PromptBreeder paper and RainbowTeaming. https://arxiv.org/abs/2402.16822 and https://arxiv.org/abs/2309.16797

How we built it

Python, Typescript, Bedrock

Challenges we ran into

Rate limits, data visualisation

Accomplishments that we're proud of

Working system that can test any AI Agent endpoint and creates working jailbreak attacks against frontier models and agent frameworks. Observability platform that is able to help expose prompt improvements and defences to various attacks.

What we learned

Jailbreaking is tricky

What's next for Jailbreak Lab

Scale up, publish, keep building.

Built With

Share this project:

Updates