Inspiration
Research into LLM jailbreak attacks using evolutionary algorithms.
What it does
Runs a prompt breeding/evolution multi-agent system to find vulnerabilities. Builds on the DeepMind PromptBreeder paper and RainbowTeaming. https://arxiv.org/abs/2402.16822 and https://arxiv.org/abs/2309.16797
How we built it
Python, Typescript, Bedrock
Challenges we ran into
Rate limits, data visualisation
Accomplishments that we're proud of
Working system that can test any AI Agent endpoint and creates working jailbreak attacks against frontier models and agent frameworks. Observability platform that is able to help expose prompt improvements and defences to various attacks.
What we learned
Jailbreaking is tricky
What's next for Jailbreak Lab
Scale up, publish, keep building.
Built With
- bedrock
- python
- typescript
Log in or sign up for Devpost to join the conversation.