ModDuel

The Arena: Watch a single LLM agent tackle a high-stakes scenario live. Monitor its thought process, tool usage, and decisions in real time.
The Frontier Trials: Our batch-testing engine. Run large-scale experiments across multiple scenarios to measure safety and agent alignment.
The Reckoning: An automated LLM Judge scores the agent's performance, catching deception and penalizing critical safety violations.

Inspiration

Modern guardrail models remain susceptible to "jailbreaks", which are events where the models ignore instantiated safety guidelines and output harmful content typically blocked. Ensuring that these events do not happen is a critical issue when using LLMs, as they reflect potentially real-world scenarios in which AI is given the authority to cause significant damage to humanity. Having read the Anthropic article about LLMs blackmailing after reading emails, we aimed to prevent that kind of behavior from happening.

What it does

Our project simulates red-teaming scenarios in which a list of emails contains content that might provoke an LLM to respond in discordance, and then comprehensively evaluates the LLM on its responses to the scenario. Our sandbox enables users to compare the effectively and robustness of various leading industry LLMs and discover the potential dangers that unregulated AI development will have for our future.

How we built it

We used Next.js for our frontend, along with wild west-themed assets, and integrated SQLPostgres with FastAPI to create our backend.

Challenges we ran into

We faced some UI display issues regarding the email list in each scenario. We had a lot of trouble when deploying our website on Vercel, as there were issues syncing the backend and frontend. Additionally, gaining access to LLM models was a difficult task, as many lightweight models still take up too much space in the backend.

Accomplishments that we're proud of

Our UI has fun and colorful references to the Wild West theme, and allows users to easily pick scenarios to compare against a model. We've fully decked out our scenario-to-model-to-evaluation pipeline, enabling a fully comprehensive analysis.