Skip to content

sani903/OpenAgentSafety

Repository files navigation

OpenAgentSafety: Evaluating Agent Safety in Realistic, High-Risk Simulations

License

Overview

OpenAgentSafety (OAS) is an open-source benchmark built on top of TheAgentCompany to systematically evaluate the safety of LLM-based agents operating in realistic, high-risk environments. Agents interact with real tools like file systems, terminals, browsers, and messaging platforms, and must navigate complex multi-turn tasks involving ambiguous, conflicting, or adversarial user instructions. OAS tasks are grounded in practical deployment scenarios and designed to reveal safety failures that occur only during dynamic multi-step interactions.

We extend and acknowledge the infrastructure of TheAgentCompany and Sotopia, leveraging their robust simulation backend and social interaction modeling.

Installation & Setup

The setup steps follow TheAgentCompany's benchmark closely. All services (GitLab, ownCloud, RocketChat, etc.) can be launched via Docker in minutes.

Mac/Linux Setup
# Requires docker + docker compose + ~30GB disk space
sudo chmod 666 /var/run/docker.sock
curl -fsSL https://github.com/TheAgentCompany/the-agent-company-backup-data/releases/download/setup-script-20241208/setup.sh | sh
Windows Setup
curl -fsSL -o setup.bat https://github.com/TheAgentCompany/the-agent-company-backup-data/releases/download/setup-script-20241208/setup.bat && setup.bat

For full details and troubleshooting tips, refer to SETUP.md.

Running OpenAgentSafety

Each task is packaged as a Docker image with:

  • utils/init.sh: initializes the task environment
  • instruction/task.md: agent-facing task instruction
  • utils/eval.py: scoring logic

Running with OpenHands

sudo su
cd evaluation
bash run_eval.sh \
  --agent-llm-config <group1> \
  --env-llm-config <group2> \
  --outputs-path <outputs> \
  --server-hostname <hostname> \
  --version 1.0.0

More details available here.

Key Features

  • High-risk tasks with real-world tooling (code, files, web, chat)
  • Adversarial + ambiguous prompts from simulated users/NPCs
  • Multi-turn reasoning in dynamic environments
  • Rich safety evaluation via deterministic + LLM-based scoring
  • Built on robust agent evaluation and complex social frameworks TheAgentCompany + Sotopia foundations

Data and Evaluation

  • The trajectories and evaluation results of Claude Sonnet 3.7, GPT-4o, o3-mini, Deepseek-v3, and Deepseek-R1 can be accessed here.
  • We also present the LLM-as-judge used for evaluation here.

Citation (coming soon)

Contributing

We welcome contributions! Please open an issue or pull request.

License

Distributed under the MIT License.


(back to top)

About

A Framework for Evaluating AI Agent Safety in Realistic Environments

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors