OpenAgentSafety: Evaluating Agent Safety in Realistic, High-Risk Simulations

Overview

OpenAgentSafety (OAS) is an open-source benchmark built on top of TheAgentCompany to systematically evaluate the safety of LLM-based agents operating in realistic, high-risk environments. Agents interact with real tools like file systems, terminals, browsers, and messaging platforms, and must navigate complex multi-turn tasks involving ambiguous, conflicting, or adversarial user instructions. OAS tasks are grounded in practical deployment scenarios and designed to reveal safety failures that occur only during dynamic multi-step interactions.

We extend and acknowledge the infrastructure of TheAgentCompany and Sotopia, leveraging their robust simulation backend and social interaction modeling.

Installation & Setup

The setup steps follow TheAgentCompany's benchmark closely. All services (GitLab, ownCloud, RocketChat, etc.) can be launched via Docker in minutes.

Mac/Linux Setup

# Requires docker + docker compose + ~30GB disk space
sudo chmod 666 /var/run/docker.sock
curl -fsSL https://github.com/TheAgentCompany/the-agent-company-backup-data/releases/download/setup-script-20241208/setup.sh | sh

Windows Setup

curl -fsSL -o setup.bat https://github.com/TheAgentCompany/the-agent-company-backup-data/releases/download/setup-script-20241208/setup.bat && setup.bat

For full details and troubleshooting tips, refer to SETUP.md.

Running OpenAgentSafety

Each task is packaged as a Docker image with:

utils/init.sh: initializes the task environment
instruction/task.md: agent-facing task instruction
utils/eval.py: scoring logic

Running with OpenHands

sudo su
cd evaluation
bash run_eval.sh \
  --agent-llm-config <group1> \
  --env-llm-config <group2> \
  --outputs-path <outputs> \
  --server-hostname <hostname> \
  --version 1.0.0

More details available here.

Key Features

High-risk tasks with real-world tooling (code, files, web, chat)
Adversarial + ambiguous prompts from simulated users/NPCs
Multi-turn reasoning in dynamic environments
Rich safety evaluation via deterministic + LLM-based scoring
Built on robust agent evaluation and complex social frameworks TheAgentCompany + Sotopia foundations

Data and Evaluation

The trajectories and evaluation results of Claude Sonnet 3.7, GPT-4o, o3-mini, Deepseek-v3, and Deepseek-R1 can be accessed here.
We also present the LLM-as-judge used for evaluation here.

Citation (coming soon)

Contributing

We welcome contributions! Please open an issue or pull request.

License

Distributed under the MIT License.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github		.github
.vscode		.vscode
docs		docs
evaluation		evaluation
miscellaneous		miscellaneous
servers		servers
workspaces		workspaces
.gitignore		.gitignore
.openhands_instruction		.openhands_instruction
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAgentSafety: Evaluating Agent Safety in Realistic, High-Risk Simulations

Overview

Installation & Setup

Running OpenAgentSafety

Running with OpenHands

Key Features

Data and Evaluation

Citation (coming soon)

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenAgentSafety: Evaluating Agent Safety in Realistic, High-Risk Simulations

Overview

Installation & Setup

Running OpenAgentSafety

Running with OpenHands

Key Features

Data and Evaluation

Citation (coming soon)

Contributing

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages