Async Control: Stress-testing Asynchronous Control Measures for LLM Agents

This repository contains accompanying code for the paper Async Control: Stress-testing Asynchronous Control Measures for LLM Agents

Installation

To install the dependencies, use uv:

uv sync

System prompts for the attack policies are encrypted to avoid accidental leakage into training data. To restore them, run the following script and enter password qn6PF5iwSnLReBGmfXUSGw7YlVMN6j2kHLmPpBdFn5E= when prompted:

uv run decrypt_red.py

Blue team monitor prompts
Red team attack policy prompts (see installation instruction for how to decrypt them)
Honest policy prompts (prompts used to generate the baseline used to calibrate monitor false positive rate)
The best performing monitor implemented as a Scorer suitable for use with Inspect AI
Time Window async monitoring deployment simulator available as a web page

Script to generate attack or honest trajectories:

# Attack trajectories:
uv run async_control_paper/generate_trajectories.py

# Honest trajectoreis:
uv run async_control_paper/generate_trajectories.py --mode=honest

Add --help to get a list of command line options
You may need to run the setup steps for some of Control Arena settings first

Script to run a basic analysis on given trajectories:

uv run async_control_paper/analyse_trajectories.py logs/attack_* logs/honest_*

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
async_control_paper		async_control_paper
ensemble_models		ensemble_models
website		website
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
decrypt_red.py		decrypt_red.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Async Control: Stress-testing Asynchronous Control Measures for LLM Agents

Installation

Contents

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Async Control: Stress-testing Asynchronous Control Measures for LLM Agents

Installation

Contents

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages