This repository contains accompanying code for the paper Async Control: Stress-testing Asynchronous Control Measures for LLM Agents
To install the dependencies, use uv:
uv syncSystem prompts for the attack policies are encrypted to avoid accidental leakage into training data. To restore them, run the following script and enter password qn6PF5iwSnLReBGmfXUSGw7YlVMN6j2kHLmPpBdFn5E= when prompted:
uv run decrypt_red.pyThis repository contains:
-
Red team attack policy prompts (see installation instruction for how to decrypt them)
-
Honest policy prompts (prompts used to generate the baseline used to calibrate monitor false positive rate)
-
The best performing monitor implemented as a
Scorersuitable for use with Inspect AI -
Time Window async monitoring deployment simulator available as a web page
-
Script to generate attack or honest trajectories:
# Attack trajectories: uv run async_control_paper/generate_trajectories.py # Honest trajectoreis: uv run async_control_paper/generate_trajectories.py --mode=honest
- Add
--helpto get a list of command line options - You may need to run the setup steps for some of Control Arena settings first
- Add
-
Script to run a basic analysis on given trajectories:
uv run async_control_paper/analyse_trajectories.py logs/attack_* logs/honest_*