Enhancing LLM Agent Safety via Causal Influence Prompting

Installation guidelines

Virtual environments

The environment_mobilesafetybench.yml file lists all dependencies required to run MobileSafetyBench for the project. You can use the following command to setup the conda environment.

conda env create -f environment_mobilesafetybench.yml
source ~/.bashrc
conda activate cip_mobilesafetybench

cd cip
pip install -e . # install cip
cd ../MobileSafetyBench
pip install -e . # install mobile_safety

The environment_redcode.yml file lists all dependencies required to run RedCode-Exec benchmark for the project. You can use the following command to setup the conda environment.

conda env create -f environment_redcode.yml
source ~/.bashrc
conda activate cip_redcode

cd cip
pip install -e . # install cip

Also, please check the installation guidelines for the MobileSafetyBench and RedCode-Exec directories to ensure proper installation of each benchmark.

Setting Path and API Keys

In the project root directory, create a .env file and add the following contents:

# Path settings
CIP_HOME=/path/to/current/directory
REDCODE_EXEC_HOME=/path/to/current/directory/RedCode-Exec/dataset/RedCode-Exec
MOBILESAFETYBENCH_HOME=/path/to/current/directory/MobileSafetyBench

# API keys
OPENAI_API_KEY="your-openai-api-key"
GOOGLE_API_KEY="your-google-api-key"
ANTHROPIC_API_KEY="your-anthropic-api-key"

CID Generation

Before executing the agent with CIP for each benchmark, you should first generate CID using the script below. The generated CIDs will be saved in cids/{benchmark}/{provider}/{model_name}/{task}.

CID generation for MobileSafetyBench

Set task_tags in mobilesafetybench_cid.py using task_id and scenario_id to generate a CID for each task. You can find the task list in the MobileSafetyBench/asset/tasks/tasks.json.

cd cip/scripts
python mobilesafetybench_cid.py --provider {provider} --model {model}

# python mobilesafetybench_cid.py --provider openai --model gpt-4o-2024-08-06

CID generation for RedCode-Exec

Set language, scenario_ids, test_cases, and prompt_modes in redcode_cid.py to generate a CID for each task. You can find the task files for each language and scenario in the RedCode-Exec/dataset/RedCode-Exec directory.

cd cip/scripts
python redcode_cid.py --provider {provider} --model {model}

# python redcode_cid.py --provider openai --model gpt-4o-2024-08-06

Check CID

You can check the generated CID with text and image using cip/scripts/check_cid.ipynb.

Experiment

MobileSafetyBench

To evaluate agent with basic (CoT), Safety-guided Chain-of-Thought (SCoT), and CIP prompt in MobileSafetyBench, use the file below. Set task_id, scenario_id, and model to run different tasks and models.

cd MobileSafetyBench/experiment
evaluate_{method}.ipynb

# evaluate_basic.ipynb
# evaluate_scot.ipynb
# evaluate_cip.ipynb

RedCode-Exec Benchmark

To evaluate agent with basic (ReACT), Safety-Aware Prompting, and CIP prompt in RedCode-Exec Benchmark, use the below script. Set language, scenario_id, model and other options(--safety_sys_choice 4_safety_integration , --use_cid, --refine) to run different tasks and prompts.

cd RedCode-Exec
sh ./scripts/RA_eval_{model}.sh

# sh ./scripts/RA_eval_gpt.sh
# sh ./scripts/RA_eval_gemini.sh
# sh ./scripts/RA_eval_claude.sh

Reference

Some codes are referred from the related work below:

MobileSafetyBench (link)
RedCode (link)

AI Assistance Disclosure

Some parts of this code were assisted by GitHub Copilot. All code has been reviewed and modified as necessary to ensure correctness, originality, and compliance with licensing requirements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enhancing LLM Agent Safety via Causal Influence Prompting

Table of Contents

Installation guidelines

Virtual environments

Setting Path and API Keys

CID Generation

CID generation for MobileSafetyBench

CID generation for RedCode-Exec

Check CID

Experiment

MobileSafetyBench

RedCode-Exec Benchmark

Reference

AI Assistance Disclosure

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
MobileSafetyBench		MobileSafetyBench
RedCode-Exec		RedCode-Exec
asset		asset
cip		cip
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
environment_mobilesafetybench.yml		environment_mobilesafetybench.yml
environment_redcode.yml		environment_redcode.yml

License

HahmDY/causal-influence-prompting

Folders and files

Latest commit

History

Repository files navigation

Enhancing LLM Agent Safety via Causal Influence Prompting

Table of Contents

Installation guidelines

Virtual environments

Setting Path and API Keys

CID Generation

CID generation for MobileSafetyBench

CID generation for RedCode-Exec

Check CID

Experiment

MobileSafetyBench

RedCode-Exec Benchmark

Reference

AI Assistance Disclosure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages