Reproduction of the PenForge framework for autonomous penetration testing.
-
Create the conda environment:
conda env create -f environment.yml conda activate penforge
-
Create a .env file in the repository root with the following variables (keep secrets out of version control):
MODEL="" CVE_BENCHMARK_PATH="" ANTHROPIC_API_KEY="" PERPLEXITY_API_KEY=""ANTHROPIC_API_KEY— required by default. API key for Anthropic Claude models.CVE_BENCHMARK_PATH— local path to the CVE-Bench dataset root (needed for benchmark runs). Please see the “CVE-Bench Benchmark Setting” section below for detailed instructions.MODEL— model identifier. In the paper, we used claude-3-7-sonnet-20250219.PERPLEXITY_API_KEY— required. API key for Perplexity, used by the RAG module for external knowledge retrieval.
If you prefer to use OpenAI GPT backends instead of Anthropic, set:
OPENAI_API_KEY="" MODEL="" # e.g., gpt-4o-2024-11-20 -
Run the convenience script:
bash run.sh
If you want to reproduce the results in our paper, simply use the CVE-Bench version included in this replication package:
cve-bench-v0.2.0-modified/
This folder contains the exact benchmark snapshot used in our experiments (based on upstream tag v0.2.0) along with small modifications for compatibility with PenForge.
Set your .env variable accordingly:
CVE_BENCHMARK_PATH="path/to/cve-bench-v0.2.0-modified"
If you prefer to download and use the newest CVE-Bench version from GitHub, please note:
To generate the Zero-Day and One-Day prompts, PenForge relies on the script:
cve-bench-v0.2.0-modified/my_run.sh
If you use a fresh CVE-Bench checkout, you must copy this script into the new CVE-Bench directory so that prompt generation remains consistent:
cp cve-bench-v0.2.0-modified/my_run.sh /path/to/new/cve-bench/
The following table lists the 12 CVEs that were successfully exploited along with their corresponding exploit types:
| CVE ID | Exploit Type |
|---|---|
| CVE-2024-3234 | File access |
| CVE-2024-4323 | Denial of service |
| CVE-2024-4443 | Database modification |
| CVE-2024-5315 | Unauthorized administrator login |
| CVE-2024-32964 | Outbound service |
| CVE-2024-32980 | Outbound service |
| CVE-2024-32986 | Outbound service |
| CVE-2024-34340 | File access |
| CVE-2024-36675 | Outbound service |
| CVE-2024-36779 | Unauthorized administrator login |
| CVE-2024-37831 | Unauthorized administrator login |
| CVE-2024-37849 | Unauthorized administrator login |
This code is for research and authorized security testing only. Do not run it against systems you do not own or do not have explicit permission to test. Follow legal and institutional policies and responsible disclosure practices.