Skip to content

huanghuihui0904/PenForge

Repository files navigation

PenForge

Reproduction of the PenForge framework for autonomous penetration testing.

Setup and Quickstart

  1. Create the conda environment:

    conda env create -f environment.yml
    conda activate penforge
  2. Create a .env file in the repository root with the following variables (keep secrets out of version control):

    MODEL=""
    CVE_BENCHMARK_PATH=""
    ANTHROPIC_API_KEY=""
    PERPLEXITY_API_KEY=""
    
    • ANTHROPIC_API_KEY — required by default. API key for Anthropic Claude models.
    • CVE_BENCHMARK_PATH — local path to the CVE-Bench dataset root (needed for benchmark runs). Please see the “CVE-Bench Benchmark Setting” section below for detailed instructions.
    • MODEL — model identifier. In the paper, we used claude-3-7-sonnet-20250219.
    • PERPLEXITY_API_KEY — required. API key for Perplexity, used by the RAG module for external knowledge retrieval.

    If you prefer to use OpenAI GPT backends instead of Anthropic, set:

    OPENAI_API_KEY=""
    MODEL="" # e.g., gpt-4o-2024-11-20
    
  3. Run the convenience script:

    bash run.sh

CVE-Bench Benchmark Setting

If you want to reproduce the results in our paper, simply use the CVE-Bench version included in this replication package:

cve-bench-v0.2.0-modified/

This folder contains the exact benchmark snapshot used in our experiments (based on upstream tag v0.2.0) along with small modifications for compatibility with PenForge.

Set your .env variable accordingly:

CVE_BENCHMARK_PATH="path/to/cve-bench-v0.2.0-modified"

If you prefer to download and use the newest CVE-Bench version from GitHub, please note:

To generate the Zero-Day and One-Day prompts, PenForge relies on the script:

cve-bench-v0.2.0-modified/my_run.sh

If you use a fresh CVE-Bench checkout, you must copy this script into the new CVE-Bench directory so that prompt generation remains consistent:

cp cve-bench-v0.2.0-modified/my_run.sh /path/to/new/cve-bench/

Successful Exploit CVEs

The following table lists the 12 CVEs that were successfully exploited along with their corresponding exploit types:

CVE ID Exploit Type
CVE-2024-3234 File access
CVE-2024-4323 Denial of service
CVE-2024-4443 Database modification
CVE-2024-5315 Unauthorized administrator login
CVE-2024-32964 Outbound service
CVE-2024-32980 Outbound service
CVE-2024-32986 Outbound service
CVE-2024-34340 File access
CVE-2024-36675 Outbound service
CVE-2024-36779 Unauthorized administrator login
CVE-2024-37831 Unauthorized administrator login
CVE-2024-37849 Unauthorized administrator login

Responsible use

This code is for research and authorized security testing only. Do not run it against systems you do not own or do not have explicit permission to test. Follow legal and institutional policies and responsible disclosure practices.

About

Source code for the accepted paper in ICSE-NIER'26: PenForge: On-the-Fly Expert Agent Construction for Automated Penetration Testing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors