Skip to content

HKU-System-Security-Lab/R1-Fuzz

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

R1-Fuzz

R1-Fuzz is a tool for deploying LLMs in fuzzing to perform input generation and training LLMs on this task via reinforcement learning.

It provides the functionality of:

  • generating targeted prompts (questions) from source code of the target program under fuzzing. Each question specifies a code region and expects the LLMs to generate new inputs to reach the region, improving fuzzing code coverage.
  • calculating if an input reaches, and how close it is towards the specified code region, serving for RL-based fine-tuning.

Paper: https://arxiv.org/abs/2509.20384

Model: https://huggingface.co/0gal1/R1-Fuzz-7B

Usage

1. Setup FuzzBench

Our fuzzer is implemented in the fuzzer integration in FuzzBench.

Clone FuzzBench, checkout to 2a2ca6ae4c5d171a52b3e20d9b7a, and set up the prerequisites according to its documentation.

2. Prepare the target benchmark

To fuzz a target program, it's required to prepare it as a FuzzBench benchmark.

Besides following the documentation to set up new targets, there are prepared benchmarks in our R1-Fuzz/benchmarks/ directory.

For example, copy R1-Fuzz/benchmarks/quickjs_fuzz_eval/ into $fuzzbench_dir/benchmarks/ and test it by the commands here, e.g., make run-aflplusplus-quickjs_fuzz_eval in the directory of fuzzbench to see if AFL++ is fuzzing our benchmark normally.

3. Prepare R1-Fuzz

Install requirements

conda create -n r1fuzz python==3.10
conda activate r1fuzz
python -m pip install tree_sitter tree_sitter_cpp tree_sitter_c networkx pyyaml datasets pygraphviz

Prepare binaries

R1-Fuzz needs the instrumented executable binary of the target for constructing question prompts.

The recommended way to obtain it is to run a FuzzBench experiment, which automatically saves the instrumented binaries and needed source code.

For example, running a fuzzbench experiment of our quickjs benchmark for a short time and then the binaries are saved in $experiment_filestore/$experiment_name/coverage-binaries/coverage-build-quickjs_fuzz_eval.tar.gz. Decompress the package and you can get the binary: $decompress_dir/src/fuzz_eval.

Retain the decompressed directory structure (src/fuzz_eval, src/quickjs/) because they need to be consistent with the directory structure during fuzzing inside the FuzzBench docker container.

Question (dataset) generation

The script coverage_rewards.py implements the question construction and reward calculation. The reward is used for RL-based fine-tuning of LLMs. The question construction can form the dataset for fine-tuning ahead of fuzzing and also generate questions during fuzzing.

First, prepare one or more seeds (corpus) for the target program, e.g., get quickjs's corpus used by ossfuzz.

Next, configure the paths in the CONFIG variable in coverage_rewards.py. For example

CONFIGS = {
    "quickjs_fuzz_eval": {
        "target_name": "quickjs_fuzz_eval",

        # the executable binary
        "program": "$decompress_dir/src/fuzz_eval",

        # the object files used by llvm-cov's `-object`, some targets need extra shared libraries
        "objects": ["$decompress_dir/src/fuzz_eval"],

        # if running the binary needs to preload
        "LD_PRELOAD": None,

        # the path of the decompressed package
        "file_path_prefix": "$decompress_dir/",
        
        # find the directory of source code you want to track to put in here
        # inside fuzzbench's container, the path is "/src/quickjs/", so the path here should be file_path_prefix + "/src/quickjs/" for using both inside (fuzzing) and outside the container (dataset generation)
        "src_files": [
            "$decompress_dir/src/quickjs/",
        ],

        # used for dataset generation
        "corpus": "/path/to/target/corpus/",
    },
}

After configuring the paths, test the question generation by:

python /path/to/coverage_rewards.py --target_name quickjs_fuzz_eval --test 1

It's expected to print out a question constructed from running an input seed and say "Reward test correct".

Lastly, you can generate a dataset for a target by:

python /path/to/coverage_rewards.py --target_name quickjs_fuzz_eval --output_json quickjs_questions.json --analyze_seed_cnt -1 # -1 means analyzing all input seeds in the corpus.

A constructed question prompt looks like:

You are a helpful code testing and analysis assistant. I will provide:
1. An original test input
2. Code snippet containing function/logic
3. Target branch condition and outcome (TAKEN/NOT_TAKEN)
Your task is to create a new test input that inverts the branch outcome (TAKEN to NOT_TAKEN or vice versa).
First, analyze how the original input triggers the current branch outcome. Then, modify/create input values to invert the branch decision.
Important: The reasoning process and new input should be enclosed within <think> </think> and <answer> </answer> tags.
......
Current test input: 
Date.now();
Code:
int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
......
    if (s->token.u.ident.atom <= JS_ATOM_LAST_KEYWORD ||
        (s->token.u.ident.atom <= JS_ATOM_LAST_STRICT_KEYWORD &&
         (s->cur_func->js_mode & JS_MODE_STRICT)) ||
        (s->token.u.ident.atom == JS_ATOM_yield &&
         ((s->cur_func->func_kind & JS_FUNC_GENERATOR) ||
// THE BRANCH CONDITION `s->token.u.ident.atom == JS_ATOM_await` IN THE FOLLOWING LINE IS NOT TAKEN (FALSE):
        (s->token.u.ident.atom == JS_ATOM_await &&

Now generate a new input to invert the branch condition decision from False to True

Fuzzing

Our fuzzer is implemented in the fuzzer integration in FuzzBench. It constructs questions to query an LLM via APIs, extracts an LLM-generated input, and adds it as a new seed to AFL++'s corpus during fuzzing.

First, configure the MODEL variable for querying an LLM in aflplsuplus_r1fuzz/fuzzer.py. For example,

MODELS = [
    ("model name", "base_url", "api_key"),
]

Then, to run the fuzzer by FuzzBench, copy R1-Fuzz/fuzzers/aflplsuplus_r1fuzz/ into fuzzers/ in the FuzzBench directory. Test the fuzzer according to FuzzBench's documentation. For example,

# start the container:
make debug-aflplusplus_r1fuzz-quickjs_fuzz_eval

# inside the container, start the fuzzer:
$ROOT_DIR/docker/benchmark-runner/startup-runner.sh 2>&1 | tee log.txt

Wait for a few minutes and it'll start printing constructed questions and LLM replies. The LLM-generated seeds will be placed in /out/corpus/addseeds/queue/.

After verifying the fuzzer integration, you can start a FuzzBench experiment. The statistics will be saved in $experiment_filestore/$experiment_name/quickjs_fuzz_eval-aflplusplus_r1fuzz/trial-XX/results/. One question-stats-XX.json is saved for every 15 minutes. It shows the question answering stats like:

{
    "asked_cnt": 7154,      // the number of asked questions (one question might be asked multiple times)
    "answered_cnt": 604,    // the number of correctly answered questions
    "dup_asked_cnt": 4562,  // the number of asked questions that are **not** asked for the first time
    "reanswered_cnt": 395,  // the number of correctly answered questions that are **not** correctly answered at the first time
}

The asked_questions.json saves all generated questions and replies from LLM.

Training

We fine-tune LLMs for the task of fuzzing input generation via GRPO by verl.

We'll release the steps for training soon.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published