Skip to content

NoCode-bench/NoCode-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NoCode-bench: A Benchmark for Evaluating Natural Language-Driven Feature Addition

📰 News

  • [2025-07-29]: NoCode-bench is now available on Hugging Face Datasets! You can access it here.
  • [2025-07-18]: We have released NoCode-bench, evaluate your SE Agent here.

📦 Benchmark Overview

NoCode-bench is a benchmark designed to evaluate the ability of Large Language Models (LLMs) to perform no-code feature addition using natural language documentation as input. Unlike prior benchmarks that focus on bug fixing or general issue resolution, NoCode-bench targets a new paradigm where feature development is driven by documentation changes in real-world software projects.

task

  • Instances: 634 real-world feature addition tasks across diverse GitHub projects
  • Format: Each instance contains the documentation change, relevant context files, and a ground truth patch
  • Subset: Includes a manually verified subset (NoCode-bench-Verified) for high-quality, human-evaluated evaluation

To access NoCode-bench, copy and run the following code:

from datasets import load_dataset
ncbench = load_dataset('NoCode-bench/NoCode-bench_Full', split='test')
ncbench_verified = load_dataset('NoCode-bench/NoCode-bench_Verified', split='test')

🚀 How to Use the Benchmark

Environment Setup

Follow these steps to set up the environment for NoCode-bench:

conda create -n ncb python=3.12
conda activate ncb
pip install -r requirements.txt

NoCode-bench enables reproducible evaluations via Docker, by building the base image (fb_base:dev) and the project image (fb_[repo]:dev) as follows:

cd environment
bash setup_all.sh

NoCode-bench also support instance-level Docker images, which can be built using the following command:

export PYTHONPATH=$PYTHONPATH:$(pwd)
python environment/setup_instances_images.py \
   --bench_tasks NoCode-bench/NoCode-bench_Verified \
   --log_dir logs \
   --max_workers 20

We have also provided a pre-built Docker image for NoCode-bench, which can be pulled from Docker Hub. For repo-level docker images, you can pull them using the following command:

cd environment
bash pull_from_hub.sh # for repo level
python pull_instance_images.py --bench_tasks NoCode-bench/NoCode-bench_Verified  # for instance level

Evaluation

You need to generate the prediction results that meet the following format for easy evaluation

# Output Format
instances = [
  {
    'model_name_or_path': '...',
    'instance_id': '...',
    'model_patch': '...',
  },
  ...
]

Evaluate patch predictions on NoCode-bench Verified with the following command:

export PYTHONPATH=$PYTHONPATH:$(pwd)
python ./evaluation/eval.py \
    --predictions_path ./all_preds.jsonl \  # <path_to_your_predictions>
    --log_dir ./evaluation/logs \ # <path_to_your_log_dir>
    --bench_tasks NoCode-bench/NoCode-bench_Verified \ # <dataset_name>
    --max_workers 110 \ # <number_of_workers>
    --output_file eval_result.txt \ # <path_to_your_output_file>
    --image_level repo \ # <cache_image_level>
    --timeout 600 \ # <timeout_in_seconds>
    --proxy None # <proxy_if_needed>

🔧 How to Reconstruct the Benchmark

You can reproduce or extend NoCode-bench using our 5-step construction pipeline:

workflow

Step 1: Project Selection

  • Select high-quality, actively maintained GitHub repositories
cd repos/
sh collect.sh

Step 2: Instance Collection

  • Parse release notes to identify real feature addition tasks
  • Retrieve corresponding PR from GitHub
python construction/collection/collect_[repo].py
python construction/filter_attribute/attribute_filter.py

Step 3: Environment Construction

  • All involved data and scripts are stored in the environment/ folder
  • Include related modules, configuration, and dependencies

Step 4: Instance Filtering

  • Automatically filter out instances that cannot meet our criteria
python construction/filter_execution/execution.py

Step 5: Input Refinement

  • Supplement missing but essential entity names in the task input.
  • Mask information that may cause data leakage
python construction/augmentation/augment.py
python construction/augmentation/mask_auto.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors