EffiBench-X

Official codebase for our paper: EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code.

EffiBench-X is a benchmarking platform for evaluating code generation capabilities of Large Language Models (LLMs), with a focus on runtime and memory efficiency. It executes solutions in a sandboxed environment, measuring runtime, memory usage, and execution success.

✨ Features

Comprehensive Benchmarking: Evaluate LLM code generation not only for correctness but also for efficiency metrics (runtime, memory usage)
Multiple Language Support: Test solutions in Python, JavaScript, C++, Java, Go, and Ruby
Flexible Backends: Run evaluations using isolated Docker execution environments
Model Integration: Support for both open-source and proprietary LLMs (OpenAI, Anthropic, Google, DeepSeek, Qwen, Gemma, etc.)
Extensive Dataset: Problems from multiple sources (LeetCode, AtCoder, CodeChef, Codeforces, etc.)
Performance Analysis: Generate detailed reports and comparisons between different models

📦 Installation

# Clone the repository
git clone https://github.com/EffiBench/EffiBench-X.git
cd EffiBench-X

# Install dependencies
pip install -r requirements.txt

🚀 Quick Start

Managing Datasets

# Download dataset from Hugging Face Hub
python hf_dataset.py download

Start the Sandbox Backend

# Start with Docker backend
python start_sandbox.py --type docker --host 127.0.0.1 --port 8000

Generate Solutions

# Generate solutions for all models in the config file
python generate_solution.py generate data/dataset data/solutions --config model_config.yaml

# Merge canonical solutions
python generate_solution.py merge-canonical-solutions

Evaluate Solutions

# Evaluate solutions with multiple processes and threads
python evaluate_solution.py evaluate -o data/evaluation

# Generate evaluation report
python evaluate_solution.py report

🙏 Acknowledgments

llm-sandbox — customized version included under third_party/llm-sandbox (MIT).
Problem sources and inspirations:

⚖️ License

EffiBench-X is licensed under the Apache License 2.0; portions are available under separate terms. The component at third_party/llm-sandbox is licensed under MIT (see its LICENSE).

📚 Citation

Please kindly consider citing our paper if you find this repository helpful in your research and work.

@article{qing2025effibench,
  title={EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code},
  author={Qing, Yuhao and Zhu, Boyu and Du, Mingzhe and Guo, Zhijiang and Zhuo, Terry Yue and Zhang, Qianru and Zhang, Jie M and Cui, Heming and Yiu, Siu-Ming and Huang, Dong and Ng, See-Kiong and Tuan, Luu Anh},
  journal={Advances in neural information processing systems},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EffiBench-X

✨ Features

📦 Installation

🚀 Quick Start

Managing Datasets

Start the Sandbox Backend

Generate Solutions

Evaluate Solutions

🙏 Acknowledgments

⚖️ License

📚 Citation

About

Uh oh!

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
effibench		effibench
third_party/llm-sandbox		third_party/llm-sandbox
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
croissant.json		croissant.json
evaluate_solution.py		evaluate_solution.py
generate_solution.py		generate_solution.py
generate_template.py		generate_template.py
hf_dataset.py		hf_dataset.py
model_config.yaml		model_config.yaml
model_config.yaml.example		model_config.yaml.example
requirements.txt		requirements.txt
setup.py		setup.py
start_sandbox.py		start_sandbox.py

License

EffiBench/EffiBench-X

Folders and files

Latest commit

History

Repository files navigation

EffiBench-X

✨ Features

📦 Installation

🚀 Quick Start

Managing Datasets

Start the Sandbox Backend

Generate Solutions

Evaluate Solutions

🙏 Acknowledgments

⚖️ License

📚 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages