Skip to content

EffiBench/EffiBench-X

Repository files navigation

EffiBench-X

Official codebase for our paper: EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code.

EffiBench-X is a benchmarking platform for evaluating code generation capabilities of Large Language Models (LLMs), with a focus on runtime and memory efficiency. It executes solutions in a sandboxed environment, measuring runtime, memory usage, and execution success.

✨ Features | 📦 Installation | 🚀 Quick Start | 🙏 Acknowledgments | ⚖️ License | 📚 Citation

✨ Features

  • Comprehensive Benchmarking: Evaluate LLM code generation not only for correctness but also for efficiency metrics (runtime, memory usage)
  • Multiple Language Support: Test solutions in Python, JavaScript, C++, Java, Go, and Ruby
  • Flexible Backends: Run evaluations using isolated Docker execution environments
  • Model Integration: Support for both open-source and proprietary LLMs (OpenAI, Anthropic, Google, DeepSeek, Qwen, Gemma, etc.)
  • Extensive Dataset: Problems from multiple sources (LeetCode, AtCoder, CodeChef, Codeforces, etc.)
  • Performance Analysis: Generate detailed reports and comparisons between different models

📦 Installation

# Clone the repository
git clone https://github.com/EffiBench/EffiBench-X.git
cd EffiBench-X

# Install dependencies
pip install -r requirements.txt

🚀 Quick Start

Managing Datasets

# Download dataset from Hugging Face Hub
python hf_dataset.py download

Start the Sandbox Backend

# Start with Docker backend
python start_sandbox.py --type docker --host 127.0.0.1 --port 8000

Generate Solutions

# Generate solutions for all models in the config file
python generate_solution.py generate data/dataset data/solutions --config model_config.yaml

# Merge canonical solutions
python generate_solution.py merge-canonical-solutions

Evaluate Solutions

# Evaluate solutions with multiple processes and threads
python evaluate_solution.py evaluate -o data/evaluation

# Generate evaluation report
python evaluate_solution.py report

🙏 Acknowledgments

⚖️ License

EffiBench-X is licensed under the Apache License 2.0; portions are available under separate terms. The component at third_party/llm-sandbox is licensed under MIT (see its LICENSE).

📚 Citation

Please kindly consider citing our paper if you find this repository helpful in your research and work.

@article{qing2025effibench,
  title={EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code},
  author={Qing, Yuhao and Zhu, Boyu and Du, Mingzhe and Guo, Zhijiang and Zhuo, Terry Yue and Zhang, Qianru and Zhang, Jie M and Cui, Heming and Yiu, Siu-Ming and Huang, Dong and Ng, See-Kiong and Tuan, Luu Anh},
  journal={Advances in neural information processing systems},
  year={2025}
}

About

[NeurIPS'25] EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code

Resources

License

Stars

Watchers

Forks

Languages