Note
Our evaluation script will be released soon!
A comprehensive benchmark evaluation platform for Software Engineering Efficiency across different AI scaffolds and models.
SWE-Effi provides a standardized platform for evaluating and comparing AI-powered software engineering tools across different scaffolds and language models. Our platform aggregates benchmark results and presents them through an interactive web interface.
🌐 Visit the Live Platform
📝 Submit Your Results
SWE-Effi
├── benchmark
│ └── results
│ └── agent-scaffold-stats
│ ├── agentless/
│ │ ├── GPT-4o-mini-2024-07-18/
│ │ │ ├── combined_stats.json
│ │ │ └── summary_stats.json
│ │ └── qwen3-32B/
│ │ ├── combined_stats.json
│ │ └── summary_stats.json
│ ├── agentless-mini/
│ ├── auto-code-rover/
│ ├── openhands/
│ └── swe-agent/
├── scripts/
│ ├── transform-benchmark.py # data transformation
│ └── update-website.sh # easy update script
└── website/
├── public/
│ └── data/
│ └── benchmark/
│ └── raw/ # benchmark data
│ └── summary/ # benchmark data
└── src/
└── docs/
├── about/
└── index.tsx
Want to submit your benchmark results? Follow our submission guide →
-
Clone the repository:
git clone https://github.com/your-org/swe-effi.git cd swe-effi -
Process benchmark data:
# Process all new benchmark data ./scripts/update-website.sh --auto # Process specific scaffold/model ./scripts/update-website.sh agentless gpt-4 # Validate files before processing ./scripts/update-website.sh --validate-only
-
Run the website locally:
cd website npm install npm run dev
When contributors submit benchmark results via PR:
- Review the Pull Request for correctness
- Validate locally (optional):
git checkout [pr-branch] python3 scripts/transform-benchmark.py --validate-only
- Merge the PR
- Update the website:
./scripts/update-website.sh --auto
update-website.sh options:
--auto: Process all available data automatically--validate-only: Only validate files, don't transform--verbose: Show detailed logs--help: Show help information
transform-benchmark.py options:
--scaffold NAME --model NAME: Process specific combination--validate-only: Only validate file format--auto: Auto process all data with validation--verbose: Show detailed logs
- Python 3 for data processing
- Node.js and npm for website
cd website && npm installContributor Results → PR Submission → Validation → Processing → Website Integration
- Results Collection: Contributors submit via GitHub PRs
- Validation: Automated checks ensure data quality
- Processing: Scripts transform data for website consumption
- Integration: Website automatically displays new results
Results must include:
combined_stats.jsonsummary_stats.json