qqr

English | 中文

🤗 HuggingFace | 🤖 ModelScope | 📰 Blog | 📑 Paper

qqr (a.k.a. hilichurl) is a lightweight, non-intrusive extension for slime. It seamlessly integrates the Model Context Protocol (MCP) standard to enable the evolution of open-ended agents via ArenaRL.

🌟 Key Features

ArenaRL Algorithm: Full implementation of the core algorithms described in the paper. It includes built-in topologies for Anchor-Based, Round-Robin, Swiss-System, Double-Elimination, and Seeded Single-Elimination tournaments.
Built for Open-Ended Agents: Specifically engineered to tackle discriminative collapse in complex, open-ended tasks, ensuring continuous policy improvement via relative ranking even when reward model scores stagnate.
MCP Support: Seamlessly integration with the MCP standardizes the decoupling of LLM inference and tool environments. Developers can reuse existing MCP Servers as training environments without rewriting interfaces.
High-Performance Training: Built on top of slime (tested with v0.2.1) to deliver high-throughput, distributed rollout generation and training for large-scale agent evolution.

📦 Installation

To get started, first ensure slime is installed (refer to Quick Start). Then install qqr from source:

git clone https://github.com/Alibaba-NLP/qqr.git
cd qqr
pip install -e .

🚀 Quick Start

Run the travel experiment quickly with the following command:

bash scripts/travel/run-qwen3-8B.sh

You can configure the experiment in qqr/examples/travel/config.py.

Acknowledgements

slime: For providing a powerful post-training framework.

openai-agents-python: For providing excellent MCP interfaces.

Citation

If you use qqr or the ArenaRL algorithm in your research, please cite our paper:

@misc{zhang2026arenarlscalingrlopenended,
      title={ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking}, 
      author={Qiang Zhang and Boli Chen and Fanrui Zhang and Ruixue Ding and Shihang Wang and Qiuchen Wang and Yinfeng Huang and Haonan Zhang and Rongxiang Zhu and Pengyong Wang and Ailin Ren and Xin Li and Pengjun Xie and Jiawei Liu and Ning Guo and Jingren Zhou and Zheng-Jun Zha},
      year={2026},
      eprint={2601.06487},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.06487}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github/workflows		.github/workflows
.vscode		.vscode
assets		assets
data		data
qqr		qqr
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

qqr

English | 中文

🌟 Key Features

📦 Installation

🚀 Quick Start

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Alibaba-NLP/qqr

Folders and files

Latest commit

History

Repository files navigation

qqr

English | 中文

🌟 Key Features

📦 Installation

🚀 Quick Start

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages