Skip to content

Alibaba-NLP/qqr

Repository files navigation

qqr

English  |   中文

🤗 HuggingFace  |   🤖 ModelScope  |   📰 Blog  |   📑 Paper

qqr (a.k.a. hilichurl) is a lightweight, non-intrusive extension for slime. It seamlessly integrates the Model Context Protocol (MCP) standard to enable the evolution of open-ended agents via ArenaRL.

🌟 Key Features

  • ArenaRL Algorithm: Full implementation of the core algorithms described in the paper. It includes built-in topologies for Anchor-Based, Round-Robin, Swiss-System, Double-Elimination, and Seeded Single-Elimination tournaments.
  • Built for Open-Ended Agents: Specifically engineered to tackle discriminative collapse in complex, open-ended tasks, ensuring continuous policy improvement via relative ranking even when reward model scores stagnate.
  • MCP Support: Seamlessly integration with the MCP standardizes the decoupling of LLM inference and tool environments. Developers can reuse existing MCP Servers as training environments without rewriting interfaces.
  • High-Performance Training: Built on top of slime (tested with v0.2.1) to deliver high-throughput, distributed rollout generation and training for large-scale agent evolution.

📦 Installation

To get started, first ensure slime is installed (refer to Quick Start). Then install qqr from source:

git clone https://github.com/Alibaba-NLP/qqr.git
cd qqr
pip install -e .

🚀 Quick Start

Run the travel experiment quickly with the following command:

bash scripts/travel/run-qwen3-8B.sh

You can configure the experiment in qqr/examples/travel/config.py.

Acknowledgements

slime: For providing a powerful post-training framework.

openai-agents-python: For providing excellent MCP interfaces.

Citation

If you use qqr or the ArenaRL algorithm in your research, please cite our paper:

@misc{zhang2026arenarlscalingrlopenended,
      title={ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking}, 
      author={Qiang Zhang and Boli Chen and Fanrui Zhang and Ruixue Ding and Shihang Wang and Qiuchen Wang and Yinfeng Huang and Haonan Zhang and Rongxiang Zhu and Pengyong Wang and Ailin Ren and Xin Li and Pengjun Xie and Jiawei Liu and Ning Guo and Jingren Zhou and Zheng-Jun Zha},
      year={2026},
      eprint={2601.06487},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.06487}, 
}

About

qqr is an RL training framework for open-ended agents.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published