Rank-R1

Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning

In this work, we introduce Rank-R1, a Setwise reranker with reasoning abilities. We trained Rank-R1 on the MSMARCO dataset using the GRPO RL algorithm. Check out the paper for more details.

Installation

For training and inference Rank-R1, first follow the README.md in the root directory to install llm-rankers. Then

pip install vllm
pip install trl
pip install peft deepspeed

Inference Rank-R1

Python example:

from llmrankers.setwise import RankR1SetwiseLlmRanker
from llmrankers.rankers import SearchResult

docs = [SearchResult(docid=i, text=f'this is passage {i}', score=None) for i in range(20)]
query = 'Give me passage 6'

ranker = RankR1SetwiseLlmRanker(
    model_name_or_path='Qwen/Qwen2.5-7B-Instruct',
    lora_name_or_path='ielabgroup/Rank-R1-7B-v0.1',
    prompt_file='prompts/prompt_setwise-R1.toml',
    num_child=19,
    k=1,
    verbose=True
)

print(ranker.rerank(query, docs)[0])

Internally, the above code is equivalent to the following:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, PeftConfig

def get_model(peft_model_name):
    config = PeftConfig.from_pretrained(peft_model_name)
    base_model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
    model = PeftModel.from_pretrained(base_model, peft_model_name)
    model = model.merge_and_unload()
    return model

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-7B-Instruct')
model = get_model('ielabgroup/Rank-R1-7B-v0.1').to('cuda:0').eval()

prompt_system = "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think> <answer> answer here </answer>."
prompt_user = '''Given the query: "{query}", which of the following documents is most relevant?
{docs}
After completing the reasoning process, please provide only the label of the most relevant document to the query, enclosed in square brackets, within the answer tags. For example, if the third document is the most relevant, the answer should be: <think> reasoning process here </think> <answer>[3]</answer>.'''

query = 'Give me passage 6'
docs = [f'[{i+1}] this is passage {i+1}' for i in range(20)]
docs = '\n'.join(docs)

messages = [
    {'role': "system", 'content': prompt_system},
    {'role': "user", 'content': prompt_user.format(query=query, docs=docs)}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to('cuda:0')

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=2048,
    do_sample=False,
)
generated_ids = [
    output_ids[len(input_ids)-1:] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
'''
<think> The query "Give me passage 6" is asking for the specific document that contains the text "this is passage 6". By looking at the list provided, we can see that the document labeled [6] matches this description exactly. </think>
<answer>[6]</answer>
'''

# extract the answer
import re
pattern = '<think>.*?</think>\s*<answer>(.*?)</answer>'
answer = re.search(pattern, response, re.DOTALL).group(1) # answer = '[6]'

Note that our Rank-R1 Setwise rerankers are trained with the prompt format shown above, which includes 20 documents. Other numbers of documents should also work fine, but this would represent a "zero-shot" setting for the model.

Released Models (LoRA adapters)

Resource	Description
Rank-R1-3B-v0.1	Trained from Qwen2.5-3B-Instruct
Rank-R1-7B-v0.1	Trained from Qwen2.5-7B-Instruct
Rank-R1-14B-v0.1	Trained from Qwen2.5-14B-Instruct
Setwise-SFT-3B-v0.1	Trained from Qwen2.5-3B-Instruct
Setwise-SFT-7B-v0.1	Trained from Qwen2.5-7B-Instruct
Setwise-SFT-14B-v0.1	Trained from Qwen2.5-14B-Instruct

TREC DL examples:

Rank-R1

# Rank-R1
for dataset in dl19 dl20; do
  for size in 3 7 14; do
    model_name_or_path=Qwen/Qwen2.5-${size}B-Instruct
    lora_path=ielabgroup/Rank-R1-${size}B-v0.1
    mkdir -p runs/${lora_path}
    
    CUDA_VISIBLE_DEVICES=0 python3 run_setwise.py \
    run --model_name_or_path ${model_name_or_path} \
      --lora_path ${lora_path} \
      --prompt_file prompts/prompt_setwise-R1.toml \
      --run_path runs/pyserini_bm25/run.msmarco-v1-passage.bm25-default.${dataset}.txt \
      --save_path runs/${lora_path}/${dataset}.txt \
      --pyserini_dataset ${dataset}-passage \
      --pyserini_index msmarco-v1-passage \
      --hits 100 \
      --query_length 32 \
      --passage_length 512 \
      --scoring generation \
      setwise --num_child 19 \
               --method heapsort \
               --k 10
           
  python -m pyserini.eval.trec_eval -c -l 2 -m ndcg_cut.10 ${dataset}-passage runs/${lora_path}/${dataset}.txt > runs/${lora_path}/${dataset}.eval
  done
done

Rank-R1 zeroshot

# Rank-R1 zeroshot
for dataset in dl19 dl20; do
  for size in 3 7 14; do
    model_name_or_path=Qwen/Qwen2.5-${size}B-Instruct
    
    mkdir -p runs/${model_name_or_path}/zeroshot-R1
    
    CUDA_VISIBLE_DEVICES=0 python3 run_setwise.py \
    run --model_name_or_path ${model_name_or_path} \
      --prompt_file prompts/prompt_setwise-R1.toml \
      --run_path runs/pyserini_bm25/run.msmarco-v1-passage.bm25-default.${dataset}.txt \
      --save_path runs/${model_name_or_path}/zeroshot-R1/${dataset}.txt \
      --pyserini_dataset ${dataset}-passage \
      --pyserini_index msmarco-v1-passage \
      --hits 100 \
      --query_length 32 \
      --passage_length 512 \
      --scoring generation \
      setwise --num_child 19 \
               --method heapsort \
               --k 10
           
  python -m pyserini.eval.trec_eval -c -l 2 -m ndcg_cut.10 ${dataset}-passage runs/${model_name_or_path}/zeroshot-R1/${dataset}.txt > runs/${model_name_or_path}/zeroshot-R1/${dataset}.eval
  done
done

Setwise SFT

# Setwise SFT
for dataset in dl19 dl20; do
  for size in 3 7 14; do
    model_name_or_path=Qwen/Qwen2.5-${size}B-Instruct
    lora_path=ielabgroup/Setwise-SFT-${size}B-v0.1
    
    mkdir -p runs/${lora_path}
    
    CUDA_VISIBLE_DEVICES=0 python3 run_setwise.py \
    run --model_name_or_path ${model_name_or_path} \
      --lora_path ${lora_path} \
      --prompt_file prompts/prompt_setwise.toml \
      --run_path runs/pyserini_bm25/run.msmarco-v1-passage.bm25-default.${dataset}.txt \
      --save_path runs/${lora_path}/${dataset}.txt \
      --pyserini_dataset ${dataset}-passage \
      --pyserini_index msmarco-v1-passage \
      --hits 100 \
      --query_length 32 \
      --passage_length 512 \
      --scoring generation \
      setwise --num_child 19 \
               --method heapsort \
               --k 10
               
    python -m pyserini.eval.trec_eval -c -l 2 -m ndcg_cut.10 ${dataset}-passage runs/${lora_path}/${dataset}.txt > runs/${lora_path}/${dataset}.eval
    done
done

Setwise zeroshot

# Setwise zeroshot
for dataset in dl19 dl20; do
  for size in 3 7 14; do
    model_name_or_path=Qwen/Qwen2.5-${size}B-Instruct
    
    mkdir -p runs/${model_name_or_path}/zeroshot
    
    CUDA_VISIBLE_DEVICES=0 python3 run_setwise.py \
    run --model_name_or_path ${model_name_or_path} \
      --prompt_file prompts/prompt_setwise.toml \
      --run_path runs/pyserini_bm25/run.msmarco-v1-passage.bm25-default.${dataset}.txt \
      --save_path runs/${model_name_or_path}/zeroshot/${dataset}.txt \
      --pyserini_dataset ${dataset}-passage \
      --pyserini_index msmarco-v1-passage \
      --hits 100 \
      --query_length 32 \
      --passage_length 512 \
      --scoring generation \
      setwise --num_child 19 \
               --method heapsort \
               --k 10
               
    python -m pyserini.eval.trec_eval -c -l 2 -m ndcg_cut.10 ${dataset}-passage runs/${model_name_or_path}/zeroshot/${dataset}.txt > runs/${model_name_or_path}/zeroshot/${dataset}.eval
    done
done

Rankzephyr

# Rankzephyr
for dataset in dl19 dl20; do
    model_name_or_path=castorini/rank_zephyr_7b_v1_full
    
    mkdir -p runs/${model_name_or_path}
    
    CUDA_VISIBLE_DEVICES=0 python3 run_listwise.py \
      run --model_name_or_path castorini/rank_zephyr_7b_v1_full \
      --prompt_file prompts/prompt_listwise_rankzephyr.toml \
      --run_path runs/pyserini_bm25/run.msmarco-v1-passage.bm25-default.${dataset}.txt \
      --save_path runs/${model_name_or_path}/${dataset}.txt \
      --pyserini_dataset ${dataset}-passage \
      --pyserini_index msmarco-v1-passage \
      --hits 100 \
      --query_length 32 \
      --passage_length 512 \
      --scoring generation \
      listwise --window_size 20 \
               --step_size 10 \
               --num_repeat 1
            
    python -m pyserini.eval.trec_eval -c -l 2 -m ndcg_cut.10 ${dataset}-passage runs/${model_name_or_path}/${dataset}.txt > runs/${model_name_or_path}/${dataset}.eval
done

BRIGHT examples:

First go the bright folder and follow the instructions to obtain the BM25 run files.

Then similar to TREC DL exmaples, you can run the following commands to evaluate Rank-R1 with BM25 run files:

for size in 3 7 14; do
    model_name_or_path=Qwen/Qwen2.5-${size}B-Instruct
    lora_path=ielabgroup/Rank-R1-${size}B-v0.1
    
    for dataset in biology earth_science economics psychology robotics stackoverflow sustainable_living pony leetcode aops theoremqa_theorems theoremqa_questions; do
        CUDA_VISIBLE_DEVICES=0 python3 run_setwise.py \
            run --model_name_or_path ${model_name_or_path} \
              --lora_path ${lora_path} \
              --prompt_file prompts/prompt_setwise-R1.toml \
              --run_path bright/runs_bm25/bm25.${dataset}.filtered.trec \
              --save_path results/setwise/runs_bright/${lora_path}/${prompt_file}/run_${dataset}.txt \
              --query_file bright/data/pyserini_queries/${dataset}.tsv \
              --pyserini_index bright/data/pyserini_indexes/${dataset} \
              --hits 100 \
              --query_length 1024 \
              --passage_length 1024 \
              --scoring generation \
            setwise --num_child 19 \
                   --method heapsort \
                   --k 10

        python -m pyserini.eval.trec_eval -c -l 2 -m ndcg_cut.10 bright/data/pyserini_qrels/${dataset}.tsv \
          results/setwise/runs_bright/${lora_path}/${prompt_file}/run_${dataset}.txt > results/setwise/runs_bright/${lora_path}/${prompt_file}/run_${dataset}.eval
    done
done

Training

Train Rank-R1 GRPO

Step 1: Create training dataset

python create_dataset.py

Step 2: Train

deepspeed --include localhost:0,1,2,3 --master_port 60000 train_grpo.py

Train Setwise SFT

Step 1: Create training dataset

python create_dataset_sft.py

Step 2: Train

deepspeed --include localhost:0,1,2,3 --master_port 60000 train_sft.py

Citation

@misc{zhuang2025rankr1enhancingreasoningllmbased,
      title={Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning}, 
      author={Shengyao Zhuang and Xueguang Ma and Bevan Koopman and Jimmy Lin and Guido Zuccon},
      year={2025},
      eprint={2503.06034},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2503.06034}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning

Installation

Inference Rank-R1

Python example:

Released Models (LoRA adapters)

TREC DL examples:

BRIGHT examples:

Training

Citation

Name		Name	Last commit message	Last commit date
parent directory ..
bright		bright
prompts		prompts
README.md		README.md
create_dataset.py		create_dataset.py
create_dataset_sft.py		create_dataset_sft.py
create_dataset_v0.2.py		create_dataset_v0.2.py
ds_zero0_config.json		ds_zero0_config.json
hybrid.py		hybrid.py
run_listwise.py		run_listwise.py
run_setwise.py		run_setwise.py
train_grpo.py		train_grpo.py
train_grpo_v0.2.py		train_grpo_v0.2.py
train_sft.py		train_sft.py

FilesExpand file tree

Rank-R1

Directory actions

More options

Directory actions

More options

Latest commit

History

Rank-R1

Folders and files

parent directory

README.md

Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning

Installation

Inference Rank-R1

Python example:

Released Models (LoRA adapters)

TREC DL examples:

BRIGHT examples:

Training

Citation