Official implementation for Hybrid and Collaborative Passage Reranking
Create conda environment via running:
conda env create -f environment.yml
conda activate hybrankYou can download preprocessed data from HuggingFace Repo including:
- Natural Questions Trainset, Devset and Testset with DPR-Multi retriever
- Pretrained checkpoint for Natural Questions with DPR-Multi retriever
- MS MARCO Trainset and Devset and TREC 2019/2020 Testset with ANCE retriever
- Pretrained checkpoint for MS MARCO with ANCE retriever
Preprocessed data for other datasets or retrievers (~800G in total) will not be uploaded due to the space limitation of cloud storage.
Please contact the authors for these preprocessed data or preprocess by yourself following instructions in data/README.md.
Note that training data should be generated following the instructions in data/README.md
Train HybRank on Natural Questions with DPR-Multi retriever:
python main.py --exp_name NQ_DPR-Multi --data_path data/NQ_DPR-MultiEvaluate HybRank on Natural Questions with DPR-Multi retriever:
python main.py --exp_name test_NQ_DPR-Multi --data_path data/NQ_DPR-Multi --resume experiments/NQ_DPR-Multi/best-model.pth --only_evalEvaluate HybRank on MS MARCO with ANCE retriever:
python main.py --exp_name test_MSMARCO_ANCE --data_path data/MSMARCO_ANCE --resume experiments/MSMARCO_ANCE/best-model.pth --only_evalEvaluate HybRank on TREC 2019 with ANCE retriever:
python main.py --exp_name test_TRECDL2019_ANCE --data_path data/TRECDL2019_ANCE --resume experiments/MSMARCO_ANCE/best-model.pth --only_evalEvaluate HybRank on TREC 2020 with ANCE retriever:
python main.py --exp_name test_TRECDL2020_ANCE --data_path data/TRECDL2020_ANCE --resume experiments/MSMARCO_ANCE/best-model.pth --only_evalDisplay help information by:
python main.py -hPlease cite the following paper if HybRank is helpful for your research
@inproceedings{zhang-etal-2023-hybrid,
title = "Hybrid and Collaborative Passage Reranking",
author = "Zhang, Zongmeng and
Zhou, Wengang and
Shi, Jiaxin and
Li, Houqiang",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.880",
doi = "10.18653/v1/2023.findings-acl.880",
pages = "14003--14021",
}
