Hybrid and Collaborative Passage Reranking

Official implementation for Hybrid and Collaborative Passage Reranking

Model Pipeline

Setup

Environment Setup

Create conda environment via running:

conda env create -f environment.yml
conda activate hybrank

Data

You can download preprocessed data from HuggingFace Repo including:

Natural Questions Trainset, Devset and Testset with DPR-Multi retriever
Pretrained checkpoint for Natural Questions with DPR-Multi retriever
MS MARCO Trainset and Devset and TREC 2019/2020 Testset with ANCE retriever
Pretrained checkpoint for MS MARCO with ANCE retriever

Preprocessed data for other datasets or retrievers (~800G in total) will not be uploaded due to the space limitation of cloud storage.

Please contact the authors for these preprocessed data or preprocess by yourself following instructions in data/README.md.

Code Running

Training

Note that training data should be generated following the instructions in data/README.md

Train HybRank on Natural Questions with DPR-Multi retriever:

python main.py --exp_name NQ_DPR-Multi --data_path data/NQ_DPR-Multi

Evaluation

Evaluate HybRank on Natural Questions with DPR-Multi retriever:

python main.py --exp_name test_NQ_DPR-Multi --data_path data/NQ_DPR-Multi --resume experiments/NQ_DPR-Multi/best-model.pth --only_eval

Evaluate HybRank on MS MARCO with ANCE retriever:

python main.py --exp_name test_MSMARCO_ANCE --data_path data/MSMARCO_ANCE --resume experiments/MSMARCO_ANCE/best-model.pth --only_eval

Evaluate HybRank on TREC 2019 with ANCE retriever:

python main.py --exp_name test_TRECDL2019_ANCE --data_path data/TRECDL2019_ANCE --resume experiments/MSMARCO_ANCE/best-model.pth --only_eval

Evaluate HybRank on TREC 2020 with ANCE retriever:

python main.py --exp_name test_TRECDL2020_ANCE --data_path data/TRECDL2020_ANCE --resume experiments/MSMARCO_ANCE/best-model.pth --only_eval

Help Information

Display help information by:

python main.py -h

Reference

Please cite the following paper if HybRank is helpful for your research

@inproceedings{zhang-etal-2023-hybrid,
    title = "Hybrid and Collaborative Passage Reranking",
    author = "Zhang, Zongmeng  and
      Zhou, Wengang  and
      Shi, Jiaxin  and
      Li, Houqiang",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.findings-acl.880",
    doi = "10.18653/v1/2023.findings-acl.880",
    pages = "14003--14021",
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dataset.py		dataset.py
environment.yml		environment.yml
evaluation.py		evaluation.py
main.py		main.py
model-pipeline.png		model-pipeline.png
models.py		models.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hybrid and Collaborative Passage Reranking

Model Pipeline

Setup

Environment Setup

Data

Code Running

Training

Evaluation

Help Information

Reference

About

Uh oh!

Releases

Packages

Languages

zmzhang2000/HybRank

Folders and files

Latest commit

History

Repository files navigation

Hybrid and Collaborative Passage Reranking

Model Pipeline

Setup

Environment Setup

Data

Code Running

Training

Evaluation

Help Information

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages