We build on the code provided by Google DeepMind to evaluate XTR. This evaluation serves as the baseline for the highly optimized XTR/WARP retrieval engine.
xtr-eval requires Python 3.8+, PyTorch 1.9+ and Tensorflow 2.8.2 and uses the Hugging Face Transformers library.
We evaluate XTR using the XTR_base checkpoint provided on Hugging Face.
It is strongly recommended to create a conda environment using the commands below. We include the corresponding environment file (environment.yml).
conda activate xtr-eval
source ./scripts/build_indexes.shTo construct indexes and perform retrieval, define the following values in a config.yml file in the repository root:
BEIR_COLLECTION_PATH: "..."
LOTTE_COLLECTION_PATH: "..."BEIR_COLLECTION_PATH: Designates the path to the datasets of the BEIR Benchmark.LOTTE_COLLECTION_PATH: Specifies the path to the LoTTE dataset.
To download and extract a dataset from the BEIR Benchmark use the extract_collection.py script provided in XTR/WARP:
python utility/extract_collection.py -d ${dataset} -i "${BEIR_COLLECTION_PATH}" -s testReplace ${dataset} with the desired dataset name as specified here.
- Download the LoTTE dataset files from here.
- Extract the files manually to the directory specified in
LOTTE_COLLECTION_PATH.