This is the official code repository for paper Document Screenshot Retrievers are Vulnerable to Pixel Poisoning Attacks, Shengyao Zhuang, Ekaterina Khramtsova, Xueguang Ma, Bevan Koopman, Jimmy Lin and Guido Zuccon.
- 2025-04-05: Our paper has been accepted at SIGIR2025!
conda create -n dsr-poisoning python=3.10
conda activate dsr-poisoning
pip install torch torchvision
pip install vidore-benchmark
pip install transformers==4.48.2
pip install qwen-vl-utilsWe provided our data (target queries, seed images) used in the paper in the data directory. Checkout the README in the directory for more details.
In this example we use Trump's wikipedia page as the seed image and the target queries are Wiki-SS-NQ training queries. That is, optimize the Trump's wikipedia page to be retrieved by all training queries.
model=dse # or colpali
CUDA_VISIBLE_DEVICES=0 python dsr_attacks.py \
--model $model \
--seed_image ./trump_wiki.png \
--target_query_file ./data/nq_train_queries/queries_kmeans_dse_1.json \
--save_dir ./trump_$model_adv_images \
--num_steps 3000 \
--optimization_method directHere, the screenshot retriever is DSE model, change --model dse to --model colpali for ColPali retrieval model.
This attack config is Direct Optimisation with 100% gradient update.
For Direct Optimisation with less gradient update, for example 50%, set:
--optimization_method direct
--grad_ratio 0.5For Noise Optimisation with less gradient update, for example 50%, set:
--optimization_method noise
--grad_ratio 0.5For Direct Mask Optimisation with a mask margin ratio, for example 20%, set:
--optimization_method direct
--mask_ratio 0.2Evaluate on Wiki-SS-Corpus dataset
Following tevatron DSE example to encode wiki-ss-corpus to get query and document screenshot embeddings.
CUDA_VISIBLE_DEVICES=0 python eval_nq_dse.py \
--adv_image_dir ./trump_dse_adv_images \
--corpus_embeddings_dir path_to_embedding_dir
# {'success_1': 0.11966759002770083, 'success_5': 0.32797783933518004, 'success_10': 0.42077562326869805, 'success_100': 0.7030470914127424, 'mrr_100': 0.21772750989553608}Following tevatron ColPali example to encode wiki-ss-corpus to get query and document screenshot embeddings. Note, for ColPali we need to run file from tevatron search results.
CUDA_VISIBLE_DEVICES=0 python eval_nq_colpali.py \
--adv_image_dir ./trump_colpali_adv_images \
--corpus_embeddings_dir path_to_embedding_dir \
--run_file path_to_run_file
# {'success_1': 0.09362880886426593, 'success_5': 0.19667590027700832, 'success_10': 0.25096952908587256, 'success_100': 0.521606648199446, 'mrr_100': 0.15225463089595362}Since VIDORE datasets are small, we can encode the datasets for both DSE and ColPali models on-the-fly.
Set --model to dse or colpali to evaluate on DSE or ColPali model.
CUDA_VISIBLE_DEVICES=0 python eval_vidore.py \
--model dse \
--adv_image_dir ./trump_dse_adv_images