Official Implementations for Uncertainty-aware Sign Language Video Retrieval with Probability Distribution Modeling
Illustration of: (a) Uncertainty. (b) Previous method. (C) Ours method
Sign language video retrieval is crucial for helping the hearing-impaired community to access information. Although significant progress has been made in the field of video-text retrieval, the complexity and inherent uncertainty of sign language make it difficult to directly apply these technologies. Previous methods have attempted to map sign language videos to text through fine-grained modality alignment. However, due to the scarcity of fine-grained annotations, the uncertainty in sign language videos has been underestimated, which has limited the further development of sign language retrieval tasks.
To address this challenge, we propose the Uncertainty-aware Probability Distribution Retrieval (UPRet) method. This method treats the mapping process between sign language videos and text as a matching of probability distributions. It explores their potential relationships through dynamic semantic alignment, achieving flexible mapping. We model sign language videos and text using multivariate Gaussian distributions, allowing us to explore their correspondences in a broader semantic space. This approach more accurately captures the uncertainty and polysemy of sign language. Through Monte Carlo sampling, we thoroughly explore the structure and associations of the distributions and employ Optimal Transport to achieve fine-grained cross-modal alignment.
| Model | T2V | V2T | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| R@1 | R@5 | R@10 | MedR | MnR | R@1 | R@5 | R@10 | MedR | MnR | |
| How2Sign | 59.1 | 71.5 | 75.7 | 1.0 | 54.4 | 53.4 | 65.4 | 70.0 | 1.0 | 76.4 |
| PHOENIX2014T | 72.0 | 89.1 | 94.1 | 1.0 | 4.4 | 72.0 | 89.4 | 93.3 | 1.0 | 4.6 |
| CSL-Daily | 78.4 | 89.1 | 92.0 | 1.0 | 6.7 | 77.0 | 89.2 | 92.7 | 1.0 | 5.5 |
conda create --name yourEnv python=3.7
conda activate yourEnv
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm
pip install opencv-python boto3 requests pandas
pip install -r requirements.txt
cd CLCL
python -m torch.distributed.launch --nproc_per_node=4 main_task_retrieval.py --do_train
@inproceedings{wu2024uncertainty,
title={Uncertainty-aware sign language video retrieval with probability distribution modeling},
author={Wu, Xuan and Li, Hongxiang and Luo, Yuanjiang and Cheng, Xuxin and Zhuang, Xianwei and Cao, Meng and Fu, Keren},
year={2024},
booktitle={European Conference on Computer Vision},
}
This code is based on CiCo.
