Skip to content

xua222/UPRet

Repository files navigation

UPRet

Official Implementations for Uncertainty-aware Sign Language Video Retrieval with Probability Distribution Modeling

Introduction

Image description
Illustration of: (a) Uncertainty. (b) Previous method. (C) Ours method


Sign language video retrieval is crucial for helping the hearing-impaired community to access information. Although significant progress has been made in the field of video-text retrieval, the complexity and inherent uncertainty of sign language make it difficult to directly apply these technologies. Previous methods have attempted to map sign language videos to text through fine-grained modality alignment. However, due to the scarcity of fine-grained annotations, the uncertainty in sign language videos has been underestimated, which has limited the further development of sign language retrieval tasks.


Image description
Framework overview.

To address this challenge, we propose the Uncertainty-aware Probability Distribution Retrieval (UPRet) method. This method treats the mapping process between sign language videos and text as a matching of probability distributions. It explores their potential relationships through dynamic semantic alignment, achieving flexible mapping. We model sign language videos and text using multivariate Gaussian distributions, allowing us to explore their correspondences in a broader semantic space. This approach more accurately captures the uncertainty and polysemy of sign language. Through Monte Carlo sampling, we thoroughly explore the structure and associations of the distributions and employ Optimal Transport to achieve fine-grained cross-modal alignment.

Performance

ModelT2VV2T
R@1R@5R@10MedRMnR R@1R@5R@10MedRMnR
How2Sign59.171.575.71.054.453.465.470.01.076.4
PHOENIX2014T72.089.194.11.04.472.089.493.31.04.6
CSL-Daily78.489.192.01.06.777.089.292.71.05.5
Environment
conda create --name yourEnv python=3.7
conda activate yourEnv
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm
pip install opencv-python boto3 requests pandas
pip install -r requirements.txt

Training

cd CLCL
python -m torch.distributed.launch --nproc_per_node=4 main_task_retrieval.py --do_train

Citations

@inproceedings{wu2024uncertainty,
      title={Uncertainty-aware sign language video retrieval with probability distribution modeling}, 
      author={Wu, Xuan and Li, Hongxiang and Luo, Yuanjiang and Cheng, Xuxin and Zhuang, Xianwei and Cao, Meng and Fu, Keren},
      year={2024},
      booktitle={European Conference on Computer Vision},
}

Acknowledgment

This code is based on CiCo.

About

[ECCV 2024] Uncertainty-aware Sign Language Video Retrieval with Probability Distribution Modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors