UPRet

Official Implementations for Uncertainty-aware Sign Language Video Retrieval with Probability Distribution Modeling

Introduction

Illustration of: (a) Uncertainty. (b) Previous method. (C) Ours method

Sign language video retrieval is crucial for helping the hearing-impaired community to access information. Although significant progress has been made in the field of video-text retrieval, the complexity and inherent uncertainty of sign language make it difficult to directly apply these technologies. Previous methods have attempted to map sign language videos to text through fine-grained modality alignment. However, due to the scarcity of fine-grained annotations, the uncertainty in sign language videos has been underestimated, which has limited the further development of sign language retrieval tasks.

Framework overview.

To address this challenge, we propose the Uncertainty-aware Probability Distribution Retrieval (UPRet) method. This method treats the mapping process between sign language videos and text as a matching of probability distributions. It explores their potential relationships through dynamic semantic alignment, achieving flexible mapping. We model sign language videos and text using multivariate Gaussian distributions, allowing us to explore their correspondences in a broader semantic space. This approach more accurately captures the uncertainty and polysemy of sign language. Through Monte Carlo sampling, we thoroughly explore the structure and associations of the distributions and employ Optimal Transport to achieve fine-grained cross-modal alignment.

Performance

Model	T2V					V2T
Model	R@1	R@5	R@10	MedR	MnR	R@1	R@5	R@10	MedR	MnR
How2Sign	59.1	71.5	75.7	1.0	54.4	53.4	65.4	70.0	1.0	76.4
PHOENIX2014T	72.0	89.1	94.1	1.0	4.4	72.0	89.4	93.3	1.0	4.6
CSL-Daily	78.4	89.1	92.0	1.0	6.7	77.0	89.2	92.7	1.0	5.5

Environment

conda create --name yourEnv python=3.7
conda activate yourEnv
conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0
pip install ftfy regex tqdm
pip install opencv-python boto3 requests pandas
pip install -r requirements.txt

Training

cd CLCL
python -m torch.distributed.launch --nproc_per_node=4 main_task_retrieval.py --do_train

Citations

@inproceedings{wu2024uncertainty,
      title={Uncertainty-aware sign language video retrieval with probability distribution modeling}, 
      author={Wu, Xuan and Li, Hongxiang and Luo, Yuanjiang and Cheng, Xuxin and Zhuang, Xianwei and Cao, Meng and Fu, Keren},
      year={2024},
      booktitle={European Conference on Computer Vision},
}

Acknowledgment

This code is based on CiCo.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data_csl		data_csl
data_h2		data_h2
data_ph		data_ph
dataloaders		dataloaders
img		img
modules		modules
README.md		README.md
main_task_retrieval.py		main_task_retrieval.py
main_task_retrieval_clip.py		main_task_retrieval_clip.py
main_task_retrieval_dis.py		main_task_retrieval_dis.py
metrics.py		metrics.py
test_csl.sh		test_csl.sh
test_h2s.sh		test_h2s.sh
train_clcl.sh		train_clcl.sh
train_csl.sh		train_csl.sh
train_ph.sh		train_ph.sh
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UPRet

Introduction

Performance

Training

Citations

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UPRet

Introduction

Performance

Training

Citations

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages