Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator

Official implementation for the ICCV 2025 paper, Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator.

Introduction

Sign language is a visual language that encompasses all linguistic features of natural languages and serves as the primary communication method for the deaf and hard-of-hearing communities. Although many studies have successfully adapted pretrained language models (LMs) for sign language translation (sign-to-text), the reverse task—sign language generation (text-to-sign)—remains largely unexplored. In this work, we introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs using a pretrained LM. To align sign language with the LM, we leverage a decoupled tokenizer that discretizes continuous signs into token sequences representing various body parts. During decoding, unlike existing approaches that flatten all part-wise tokens into a single sequence and predict one token at a time, we propose a multi-head decoding method capable of predicting multiple tokens simultaneously. This approach improves inference efficiency while maintaining effective information fusion across different body parts. To further ease the generation process, we propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs as auxiliary conditions, significantly improving the precision of generated signs.

Environment

Please run

conda create python=3.10 --name soke
conda activate soke
pip install -r requirements.txt

Data

Continuous Sign Language Datasets

How2Sign: raw videos(Green Screen RGB clips (frontal view)) and split files.

CSL-Daily: raw videos and split files.

Phoenix-2014T: raw videos and split files.

SMPL-X Poses can be downloaded from the project homepage.

Models

Human Models

Please download human models (mano, smpl, smplh, and smplx) from here and unzip them into deps/smpl_models.

Download t2m evaluators via sh prepare/download_t2m_evaluators.sh.

Down t5 models via sh prepare/prepare_t5.sh. Note that this aims to avoid errors caused by the default config.

Language Model

We use mBart-large-cc25, which can be downloaded here. Put the files into deps/mbart-h2s-csl-phoenix

Decoupled Tokenizer

Training

python -m train --cfg configs/deto.yaml --nodebug

Inference

python -m test --cfg configs/deto.yaml --nodebug

We also provide the mean and the std of the SMPL-X poses. The checkpoint of the tokenizer is available here.

Autoregressive Multilingual Generator

Training

python -m get_motion_code --cfg configs/soke.yaml --nodebug
python -m train --cfg configs/soke.yaml --nodebug  #Note that please first update the path of the tokenizer's checkpoint.

Inference

python -m test --cfg configs/soke.yaml --task t2m  #you can set SAVE_PREDICTIONS in the config file to True if you want to save them.

Visualizations

Simple visualizations for meshes can be done by running

python -m vis_mesh --cfg=configs/soke.yaml --demo_dataset=csl

For colorful visualizations, please refer to the configurations of BlenderToolbox, and run

python vis_blender.py

Acknowledgements

We sincerely thank the open-sourced codes of these works where our code is based on: MotionGPT, ProgressiveTransformer, WiLoR, and OSX.

Please contact r.zuo@imperial.ac.uk for further questions.

Citations

@inproceedings{zuo2025soke,
    title={Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator},
    author={Zuo, Ronglai and Potamias, Rolandos Alexandros and Ververas, Evangelos and Deng, Jiankang and Zafeiriou, Stefanos},
    booktitle={ICCV},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
BlenderToolBox		BlenderToolBox
configs		configs
mGPT		mGPT
prepare		prepare
scripts		scripts
.gitignore		.gitignore
README.md		README.md
license.txt		license.txt
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py
vis_blender.py		vis_blender.py
vis_mesh.py		vis_mesh.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator

Introduction

Environment

Data

Continuous Sign Language Datasets

Models

Human Models

Language Model

Decoupled Tokenizer

Training

Inference

Autoregressive Multilingual Generator

Training

Inference

Visualizations

Acknowledgements

Citations

About

Uh oh!

Releases

Packages

Languages

License

2000ZRL/SOKE

Folders and files

Latest commit

History

Repository files navigation

Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator

Introduction

Environment

Data

Continuous Sign Language Datasets

Models

Human Models

Language Model

Decoupled Tokenizer

Training

Inference

Autoregressive Multilingual Generator

Training

Inference

Visualizations

Acknowledgements

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages