avsr

Real-time ASR/VSR/AV-ASR Examples

Introduction

This directory contains the training recipe for real-time audio, visual, and audio-visual speech recognition (ASR, VSR, AV-ASR) models, which is an extension of Auto-AVSR.

Preparation

Install PyTorch (pytorch, torchvision, torchaudio) from source, along with all necessary packages:

pip install torch torchvision torchaudio pytorch-lightning sentencepiece

Preprocess LRS3. See the instructions in the data_prep folder.

Usage

Training

python train.py --exp-dir=[exp_dir] \
                --exp-name=[exp_name] \
                --modality=[modality] \
                --mode=[mode] \
                --root-dir=[root-dir] \
                --sp-model-path=[sp_model_path] \
                --num-nodes=[num_nodes] \
                --gpus=[gpus]

exp-dir and exp-name: The directory where the checkpoints will be saved, will be stored at the location [exp_dir]/[exp_name].
modality: Type of the input modality. Valid values are: video, audio, and audiovisual.
mode: Type of the mode. Valid values are: online and offline.
root-dir: Path to the root directory where all preprocessed files will be stored.
sp-model-path: Path to the sentencepiece model. Default: ./spm_unigram_1023.model, which can be produced using train_spm.py.
num-nodes: The number of machines used. Default: 4.
gpus: The number of gpus in each machine. Default: 8.

Evaluation

python eval.py --modality=[modality] \
               --mode=[mode] \
               --root-dir=[dataset_path] \
               --sp-model-path=[sp_model_path] \
               --checkpoint-path=[checkpoint_path]

modality: Type of the input modality. Valid values are: video, audio, and audiovisual.
mode: Type of the mode. Valid values are: online and offline.
root-dir: Path to the root directory where all preprocessed files will be stored.
sp-model-path: Path to the sentencepiece model. Default: ./spm_unigram_1023.model.
checkpoint-path: Path to a pre-trained model.

Results

The table below contains WER for AV-ASR models that were trained from scratch [offline evaluation].

Model	Training dataset (hours)	WER [%]	Params (M)
Non-streaming models
AV-ASR	LRS3 (438)	3.9	50
Streaming models
AV-ASR	LRS3 (438)	3.9	40

Name		Name	Last commit message	Last commit date
parent directory ..
data_prep		data_prep
models		models
README.md		README.md
average_checkpoints.py		average_checkpoints.py
data_module.py		data_module.py
eval.py		eval.py
lightning.py		lightning.py
lightning_av.py		lightning_av.py
lrs3.py		lrs3.py
schedulers.py		schedulers.py
train.py		train.py
train_spm.py		train_spm.py
transforms.py		transforms.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Real-time ASR/VSR/AV-ASR Examples

Introduction

Preparation

Usage

Training

Evaluation

Results

FilesExpand file tree

avsr

Directory actions

More options

Directory actions

More options

Latest commit

History

avsr

Folders and files

parent directory

README.md

Real-time ASR/VSR/AV-ASR Examples

Introduction

Preparation

Usage

Training

Evaluation

Results