MSPI for AVSP

This project provides the code for 'Audio-Visual Saliency Prediction with Multisensory Perception and Integration', Image and Vision Computing, 2024. Paper link.

Download dataset

You can download from the above original links or from the STAViS's resources.

Download pretrained backbones

SlowFast, X3D and MViTv2, facebookresearch/SlowFast
Uniformer, Sense-X/UniFormer
VideoSwin, SwinTransformer/Video-Swin-Transformer
MorphMLP, MTLab/MorphMLP
S3D, kylemin/S3D
ResNet18-VGGSound, hche11/VGGSound

😊Thank the above researchers for releasing their codes and sharing model weights!!!

The variants in the paper are testified. If you use other variants of the above models, you should change the corresponding .yaml files and settings in config.py.

Requirements

For PySlowFast installation, you can refer to this, but there might not be compatible with our code.

If you use the PySlowFast codes in this repository, partial model codes' connection to Detectron2 is cut, thus you can ignore the installation for Detectron2.

timm==0.6.12
torch==1.11.0

Training

The dataset directory structure should be

dataset/
    video_frames/ 
        .../ (directories of datasets names) 
    video_audio/ 
        .../ (directories of datasets names)
    annotations/ 
        .../ (directories of datasets names) 
    fold_lists/
        *.txt (lists of datasets splits)

Download weight of image saliency encoder

You can download it from this. The model is first trained on SALICON and then finetuned on MIT1003.

Set up config.py for training

Set paths to dataset, pretrained weight files and YAML files. Set selected backbone and more. The following setting is crucial:

_model_name
cfg.DATA.ROOT
_MOTION_WEIGHTS
cfg.MODEL.IMAGE_SALIENCY_ENCODER_WEIGHT
cfg.MODEL.AUDIO_ENCODER_WEIGHT 
.PATH_CFG

Then run the code using

$ python train.py --session_name --split --num_workers --save_ckpt_freq

Testing

Clone this repository and download the three-split weights of our model from this link. Then run the code using

$ python inference.py --weight path/to/weight --path_data path/to/dataset --split split/of/dataset

Evaluiation

The MATLAB code is used for evaluation.

Citation

If you think this project is helpful, please feel free to cite our paper:

@article{XIE2024104955,
    title = {Audio-visual saliency prediction with multisensory perception and integration},
    journal = {Image and Vision Computing},
    pages = {104955},
    year = {2024},
    issn = {0262-8856},
    doi = {https://doi.org/10.1016/j.imavis.2024.104955},
    url = {https://www.sciencedirect.com/science/article/pii/S0262885624000581},
    author = {Jiawei Xie and Zhi Liu and Gongyang Li and Yingjie Song}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
SlowFast		SlowFast
backbones		backbones
configs		configs
model		model
utils		utils
weights		weights
README.md		README.md
avsp_dataloader.py		avsp_dataloader.py
config.py		config.py
engine_train.py		engine_train.py
inference.py		inference.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MSPI for AVSP

Download dataset

Download pretrained backbones

Requirements

Training

Download weight of image saliency encoder

Set up config.py for training

Testing

Evaluiation

Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

oraclefina/MSPI

Folders and files

Latest commit

History

Repository files navigation

MSPI for AVSP

Download dataset

Download pretrained backbones

Requirements

Training

Download weight of image saliency encoder

Set up config.py for training

Testing

Evaluiation

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages