Skip to content

weizequan/DMIN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Dense Modality Interaction Network for Audio-Visual Event Localization

This repo holds the code for the work presented on TMM [Paper]

Prerequisites

We provide the implementation in PyTorch for the ease of use.

Install the requirements by runing the following command:

pip install -r requirements.txt

Code and Data Preparation

We highly appreciate @YapengTian for the shared features and code.

Download Features

Two kinds of features (i.e., Visual features and Audio features) are required for experiments.

  • Visual Features: You can download the VGG visual features from here.
  • Audio Features: You can download the VGG-like audio features from here.
  • Additional Features: You can download the features of background videos here, which are required for the experiments of the weakly-supervised setting.

After downloading the features, please place them into the data folder. The structure of the data folder is shown as follows:

data
|——audio_features.h5
|——audio_feature_noisy.h5
|——labels.h5
|——labels_noisy.h5
|——mil_labels.h5
|——test_order.h5
|——train_order.h5
|——val_order.h5
|——visual_feature.h5
|——visual_feature_noisy.h5

Download Datasets (Optional)

You can download the AVE dataset from the repo here.

Training and testing DMIN in a fully-supervised setting

Training

bash supv_train.sh
# The argument "--snapshot_pref" denotes the path for saving checkpoints and code.

Evaluating

bash supv_test.sh

After training, there will be a checkpoint file whose name contains the accuracy on the test set and the number of epoch.

Training and testing DMIN in a Weakly-supervised setting

Training

bash weak_train.sh

Evaluating

bash weak_test.sh

Training and testing DCMR

For this task, we developed a cross-modal matching network. Here, we used visual feature vectors via global average pooling, and you can find here. Please put the feature into data folder. Note that the code was implemented via Keras-2.0 with Tensorflow as the backend.

Training

bash supv_train_a2v.sh
bash supv_train_v2a.sh

Evaluating

bash supv_test_a2v.sh
bash supv_test_v2a.sh

Citation

Please cite the following paper if you feel this repo useful to your research

@ARTICLE{9712233,
  author={Liu, Shuo and Quan, Weize and Wang, Chaoqun and Liu, Yuan and Liu, Bin and Yan, Dong-Ming},
  journal={IEEE Transactions on Multimedia}, 
  title={Dense Modality Interaction Network for Audio-Visual Event Localization}, 
  year={2022},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TMM.2022.3150469}}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •