Dense Modality Interaction Network for Audio-Visual Event Localization

This repo holds the code for the work presented on TMM [Paper]

Prerequisites

We provide the implementation in PyTorch for the ease of use.

Install the requirements by runing the following command:

pip install -r requirements.txt

Code and Data Preparation

We highly appreciate @YapengTian for the shared features and code.

Download Features

Two kinds of features (i.e., Visual features and Audio features) are required for experiments.

Visual Features: You can download the VGG visual features from here.

Audio Features: You can download the VGG-like audio features from here.

Additional Features: You can download the features of background videos here, which are required for the experiments of the weakly-supervised setting.

After downloading the features, please place them into the data folder. The structure of the data folder is shown as follows:

data
|——audio_features.h5
|——audio_feature_noisy.h5
|——labels.h5
|——labels_noisy.h5
|——mil_labels.h5
|——test_order.h5
|——train_order.h5
|——val_order.h5
|——visual_feature.h5
|——visual_feature_noisy.h5

Download Datasets (Optional)

You can download the AVE dataset from the repo here.

Training and testing DMIN in a fully-supervised setting

Training

bash supv_train.sh
# The argument "--snapshot_pref" denotes the path for saving checkpoints and code.

Evaluating

bash supv_test.sh

After training, there will be a checkpoint file whose name contains the accuracy on the test set and the number of epoch.

Training and testing DMIN in a Weakly-supervised setting

Training

bash weak_train.sh

Evaluating

bash weak_test.sh

Training and testing DCMR

For this task, we developed a cross-modal matching network. Here, we used visual feature vectors via global average pooling, and you can find here. Please put the feature into data folder. Note that the code was implemented via Keras-2.0 with Tensorflow as the backend.

Training

bash supv_train_a2v.sh
bash supv_train_v2a.sh

Evaluating

bash supv_test_a2v.sh
bash supv_test_v2a.sh

Citation

Please cite the following paper if you feel this repo useful to your research

@ARTICLE{9712233,
  author={Liu, Shuo and Quan, Weize and Wang, Chaoqun and Liu, Yuan and Liu, Bin and Yan, Dong-Ming},
  journal={IEEE Transactions on Multimedia}, 
  title={Dense Modality Interaction Network for Audio-Visual Event Localization}, 
  year={2022},
  volume={},
  number={},
  pages={1-1},
  doi={10.1109/TMM.2022.3150469}}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
DCMR		DCMR
DIMN		DIMN
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dense Modality Interaction Network for Audio-Visual Event Localization

Prerequisites

Code and Data Preparation

Download Features

Download Datasets (Optional)

Training and testing DMIN in a fully-supervised setting

Training and testing DMIN in a Weakly-supervised setting

Training and testing DCMR

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

weizequan/DMIN

Folders and files

Latest commit

History

Repository files navigation

Dense Modality Interaction Network for Audio-Visual Event Localization

Prerequisites

Code and Data Preparation

Download Features

Download Datasets (Optional)

Training and testing DMIN in a fully-supervised setting

Training and testing DMIN in a Weakly-supervised setting

Training and testing DCMR

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages