This repo holds the code for the work presented on TMM [Paper]
We provide the implementation in PyTorch for the ease of use.
Install the requirements by runing the following command:
pip install -r requirements.txtWe highly appreciate @YapengTian for the shared features and code.
Two kinds of features (i.e., Visual features and Audio features) are required for experiments.
- Visual Features: You can download the VGG visual features from here.
- Audio Features: You can download the VGG-like audio features from here.
- Additional Features: You can download the features of background videos here, which are required for the experiments of the weakly-supervised setting.
After downloading the features, please place them into the data folder. The structure of the data folder is shown as follows:
data
|——audio_features.h5
|——audio_feature_noisy.h5
|——labels.h5
|——labels_noisy.h5
|——mil_labels.h5
|——test_order.h5
|——train_order.h5
|——val_order.h5
|——visual_feature.h5
|——visual_feature_noisy.h5You can download the AVE dataset from the repo here.
Training
bash supv_train.sh
# The argument "--snapshot_pref" denotes the path for saving checkpoints and code.Evaluating
bash supv_test.shAfter training, there will be a checkpoint file whose name contains the accuracy on the test set and the number of epoch.
Training
bash weak_train.shEvaluating
bash weak_test.shFor this task, we developed a cross-modal matching network. Here, we used visual feature vectors via global average pooling, and you can find here. Please put the feature into data folder. Note that the code was implemented via Keras-2.0 with Tensorflow as the backend.
Training
bash supv_train_a2v.sh
bash supv_train_v2a.shEvaluating
bash supv_test_a2v.sh
bash supv_test_v2a.shPlease cite the following paper if you feel this repo useful to your research
@ARTICLE{9712233,
author={Liu, Shuo and Quan, Weize and Wang, Chaoqun and Liu, Yuan and Liu, Bin and Yan, Dong-Ming},
journal={IEEE Transactions on Multimedia},
title={Dense Modality Interaction Network for Audio-Visual Event Localization},
year={2022},
volume={},
number={},
pages={1-1},
doi={10.1109/TMM.2022.3150469}}