Current weakly-supervised temporal action localization methods ignore the natural temporal structure of the video that can provide rich information to assist such a generation process. In this paper, we propose a novel weakly-supervised temporal action localization method by inferring salient snippet-feature. First, we design a saliency inference module that exploits the variation relationship between temporal neighbor snippets to discover salient snippet-features, which can reflect the significant dynamic change in the video. Secondly, we introduce a boundary refinement module that enhances salient snippet-features through the information interaction unit. Then, a discrimination enhancement module is introduced to enhance the discriminative nature of snippet-features. Finally, we adopt the refined snippet-features to produce high-fidelity pseudo labels, which could be used to supervise the training of the action localization network.
2025-04💾 We released our code.2024-02🚀 Our paper accepted by AAAI 2024.
We utilize the mean average precision (mAP) as an evaluation metric to assess the performance of our method, consistent with prior state-of-the-art work, and report mAP at different IoU threshold.

- Prepare THUMOS'14 dataset.
- Download the dataset from the link provided in this repo.
- Unzip it under the
dataset/folder.
- Dependencies
- python == 3.6.13
- torch== 1.10.0
- Create conda environment
conda create --name ISSF python=3.6.13 source activate ISSF pip install -r requirements.txt
Run the following code to start training.
python main.py --run-type train --dataset-dir ./dataset/ --log-dir logsRun the following code to start evaluation.
python main.py --run-type test --dataset-dir ./dataset/ Our evaluation code is build upon BaSNet, ASM-Loc, RSKP. We acknowledge these team for their valuable contributions to the field of weakly-supervised temporal action localization.
If you find this project useful for your research, please use the following BibTeX entry.
@inproceedings{yun2024weakly,
title={Weakly-Supervised Temporal Action Localization by Inferring Salient Snippet-Feature},
author={Yun, Wulian and Qi, Mengshi and Wang, Chuanming and Ma, Huadong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={7},
pages={6908--6916},
year={2024}
}