OpenMixer

This repository released the source code of the WACV 2025 paper OpenMixer, heavily dependent on the STMixer codebase. OpenMixer is an open-vocabulary action detector that aims to detect any human actions from videos in an open world. The figure below shows the model architecture.

Installation

Create conda environment:

conda create -n openmixer python=3.7

Install pytorch:

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

Install other libraries (including the OpenAI-CLIP):

pip install -r requirements.txt

Data Preparation

First, please refer to the MMAction2 JHMDB and UCF24 dataset preparation steps.
Next, please download our released Open-World splits. Make sure folders are structured as follows.

data
├──JHMDB
|   ├── openworld
|   ├── Frames
|   ├── JHMDB-MaskRCNN.pkl
|   ├── JHMDB-GT.pkl
├──UCF24
|   ├── openworld
|   ├── rgb-images
|   ├── UCF24-MaskRCNN.pkl

Models

Please download the pretrained CLIP-ViP-B/16 checkpoint from XPretrain/CLIP-ViP, which is a video CLIP model served as the backbone of our model. After downloaded, make sure the file is located at ./pretrained/pretrain_clipvip_base_16.pt.
[Optional] We released three OpenMixer models and inference results for each of the JHMDB and UCF24 datasets here: Google Drive. They correspond to the configurations in the folder ./config_files/. Note that for the ZSR_ZSL setting, no model training needed.

Training

We provided an easy-to-use bash script to enable training and evaluation for different settings and datasets. For example, to train the OpenMixer model under the end-to-end setting on the JHMDB dataset using 4 specified GPUs:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash trainval.sh train jhmdb

Optionally, you may change the GPU IDs and dataset name to ucf24. For other settings, in the trainval.sh, change the CFG_FILE to openmixer_zsr_tl.yaml to train OpenMixer model under the ZSR+TL setting.

Validation

We use the same bash script for validation (inference + evaluation)

CUDA_VISIBLE_DEVICES=0,1,2,3 bash trainval.sh eval jhmdb

Optionally, you may change the GPU IDs and dataset name to ucf24. For other settings, in the trainval.sh, change the CFG_FILE to openmixer_zsr_tl.yaml and openmixer_zsr_zsl.yaml for evaluating models under the ZSR+TL and ZSR+ZSL settings, respectively.

Acknowledgements

This project is built upon STMixer, CLIP-ViP, and OpenAI-CLIP. We sincerely thank contributors of all these great open-source repositories!

Citation

If this project helps you in your research or project, please cite our paper:

@InProceedings{bao2025wacv,
  title={Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection},
  author={Wentao Bao and Kai Li and Yuxiao Chen and Deep Patel and Martin Renqiang Min and Yu Kong},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
alphaction		alphaction
assets		assets
config_files		config_files
preprocess		preprocess
third_party		third_party
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
requirements.txt		requirements.txt
test_net.py		test_net.py
train_net.py		train_net.py
trainval.sh		trainval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OpenMixer

Installation

Data Preparation

Models

Training

Validation

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Cogito2012/OpenMixer

Folders and files

Latest commit

History

Repository files navigation

OpenMixer

Installation

Data Preparation

Models

Training

Validation

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages