DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos
Jinfang Gan1,
Wenzheng Zeng1,2*,
Yang Xiao1β ,
Xintao Zhang1,
Chaoyang Zheng1,
Ran Zhao1,
Ran Wang3,4,
Min Du5,
Zhiguo Cao1
1Huazhong University of Science and Technology,
2National University of Singapore,
3School of Journalism and Information Communication, HUST,
4School of Future Technology, HUST,
5ByteDance
π Paper | π¦ Dataset | π¬ Demo Video
DeFB achieves a superior accuracy-efficiency balance compared to other SOTA methods.
This repository contains the official implementation of the AAAI 2026 paper "DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos".
-
π Rethinking Unified Models: We identify two critical limitations in existing unified multi-person eyeblink detection models: (1) feature granularity conflict between face localization and eyeblink detection, and (2) unstable face-eye feature learning during joint training.
-
π§© Decomposed Feature Learning: We propose DeFB, which models faces and eyes in granularity-specific feature spaces. This enables fine-grained spatio-temporal modeling for eyeblink detection while maintaining efficiency for face localization.
-
β‘ Asynchronous Training Strategy: We adopt an asynchronous learning mechanism where eye feature learning refines well-trained coarse face features, significantly improving training stability and convergence.
-
π State-of-the-Art Performance: DeFB doubles the performance compared to previous SOTA (Blink-AP: 24.65% vs. 10.11%) while boosting efficiency by nearly 35%.
-
π Plug-and-Play Capability: DeFB can be integrated as a plug-in to substantially augment the eyeblink detection capabilities of general action detectors.
-
Create a new conda environment:
conda create -n defb python=3.9 conda activate defb
-
Install PyTorch (2.0.1+ is recommended):
pip install torch>=2.0.1 torchvision>=0.15.2
-
Install other dependencies:
pip install -r requirements.txt
-
Download the MPEblink dataset from Zenodo.
-
Organize the dataset as follows:
data/ βββ mpeblink/ βββ videos/ β βββ train/ β βββ val/ βββ annotations/ β βββ train.json β βββ val.json βββ raw_frames/ # Generated in next step -
Convert videos to raw frames:
python tools/mpeblink_build_raw_frames_dataset.py --root $YOUR_DATA_PATH -
Update the dataset path in
configs/dataset/mpeblink.yml.
We provide a video introduction of our work:
We provide a complete pipeline script run_mpeblinkv1.sh that includes all stages:
bash run_mpeblinkv1.shThe pipeline consists of the following stages:
Stage 1: Facial Modeling Training
# First phase training (blink_len=10)
torchrun --master_port=9909 --nproc_per_node=2 tools/train.py \
--use-amp --seed=0 \
-c configs/rtdetrv2/detrs-blink_len=10_mpeblinkv1.yml \
-t rtdetrv2_r50vd_6x_coco_ema.pth
# Second phase training (blink_len=30), loading stage 1 weights
torchrun --master_port=9909 --nproc_per_node=2 tools/train.py \
--use-amp --seed=0 \
-c configs/rtdetrv2/detrs-blink_len=30_mpeblinkv1.yml \
-t output/rtdetrv2_r50vd_6x_coco_len=10_mpeblinkv1/last.pthStage 2: Inference
# Inference on validation set
python infer_trainset.py \
--config configs/rtdetrv2/detrs-blink_len=30_mpeblinkv1.yml \
--output mpeblink_v1 \
--checkpoint output/rtdetrv2_r50vd_6x_coco_len=30_mpeblinkv1/checkpoint0000.pth \
--json $YOUR_DATA_PATH/annotations/val.json \
--root $YOUR_DATA_PATH/val_rawframes \
--mode val
# Inference on training set for blink module
python infer_trainset.py \
--config configs/rtdetrv2/detrs-blink_len=30_mpeblinkv1.yml \
--output mpeblink_v1 \
--checkpoint output/rtdetrv2_r50vd_6x_coco_len=10_mpeblinkv1/checkpoint0000.pth \
--json $YOUR_DATA_PATH/annotations/train.json \
--root $YOUR_DATA_PATH/train_rawframes \
--mode trainStage 3: Blink Module Training
# Split dataset for blink detection
python BlinkModel/split_dataset.py --config configs/BlinkModule/full_v1.py
# Train blink detection module
python BlinkModel/train_blink_detector.py --config configs/BlinkModule/full_v1.pyStage 4: Testing & Evaluation
# Full model testing
python test.py \
--track_config configs/rtdetrv2/detrs-blink_len=30_mpeblinkv1.yml \
--blink_config configs/BlinkModule/full_v1.py \
--output mpeblink_v1 \
--checkpoint output/rtdetrv2_r50vd_6x_coco_len=30_mpeblinkv1/checkpoint0000.pth \
--json $YOUR_DATA_PATH/annotations/test.json \
--root $YOUR_DATA_PATH/test_rawframes \
--mode test
# Convert results with threshold
python tools/instblink_plus_result_convertor_args.py \
--json results/test_results/mpeblink_v1.json \
--output results/blink_converted_results/mpeblink_v1.json \
--threshold 0.07
# Evaluate on MPEblink
python tools/eval_mpeblink.py \
--gt_json $YOUR_DATA_PATH/annotations/test.json \
--pred_json results/blink_converted_results/mpeblink_v1.json| Type | Method | Blink-AP | Blink-AP0.5 | Blink-AP0.75 | Blink-AP0.95 | Inst-AP |
|---|---|---|---|---|---|---|
| Multi-stage | BlinkFormer | 4.69 | 19.95 | 0.54 | 0.00 | 56.70 |
| Unified | InstBlink | 10.11 | 27.19 | 7.16 | 0.62 | 67.89 |
| Unified | DeFB (Ours) | 24.65 | 44.17 | 24.62 | 4.40 | 76.07 |
| Method | Time per image |
|---|---|
| Multi-stage methods | T (=9.3ms) + latency Γ #faces |
| InstBlink | 8.9 + D (=2.6ms) |
| DeFB (Ours) | 6.1 + D (=2.6ms) |
This code is built upon RT-DETRv2 and InstBlink. We thank the authors for their excellent work.
If you find our work useful in your research, please consider citing:
@misc{gan2026defb,
title={DeFB: Decomposed Feature Learning for Real-Time Multi-Person Eyeblink Detection in Untrimmed In-the-Wild Videos},
author={Gan, Jinfang and Zeng, Wenzheng and Xiao, Yang and Zhang, Xintao and Zheng, Chaoyang and Zhao, Ran and Wang, Ran
and Du, Min and Cao, Zhiguo},
howpublished={\url{https://github.com/jinfanggan/DeFB}},
note={Accepted at AAAI 2026},
year={2026}
}If you use the MPEblink dataset, please also cite:
@inproceedings{zeng2023real,
title={Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video},
author={Zeng, Wenzheng and Xiao, Yang and Wei, Sicheng and Gan, Jinfang and Zhang, Xintao and Cao, Zhiguo and Fang, Zhiwen and Zhou, Joey Tianyi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
pages={13854--13863},
year={2023}
}This project is released under the Apache 2.0 license.
For questions and suggestions, please open an issue or contact Jinfang Gan (jinfanggan@hust.edu.cn).

