[📜 Paper] [🌐 Project Page] [🤗 Models]
Junho Kim1*, Hosu Lee2*, James M. Rehg1, Minsu Kim3†, Yong Man Ro2†
1UIUC · 2KAIST · 3Google DeepMind
STRIDE (Structured Temporal Refinement with Iterative DEnoising) is a lightweight proactive activation model for streaming video understanding. It employs a masked diffusion module at the activation interface to jointly predict and progressively refine activation signals over a sliding temporal window, producing temporally coherent proactive responses in online streaming scenarios.
- Paper release
- Model weights release (STRIDE-2B)
- Demo website
- Training code
- Evaluation scripts
@article{kim2026stride,
title={STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding},
author={Kim, Junho and Lee, Hosu and Rehg, James M. and Kim, Minsu and Ro, Yong Man},
journal={arXiv preprint arXiv:2603.27593},
year={2026}
}This project is released under the Apache 2.0 License.