STRIDE

When to Speak Meets Sequence Denoising for Streaming Video Understanding

Junho Kim^1*, Hosu Lee^2*, James M. Rehg¹, Minsu Kim^3†, Yong Man Ro^2†

¹UIUC · ²KAIST · ³Google DeepMind

Introduction

STRIDE (Structured Temporal Refinement with Iterative DEnoising) is a lightweight proactive activation model for streaming video understanding. It employs a masked diffusion module at the activation interface to jointly predict and progressively refine activation signals over a sliding temporal window, producing temporally coherent proactive responses in online streaming scenarios.

TODO

Citation

@article{kim2026stride,
  title={STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding},
  author={Kim, Junho and Lee, Hosu and Rehg, James M. and Kim, Minsu and Ro, Yong Man},
  journal={arXiv preprint arXiv:2603.27593},
  year={2026}
}

License

This project is released under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STRIDE

When to Speak Meets Sequence Denoising for Streaming Video Understanding

Introduction

TODO

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

STRIDE

When to Speak Meets Sequence Denoising for Streaming Video Understanding

Introduction

TODO

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages