UMO: Unified In-Context Learning Unlocks
Motion Foundation Model Priors

One Unified Framework · Three Meta-Operations · Diverse Motion Tasks

Xiaoyan Cong1*, Zekun Li1*, Zhiyang Dou2, Hongyu Li1, Omid Taheri3, Chuan Guo4, Abhay Mittal4, Sizhe An4, Taku Komura5, Wojciech Matusik2, Michael J. Black3, Srinath Sridhar1

1Brown University   2Massachusetts Institute of Technology   3Max-Planck Institute for Intelligent Systems
4Meta Reality Lab   5The University of Hong Kong

UMO Teaser: diverse motion tasks as compositions of atomic operations
Radar chart comparing UMO across tasks and metrics

Text-to-Motion Generation

Generating realistic human motions from natural language descriptions.

Temporal Inpainting

Keyframe infilling, prediction, backcasting, and in-betweening.

Keyframe (given) Generated (ours)

Instruction-Based Motion Editing

Text-guided motion editing.

Trajectory Following

Follow geometric trajectories while maintaining natural motion.

{"type":"circular_arc", "start":[0.0, 0.0], "end":[2.92, 5.35], "center":[2.29, 2.22], "radius":3.19, "direction":"clockwise"}

{"type":"cubic_bezier","params":{"start":[0.0,0.0],"end":[3.54,4.15],"P0":[0.0,0.0],"P1":[-1.02,3.24],"P2":[4.55,0.92],"P3":[3.54,4.15]}}

{"type":"cubic_bezier","params":{"start":[0.0,0.0],"end":[3.98,2.03],"P0":[0.0,0.0],"P1":[0.47,2.34],"P2":[3.52,-0.31],"P3":[3.98,2.03]}}

Obstacle Avoidance

Navigate from point A to point B while avoiding obstacles.

A person walks from (0.00, 0.00) to (3.67, 5.36). Avoiding 2 obstacles at (0.71, 1.09, r=0.25), (2.86, 4.31, r=0.35), where r is the safety radius in meters.

A person walks from (-0.00, 0.00) to (3.28, 6.41). Avoiding 3 obstacles at (0.72, 3.65, r=0.43), (1.46, 3.98, r=0.42), (0.32, 2.93, r=0.24), where r is the safety radius in meters.

Dual-Identity Reaction Generation

Two-person interaction generation — entirely absent from single-person pretraining.

Source Generated

Architecture Ablation

Comparing four conditioning architectures for in-context feature integration.

Citation

If you find this work useful, please consider citing our paper.

@misc{cong2026umounifiedincontextlearning,
      title={UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors},
      author={Xiaoyan Cong and Zekun Li and Zhiyang Dou and Hongyu Li and Omid Taheri and Chuan Guo and Abhay Mittal and Sizhe An and Taku Komura and Wojciech Matusik and Michael J. Black and Srinath Sridhar},
      year={2026},
      eprint={2603.15975},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.15975},
}