UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors

In-Domain

Text-to-Motion Generation

Generating realistic human motions from natural language descriptions.

Use arrows to browse more examples

"A man postures his arms like holding a dance partner and dances the waltz from the left hand side to the right hand side."

"Person is jogging and then gets down and walks like an ape and then gets back up and jogs again."

"Person clasps both hands together then waves arms to side then goes down on right knee bent over ties something on feet then gets up."

"A person walks forward then turns completely around and does a cartwheel."

"A person shoves their arms out in an angry fashion."

"A man hops on his right leg five times, makes a small jump, then steps forward into a strong push with both hands."

"A person dribbles a ball with one hand then the other and proceeds to shoot ball into goal with both hands."

"A man steps forward and does a handstand."

"A person dribbles a basketball through their legs then runs quickly."

Out-of-Domain

Temporal Inpainting

Keyframe infilling, prediction, backcasting, and in-betweening.

Keyframe (given) Generated (ours)

Keyframe Infilling

"A person stumbles forward and back almost falling over."

"A person hops forward with both legs and after a few hops they hop on top of something then back down right after."

"A person bends over to begin charging forward, turns around with arms raised, and charges back to original position."

"A person runs forward and jumps over something, then turns around and jumps back over it."

"The drunk guy struggles to walk down the street."

"A person dribbles a basketball through their legs then runs quickly."

"A person is doing a dance."

"A person lifts both arms out to their side and runs forward in a figure 8 pattern."

"A person is walking in a counter-clockwise circle while bringing their knees up as they walk."

Prediction

"A man stands up, walks clock-wise in a circle, then sits back down."

"A person walking in a crouched over position."

"A person runs forward and jumps over something, then turns around and jumps back over it."

"The person walks in a straight line and places their right hand to support their weight against something."

"Person is hunched over creeping diagonally down."

"A person is walking, turns back and to their left, proceeds to walk again, trips, then turns back once more, limping now."

"A person quickly runs straight forward, then bends down and picks up something with both hands."

"A person is walking in a counter-clockwise circle while bringing their knees up as they walk."

"The man walks in a forward facing zig zag motion."

Backcasting

"A person standing in place raises both arms to the side raising them above eye level."

"A person takes a step forward, moves to their right, then continues forward with their right hand on a rail."

"A person walking one way and then increasing pace running back."

"The person walks in a straight line and places their right hand to support their weight against something."

"Person is hunched over creeping diagonally down."

"A person is walking, turns back and to their left, proceeds to walk again, trips, then turns back once more, limping now."

"A man walks forward, then squats to pick something up with both hands, stands back up, and resumes walking."

"A person is walking in a counter-clockwise circle while bringing their knees up as they walk."

"A person walks and places a box on the ground."

In-Betweening

"Person is jogging and then gets down and walks like an ape and then gets back up and jogs again."

"A person jogs forward and semi circles around to the right and then to the left."

"Person is hunched over creeping diagonally down."

"Someone walks with difficulty on their right side, then tries to run."

"A person is walking, turns back and to their left, proceeds to walk again, trips, then turns back once more, limping now."

"A man walks forward, then squats to pick something up with both hands, stands back up, and resumes walking."

"A person is walking in a counter-clockwise circle while bringing their knees up as they walk."

"A person walks down a hill and places a box on the ground."

"The man walks in a forward facing zig zag motion."

Out-of-Domain

Instruction-Based Motion Editing

Text-guided motion editing.

Use arrows to browse more examples

Editing Instruction: "Start later and do one jump instead of a few"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Spread arms out more, sway side to side, bending from waist"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Get up at the end with right leg first instead, start with the left leg"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Don't straighten fully, keep acting as if sneaking after getting up. Don't crouch down again"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Flip sides, i.e. start with your right arm and left leg, facing opposite"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Just take a single step"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Maintain the hands at chest level"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Lower the left elbow and raise the opposite hand while gesturing."

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Start slower"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "More energy"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Start the arm motion earlier and repeat"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Turn your head instead of your arms"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Shake the head while doing the action and don't tilt the head on the left side"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Don't move the upright hands left and right, just keep them straight"

Source Motion

TMED

UMO (Ours)

Editing Instruction: "Only move the hand forward, back and to your right"

Source Motion

TMED

UMO (Ours)

Stress Test

Trajectory Following

Follow geometric trajectories while maintaining natural motion.

{"type":"circular_arc", "start":[0.0, 0.0], "end":[2.92, 5.35], "center":[2.29, 2.22], "radius":3.19, "direction":"clockwise"}

{"type":"cubic_bezier","params":{"start":[0.0,0.0],"end":[3.54,4.15],"P0":[0.0,0.0],"P1":[-1.02,3.24],"P2":[4.55,0.92],"P3":[3.54,4.15]}}

{"type":"cubic_bezier","params":{"start":[0.0,0.0],"end":[3.98,2.03],"P0":[0.0,0.0],"P1":[0.47,2.34],"P2":[3.52,-0.31],"P3":[3.98,2.03]}}

Stress Test

Obstacle Avoidance

Navigate from point A to point B while avoiding obstacles.

A person walks from (0.00, 0.00) to (3.67, 5.36). Avoiding 2 obstacles at (0.71, 1.09, r=0.25), (2.86, 4.31, r=0.35), where r is the safety radius in meters.

A person walks from (-0.00, 0.00) to (3.28, 6.41). Avoiding 3 obstacles at (0.72, 3.65, r=0.43), (1.46, 3.98, r=0.42), (0.32, 2.93, r=0.24), where r is the safety radius in meters.

Stress Test

Dual-Identity Reaction Generation

Two-person interaction generation — entirely absent from single-person pretraining.

Use arrows to browse more examples

Source Generated

"Two people stand facing each other, with one person folding his/her hands and bending down to bow. The other individual also folds his/her hands and responds with a bow."

"In a confrontation, one forcefully slaps another's left cheek, turning the slapped person's body to the right."

"One person kicks the other person's left shin with his/her left foot, causing the other person to trip and sit on the ground."

"The first person covers his/her face and walks towards the second person, who raises his/her right palm. They clap hands."

"The first person gently pats the upper right part of the second person's back with his/her left hand from the right side."

"The two people stand facing each other, with one person placing his/her right hand in front and bowing. Upon seeing this, the second person raises his/her left hand."

"One person stands opposite another and raises his/her right hand to wave. Simultaneously, the second person raises his/her left hand to wave back."

"Two people stand face to face, then both raise their hands and give each other a high-five."

Ablation

Architecture Ablation

Comparing four conditioning architectures for in-context feature integration.

Use arrows to browse more examples

Temporal Inpainting (Keyframe Infilling)

"a man stands up, walks clock-wise in a circle, then sits back down."

Temporal Fusion (Ours)

AdaLN

Sequential Concatenation

ControlNet

"person walks forwards slowly and normally without swinging arms"

Temporal Fusion (Ours)

AdaLN

Sequential Concatenation

ControlNet

Instruction-Based Motion Editing

Editing Instruction: "Maintain the hands at chest level"

Source

Temporal Fusion (Ours)

AdaLN

Sequential Concatenation

ControlNet

Editing Instruction: "start slower"

Source

Temporal Fusion (Ours)

AdaLN

Sequential Concatenation

ControlNet

Editing Instruction: "more energy"

Source

Temporal Fusion (Ours)

AdaLN

Sequential Concatenation

ControlNet

Citation

If you find this work useful, please consider citing our paper.

@misc{cong2026umounifiedincontextlearning,
      title={UMO: Unified In-Context Learning Unlocks Motion Foundation Model Priors},
      author={Xiaoyan Cong and Zekun Li and Zhiyang Dou and Hongyu Li and Omid Taheri and Chuan Guo and Abhay Mittal and Sizhe An and Taku Komura and Wojciech Matusik and Michael J. Black and Srinath Sridhar},
      year={2026},
      eprint={2603.15975},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2603.15975},
}

UMO: Unified In-Context Learning UnlocksMotion Foundation Model Priors

Text-to-Motion Generation

Temporal Inpainting

Keyframe Infilling

Prediction

Backcasting

In-Betweening

Instruction-Based Motion Editing

Trajectory Following

Obstacle Avoidance

Dual-Identity Reaction Generation

Architecture Ablation

Temporal Inpainting (Keyframe Infilling)

Instruction-Based Motion Editing

Citation

UMO: Unified In-Context Learning Unlocks
Motion Foundation Model Priors