Over++: Generative Video Compositing for Layer Interaction Effects

I. Effect Generation

II. Effect Editing

III. Keyframe masking

IV. Background Swapping

We introduce Over++, a framework for generating environmental effects and enabling effect editing through mask- or prompt-guided control. Explore the sections below for more details:

Baseline Comparisons

Our Framework

Naively compositing the foreground over the background layer (copy-paste: $\mathcal{I}_{\text{over}} = \mathcal{I}_{\text{fg}} \oplus \mathcal{I}_{\text{bg}}$) produces a video that lacks environmental effects such as shadows or wakes. Given such an input composite and an optional binary mask ($\mathcal{M}_{\text{effect}}$) indicating the target effect regions, our model generates the desired effects within those regions.

Our method is trained on both paired and unpaired data. For unpaired data, we zero out the latent codes of $\mathcal{I}_{\text{over}}$ and $\mathcal{M}_{\text{effect}}$. (Text prompts $\mathcal{T}$ are not shown here for simplicity.)

Back to top

Training Data

Robustness

Failure Cases

References

Baseline comparisons

Ku et al. AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks. TMLR, 2024.
Gao et al. LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning. ArXiv, 2025.
Jiang et al. VACE: All-in-One Video Creation and Editing. ICCV, 2025.
Runway. Runway Aleph. 2025.

Data collection

Gillman et al. Force Prompting: Video Generation Models Can Learn and Generalize Physics-based Control Signals. NeurIPS, 2025.
Ruiz et al. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. CVPR, 2023.
Lu et al. Omnimatte: Associating Objects and Their Effects in Video. CVPR, 2021.
Lee et al. Generative Omnimatte: Learning to Decompose Video into Layers. CVPR, 2025.
Lin et al. OmnimatteRF: Robust Omnimatte with 3D Background Modeling. ICCV, 2023.
Greff et al. Kubric: A scalable dataset generator. CVPR, 2022.

Failure cases

Sadat et al. Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models ICLR, 2025.

Back to baseline comparisons Back to top

Societal Impact

We acknowledge that powerful video editing tools, including ours, may raise ethical considerations depending on their context of use. While our work is intended to augment video compositing and professional workflows, such capabilities could potentially be misused. We therefore encourage responsible use aligned with community guidelines and emphasize transparency regarding any applied edits.