Over++: Generative Video Compositing for Layer Interaction Effects


TL;DR: Generate environmental effects between any foreground and background layers.

Over++ enables effect generation and editing, with or without mask or prompt guidance. Explore our applications below.

I. Effect Generation

II. Effect Editing

III. Keyframe masking

IV. Background Swapping



We introduce Over++, a framework for generating environmental effects and enabling effect editing through mask- or prompt-guided control. Explore the sections below for more details:

Baseline Comparisons

Our Framework

Naively compositing the foreground over the background layer (copy-paste: $\mathcal{I}_{\text{over}} = \mathcal{I}_{\text{fg}} \oplus \mathcal{I}_{\text{bg}}$) produces a video that lacks environmental effects such as shadows or wakes. Given such an input composite and an optional binary mask ($\mathcal{M}_{\text{effect}}$) indicating the target effect regions, our model generates the desired effects within those regions.

Our method is trained on both paired and unpaired data. For unpaired data, we zero out the latent codes of $\mathcal{I}_{\text{over}}$ and $\mathcal{M}_{\text{effect}}$. (Text prompts $\mathcal{T}$ are not shown here for simplicity.)

Training Data

Robustness

Failure Cases

Societal Impact

We acknowledge that powerful video editing tools, including ours, may raise ethical considerations depending on their context of use. While our work is intended to augment video compositing and professional workflows, such capabilities could potentially be misused. We therefore encourage responsible use aligned with community guidelines and emphasize transparency regarding any applied edits.

BibTeX