DynVFX: Augmenting Real Videos with Dynamic Content

✨ SIGGRAPH Asia 2025 ✨

Danah Yatim, Rafail Fridman, Omer Bar-Tal, Tali Dekel
Weizmann Institute of Science
(* equal contribution)

teaser.mp4

This repository contains the official implementation of the paper DynVFX: Augmenting Real Videos with Dynamic Content

DynVFX augments real-world videos with new dynamic content described by a simple user-provided text instruction. The framework automatically infers where the synthesized content should appear, how it should move, and how it should harmonize at the pixel level with the scene, without requiring any additional user input. The key idea is to selectively extend the attention mechanism in a pre-trained text-to-video diffusion model, enforcing the generation to be content-aware of existing scene elements (anchors) from the original video. This allows the model to generate content that naturally interacts with the environment, producing complex and realistic video edits in a fully automated way.

For more, visit the project webpage.

videos.mp4

Setup 🔧

Create New Conda Environment

conda create -n dynvfx python=3.12
conda deactivate
conda activate dynvfx

Install PyTorch with CUDA support:

pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Clone Repository and with Submodules

git clone --recursive https://github.com/DanahYatim/dynvfx.git
cd dynvfx

If you already cloned without --recursive:

git submodule update --init --recursive

Install Requirements

pip install -r requirements.txt

Install and build EVF-SAM2

cd third_party/evfsam2
pip install -r requirements.txt
cd model/segment_anything_2
python setup.py build_ext --inplace
cd ../../..

Set Up OpenAI API Key

This repository uses OpenAI's GPT-4o as the VFX Assistant. Create an API key at OpenAI Platform.

Save your key in vfx_assistant/.env:

OPENAI_API_KEY=<your_key>

Quick Start 🚀

# 1. Prepare your video frames (720x480, 49 frames at 8fps)
ffmpeg -i input.mp4 -vf "scale=720:480,fps=8" data/my_video/%05d.png

# 2. Edit configs/user_config.yaml with your paths and desired content

# 3. Run inversion to extract refference keys and values
python inversion.py --user_config_path configs/user_config.yaml

# 4. Run DynVFX
python run.py --user_config_path configs/user_config.yaml

Usage

Configuration ⚙️

Edit configs/user_config.yaml with the following parameters:

Parameter	Description
`data_path`	Path to input video frames directory
`new_content`	Text instruction describing new content to add
`output_path`	Directory where output files will be saved
`target_folder`	name of edit, file name where edited video will be saved
`masks_dir`	Directory for prominent elements segmentation masks
`latents_path`	Directory for inverted latents
`mode`	Run mode: `"auto"`, `"generate"`, or `"execute"`

See Tips section for configuration options.

Input Format

Your input video should be provided as individual frames in a directory:

data/input_frames/
├── 00000.png
...
├── 00048.png

The method works best with:

Resolution: 720×480
Frame rate: 8 fps
Frame count: 49 frames (~6 seconds)

Resize the video and extract the frames:

ffmpeg -i input.mp4 -vf "scale=720:480,fps=8" data/my_video/original/%05d.png

Reference Keys and Values Extraction

To extract the reference keys and values, we first obtain the intermediate latents by inverting the input video:

python inversion.py --user_config_path configs/user_config.yaml

Configuration - Make sure video_path and latents_path are set in your user_config.yaml file.

Note:

🔬 For paper comparison: This step is REQUIRED

🎯 For best quality: Run inversion for optimal scene alignment

🎲 For quick testing: Can be skipped, but results may drift

Running DynVFX 🎬

The pipeline consists of three stages:

🤖 VFX Assistant — GPT-4o interprets the edit instruction and generates captions
🎭 Text-based Segmentation — EVF-SAM extracts masks of scene elements
🎬 DynVFX Pipeline — Iterative refinement with AnchorExtAttn

Option A: Fully Automated Mode

Run the entire pipeline in one command:

# In configs/user_config.yaml
mode: "auto"

python run.py --user_config_path configs/user_config.yaml

Option B: Preview & Execute Mode

**Stage 1: Generate 🤖 VFX Assistant + EVF-SAM outputs and review

# In configs/user_config.yaml
mode: "generate"

python run.py --user_config_path configs/user_config.yaml

👀 Review the generated protocol at output_path/output_for_vfx_protocol.json and masks in masks_dir.

Stage 2: Execute with the approved protocol

# In configs/user_config.yaml
mode: "execute"

python run.py --user_config_path configs/user_config.yaml

Repository Structure 📁

dynvfx/
├── configs/
│   ├── base_config.yaml      # Pipeline hyperparameters
│   ├── user_config.yaml      # User-specific settings
│   └── inversion_config.yaml # Inversion settings
├── models/
│   ├── get_masks_from_sam.py # SAM mask generation
│   └── get_source_mask.py    # Source mask extraction
├── utilities/
│   ├── attention_utils.py    # Extended attention modules
│   ├── masking_utils.py      # Mask processing utilities
│   └── utils.py              # General utilities
├── vfx_assistant/
│   ├── protocol.py           # VFX Assistant (GPT-4o)
│   ├── system_prompts.py     # System prompts
│   └── .env                  # API keys (create this)
├── third_party/
│   └── evfsam2/              # EVF-SAM installation
├── dynvfx_pipeline.py        # Main pipeline
├── inversion.py              # DDIM inversion
├── run.py                    # Entry point
└── requirements.txt          # Dependencies

Tips

Intermediate Visualization 📊

Enable logging to save intermediate results:

# In configs/base_config.yaml
with_logger: True

This saves to output_path:

Input video and source masks
Intermediate samples and target masks
Latent mask visualizations

Credits

This work builds on:

EVF-SAM - Base text-prompted segmentation model
CogVideoX-5B - Base text-to-video model
ChatGPT - Base visual language model

📚 Citation

If you use this work, please cite:

@misc{yatim2025dynvfxaugmentingrealvideos,
      title={DynVFX: Augmenting Real Videos with Dynamic Content}, 
      author={Danah Yatim and Rafail Fridman and Omer Bar-Tal and Tali Dekel},
      year={2025},
      eprint={2502.03621},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.03621}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DynVFX: Augmenting Real Videos with Dynamic Content

✨ SIGGRAPH Asia 2025 ✨

Danah Yatim, Rafail Fridman, Omer Bar-Tal, Tali Dekel
Weizmann Institute of Science
(* equal contribution)

Setup 🔧

Create New Conda Environment

Install PyTorch with CUDA support:

Clone Repository and with Submodules

Install Requirements

Install and build EVF-SAM2

Set Up OpenAI API Key

Quick Start 🚀

Usage

Configuration ⚙️

Input Format

Reference Keys and Values Extraction

Running DynVFX 🎬

Option A: Fully Automated Mode

Option B: Preview & Execute Mode

Repository Structure 📁

Tips

Intermediate Visualization 📊

Credits

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
configs		configs
data/box/original		data/box/original
models		models
third_party		third_party
utilities		utilities
vfx_assistant		vfx_assistant
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
dynvfx_pipeline.py		dynvfx_pipeline.py
inversion.py		inversion.py
requirements.txt		requirements.txt
run.py		run.py

Folders and files

Latest commit

History

Repository files navigation

DynVFX: Augmenting Real Videos with Dynamic Content

✨ SIGGRAPH Asia 2025 ✨

Danah Yatim*, Rafail Fridman*, Omer Bar-Tal, Tali Dekel Weizmann Institute of Science (* equal contribution)

Setup 🔧

Create New Conda Environment

Install PyTorch with CUDA support:

Clone Repository and with Submodules

Install Requirements

Install and build EVF-SAM2

Set Up OpenAI API Key

Quick Start 🚀

Usage

Configuration ⚙️

Input Format

Reference Keys and Values Extraction

Running DynVFX 🎬

Option A: Fully Automated Mode

Option B: Preview & Execute Mode

Repository Structure 📁

Tips

Intermediate Visualization 📊

Credits

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Danah Yatim, Rafail Fridman, Omer Bar-Tal, Tali Dekel
Weizmann Institute of Science
(* equal contribution)

Packages