Skip to content

DynVFX/dynvfx

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DynVFX: Augmenting Real Videos with Dynamic Content

✨ SIGGRAPH Asia 2025 ✨

Danah Yatim*, Rafail Fridman*, Omer Bar-Tal, Tali Dekel
Weizmann Institute of Science
(* equal contribution)

Website arXiv Conference Supp Videos Pytorch

teaser.mp4

This repository contains the official implementation of the paper DynVFX: Augmenting Real Videos with Dynamic Content

DynVFX augments real-world videos with new dynamic content described by a simple user-provided text instruction. The framework automatically infers where the synthesized content should appear, how it should move, and how it should harmonize at the pixel level with the scene, without requiring any additional user input. The key idea is to selectively extend the attention mechanism in a pre-trained text-to-video diffusion model, enforcing the generation to be content-aware of existing scene elements (anchors) from the original video. This allows the model to generate content that naturally interacts with the environment, producing complex and realistic video edits in a fully automated way.

For more, visit the project webpage.

videos.mp4

Setup πŸ”§

Create New Conda Environment

conda create -n dynvfx python=3.12
conda deactivate
conda activate dynvfx

Install PyTorch with CUDA support:

pip install torch==2.7.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126

Clone Repository and with Submodules

git clone --recursive https://github.com/DanahYatim/dynvfx.git
cd dynvfx

If you already cloned without --recursive:

git submodule update --init --recursive

Install Requirements

pip install -r requirements.txt

Install and build EVF-SAM2

cd third_party/evfsam2
pip install -r requirements.txt
cd model/segment_anything_2
python setup.py build_ext --inplace
cd ../../..

Set Up OpenAI API Key

This repository uses OpenAI's GPT-4o as the VFX Assistant. Create an API key at OpenAI Platform.

Save your key in vfx_assistant/.env:

OPENAI_API_KEY=<your_key>

Quick Start πŸš€

# 1. Prepare your video frames (720x480, 49 frames at 8fps)
ffmpeg -i input.mp4 -vf "scale=720:480,fps=8" data/my_video/%05d.png

# 2. Edit configs/user_config.yaml with your paths and desired content

# 3. Run inversion to extract refference keys and values
python inversion.py --user_config_path configs/user_config.yaml

# 4. Run DynVFX
python run.py --user_config_path configs/user_config.yaml

Usage


Configuration βš™οΈ

Edit configs/user_config.yaml with the following parameters:

Parameter Description
data_path Path to input video frames directory
new_content Text instruction describing new content to add
output_path Directory where output files will be saved
target_folder name of edit, file name where edited video will be saved
masks_dir Directory for prominent elements segmentation masks
latents_path Directory for inverted latents
mode Run mode: "auto", "generate", or "execute"

See Tips section for configuration options.


Input Format

Your input video should be provided as individual frames in a directory:

data/input_frames/
β”œβ”€β”€ 00000.png
...
β”œβ”€β”€ 00048.png

The method works best with:

  • Resolution: 720Γ—480
  • Frame rate: 8 fps
  • Frame count: 49 frames (~6 seconds)

Resize the video and extract the frames:

ffmpeg -i input.mp4 -vf "scale=720:480,fps=8" data/my_video/original/%05d.png

Reference Keys and Values Extraction

To extract the reference keys and values, we first obtain the intermediate latents by inverting the input video:

python inversion.py --user_config_path configs/user_config.yaml

Configuration - Make sure video_path and latents_path are set in your user_config.yaml file.

Note:

  • πŸ”¬ For paper comparison: This step is REQUIRED
  • 🎯 For best quality: Run inversion for optimal scene alignment
  • 🎲 For quick testing: Can be skipped, but results may drift

Running DynVFX 🎬

The pipeline consists of three stages:

  1. πŸ€– VFX Assistant β€” GPT-4o interprets the edit instruction and generates captions
  2. 🎭 Text-based Segmentation β€” EVF-SAM extracts masks of scene elements
  3. 🎬 DynVFX Pipeline β€” Iterative refinement with AnchorExtAttn


Option A: Fully Automated Mode

Run the entire pipeline in one command:

# In configs/user_config.yaml
mode: "auto"
python run.py --user_config_path configs/user_config.yaml

Option B: Preview & Execute Mode

**Stage 1: Generate πŸ€– VFX Assistant + EVF-SAM outputs and review

# In configs/user_config.yaml
mode: "generate"
python run.py --user_config_path configs/user_config.yaml

πŸ‘€ Review the generated protocol at output_path/output_for_vfx_protocol.json and masks in masks_dir.

Stage 2: Execute with the approved protocol

# In configs/user_config.yaml
mode: "execute"
python run.py --user_config_path configs/user_config.yaml

Repository Structure πŸ“

dynvfx/
β”œβ”€β”€ configs/
β”‚   β”œβ”€β”€ base_config.yaml      # Pipeline hyperparameters
β”‚   β”œβ”€β”€ user_config.yaml      # User-specific settings
β”‚   └── inversion_config.yaml # Inversion settings
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ get_masks_from_sam.py # SAM mask generation
β”‚   └── get_source_mask.py    # Source mask extraction
β”œβ”€β”€ utilities/
β”‚   β”œβ”€β”€ attention_utils.py    # Extended attention modules
β”‚   β”œβ”€β”€ masking_utils.py      # Mask processing utilities
β”‚   └── utils.py              # General utilities
β”œβ”€β”€ vfx_assistant/
β”‚   β”œβ”€β”€ protocol.py           # VFX Assistant (GPT-4o)
β”‚   β”œβ”€β”€ system_prompts.py     # System prompts
β”‚   └── .env                  # API keys (create this)
β”œβ”€β”€ third_party/
β”‚   └── evfsam2/              # EVF-SAM installation
β”œβ”€β”€ dynvfx_pipeline.py        # Main pipeline
β”œβ”€β”€ inversion.py              # DDIM inversion
β”œβ”€β”€ run.py                    # Entry point
└── requirements.txt          # Dependencies

Tips

Intermediate Visualization πŸ“Š

Enable logging to save intermediate results:

# In configs/base_config.yaml
with_logger: True

This saves to output_path:

  • Input video and source masks
  • Intermediate samples and target masks
  • Latent mask visualizations

Credits

This work builds on:


πŸ“š Citation

If you use this work, please cite:

@misc{yatim2025dynvfxaugmentingrealvideos,
      title={DynVFX: Augmenting Real Videos with Dynamic Content}, 
      author={Danah Yatim and Rafail Fridman and Omer Bar-Tal and Tali Dekel},
      year={2025},
      eprint={2502.03621},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.03621}, 
}

About

[SIGGRAPH Asia 2025] Official Pytorch Implementation for "DynVFX: Augmenting Real Videos with Dynamic Content"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages