FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

Guangzhao Li · Yanming Yang · Chenxi Song · Chi Zhang

FlowDirector edits videos based on text prompts, preserving unedited regions and maintaining temporal coherence.

Version Notice: This repository provides the V1 implementation of FlowDirector.
The V2 version (with additional updates) will be open-sourced soon.

📄 Abstract

TL;DR: Here, we propose FlowDirector, a training- and inversion-free framework for text-guided video editing, enabling precise object edits and temporal consistency through new spatial correction and guidance mechanisms.

Click to read the full Abstract

Text-driven video editing aims to modify video content according to natural language instructions. While recent training-free approaches have made progress by leveraging pre-trained diffusion models, they typically rely on inversion-based techniques that map input videos into the latent space, which often leads to temporal inconsistencies and degraded structural fidelity. To address this, we propose FlowDirector, a novel inversion-free video editing framework. Our framework models the editing process as a direct evolution in data space, guiding the video via an Ordinary Differential Equation (ODE) to smoothly transition along its inherent spatiotemporal manifold, thereby preserving temporal coherence and structural details. To achieve localized and controllable edits, we introduce an attention-guided masking mechanism that modulates the ODE velocity field, preserving non-target regions both spatially and temporally. Furthermore, to address incomplete edits and enhance semantic alignment with editing instructions, we present a guidance-enhanced editing strategy inspired by Classifier-Free Guidance, which leverages differential signals between multiple candidate flows to steer the editing trajectory toward stronger semantic alignment without compromising structural consistency. Extensive experiments across benchmarks demonstrate that FlowDirector achieves state-of-the-art performance in instruction adherence, temporal consistency, and background preservation, establishing a new paradigm for efficient and coherent video editing without inversion.

🌟 Key Features

🌊 Inversion-Free Editing: Directly evolves video in data space, bypassing noisy and error-prone inversion processes.
⚙️ ODE-Driven Transformation: Smoothly transitions videos along their spatiotemporal manifold, preserving coherence and structural details.
🎨 Spatially Attentive Flow Correction (SAFC): An attention-guided masking mechanism precisely modulates the ODE velocity field, ensuring unedited regions remain unchanged both spatially and temporally.
🎯 Differential Averaging Guidance (DAG): A CFG-inspired strategy that leverages differential signals between multiple candidate flows to enhance semantic alignment with target prompts without compromising structural consistency.
🏆 State-of-the-Art Performance: Outperforms existing methods in instruction adherence, temporal consistency, and background preservation.

🔥 News

[2023.05.30] FlowDirector is released! Check out the code and demos.
[2025.05.29] Paper and project page released.

📑 ToDo

Release the code
Gradio demo

🚀 Getting Started

Pre-trained Models

Download the Wan2.1-T2V-1.3B model checkpoints from their official sources (e.g., from the Wan2.1 GitHub or Hugging Face). You will need to provide the path to the directory containing these checkpoints using the --ckpt_dir argument when running the editing script (see examples below).

For instance, if you download them to ./checkpoints/Wan2.1-T2V-1.3B, you will use --ckpt_dir ./checkpoints/Wan2.1-T2V-1.3B.

Installation

Clone the repository (replace YOUR_USERNAME with the actual path if forked, or use the main repo URL):
```
git clone https://github.com/YOUR_USERNAME/FlowDirector.git
cd FlowDirector
```

Install dependencies:

conda create -n flowdirector python=3.12
conda activate flowdirector

pip install -r requirements.txt

(Optional) Install flash_attention for Accelerated Editing: We strongly recommend installing flash_attention to accelerate editing (can be more than 5x faster):
```
pip install flash-attn --no-build-isolation
```
Alternatively, you can check the official flash_attention GitHub repository.

⚙️ How to Use

You can edit a video using the edit.py script. Ensure you have a source video, corresponding source/target text prompts, and have downloaded the pre-trained models.

Single-GPU Editing

Here's an example of how to run video editing on a single GPU:

bash script_edit_single_gpu.sh

Multi-GPU Editing (using `torchrun`)

For multi-GPU editing (e.g., 4 GPUs), run the following command:

bash script_edit_multi_gpu.sh

For detailed parameter explanations, please refer to the edit.py file.

Use Gradio Web Interface

You can also use the Gradio web interface for editing videos, run:

python app.py --ckpt ./checkpoints/Wan2.1-T2V-1.3B

🎬 FlowDirector Editing Demos

FlowDirector achieves superior results across various editing tasks. Below are specific demonstrations:

Original Subject: Large Brown Bear

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

large brown bear	large panda	large dinosaur

Original Subject: Rabbit

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

rabbit	Crochet rabbit	Origami rabbit

Original Subject: Black Swan

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

black swan	pink flamingo	white duck

Original Subject: Woman in a black dress

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

woman in a black dress	a red baseball cap	woman in a blue shirt and jeans

Original Subject: Silver Jeep

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

silver jeep	Porsche car	Tractor

Original Subject: Holding a flower

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

holding a flower	~~holding a flower~~	A golden retriever with a colorful collar

Original Subject: Cats

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

cats	dogs	kangaroo

Original Subject: Wolf

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

wolf	fox	husky

Original Subject: Sea Turtle

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

sea turtle	dolphin	seal

Original Subject: Sea Lion

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

sea lion	Seahorse	Clownfish

Original Subject: Woman (Gym)

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

woman	chimpanzee	Spider-Man

Original Subject: Red Cockatiel

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

red cockatiel	blue budgie	eagle

Original Subject: Puppy

Original Video (Source Keyword)	Edited Video 1 (Target Keyword)	Edited Video 2 (Target Keyword)

puppy	chinchilla	cat

📜 Citation

If you find FlowDirector useful for your research, please cite our paper:

@article{li2025flowdirector0,
  title   = {FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing},
  author  = {Guangzhao Li and Yanming Yang and Chenxi Song and Chi Zhang},
  year    = {2025},
  journal = {arXiv preprint arXiv: 2506.05046}
}

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

📧 Contact

For questions or inquiries, please contact Guangzhao Li at [gzhao.cs@gmail.com] or open an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
video_list		video_list
wan		wan
LICENSE		LICENSE
README.md		README.md
app.py		app.py
edit.py		edit.py
generate.py		generate.py
requirements.txt		requirements.txt
script-edit-multi-gpu.sh		script-edit-multi-gpu.sh
script-edit-single-gpu.sh		script-edit-single-gpu.sh
script-generate.sh		script-generate.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

📄 Abstract

🌟 Key Features

🔥 News

📑 ToDo

🚀 Getting Started

Pre-trained Models

Installation

⚙️ How to Use

Single-GPU Editing

Multi-GPU Editing (using `torchrun`)

Use Gradio Web Interface

🎬 FlowDirector Editing Demos

Original Subject: Large Brown Bear

Original Subject: Rabbit

Original Subject: Black Swan

Original Subject: Woman in a black dress

Original Subject: Silver Jeep

Original Subject: Holding a flower

Original Subject: Cats

Original Subject: Wolf

Original Subject: Sea Turtle

Original Subject: Sea Lion

Original Subject: Woman (Gym)

Original Subject: Red Cockatiel

Original Subject: Puppy

📜 Citation

📝 License

📧 Contact

About

Uh oh!

Releases

Packages

Languages

License

Westlake-AGI-Lab/FlowDirector

Folders and files

Latest commit

History

Repository files navigation

FlowDirector: Training-Free Flow Steering for Precise Text-to-Video Editing

📄 Abstract

🌟 Key Features

🔥 News

📑 ToDo

🚀 Getting Started

Pre-trained Models

Installation

⚙️ How to Use

Single-GPU Editing

Multi-GPU Editing (using torchrun)

Use Gradio Web Interface

🎬 FlowDirector Editing Demos

Original Subject: Large Brown Bear

Original Subject: Rabbit

Original Subject: Black Swan

Original Subject: Woman in a black dress

Original Subject: Silver Jeep

Original Subject: Holding a flower

Original Subject: Cats

Original Subject: Wolf

Original Subject: Sea Turtle

Original Subject: Sea Lion

Original Subject: Woman (Gym)

Original Subject: Red Cockatiel

Original Subject: Puppy

📜 Citation

📝 License

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Multi-GPU Editing (using `torchrun`)

Packages