Zen Director 5B

license

apache-2.0

language

en

zh

pipeline_tag

text-to-video

Zen Director 5B

Zen Director is a 5B parameter text-to-video and image-to-video generation model. Based on Wan 2.2, it creates high-quality videos from text descriptions and images with controllable motion.

Base Model

Built on Wan-AI/Wan2.2-TI2V-5B - Text-to-Image-to-Video model with 5B parameters.

Note: This is based on Wan 2.2. Wan 2.5 is announced but not yet open-source. We will upgrade to Wan 2.5 when it becomes available.

Capabilities

Text-to-Video: Generate videos from text descriptions
Image-to-Video: Animate static images into videos
High Resolution: Supports high-quality video generation
Efficient: Optimized MoE architecture for fast inference

Model Details

Architecture: Mixture-of-Experts (MoE) Transformer
Parameters: 5B total
Base: Wan 2.2 TI2V
Resolution: Up to 1280x720
Frame Rate: 24 FPS
Duration: Up to 5 seconds

Installation

pip install diffusers transformers accelerate torch
pip install av opencv-python pillow

Usage

Text-to-Video

from diffusers import DiffusionPipeline
import torch

# Load the model
pipe = DiffusionPipeline.from_pretrained(
    "zenlm/zen-director",
    torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

# Generate video from text
prompt = "A serene sunset over a calm ocean with waves gently lapping at the shore"
video = pipe(prompt, num_frames=120, height=720, width=1280).frames

# Save video
from diffusers.utils import export_to_video
export_to_video(video, "output.mp4", fps=24)

Image-to-Video

from PIL import Image

# Load starting image
image = Image.open("input.jpg")

# Generate video from image
video = pipe(
    prompt="Animate this image with gentle camera movement",
    image=image,
    num_frames=120
).frames

export_to_video(video, "animated.mp4", fps=24)

Performance

Inference Speed: ~2-3 seconds/frame on A100
Memory: Requires 24GB+ VRAM for full resolution
Quantization: FP16 recommended for consumer GPUs

Roadmap

✅ v1.0 - Wan 2.2 TI2V-5B base (current)
🔄 v2.0 - Upgrade to Wan 2.5 when open-source
📋 Future - Fine-tuning for specific styles and domains

Limitations

Requires high-end GPU (24GB+ VRAM recommended)
Video duration limited to 5 seconds
Best results with detailed, specific prompts
Some motion artifacts in complex scenes

Links

GitHub: https://github.com/zenlm
Zen Gym (Training): https://github.com/zenlm/zen-gym
Zen Engine (Inference): https://github.com/zenlm/zen-engine

Citation

@misc{zen-director-2025,
  title={Zen Director 5B: Text-to-Video Generation Model},
  author={Zen AI Team},
  year={2025},
  howpublished={\url{https://huggingface.co/zenlm/zen-director-5b}}
}

@article{wan2024,
  title={Wan 2.2: High-Quality Video Generation},
  author={Wan-AI Team},
  journal={arXiv preprint},
  year={2024}
}

License

Apache 2.0

Note: Based on Wan 2.2. Will be upgraded to Wan 2.5 when it becomes open-source.

Part of the Zen AI ecosystem.

Based On

zen-director is based on CogVideoX-5B

We are grateful to the original authors for their excellent work and open-source contributions.

Upstream Source

Repository: https://huggingface.co/THUDM/CogVideoX-5B
Base Model: CogVideoX-5B
License: See original repository for license details

Changes in Zen LM

Adapted for Zen AI ecosystem
Fine-tuned for specific use cases
Added training and inference scripts
Integrated with Zen Gym and Zen Engine
Enhanced documentation and examples

Citation

If you use this model, please cite both the original work and Zen LM:

@misc{zenlm2025zen-director,
    title={Zen LM: zen-director},
    author={Hanzo AI and Zoo Labs Foundation},
    year={2025},
    publisher={HuggingFace},
    howpublished={\url{https://huggingface.co/zenlm/zen-director}}
}

Please also cite the original upstream work - see https://huggingface.co/THUDM/CogVideoX-5B for citation details.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
paper		paper
scripts		scripts
tests		tests
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
LLM.md		LLM.md
README.md		README.md
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zen Director 5B

Base Model

Capabilities

Model Details

Installation

Usage

Text-to-Video

Image-to-Video

Performance

Roadmap

Limitations

Links

Citation

License

Based On

Upstream Source

Changes in Zen LM

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zen Director 5B

Base Model

Capabilities

Model Details

Installation

Usage

Text-to-Video

Image-to-Video

Performance

Roadmap

Limitations

Links

Citation

License

Based On

Upstream Source

Changes in Zen LM

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages