Visual Persona: Foundation Model for Full-Body Human Customization

Jisu Nam¹ · Soowon Son¹ · Zhan Xu² · Jing Shi² · Difan Liu² · Feng Liu² · Aashish Misra³ · Seungryong Kim¹ · Yang Zhou²

¹KAIST AI ²Adobe Research ³Adobe

CVPR 2025

Visual Persona is a foundation model for 🏄 Full-Body Human Customization. Given a reference image of a person, our model generates diverse, customized images while faithfully preserving the full-body appearance — including face, clothing, body shape, and accessories.

✨ Highlights

Full-body fidelity: Preserves identity across face, torso, legs, and shoes simultaneously
Versatile applications: A single model supports multiple downstream tasks via plug-in adapters
Flexible control: Supports both pose-guided and text-guided generation

🚀 Applications

Task	Description
Pose-Guided Human Customization	Generate the reference person in arbitrary poses
Story Generation	Create consistent multi-scene narratives with the same identity
Text-Guided Virtual Try-On	Change clothing while preserving the person's appearance
Anime Character Customization	Transfer identity to stylized, non-photorealistic characters

🛠️ Installation

We recommend using a conda environment with Python 3.10.

conda create -n visual_persona python=3.10 -y
conda activate visual_persona
pip install -r requirements.txt

📦 Pretrained Models

Step 1. Download the DINOv2 ViT-G/14 backbone:

mkdir pretrained_models
cd pretrained_models
wget https://dl.fbaipublicfiles.com/dinov2/dinov2_vitg14/dinov2_vitg14_pretrain.pth

Step 2. Download our Visual Persona checkpoints from Google Drive into ./pretrained_models/.

After both steps, the directory should look like:

pretrained_models/
├── dinov2_vitg14_pretrain.pth
├── diffusion_pytorch_model.safetensors
└── wieght.bin

🏃 Inference

Run the script corresponding to your desired application:

# Base pose-guided generation
python inference.py

# Pose-guided generation with ControlNet
python inference_controlnet.py

# Multi-scene story generation
python inference_controlnet_story.py

# Text-guided virtual try-on
python inference_tryon.py

# Anime / character customization
python inference_anime.py

Tip: Each script contains configurable arguments at the top of the file (input image path, prompt, output directory, etc.).

Testing Dataset Preparation

We use SCHP (Self-Correction Human Parsing) to parse input images into five body regions: full-body, face, torso, legs, and shoes. Any other state-of-the-art human parsing method can be substituted.

Download the SCHP ATR checkpoint into ./pretrained_models/:

SCHP ATR checkpoint (Google Drive)

Then run:

# Parse a single image
python parsing.py --input_path /path/to/image.jpg

# Parse all images in a directory
python parsing.py --input_path /path/to/images/ --output_dir ./parsing

📚 Citation

If you find this work useful, please consider citing:

@inproceedings{nam2025visual,
  title     = {Visual Persona: Foundation Model for Full-Body Human Customization},
  author    = {Nam, Jisu and Son, Soowon and Xu, Zhan and Shi, Jing and Liu, Difan and Liu, Feng and Misra, Aashish and Kim, Seungryong and Zhou, Yang},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages     = {18630--18641},
  year      = {2025}
}

🙏 Acknowledgements

This project builds on IP-Adapter. We thank the authors for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
config_files		config_files
data_inference		data_inference
inference_utils		inference_utils
losses		losses
model		model
schp		schp
utils		utils
.gitignore		.gitignore
README.md		README.md
inference.py		inference.py
inference_anime.py		inference_anime.py
inference_controlnet.py		inference_controlnet.py
inference_controlnet_story.py		inference_controlnet_story.py
inference_tryon.py		inference_tryon.py
parsing.py		parsing.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Persona: Foundation Model for Full-Body Human Customization

✨ Highlights

🚀 Applications

🛠️ Installation

📦 Pretrained Models

🏃 Inference

Testing Dataset Preparation

📚 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Visual Persona: Foundation Model for Full-Body Human Customization

✨ Highlights

🚀 Applications

🛠️ Installation

📦 Pretrained Models

🏃 Inference

Testing Dataset Preparation

📚 Citation

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages