Efficient Personalization of Quantized Diffusion Model without Backpropagation
Hoigi Seo*, Wongi Jeong*, Kyungryeol Lee, Se Young Chun (*co-first)
This paper presents a novel approach to enabling personalization with a quantized text-to-image diffusion model while operating under minimal memory constraints and without reliance on backpropagation. Leveraging zeroth-order (ZO) optimization, the proposed method achieves personalization using merely 2.37GB of VRAM on Stable Diffusion v1.5.

-
Environment Setup
Create and activate the Conda virtual environment:
conda env create -f environment.yaml conda activate zoodip
Alternatively, install dependencies via
pip:pip install -r requirements.txt
Additionally, download dreambooth dataset from here and put them in
./dataset: -
Folder Tree
ZOODiP
├── dataset
│ ├── dreambooth dataset
│ └── or custom dataset
├── results
│ └── learned_embeds.safetensors
├── requirements.txt
├── environment.yaml
├── cc.json
├── train_zoodip.sh
├── train_zoodip.py
└── inference.ipynb-
Configure Parameters
The implementation is primarily based on the textual inversion code from Diffusers, with the following additional parameters:
n: Number of gradient estimation for ZO optimization.tau: Buffer size (see Algorithm. 1).nu: Threshold that controls the amount of variance retained (see Algorithm. 1).use_cc: Whether to use comprehensive captioning.
-
Run the Example
Execute the main script
train_zoodip.sh:sh train_zoodip.sh
The learned embeddings will be saved in the
./results/directory.
If the setup has been correctly configured and the training has been successfully completed, one can obtain images akin to those presented in ./inference.ipynb.
This code is based on the textual inversion implementation provided by Diffusers. The referenced works are as follows:
