School of Computer Science and Technology, Soochow University
Institute of Software, Chinese Academy of Sciences
University of Chinese Academy of Sciences
Northwestern Polytechnical University
- We use the same environment configuration as DiT
environment.ymlfile, If you only want to run pre-trained models locally on CPU, you can remove thecudatoolkitandpytorch-cudarequirements from the file.
conda env create -f environment.yml
conda activate DiT- We used model pre-trained parameters with an image resolution of 512x512.
python sample.py --image-size 512 --seed 1- You can simply use the following code to replace the corresponding content in the 'diffusion/gaussian_diffusion.py' file in DiT.
def p_sample(
self,
model,
x,
t,
clip_denoised=True,
denoised_fn=None,
cond_fn=None,
model_kwargs=None,
):
"""
Sample x_{t-1} from the model at the given timestep.
:param model: the model to sample from.
:param x: the current tensor at x_{t-1}.
:param t: the value of t, starting at 0 for the first diffusion step.
:param clip_denoised: if True, clip the x_start prediction to [-1, 1].
:param denoised_fn: if not None, a function which applies to the
x_start prediction before it is used to sample.
:param cond_fn: if not None, this is a gradient function that acts
similarly to the model.
:param model_kwargs: if not None, a dict of extra keyword arguments to
pass to the model. This can be used for conditioning.
:return: a dict containing the following keys:
- 'sample': a random sample from the model.
- 'pred_xstart': a prediction of x_0.
"""
out = self.p_mean_variance(
model,
x,
t,
clip_denoised=clip_denoised,
denoised_fn=denoised_fn,
model_kwargs=model_kwargs,
)
noise = th.randn_like(x)
###############################Code Added###############################
if t[0]<beta:
sample_temp = noise.permute(0, 2, 3, 1)
sample_temp = F.normalize(sample_temp, p=1, dim=-1)
ones_tensor=torch.ones_like(sample_temp)
ones_tensor[:,:,:,0]=2
sample_temp=sample_temp+ones_tensor
noise = sample_temp.permute(0, 3, 1, 2)
########################################################################
nonzero_mask = (
(t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
) # no noise when t == 0
if cond_fn is not None:
out["mean"] = self.condition_mean(cond_fn, out, x, t, model_kwargs=model_kwargs)
sample = out["mean"] + nonzero_mask * th.exp(0.5 * out["log_variance"]) * noise
return {"sample": sample, "pred_xstart": out["pred_xstart"]}- Beta: If your step is set to 100, then setting beta to around 25-30 is more appropriate.
Thanks for the DiT code provided by Scalable Diffusion Models with Transformers.
@article{he2023cartoondiff,
title={Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models},
author={He, Feihong and Li, Gang and Si, Lingyu and Yan, Leilei and Hou, Shimeng and Dong, Hongwei and Li, Fanzhang},
journal={arXiv preprint arXiv:2309.08251},
year={2023}
}

