CartoonDiff: Training-free Cartoon Image Generation with Diffusion Transformer Models

Feihong He, Gang Li, Lingyu Si, Leilei Yan Shimeng Hou Hongwei Dong Fanzhang Li

School of Computer Science and Technology, Soochow University Institute of Software, Chinese Academy of Sciences University of Chinese Academy of Sciences Northwestern Polytechnical University

Setup

We use the same environment configuration as DiTenvironment.yml file, If you only want to run pre-trained models locally on CPU, you can remove the cudatoolkit and pytorch-cuda requirements from the file.

conda env create -f environment.yml
conda activate DiT

We used model pre-trained parameters with an image resolution of 512x512.

Sampling

python sample.py --image-size 512 --seed 1

CartoonDiff Code

You can simply use the following code to replace the corresponding content in the 'diffusion/gaussian_diffusion.py' file in DiT.

    def p_sample(
        self,
        model,
        x,
        t,
        clip_denoised=True,
        denoised_fn=None,
        cond_fn=None,
        model_kwargs=None,
    ):
        """
        Sample x_{t-1} from the model at the given timestep.
        :param model: the model to sample from.
        :param x: the current tensor at x_{t-1}.
        :param t: the value of t, starting at 0 for the first diffusion step.
        :param clip_denoised: if True, clip the x_start prediction to [-1, 1].
        :param denoised_fn: if not None, a function which applies to the
            x_start prediction before it is used to sample.
        :param cond_fn: if not None, this is a gradient function that acts
                        similarly to the model.
        :param model_kwargs: if not None, a dict of extra keyword arguments to
            pass to the model. This can be used for conditioning.
        :return: a dict containing the following keys:
                 - 'sample': a random sample from the model.
                 - 'pred_xstart': a prediction of x_0.
        """
        out = self.p_mean_variance(
            model,
            x,
            t,
            clip_denoised=clip_denoised,
            denoised_fn=denoised_fn,
            model_kwargs=model_kwargs,
        )

        noise = th.randn_like(x)
###############################Code Added###############################
        if t[0]<beta:
            sample_temp = noise.permute(0, 2, 3, 1)
            sample_temp = F.normalize(sample_temp, p=1, dim=-1)
            ones_tensor=torch.ones_like(sample_temp)
            ones_tensor[:,:,:,0]=2
            sample_temp=sample_temp+ones_tensor
            noise = sample_temp.permute(0, 3, 1, 2)
########################################################################

        nonzero_mask = (
            (t != 0).float().view(-1, *([1] * (len(x.shape) - 1)))
        )  # no noise when t == 0
        if cond_fn is not None:
            out["mean"] = self.condition_mean(cond_fn, out, x, t, model_kwargs=model_kwargs)
        sample = out["mean"] + nonzero_mask * th.exp(0.5 * out["log_variance"]) * noise


        return {"sample": sample, "pred_xstart": out["pred_xstart"]}

Parameters

Beta: If your step is set to 100, then setting beta to around 25-30 is more appropriate.

Acknowledgments

Thanks for the DiT code provided by Scalable Diffusion Models with Transformers.

BibTeX

@article{he2023cartoondiff,
  title={Cartoondiff: Training-free Cartoon Image Generation with Diffusion Transformer Models},
  author={He, Feihong and Li, Gang and Si, Lingyu and Yan, Leilei and Hou, Shimeng and Dong, Hongwei and Li, Fanzhang},
  journal={arXiv preprint arXiv:2309.08251},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
visuals		visuals
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CartoonDiff: Training-free Cartoon Image Generation with Diffusion Transformer Models

Setup

Sampling

CartoonDiff Code

Parameters

Acknowledgments

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

CartoonDiff: Training-free Cartoon Image Generation with Diffusion Transformer Models

Setup

Sampling

CartoonDiff Code

Parameters

Acknowledgments

BibTeX

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages