[Guide] Quantize your Diffusion Models with `bnb` #10012

ariG23498 · 2024-11-25T08:41:01Z

This PR adds a guide on quantization of diffusion models using bnb and diffusers

Here is a colab notebook for easy code access

stevhliu

Very nice start! 👏

I think you can combine this guide with the existing one here since there is quite a bit of overlap between the two. Here are some general tips for doing that:

Keep the introduction in the existing guide but add a few sentences that adapts it to quantizing Flux.1-dev with bitsandbytes so you can run it on hardware with less than 16GB of memory. I think most users at this point have a general idea of what quantization is (and it is also covered in the getting started), so we don't need to spend more time on what it is/why it is important. The focus is more on bitsandbytes than quantization in general.
I don't think it's necessary to have a section for showing how to use an unquantized model. Users are probably more eager to see how they can use a quantized model and getting them there as quickly as possible would be better.
Combine the 8-bit quantization section with the existing one here. You can add here about how you're quantizing both the T5EncoderModel and FluxTransformer2DModel, what the low_cpu_mem_usage and device_map (if you have more than one GPU) parameter do.
You can do the same thing with the 4-bit section. Combine it with the existing one and add a few lines explaining the parameters.
Combine the NF4 quantization section with the one here.
Lead with the visualization in the method comparison section. Most users probably aren't too interested in comparing and running all this code themselves, so it's more impactful to lead with the results first.

docs/source/en/quantization/quant_bnb.md

pcuenca

Suggested some nits. Greatly agree with @stevhliu's comments and recommendations.

docs/source/en/quantization/quant_bnb.md

pcuenca · 2024-11-28T11:38:03Z

docs/source/en/quantization/quant_bnb.md

+```python
+memory_allocated = torch.cuda.max_memory_allocated(0) / (1024 ** 3)
+print(f"GPU Memory Allocated: {memory_allocated:.2f} GB")
+```


As a reader, I'd like to know how much it was at this point.

docs/source/en/quantization/quant_bnb.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

HuggingFaceDocBuilderDev · 2024-11-29T06:38:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

stevhliu

Nice rework, should be ready to go soon! 🚀

docs/source/en/quantization/bitsandbytes.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

stevhliu

One last nit!

docs/source/en/quantization/bitsandbytes.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

ariG23498 · 2024-12-03T19:04:03Z

@stevhliu thanks for the thorough review. I have taken care of all the suggestions. 🤗

stevhliu · 2024-12-03T19:12:21Z

Awesome, thanks so much for iterating on this! I'll give @sayakpaul a chance to review it and then we can merge 🤗

sayakpaul

It seems like a lot of changes are related to breaking of long lines into multiple ones. So, it's difficult for me to go through the true changes.

If possible, could you please undo those breaks? If not, I am okay with merging it since @stevhliu has already reviewed it.

docs/source/en/quantization/bitsandbytes.md

OK

sayakpaul

Very nice improvements! I think we're close to merge -- just a few comments, mostly related to nits.

docs/source/en/quantization/bitsandbytes.md

sayakpaul · 2024-12-05T02:29:39Z

docs/source/en/quantization/bitsandbytes.md

+    "black-forest-labs/FLUX.1-dev",
+    subfolder="text_encoder_2",
+    quantization_config=quant_config,
+    torch_dtype=torch.float16,


We could add a note about the torch_dtype here. I am thinking of something like so:

Depending on the GPU, set your torch_dtype. For Ada and higher series GPUs support torch.bfloat16 and we suggest using it when applicable.

@stevhliu any suggestions?

sayakpaul · 2024-12-05T02:31:32Z

docs/source/en/quantization/bitsandbytes.md

+transformer_8bit = FluxTransformer2DModel.from_pretrained(
+    "black-forest-labs/FLUX.1-dev",
+    subfolder="transformer",
+    quantization_config=quant_config,
+    torch_dtype=torch.float32,
+)


Maybe we could show this in a diff rather than py block as the only difference here (compared to the above snippets) is torch_dtype? So, I would do:

transformer_8bit = FluxTransformer2DModel.from_pretrained( "black-forest-labs/FLUX.1-dev", subfolder="transformer", quantization_config=quant_config, + torch_dtype=torch.float32, )

docs/source/en/quantization/bitsandbytes.md

sayakpaul · 2024-12-05T02:33:38Z

docs/source/en/quantization/bitsandbytes.md

+    **pipe_kwargs,
+).images[0]
+
+image.resize((224, 224))


I don't think we have to show this line of code here.

Suggested change

image.resize((224, 224))

sayakpaul · 2024-12-05T02:35:20Z

docs/source/en/quantization/bitsandbytes.md

+<div class="flex justify-center">
+   <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/quant-bnb/8bit.png"/>
+</div>
+


Would it also make sense to comment on the following things:

When memory permits, users can directly move the pipeline to the GPU by doing to("cuda").

To go easy on the memory, they can also use enable_model_cpu_offload().

sayakpaul · 2024-12-05T02:38:03Z

docs/source/en/quantization/bitsandbytes.md

+bitsandbytes is supported in both Transformers and Diffusers, so you can can quantize both the
+[`FluxTransformer2DModel`] and [`~transformers.T5EncoderModel`].
+


We can add a little [!NOTE] here saying that here we usually don't quantize the CLIPTextModel as it's small enough and the AutoencoderKL because it doesn't contain too many torch.nn.Linear layers.

sayakpaul · 2024-12-05T02:38:39Z

docs/source/en/quantization/bitsandbytes.md

+image = pipe(
+    generator=torch.Generator("cpu").manual_seed(0),
+    **pipe_kwargs,
+).images[0]
+
+image.resize((224, 224))


Same as above.

docs/source/en/quantization/bitsandbytes.md

…to aritra/qunat-blog OK

sayakpaul · 2024-12-05T04:03:34Z

docs/source/en/quantization/bitsandbytes.md

+> [!Note]
+> Depending on the GPU, set your `torch_dtype`. For Ada and higher series GPUs support `torch.bfloat16` and we suggest using it when applicable.
+
+> [!Note]
+> We do not qunatize the `CLIPTextModel` and the `AutoencoderKL` due to their small size, and also for the fact that `AutoencoderKL` has very few `torch.nn.Linear` layers.


We already added it above no? Or is this for separate 8bit and 4bit sections?

Or is this for separate 8bit and 4bit sections?

As I understand it, we need to make the text same for both the sections to have parity in the tabs.

sayakpaul

Thanks a lot!

I will let @stevhliu review the new changes and get this merged.

stevhliu

Awesome, just a few more changes!

docs/source/en/quantization/bitsandbytes.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

ariG23498 · 2024-12-05T16:51:23Z

Thank you for the review @sayakpaul, @stevhliu and @pcuenca

I have made the changes. I think we are ready to go 🚀

stevhliu · 2024-12-05T19:24:14Z

docs/source/en/quantization/bitsandbytes.md

+bitsandbytes is supported in both Transformers and Diffusers, so you can can quantize both the
+[`FluxTransformer2DModel`] and [`~transformers.T5EncoderModel`].
+
+> [!Note]


Need to make the same changes to the 4-bit section here too :)

I apologise for missing it!

I have added the same updated to the 4-bit section as well.

OK

* chore: initial draft * Apply suggestions from code review Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * chore: link in place * chore: review suggestions * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * chore: review suggestions * Update docs/source/en/quantization/bitsandbytes.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * review suggestions * chore: review suggestions * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * adding same changes to 4 bit section * review suggestions --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

ariG23498 and others added 2 commits November 25, 2024 14:08

chore: initial draft

c00200f

Merge branch 'main' into aritra/qunat-blog

fb29f2e

sayakpaul requested a review from stevhliu November 25, 2024 12:59

stevhliu reviewed Nov 26, 2024

View reviewed changes

docs/source/en/quantization/quant_bnb.md Outdated Show resolved Hide resolved

docs/source/en/quantization/quant_bnb.md Outdated Show resolved Hide resolved

docs/source/en/quantization/quant_bnb.md Outdated Show resolved Hide resolved

pcuenca reviewed Nov 28, 2024

View reviewed changes

ariG23498 and others added 3 commits November 28, 2024 18:53

Apply suggestions from code review

2d7a5b7

Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

chore: link in place

454fcec

chore: review suggestions

4a4d56f

ariG23498 requested a review from stevhliu November 29, 2024 06:31

stevhliu reviewed Dec 2, 2024

View reviewed changes

ariG23498 and others added 2 commits December 3, 2024 13:40

Apply suggestions from code review

fd3cba4

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

chore: review suggestions

674b60a

ariG23498 requested a review from stevhliu December 3, 2024 08:23

yiyixuxu added the roadmap Add to current release roadmap label Dec 3, 2024

stevhliu approved these changes Dec 3, 2024

View reviewed changes

docs/source/en/quantization/bitsandbytes.md Outdated Show resolved Hide resolved

Update docs/source/en/quantization/bitsandbytes.md

42673fa

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

Merge branch 'main' into aritra/qunat-blog

1a0d9d0

sayakpaul reviewed Dec 4, 2024

View reviewed changes

docs/source/en/quantization/bitsandbytes.md Show resolved Hide resolved

docs/source/en/quantization/bitsandbytes.md Outdated Show resolved Hide resolved

ariG23498 added 2 commits December 5, 2024 00:56

Merge branch 'main' into aritra/qunat-blog

0bf4c49

OK

review suggestions

1d96c52

ariG23498 requested review from sayakpaul and stevhliu December 4, 2024 19:30

sayakpaul reviewed Dec 5, 2024

View reviewed changes

sayakpaul and others added 3 commits December 5, 2024 08:10

Merge branch 'main' into aritra/qunat-blog

ee79bf5

chore: review suggestions

daccd75

Merge branch 'aritra/qunat-blog' of github.com:ariG23498/diffusers in…

a09d104

…to aritra/qunat-blog OK

ariG23498 requested a review from sayakpaul December 5, 2024 03:42

sayakpaul reviewed Dec 5, 2024

View reviewed changes

sayakpaul approved these changes Dec 5, 2024

View reviewed changes

stevhliu approved these changes Dec 5, 2024

View reviewed changes

docs/source/en/quantization/bitsandbytes.md Outdated Show resolved Hide resolved

docs/source/en/quantization/bitsandbytes.md Outdated Show resolved Hide resolved

docs/source/en/quantization/bitsandbytes.md Outdated Show resolved Hide resolved

Apply suggestions from code review

b950220

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

stevhliu reviewed Dec 5, 2024

View reviewed changes

ariG23498 added 3 commits December 6, 2024 01:22

adding same changes to 4 bit section

235d94f

review suggestions

a1e8a23

Merge branch 'main' into aritra/qunat-blog

f9c9b5f

OK

stevhliu merged commit bf64b32 into huggingface:main Dec 5, 2024
1 check passed

		bitsandbytes is supported in both Transformers and Diffusers, so you can can quantize both the
		[`FluxTransformer2DModel`] and [`~transformers.T5EncoderModel`].

[Guide] Quantize your Diffusion Models with bnb #10012

[Guide] Quantize your Diffusion Models with bnb #10012

Uh oh!

Conversation

ariG23498 commented Nov 25, 2024

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcuenca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 29, 2024

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ariG23498 commented Dec 3, 2024

Uh oh!

stevhliu commented Dec 3, 2024

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

stevhliu left a comment

[Guide] Quantize your Diffusion Models with `bnb` #10012

[Guide] Quantize your Diffusion Models with `bnb` #10012