Add new QAT API through quantize_ by andrewor14 · Pull Request #1415 · pytorch/ao

andrewor14 · 2024-12-13T17:13:21Z

Summary: This commit adds a new QAT API that can be used with the existing quantize_. This is an alternative to the old QAT *Quantizer APIs, which are much less flexible. The new API can be used as follows:

from torchao import quantize_
from torchao.quantization.qat import (
    FakeQuantizeConfig,
    intx_quantization_aware_training,
)
my_model = ...
activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False)
weight_config = FakeQuantizeConfig(torch.int4, group_size=32)
quantize_(
    my_model,
    intx_quantization_aware_training(activation_config, weight_config),
)

Test Plan:
python test/quantization/test_qat.py -k test_quantize_api

pytorch-bot · 2024-12-13T17:13:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1415

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit eb0d868 with merge base ebc4303 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh) (trunk failure)
test/integration/test_integration.py::TestSubclass::test_int8_dynamic_quant_subclass_api_5_cuda

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jerryzh168 · 2024-12-13T18:24:53Z

+        elif isinstance(mod, torch.nn.Embedding):
+            if activation_config is not None:
+                raise ValueError("Embedding does not support QAT for activations")
+            return FakeQuantizedEmbedding.from_embedding(mod, weight_config)


the function takes both activation_config and weight_config, but in this case we only use weight_config, this might be confusing

should we be separating these two branches to 2 APIs?

I feel it's better to only have one API, otherwise we'll need a separate API for each type of layer we support. Also it's more consistent with PTQ where we only have one int8_weight_only and let the filter_fn decide which layers to apply the transformation on (instead of having separate int8_weight_only_linear and int8_weight_only_embedding)

oh the separation I'm referring to is to separate int8_act_int4_weight and int8_weight_only here, not linear v.s. embedding

oh I see, you mean separate this into the following?

intx_activation_intx_weight_quantization_aware_training intx_weight_only_quantization_aware_training

I can do that, but the naming seems a bit long... any suggestions?

Ok, separated into these two for now. Please let me know if you have better suggestions for the naming:

intx_quantization_aware_training (for both act + weight) intx_weight_only_quantization_aware_training

maybe intx_quantization_aware_training can be intx_dynamic_quantization_aware_training?

or just expand to match ptq and then add a qat as suffix, that might be clearer?

Discussed this offline: we settled on having a single intx_quantization_aware_training (not separating into activation+weight and weight-only), which is the most general and can be extended to support output activation in the future. Having "dynamic" in the name is also not great because we also support FakeQuantizeConfig(is_dynamic=False).

Since QAT only supports linear and embedding, and embedding only supports weight-only QAT, we will support the following:

# ok quantize_(model, intx_quantization_aware_training(act=config1, weight=config2), is_linear) quantize_(model, intx_quantization_aware_training(weight=config3), is_embedding) # throws an exception quantize_(model, intx_quantization_aware_training(act=config1, weight=config2), is_embedding) quantize_(model, intx_quantization_aware_training(act=config1, weight=config2), is_conv)

Summary: This commit adds a new QAT API that can be used with the existing `quantize_`. This is an alternative to the old QAT *Quantizer APIs, which are much less flexible. The new API can be used as follows: ``` from torchao import quantize_ from torchao.quantization.qat import ( FakeQuantizeConfig, intx_quantization_aware_training, ) my_model = ... activation_config = FakeQuantizeConfig( torch.int8, "per_token", is_symmetric=False, ) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) quantize_( my_model, intx_quantization_aware_training(activation_config, weight_config), ) ``` Test Plan: python test/quantization/test_qat.py -k test_quantize_api python test/quantization/test_qat.py -k test_quantize_api_errors

andrewor14 · 2024-12-16T15:08:08Z

Thanks, merging this!

andrewor14 · 2024-12-16T15:08:13Z

@pytorchbot merge

pytorchmergebot · 2024-12-16T15:08:46Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Summary: #1415 added a quantize_ QAT API for the prepare path. This commit adds the remaining convert path for users to actually perform end-to-end QAT using the quantize_ API. The new flow will look like: ``` from torchao.quantization import ( quantize_, int8_dynamic_activation_int4_weight, ) from torchao.quantization.qat import ( FakeQuantizeConfig, from_intx_quantization_aware_training, intx_quantization_aware_training, ) activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) quantize_( my_model, intx_quantization_aware_training(activation_config, weight_config), ) quantize_(my_model, from_intx_quantization_aware_training()) quantize_(my_model, int8_dynamic_activation_int4_weight(group_size=32)) ``` Test Plan: python test/quantization/test_qat.py -k test_quantize_api_convert_path

Summary: #1415 added a quantize_ QAT API for the prepare path. This commit adds the remaining convert path for users to actually perform end-to-end QAT using the quantize_ API. The new flow will look like: ``` from torchao.quantization import ( quantize_, int8_dynamic_activation_int4_weight, ) from torchao.quantization.qat import ( FakeQuantizeConfig, from_intx_quantization_aware_training, intx_quantization_aware_training, ) activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) quantize_( my_model, intx_quantization_aware_training(activation_config, weight_config), ) quantize_(my_model, from_intx_quantization_aware_training()) quantize_(my_model, int8_dynamic_activation_int4_weight(group_size=32)) ``` Test Plan: python test/quantization/test_qat.py -k test_quantize_api_convert_path [ghstack-poisoned]

**Summary:** This commit adds a new QAT API that can be used with the existing `quantize_`. This is an alternative to the old QAT *Quantizer APIs, which are much less flexible. The new API can be used as follows: ``` from torchao import quantize_ from torchao.quantization.qat import ( FakeQuantizeConfig, intx_quantization_aware_training, ) my_model = ... activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) quantize_( my_model, intx_quantization_aware_training(activation_config, weight_config), ) ``` **Test Plan:** python test/quantization/test_qat.py -k test_quantize_api Pull Request resolved: #1415 Approved by: https://github.com/jerryzh168

* Add convert path for quantize_ QAT API Summary: #1415 added a quantize_ QAT API for the prepare path. This commit adds the remaining convert path for users to actually perform end-to-end QAT using the quantize_ API. The new flow will look like: ``` from torchao.quantization import ( quantize_, int8_dynamic_activation_int4_weight, ) from torchao.quantization.qat import ( FakeQuantizeConfig, from_intx_quantization_aware_training, intx_quantization_aware_training, ) activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) quantize_( my_model, intx_quantization_aware_training(activation_config, weight_config), ) quantize_(my_model, from_intx_quantization_aware_training()) quantize_(my_model, int8_dynamic_activation_int4_weight(group_size=32)) ``` Test Plan: python test/quantization/test_qat.py -k test_quantize_api_convert_path [ghstack-poisoned] * Update on "Add convert path for quantize_ QAT API" Summary: #1415 added a quantize_ QAT API for the prepare path. This commit adds the remaining convert path for users to actually perform end-to-end QAT using the quantize_ API. The new flow will look like: ``` from torchao.quantization import ( quantize_, int8_dynamic_activation_int4_weight, ) from torchao.quantization.qat import ( FakeQuantizeConfig, from_intx_quantization_aware_training, intx_quantization_aware_training, ) activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) quantize_( my_model, intx_quantization_aware_training(activation_config, weight_config), ) quantize_(my_model, from_intx_quantization_aware_training()) quantize_(my_model, int8_dynamic_activation_int4_weight(group_size=32)) ``` Test Plan: python test/quantization/test_qat.py -k test_quantize_api_convert_path [ghstack-poisoned] * Update on "Add convert path for quantize_ QAT API" Summary: #1415 added a quantize_ QAT API for the prepare path. This commit adds the remaining convert path for users to actually perform end-to-end QAT using the quantize_ API. The new flow will look like: ``` from torchao.quantization import ( quantize_, int8_dynamic_activation_int4_weight, ) from torchao.quantization.qat import ( FakeQuantizeConfig, from_intx_quantization_aware_training, intx_quantization_aware_training, ) activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) quantize_( my_model, intx_quantization_aware_training(activation_config, weight_config), ) quantize_(my_model, from_intx_quantization_aware_training()) quantize_(my_model, int8_dynamic_activation_int4_weight(group_size=32)) ``` Test Plan: python test/quantization/test_qat.py -k test_quantize_api_convert_path [ghstack-poisoned] * Update on "Add convert path for quantize_ QAT API" Summary: #1415 added a quantize_ QAT API for the prepare path. This commit adds the remaining convert path for users to actually perform end-to-end QAT using the quantize_ API. The new flow will look like: ``` from torchao.quantization import ( quantize_, int8_dynamic_activation_int4_weight, ) from torchao.quantization.qat import ( FakeQuantizeConfig, from_intx_quantization_aware_training, intx_quantization_aware_training, ) activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) quantize_( my_model, intx_quantization_aware_training(activation_config, weight_config), ) quantize_(my_model, from_intx_quantization_aware_training()) quantize_(my_model, int8_dynamic_activation_int4_weight(group_size=32)) ``` Test Plan: python test/quantization/test_qat.py -k test_quantize_api_convert_path [ghstack-poisoned]

* Add convert path for quantize_ QAT API Summary: #1415 added a quantize_ QAT API for the prepare path. This commit adds the remaining convert path for users to actually perform end-to-end QAT using the quantize_ API. The new flow will look like: ``` from torchao.quantization import ( quantize_, int8_dynamic_activation_int4_weight, ) from torchao.quantization.qat import ( FakeQuantizeConfig, from_intx_quantization_aware_training, intx_quantization_aware_training, ) activation_config = FakeQuantizeConfig(torch.int8, "per_token", is_symmetric=False) weight_config = FakeQuantizeConfig(torch.int4, group_size=32) quantize_( my_model, intx_quantization_aware_training(activation_config, weight_config), ) quantize_(my_model, from_intx_quantization_aware_training()) quantize_(my_model, int8_dynamic_activation_int4_weight(group_size=32)) ``` Test Plan: python test/quantization/test_qat.py -k test_quantize_api_convert_path [ghstack-poisoned] * Update QAT READMEs using new APIs Add references to new QAT APIs including `quantize_`, `FakeQuantizedX`, and the new embedding Quantizers and ComposableQATQuantizer. Also link to new QAT + LoRA recipe in torchtune. [ghstack-poisoned] * Update base for Update on "Update QAT READMEs using new APIs" Add references to new QAT APIs including `quantize_`, `FakeQuantizedX`, and the new embedding Quantizers and ComposableQATQuantizer. Also link to new QAT + LoRA recipe in torchtune. [ghstack-poisoned] * Update base for Update on "Update QAT READMEs using new APIs" Add references to new QAT APIs including `quantize_`, `FakeQuantizedX`, and the new embedding Quantizers and ComposableQATQuantizer. Also link to new QAT + LoRA recipe in torchtune. [ghstack-poisoned] * Update base for Update on "Update QAT READMEs using new APIs" Add references to new QAT APIs including `quantize_`, `FakeQuantizedX`, and the new embedding Quantizers and ComposableQATQuantizer. Also link to new QAT + LoRA recipe in torchtune. [ghstack-poisoned] * Update base for Update on "Update QAT READMEs using new APIs" Add references to new QAT APIs including `quantize_`, `FakeQuantizedX`, and the new embedding Quantizers and ComposableQATQuantizer. Also link to new QAT + LoRA recipe in torchtune. [ghstack-poisoned] * Update base for Update on "Update QAT READMEs using new APIs" Add references to new QAT APIs including `quantize_`, `FakeQuantizedX`, and the new embedding Quantizers and ComposableQATQuantizer. Also link to new QAT + LoRA recipe in torchtune. [ghstack-poisoned]

andrewor14 requested a review from jerryzh168 December 13, 2024 17:13

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 13, 2024

andrewor14 force-pushed the new-qat-api branch 2 times, most recently from 6118d76 to 6190af7 Compare December 13, 2024 17:32

andrewor14 added the topic: improvement Use this tag if this PR is an improvement (doesn't fit into any of the other categories) label Dec 13, 2024

andrewor14 force-pushed the new-qat-api branch from 6190af7 to d3e31d1 Compare December 13, 2024 18:17

jerryzh168 reviewed Dec 13, 2024

View reviewed changes

Comment thread torchao/quantization/qat/embedding.py

jerryzh168 reviewed Dec 13, 2024

View reviewed changes

andrewor14 force-pushed the new-qat-api branch 2 times, most recently from 8293026 to c887099 Compare December 13, 2024 21:23

andrewor14 requested a review from jerryzh168 December 13, 2024 21:24

andrewor14 force-pushed the new-qat-api branch 3 times, most recently from afbac81 to 2666b80 Compare December 13, 2024 22:41

andrewor14 force-pushed the new-qat-api branch from 2666b80 to eb0d868 Compare December 13, 2024 22:45

jerryzh168 approved these changes Dec 13, 2024

View reviewed changes

pytorchmergebot added the merging label Dec 16, 2024

pytorchmergebot added the Merged label Dec 16, 2024

pytorchmergebot closed this in 200589b Dec 16, 2024

pytorchmergebot removed the merging label Dec 16, 2024

andrewor14 mentioned this pull request Jan 10, 2025

Add convert path for quantize_ QAT API #1540

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new QAT API through quantize_#1415

Add new QAT API through quantize_#1415
andrewor14 wants to merge 1 commit into
mainfrom
new-qat-api

andrewor14 commented Dec 13, 2024

Uh oh!

pytorch-bot Bot commented Dec 13, 2024 •

edited

Loading

Uh oh!

Uh oh!

jerryzh168 Dec 13, 2024 •

edited

Loading

Uh oh!

andrewor14 Dec 13, 2024

Uh oh!

jerryzh168 Dec 13, 2024

Uh oh!

andrewor14 Dec 13, 2024

Uh oh!

andrewor14 Dec 13, 2024

Uh oh!

jerryzh168 Dec 13, 2024

Uh oh!

jerryzh168 Dec 13, 2024

Uh oh!

andrewor14 Dec 13, 2024

Uh oh!

andrewor14 commented Dec 16, 2024

Uh oh!

andrewor14 commented Dec 16, 2024

Uh oh!

pytorchmergebot commented Dec 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

andrewor14 commented Dec 13, 2024

Uh oh!

pytorch-bot Bot commented Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1415

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

jerryzh168 Dec 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrewor14 Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

andrewor14 Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

andrewor14 Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

andrewor14 Dec 13, 2024

Choose a reason for hiding this comment

Uh oh!

andrewor14 commented Dec 16, 2024

Uh oh!

andrewor14 commented Dec 16, 2024

Uh oh!

pytorchmergebot commented Dec 16, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

pytorch-bot Bot commented Dec 13, 2024 •

edited

Loading

jerryzh168 Dec 13, 2024 •

edited

Loading