[bc-breaking] enable direct configuration in quantize_ by vkuzo · Pull Request #1595 · pytorch/ao

vkuzo · 2025-01-22T16:49:12Z

summary

This PR enables passing per-workflow arguments to quantize_ directly, without wrapping them in a Callable.

Motivation: passing direct configuraton is intuintive and widely used in similar contexts across various projects. Passing configuration wrapped in a callable is IMO not intuitive, hard to understand and debug, and we have evidence that it pushes a portion of users from building on top of torchao.

We will keep the old callable syntax supported by quantize_ for one release cycle, and delete it afterwards. We will keep the old names as aliases for new names going forward (example: int4_weight_only as an alias of Int4WeightOnlyConfig) to keep existing callsites working without changes.

user facing API changes

signature of quantize_

#
# before
#
def quantize(
    model: torch.nn.Module,
    apply_tensor_subclass: Callable[[torch.nn.Module], torch.nn.Module],
    ...,
): ...

#
# after - intermediate state, support both old and new for one release
#
def quantize(
    model: torch.nn.Module,
    config: Union[AOBaseConfig, Callable[[torch.nn.Module], torch.nn.Module]],
    ...,
): ...

#
# after - long term state
#
def quantize(
    model: torch.nn.Module,
    config: AOBaseConfig,
    ...,
): ...

usage example

An example for int4_weight_only

#
# before
#
quantize_(m, int4_weight_only(group_size=32))

#
# after, with new user facing names
#
quantize_(m, Int4WeightOnlyConfig(group_size=32))

#
# AND, after, with BC names
#
quantize_(m, int4_weight_only(group_size=32))

developer facing changes

See the PR details for examples, but they can be summarized as:

#
# old
#

# quantize_ calls the instance of calling this function on each module of the model
def int4_weight_only(group_size: int, ...) -> Callable:

    def new_callable(weight: torch.Tensor):
        # configuration is captured here via local variables
        ...
        
    # return type is a Callable
    return _get_linear_subclass_inserter(new_callable)

#
# new
#

# config base class
class AOBaseConfig(abc.ABC):
    pass

# user facing configuration of a workflow
@dataclass
class Int4WeightOnlyConfig(AOBaseConfig):
    group_size: int = 128
    ...

# not user facing transform of a module according to a worfklow's configuration
@register_quantize_module_handler(Int4WeightOnlyConfig)
def _int4_weight_only_transform(
    module: torch.nn.Module, 
    config: Int4WeightOnlyConfig,
) -> torch.nn.Module:
    # map to AQT, not user facing
    ...

current status

The current PR migrates three user facing workflows:

PTQ's int4_weight_only
QAT's intx_quantization_aware_training and from_intx_quantization_aware_training

I've chosen to migrate one PTQ and two QAT workflows to prove generality of the new flow, but avoid a high LOC in this PR to make it easier to review. We will migrate the rest of the workflows in future PRs, detailed below:

int8_dynamic_activation_int4_weight
int8_dynamic_activation_int8_weight
int8_dynamic_activation_int8_semi_sparse_weight
int8_weight_only
float8_weight_only
float8_dynamic_activation_float8_weight
float8_static_activation_float8_weight
uintx_weight_only
fpx_weight_only
gemlite_uintx_weight_only
callsites from the prototype folder

After a release cycle, we will delete the old callable syntax.

Test Plan:

pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics
pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone
pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-01-22T16:49:13Z

Stack from ghstack (oldest at bottom):

pytorch-bot · 2025-01-22T16:49:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1595

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit d63e657 with merge base d3306b2 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fb0703f ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 73e9a5c ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ff2d58b ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 05b6a54 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e4f1550 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c0716ed ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5672018 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2cb59ed ghstack-comment-id: 2607756510 Pull Request resolved: #1595

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fc9a5c1 ghstack-comment-id: 2607756510 Pull Request resolved: #1595

andrewor14

Looks great! Mostly just minor doc nits.

andrewor14 · 2025-02-05T21:30:53Z

    )
    @unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available")
    def test_print_quantized_module(self, apply_quant):
+        print(apply_quant)


andrewor14 · 2025-02-05T21:33:16Z

@@ -0,0 +1,10 @@
+import abc


I feel we can just add this to torchao/config.py without making a new core directory. No strong preference though

slightly stronger preference is I feel "core" shouldn't appear in the import, so users should be able to do this:

from torchao.config import AOBaseConfig

but we can do that by adding this to __init__.py

andrewor14 · 2025-02-05T21:35:23Z

        not TORCH_VERSION_AT_LEAST_2_4, "skipping when torch version is 2.4 or lower"
    )
-    def test_quantize_api(self):
+    def test_quantize_api_standalone(self):


do we need this change?

it's convenient from being able to filter for only this test from the commandline. I can remove it if you'd like.

andrewor14 · 2025-02-05T21:41:22Z

+            handler,
+            _is_linear if filter_fn is None else filter_fn,
+            device=device,
+            extra_args=(config,),


alternatively we can pass in a lambda, then we don't need to add extra_args or pass in config:

replace_fn = lambda mod: handler(mod, config)

seems simpler

I'm really not a fan of passing callables around, it's easy when the callable is simple but easy for future people to tack ugly stuff on and increase complexity. Non-callable args make it harder to make the code ugly in the future.

oh sorry, I meant pass in replace_fn instead of handler, like:

replace_fn = lambda mod: handler(mod, config) _replace_with_custom_fn_if_matches_filter( model, replace_fn, _is_linear if filter_fn is None else filter_fn, device=device, )

either way you're passing a callable

hmm, still not a fan of replace_fn = lambda mod: handler(mod, config). This changes replace_fn from a stateless callable to a stateful callable, where the state is hard to inspect. It's less LOC but harder to debug IMO.

[ghstack-poisoned]

Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5f5330c ghstack-comment-id: 2607756510 Pull Request resolved: #1595

msaroufim · 2025-02-10T21:37:58Z

+                quantize_(linear, apply_quant)
+            else:
+                # TODO(#1690): delete this once config migration is done
+                ql = apply_quant(linear)


have a few partners where we need to forward fix BC issues including HuggingFace transformers, Optimimum, SGLang and Diffusers

@msaroufim do you have a link?

I don't expect any BC breakages of people using the quantize_ API as specified in the docs. The BC breaking change would be if people are applying their transform on linear layers directly, without using quantize_.

HF callsite: https://github.com/huggingface/transformers/blob/1feebb5b4150882deabddd190a541f336f3be817/src/transformers/quantizers/quantizer_torchao.py#L199

SGLANG callsite: https://github.com/sgl-project/sglang/blob/2f47d710ae9cb1bdbbe0fe2392a0634827d257b3/python/sglang/srt/layers/torchao_utils.py#L39

Diffusers callsite: https://github.com/huggingface/diffusers/blob/7fb481f840b5d73982cafd1affe89f21a5c0b20b/src/diffusers/quantizers/torchao/torchao_quantizer.py#L234

we should definitely test these, but they look like they will be unaffected to me

[ghstack-poisoned]

* Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned]

Update

24114ce

[ghstack-poisoned]

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 22, 2025

vkuzo changed the title ~~[wip] configs configs configs!~~ [rfc] enable direct configuration in quantize_, v2 Jan 22, 2025

vkuzo added the module: bc-breaking Use this tag if this PR breaks backward compatibility label Jan 22, 2025

vkuzo mentioned this pull request Jan 22, 2025

[rfc] enable direct configuration in quantize_ #1585

Closed

Update

5b9d876

[ghstack-poisoned]

Update

1cea42f

[ghstack-poisoned]

Update

138883b

[ghstack-poisoned]

Update

ba045ea

[ghstack-poisoned]

Update

94d9426

[ghstack-poisoned]

vkuzo requested review from HDCharles, andrewor14, drisspg and jerryzh168 January 23, 2025 16:15

vkuzo changed the title ~~[rfc] enable direct configuration in quantize_, v2~~ [bc-breaking] enable direct configuration in quantize_, v2 Jan 23, 2025

vkuzo changed the title ~~[bc-breaking] enable direct configuration in quantize_, v2~~ [bc-breaking] enable direct configuration in quantize_ Jan 23, 2025

drisspg reviewed Jan 23, 2025

View reviewed changes

Comment thread torchao/core/config.py Outdated

drisspg reviewed Jan 23, 2025

View reviewed changes

Comment thread torchao/quantization/_transform_module.py Outdated

drisspg reviewed Jan 23, 2025

View reviewed changes

Comment thread torchao/quantization/_transform_module.py Outdated

Update

b589ce7

[ghstack-poisoned]

drisspg approved these changes Jan 23, 2025

View reviewed changes

vkuzo mentioned this pull request Jan 29, 2025

make smoothquant more PT2 friendly #1639

Open

Update

aaba2d8

[ghstack-poisoned]

Update

26850da

[ghstack-poisoned]

andrewor14 approved these changes Feb 5, 2025

View reviewed changes

Update

7caecb1

[ghstack-poisoned]

msaroufim reviewed Feb 10, 2025

View reviewed changes

This was referenced Feb 10, 2025

migration of quantize_ workflow configuration from callables to configs #1690

Closed

config migration: float8* #1694

Merged

Update

0542402

[ghstack-poisoned]

This was referenced Feb 11, 2025

config migration: int* #1696

Merged

config migration: fpx, gemlite, uintx #1697

Merged

vkuzo added 2 commits February 11, 2025 08:02

Update

fac3263

[ghstack-poisoned]

Update

d63e657

[ghstack-poisoned]

This was referenced Feb 13, 2025

unbreak float8 static quant tutorial #1709

Merged

migrate static quant tutorials to direct configuration #1710

Merged

update torchao READMEs with new configuration APIs #1711

Merged

vkuzo merged commit 52f4737 into main Feb 14, 2025

Conversation

vkuzo commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

summary

user facing API changes

signature of quantize_

usage example

developer facing changes

current status

Uh oh!

vkuzo commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1595

✅ No Failures

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrewor14 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

vkuzo commented Jan 22, 2025 •

edited

Loading

vkuzo commented Jan 22, 2025 •

edited

Loading

pytorch-bot Bot commented Jan 22, 2025 •

edited

Loading