[bc-breaking] enable direct configuration in quantize_#1595
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1595
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit d63e657 with merge base d3306b2 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fb0703f ghstack-comment-id: 2607756510 Pull Request resolved: #1595
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 73e9a5c ghstack-comment-id: 2607756510 Pull Request resolved: #1595
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: ff2d58b ghstack-comment-id: 2607756510 Pull Request resolved: #1595
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 05b6a54 ghstack-comment-id: 2607756510 Pull Request resolved: #1595
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: e4f1550 ghstack-comment-id: 2607756510 Pull Request resolved: #1595
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: c0716ed ghstack-comment-id: 2607756510 Pull Request resolved: #1595
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5672018 ghstack-comment-id: 2607756510 Pull Request resolved: #1595
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 2cb59ed ghstack-comment-id: 2607756510 Pull Request resolved: #1595
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: fc9a5c1 ghstack-comment-id: 2607756510 Pull Request resolved: #1595
andrewor14
left a comment
There was a problem hiding this comment.
Looks great! Mostly just minor doc nits.
| ) | ||
| @unittest.skipIf(not torch.cuda.is_available(), "Need CUDA available") | ||
| def test_print_quantized_module(self, apply_quant): | ||
| print(apply_quant) |
| @@ -0,0 +1,10 @@ | |||
| import abc | |||
There was a problem hiding this comment.
I feel we can just add this to torchao/config.py without making a new core directory. No strong preference though
There was a problem hiding this comment.
slightly stronger preference is I feel "core" shouldn't appear in the import, so users should be able to do this:
from torchao.config import AOBaseConfig
but we can do that by adding this to __init__.py
| not TORCH_VERSION_AT_LEAST_2_4, "skipping when torch version is 2.4 or lower" | ||
| ) | ||
| def test_quantize_api(self): | ||
| def test_quantize_api_standalone(self): |
There was a problem hiding this comment.
it's convenient from being able to filter for only this test from the commandline. I can remove it if you'd like.
| handler, | ||
| _is_linear if filter_fn is None else filter_fn, | ||
| device=device, | ||
| extra_args=(config,), |
There was a problem hiding this comment.
alternatively we can pass in a lambda, then we don't need to add extra_args or pass in config:
replace_fn = lambda mod: handler(mod, config)
seems simpler
There was a problem hiding this comment.
I'm really not a fan of passing callables around, it's easy when the callable is simple but easy for future people to tack ugly stuff on and increase complexity. Non-callable args make it harder to make the code ugly in the future.
There was a problem hiding this comment.
oh sorry, I meant pass in replace_fn instead of handler, like:
replace_fn = lambda mod: handler(mod, config)
_replace_with_custom_fn_if_matches_filter(
model,
replace_fn,
_is_linear if filter_fn is None else filter_fn,
device=device,
)
either way you're passing a callable
There was a problem hiding this comment.
hmm, still not a fan of replace_fn = lambda mod: handler(mod, config). This changes replace_fn from a stateless callable to a stateful callable, where the state is hard to inspect. It's less LOC but harder to debug IMO.
Summary: POC for: * decoupling configuration from transformation * stop passing obscure stateful callables around * enable printing of configuration * reduce amount of context switching to navigate the logic from `quantize_` to quantizing a single module TODO more polish before wider discussion. Test Plan: ``` pytest test/quantization/test_quant_api.py -s -x -k test_int4_weight_only_numerics pytest test/quantization/test_qat.py -s -x -k test_quantize_api_standalone pytest test/quantization/test_qat.py -s -x -k test_quantize_api_convert_path ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 5f5330c ghstack-comment-id: 2607756510 Pull Request resolved: #1595
| quantize_(linear, apply_quant) | ||
| else: | ||
| # TODO(#1690): delete this once config migration is done | ||
| ql = apply_quant(linear) |
There was a problem hiding this comment.
have a few partners where we need to forward fix BC issues including HuggingFace transformers, Optimimum, SGLang and Diffusers
There was a problem hiding this comment.
@msaroufim do you have a link?
I don't expect any BC breakages of people using the quantize_ API as specified in the docs. The BC breaking change would be if people are applying their transform on linear layers directly, without using quantize_.
There was a problem hiding this comment.
SGLANG callsite: https://github.com/sgl-project/sglang/blob/2f47d710ae9cb1bdbbe0fe2392a0634827d257b3/python/sglang/srt/layers/torchao_utils.py#L39
Diffusers callsite: https://github.com/huggingface/diffusers/blob/7fb481f840b5d73982cafd1affe89f21a5c0b20b/src/diffusers/quantizers/torchao/torchao_quantizer.py#L234
we should definitely test these, but they look like they will be unaffected to me
* Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned]
summary
This PR enables passing per-workflow arguments to
quantize_directly, without wrapping them in aCallable.Motivation: passing direct configuraton is intuintive and widely used in similar contexts across various projects. Passing configuration wrapped in a callable is IMO not intuitive, hard to understand and debug, and we have evidence that it pushes a portion of users from building on top of torchao.
We will keep the old callable syntax supported by
quantize_for one release cycle, and delete it afterwards. We will keep the old names as aliases for new names going forward (example:int4_weight_onlyas an alias ofInt4WeightOnlyConfig) to keep existing callsites working without changes.user facing API changes
signature of quantize_
usage example
An example for
int4_weight_onlydeveloper facing changes
See the PR details for examples, but they can be summarized as:
current status
The current PR migrates three user facing workflows:
int4_weight_onlyintx_quantization_aware_trainingandfrom_intx_quantization_aware_trainingI've chosen to migrate one PTQ and two QAT workflows to prove generality of the new flow, but avoid a high LOC in this PR to make it easier to review. We will migrate the rest of the workflows in future PRs, detailed below:
After a release cycle, we will delete the old callable syntax.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: