Add module fqn regex support for ModuleFqnToConfig#3084
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3084
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit de999a5 with merge base 7690612 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
does it work for quantizing a HuggingFace model and then running it in vLLM? |
not yet, we need follow up changes in these repos to handle this, i.e. https://github.com/huggingface/transformers/blob/071eb5334f5a9ac2c7a13515219be8a272388ec6/src/transformers/quantizers/quantizer_torchao.py#L302 and https://github.com/vllm-project/vllm/blob/8bf8f4582208ac7af230512ff5f3ac1dc36d5222/vllm/model_executor/layers/quantization/torchao.py#L126 |
|
I'd love to see the following in this PR:
IMO it's hard to know if this PR is landable without having the two things above |
f138154 to
eee92e6
Compare
Summary: Similar to pytorch/ao#3084 we added regex support in transformers so people can use regex to quantize the models. See pytorch/ao#3084 for docs and precedence of different configurations Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev Test Plan: pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex Reviewers: Subscribers: Tasks: Tags:
Summary: Similar to pytorch/ao#3084 we added regex support in transformers so people can use regex to quantize the models. See pytorch/ao#3084 for docs and precedence of different configurations Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev Test Plan: pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex Reviewers: Subscribers: Tasks: Tags:
|
@vkuzo just updated the PR to include more docs and details on precedence of configs and please check out the summary for the e2e tests in transformers and vllm. Please take a look again |
|
|
||
| Config key ordered by precedence: | ||
| * fully qualified module name, e.g. `language.layers.0.q_proj` | ||
| * regex for module names, e.g. `language.layers.*.q_proj` |
There was a problem hiding this comment.
clarify that regexes will be matched in the order that they appear in the dictionary
There was a problem hiding this comment.
will do, also I think we need to change to OrderedDict to keep the order consistent
4ad3bd1 to
8c70326
Compare
|
Updates:
|
32b8793 to
9b5f773
Compare
| # fallback to use default if no module specific config is provided | ||
| c = config.module_fqn_to_config.get("_default", None) | ||
| for maybe_module_fqn_pattern in config.module_fqn_to_config: | ||
| if re.search(maybe_module_fqn_pattern, module_fqn): |
There was a problem hiding this comment.
Thoughts on changing this logic from any match in the string, to must match the entire string?
I.e. for some regex r, we actually do re.search(^r$, module_fqn). Reason I'm proposing this is because for gpt-oss we have gate_up_proj and gate_up_proj_bias, and only one should be quantized.
There was a problem hiding this comment.
should this be done in the regex itself? like ...gate_up_proj$
There was a problem hiding this comment.
yeah I'm just a little worried people will be dumb like me and accidentally quantize both, but im fine with how it is now too.
There was a problem hiding this comment.
I could add a note in the doc I guess, although I feel we should not change the meaning of the regex in the code, but leave this to user instead
There was a problem hiding this comment.
Reason I'm proposing this is because for gpt-oss we have gate_up_proj and gate_up_proj_bias, and only one should be quantized.
that's a good point. If the user specifies gate_up_proj, seems like with the current logic it would apply an exact match to gate_up_proj, and a regex match to gate_up_proj_bias? That's definitely a gotcha, IMO we should ensure every key is either a regex or an exact match to remove the ambiguity.
There was a problem hiding this comment.
sorry I think I meant to use fullmatch function, not search (https://docs.python.org/3/library/re.html#search-vs-match)
but discussed offline that we want to go with explicitly calling out regex configs. I plan to use a prefix of re:, same as llm-compressor
9b5f773 to
97e48e0
Compare
Summary: Similar to pytorch/ao#3084 we added regex support in transformers so people can use regex to quantize the models. See pytorch/ao#3084 for docs and precedence of different configurations Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev Test Plan: pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex Reviewers: Subscribers: Tasks: Tags:
Summary: Similar to pytorch/ao#3084 we added regex support in transformers so people can use regex to quantize the models. See pytorch/ao#3084 for docs and precedence of different configurations Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev Test Plan: pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex Reviewers: Subscribers: Tasks: Tags:
e724d12 to
c3c249b
Compare
Summary: To simplify the config file for torchao quantized models we want to allow people to configure the ModuleFqnToConfig through regex, e.g. `linear*`, `language.layers.*.gate_proj` Test Plan: python test/quantization/test_quant_api.py -k test_module_fqn_to_config_module_name_regex Reviewers: Subscribers: Tasks: Tags:
c3c249b to
de999a5
Compare
Summary: Similar to pytorch/ao#3084 we added regex support in transformers so people can use regex to quantize the models. See pytorch/ao#3084 for docs and precedence of different configurations Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev Test Plan: pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex Reviewers: Subscribers: Tasks: Tags:
| @@ -2409,8 +2405,16 @@ def _module_fqn_to_config_handler( | |||
| # Maybe: we can add module type specific config in the future, in needed | |||
| c = config.module_fqn_to_config[module_fqn] | |||
There was a problem hiding this comment.
nit: assert that the pattern does not start with re:, for clarity
|
thank you! |
* Add regex support for ModuleFqnToConfig Summary: Similar to pytorch/ao#3084 we added regex support in transformers so people can use regex to quantize the models. See pytorch/ao#3084 for docs and precedence of different configurations Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev Test Plan: pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex Reviewers: Subscribers: Tasks: Tags: * Apply style fixes * add assert for --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
…ng regex Summary: att, we are adding regex support to simplify the config, and enabling the support in both transformers and vllm to make sure regex config works everywhere torchao PR that adds the functionality to quantize_ API: pytorch/ao#3084 transformer PR: Test Plan: We save the model with the regex config in transformers, in vllm we just make sure we can load the model: pytest tests/quantization/test_torchao.py test_opt_125m_module_fqn_to_config_regex_model_loading_with_params Reviewers: Subscribers: Tasks: Tags: Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
* Add regex support for ModuleFqnToConfig Summary: Similar to pytorch/ao#3084 we added regex support in transformers so people can use regex to quantize the models. See pytorch/ao#3084 for docs and precedence of different configurations Uploaded model: https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev Test Plan: pytest tests/quantization/torchao_integration/test_torchao.py -k test_module_fqn_to_config_regex Reviewers: Subscribers: Tasks: Tags: * Apply style fixes * add assert for --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Summary:
To simplify the config file for torchao quantized models we want to allow people to configure the ModuleFqnToConfig through regex, e.g.
re:linear.+,re:language.layers.mlp\..+\.gate_projNote: this does not change the previous behavior of specifying full fqns, the current supported configurations:
re:)and 1 takes precedence over 2. e.g. for a model with linear1 and linear2 submodules, if we have:
then m.linear1 will have config1 and m.linear2 will have config2, and all other modules will have config3
Note: changing the type of dict from Dict to OrderedDict is not bc breaking, tested with https://huggingface.co/torchao-testing/opt-125m-ModuleFqnToConfig-v1-regex-0.14.0.dev#test-loading (produced before the change) and it still works
Test Plan:
unit tests
pytest test/quantization/test_quant_api.py -k test_module_fqn_to_config_regex
e2e test
Reviewers:
Subscribers:
Tasks:
Tags: