Enable pt2e quantization path for arm#146690
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/146690
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 36325da with merge base 4854926 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "module: arm" |
|
@pytorchbot label "module: cpu" |
|
@jerryzh168 can you please review this pr, thankyou. |
|
@pytorchbot label "ciflow/linux-aarch64" |
|
To add these label(s) (ciflow/linux-aarch64) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page). This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
|
jerryzh168
left a comment
There was a problem hiding this comment.
thanks, the quantizer can be owned by ARM I think, so LGTM. can you add some tests, similar to https://github.com/pytorch/pytorch/blob/main/test/quantization/pt2e/test_x86inductor_quantizer.py ?
0774646 to
13176d6
Compare
|
Hi @jerryzh168, thanks for the quick response. I added the tests for the arm_inductor_quantizer config. can you add the label "ciflow/linux-aarch64" and trigger the CI pipelines |
|
To add the ciflow label This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows. |
There was a problem hiding this comment.
are these needed for pt2e quant stack? I feel these are only needed for the older fx stack. cc @Xia-Weiwen
There was a problem hiding this comment.
Yes, I think it's only needed for old stacks.
There was a problem hiding this comment.
Why this code changes are needed for PT2E quantization?
|
Hi @huydhn, when i am trying to reply to a comment then it showing me the pending label so the other reviewers are not able to see my comments, can you help me with this. |
|
Hi @jerryzh168 , @Xia-Weiwen, replay for this (#146690 (comment)) To integrate the Additionally, if the qconfig change was not done then, the system defaults to the "x86" configuration, which leads to an error when we use the ARM configuration. |
|
Hi @leslie-fang-intel, replay to this comment (#146690 (comment)) To set the default qconfig as |
23338ae to
2f26b4d
Compare
|
@choudhary-devang @jerryzh168 In ExecuTorch there is already Arm quantiser (https://github.com/pytorch/executorch/blob/main/backends/arm/quantizer/arm_quantizer.py) that is using TOSA as backend for quantization in order to target devices such as Ethos-U. I was wondering whether we can rename this quantiser to be (cc: @digantdesai @freddan80) |
is ARM ops just (1) a different implementation of onednn ops, or (2) will they be using different hardware instructions and target different hardwares? I think we can merge into onednn if it's (1), but we should have a separate quantizer if it's (2), even with (2) you can compose with onednn quantizer with composable_quantizer: https://github.com/pytorch/pytorch/blob/main/torch/ao/quantization/quantizer/composable_quantizer.py and use one quantizer to quantize one part of the model and the other quantizer to quantize the other part |
@jerryzh168 Hello, good to e-meet you! 2. This quantizer is is for Arm NPUs. I agree with @milpuz01, we should consider changing the name. Having an Perhaps there should be a naming convention for quantizers :) @digantdesai your thought on this? |
…tor_quantizer and x86_inductor_quantizer
This reverts commit 4db960a.
2f26b4d to
36325da
Compare
@jerryzh168, for this path, Arm and Intel basically share the same high level API which is oneDNN. The same mkldnn/onednn lowerings in inductor are shared between aarch64 and x86. Having said that, I think we fall into case (1) |
|
Perhaps I read the question wrong. To clarify. ArmQuantizer: For NPUs |
|
@freddan80 nice to meet you as well, also thanks for clarifications @fadara01. I thought oneDNN is just for intel cpu, in that case I think it will be better to merge into the existing X86InductorQuantizer (and should probably rename this to OnednnQuantizer), in general it can be per backend library I think, like fbgemm, onednn etc. |
cc @leslie-fang-intel about the renaming suggestion (X86InductorQuantizer -> OnednnQuantizer) |
Since for the backend optimization of |
@leslie-fang-intel so what should name be if we add ARM CPU support on top of x86 CPU? maybe |
Hi @jerryzh168 So, I was thinking we can have a seperate ARM quantizer (arm_inductor_quantizer.py) like how |
|
@choudhary-devang OK that sounds good, we just copy pasted the pt2e quant code to torchao, could you reopen this PR in torchao instead? https://github.com/pytorch/ao/tree/main/torchao/quantization/pt2e |
My only concern is about naming. I think OneDNN should be there in the name somehow, or there'll be confusion. For example, XNNpack has its |
Hi @jerryzh168, I have created a new pr as requested in torchao |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |


Title: Enable PyTorch 2 Export Quantization path for ARM CPUs.
Description:
Key Changes:
Introduces ARM-specific support by leveraging oneDNN kernels for matmuls and convolution.
Integrates pre-defined configuration selection to automatically choose the best quantization settings based on the selected quantization method.
Provides customization options via two flags:
These options allow users to tailor the quantization process for their specific workload requirements (e.g., using QAT for fine-tuning or PTQ for calibration-based quantization).
Testing and Validation:
The new ARM flow has been thoroughly tested across a range of models with all combinations:
NLP: Models such as BERT and T5.
Vision: Models like ResNet and ViT.
Custom Models: user defined models with various operators.
example script:
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @malfet @snadampal @milpuz01 @aditew01 @nikhil-arm @fadara01