[Quant][X86] add an op to compute uint8 pointwise mul#151112
[Quant][X86] add an op to compute uint8 pointwise mul#151112Xia-Weiwen wants to merge 6 commits into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151112
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 8afe2e8 with merge base 7f28c03 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Hi @jerryzh168 Could you please review this PR? Thanks. |
1 similar comment
|
Hi @jerryzh168 Could you please review this PR? Thanks. |
ghstack-source-id: b785a3d Pull Request resolved: pytorch/pytorch#151112
| qa = torch.quantize_per_tensor(a, s_a, z_a, torch.quint8) | ||
| qb = torch.quantize_per_tensor(b, s_b, z_b, torch.quint8) | ||
| dqa = qa.dequantize() | ||
| dqb = qb.dequantize() | ||
| c_ref = dqa * dqb | ||
| if output_dtype == torch.uint8: | ||
| c_ref = torch.quantize_per_tensor(c_ref, s_c, z_c, torch.quint8).int_repr() |
There was a problem hiding this comment.
we have the quantized_decomposed ops that's more recent btw
There was a problem hiding this comment.
Thanks for the suggestion. I have changed it to the new version.
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Summary
Add a new op,
onednn.qmul.tensor, for int8 elementwise mul, which accepts inputs on CPU device (instead of QuantizedCPU).The new op is implemented by AVX512 instructions and it provides similar or better performance, depending on shape, than its counterpart for QuantizedCPU device
quantized.mul.The new op supports output dtypes other than uint8 (fp32, fp16 and bf16 are supported).
Test plan
cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168