[Inductor][CPU] Add torchao da8w8 pattern with sym quantized act & wgt by sanchitintel · Pull Request #142015 · pytorch/pytorch

sanchitintel · 2024-12-03T23:41:13Z

Summary

Extends #139595 for Inductor pattern-matching pattern covered for torchao API int8_dynamic_activation_int8_weight in the following scenario (inference-only, freezing enabled) -

int8 quantized (symmetrically) activation (per token quantized).
Statically (so, scales are also constant. But then they would have been constant even in case of dynamic quantization due to constant weights, anyway) per-channel int8 quantized (symmetrically) weights (which are also constant because freezing is enabled).

The pattern that's matched is torch._intmm -> convert to FP32/BF16 -> [optional expand for activation scale] ->mul -> mul.

We don't check if the activation is dynamically quantized or whether the weights are statically quantized, though (since the implementation won't have have any side-effects even if that wouldn't be true).

In practice, it also matches the smooth-quant int8 quantized linear pattern if its output is not reshaped (if activation is 2D).

More details

oneDNN int8 matmul supports application of per-channel weight scale but not a vector activation scale, which could be applied as a post op, but is currently unsupported in ATen. Bias addition (which could be supported with an add post-op) is also unfused.

The fusion pattern used in this PR is torch._intmm -> convert to FP32/BF16 ->mul, which will be replaced by oneDNN qlinear op.

The speedup over eager-mode is due to 2 reasons -

fusion of int8xint8 -> int32 GEMM, conversion to FP32/BF16 & application of weight scale. (In case of BF16, many intermediate conversions are also avoided).
weight is pre-packed & cached by Inductor, so a reorder is avoided at run-time.

But, in the future, the whole pattern (including application of activation scale, which would be a mul post-op) + bias could be fused if corresponding support would be enabled in ATen.

Verification

Added UT in this PR

python test/inductor/test_mkldnn_pattern_matcher.py -v -k test_da8w8_sym_act_sym_wgt_with_int_mm

Corresponding torchao UTs

int8 Smoothquant legacy API - TORCHINDUCTOR_FREEZING=1 TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+inductor" python test/integration/test_integration.py -v -k test_non_dynamically_quantizable_linear.
The difference from [Inductor][CPU] Fuse SmoothQuant int8 linear pattern #139595 is that there are no reshapes of the linear output in this pattern.
int8 da8w8 - symmetrically quantized activation (dynamically) & statically quantized weights - TORCH_COMPILE_DEBUG=1 TORCH_LOGS="+inductor" TORCHINDUCTOR_FREEZING=1 python test/integration/test_integration.py -v -k test_int8_dynamic_quant_subclass_api_0_cpu

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov

[ghstack-poisoned]

pytorch-bot · 2024-12-03T23:41:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142015

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 7dc7fa7 with merge base 7dfb439 ():

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu, unstable) (gh) (#141498)
convnext_base

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

test/inductor/test_mkldnn_pattern_matcher.py

[ghstack-poisoned]

ghstack-source-id: da92ae2 Pull Request resolved: #142015

torch/_inductor/fx_passes/quantization.py

leslie-fang-intel · 2024-12-04T06:07:44Z

torch/_inductor/fx_passes/quantization.py

+    # In practice, though, they may also match smooth-quant pattern when a 2D input shape would be used.
+    # Since add is not currently being used as a oneDNN post-op, but is unfused, we don't need these patterns with bias.
+    # Ideally, we should add mul + add post-op support in ATen int8 oneDNN linear op.
+    pattern1_with_no_outer_or_act_reshape = get_pattern_no_bias(


I have a question regarding to the bias. In @Xia-Weiwen's base PR, it handles 2 cases:

Case 1: when activation is per-tensor quant, the bias can be one of the inputs to qlinear.

Case 2: when activation is per-channel quant, the bias can't be fused and will exist as epilogue.

But here in this PR, we only register the pattern wo bias and may cause the difference of case 1. May I know the reason of this difference?

But here in this PR, we only register the pattern wo bias and may cause the difference of case 1. May I know the reason of this difference?

torchao int8_dynamic_activation_int8_weight API supports per-token quantization of activation, and not a scalar activation scale.

Please refer to https://github.com/pytorch/ao/blob/1a0dbf1c41ad1c6f28d6501e1134b30ea2f2590d/torchao/quantization/quant_api.py#L741-L746

Basically, in case of int8_dynamic_activation_int8_weight, the activation scale is a vector.

@leslie-fang-intel, if the case of smooth-quant with 2D activation & scalar activation scale is to be supported, then bias would also have to be in the pattern. Please let me know if its support also needs to be added.

[ghstack-poisoned]

ghstack-source-id: da13aa1 Pull Request resolved: #142015

leslie-fang-intel

LGTM

sanchitintel · 2024-12-04T17:52:19Z

@pytorchbot merge

pytorchmergebot · 2024-12-04T17:55:29Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-12-04T18:10:02Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

sanchitintel · 2024-12-04T18:10:49Z

The base ghstack PR changed, so need to create a new PR

sanchitintel · 2024-12-05T05:30:01Z

Opened #142110 with the new base PR.

Update

1281036

[ghstack-poisoned]

sanchitintel mentioned this pull request Dec 3, 2024

[Inductor][CPU] Fuse SmoothQuant int8 linear pattern #139595

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Dec 3, 2024

sanchitintel mentioned this pull request Dec 3, 2024

Add torchao int8 da8w8 sym act sym wgt linear pattern for CPU #141851

Closed

pytorchbot added the open source label Dec 3, 2024

sanchitintel requested review from Xia-Weiwen, jgong5 and leslie-fang-intel December 3, 2024 23:48

sanchitintel added the release notes: quantization release notes category label Dec 3, 2024

Update

6bcec9c

[ghstack-poisoned]

sanchitintel changed the title ~~[Inductor] Add torchao da8w8 pattern with sym quantized act & wgt~~ [Inductor][CPU] Add torchao da8w8 pattern with sym quantized act & wgt Dec 4, 2024

jgong5 reviewed Dec 4, 2024

View reviewed changes

test/inductor/test_mkldnn_pattern_matcher.py Outdated Show resolved Hide resolved

Update

951d1cd

[ghstack-poisoned]

sanchitintel added a commit that referenced this pull request Dec 4, 2024

[Inductor] Add torchao da8w8 pattern with sym quantized act & wgt

57f6833

ghstack-source-id: da92ae2 Pull Request resolved: #142015

sanchitintel requested a review from jgong5 December 4, 2024 02:45

leslie-fang-intel reviewed Dec 4, 2024

View reviewed changes

Update

7dc7fa7

[ghstack-poisoned]

sanchitintel added a commit that referenced this pull request Dec 4, 2024

[Inductor] Add torchao da8w8 pattern with sym quantized act & wgt

8709964

ghstack-source-id: da13aa1 Pull Request resolved: #142015

sanchitintel requested a review from leslie-fang-intel December 4, 2024 06:51

leslie-fang-intel approved these changes Dec 4, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 4, 2024

pytorchmergebot added the merging label Dec 4, 2024

jgong5 approved these changes Dec 5, 2024

View reviewed changes

sanchitintel closed this Dec 5, 2024

github-actions bot deleted the gh/sanchitintel/3/head branch January 5, 2025 02:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor][CPU] Add torchao da8w8 pattern with sym quantized act & wgt#142015

[Inductor][CPU] Add torchao da8w8 pattern with sym quantized act & wgt#142015
sanchitintel wants to merge 4 commits intogh/sanchitintel/3/basefrom
gh/sanchitintel/3/head

sanchitintel commented Dec 3, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Dec 3, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

leslie-fang-intel Dec 4, 2024

Uh oh!

sanchitintel Dec 4, 2024 •

edited

Loading

Uh oh!

sanchitintel Dec 4, 2024

Uh oh!

leslie-fang-intel left a comment

Uh oh!

sanchitintel commented Dec 4, 2024

Uh oh!

pytorchmergebot commented Dec 4, 2024

Uh oh!

pytorchmergebot commented Dec 4, 2024

Uh oh!

sanchitintel commented Dec 4, 2024

Uh oh!

sanchitintel commented Dec 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

sanchitintel commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

More details

Verification

Corresponding torchao UTs

Uh oh!

pytorch-bot bot commented Dec 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/142015

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

leslie-fang-intel Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

sanchitintel Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanchitintel Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

leslie-fang-intel left a comment

Choose a reason for hiding this comment

Uh oh!

sanchitintel commented Dec 4, 2024

Uh oh!

pytorchmergebot commented Dec 4, 2024

Merge started

Uh oh!

pytorchmergebot commented Dec 4, 2024

Uh oh!

sanchitintel commented Dec 4, 2024

Uh oh!

sanchitintel commented Dec 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sanchitintel commented Dec 3, 2024 •

edited

Loading

pytorch-bot bot commented Dec 3, 2024 •

edited

Loading

sanchitintel Dec 4, 2024 •

edited

Loading