[Inductor changes] Invoke Quant by eellison · Pull Request #139102 · pytorch/pytorch

eellison · 2024-10-28T20:34:47Z

Stack from ghstack (oldest at bottom):

Adds a invoke_quant higher order operator as proposed here.

The primary motivations are

Unifying scattered reasoning for quant operators throughout the code base

Easy of pattern matching - see this very large pattern match expression [here](

pytorch/torch/_inductor/fx_passes/post_grad.py

Lines 390 to 426 in 949fdd2

    
           @register_lowering_pattern( 
        
               CallFunction( 
        
                   aten.mm.default, 
        
                   KeywordArg("mat1"), 
        
                   CallFunction( 
        
                       aten.sub.Tensor, 
        
                       CallFunction( 
        
                           prims.convert_element_type.default, 
        
                           CallFunction( 
        
                               aten.reshape.default, 
        
                               CallFunction( 
        
                                   aten.cat.default, 
        
                                   ListOf( 
        
                                       CallFunction( 
        
                                           aten.bitwise_and.Scalar, 
        
                                           KeywordArg("mat2"), 
        
                                           0xF, 
        
                                       ), 
        
                                       # CallFunction( 
        
                                       #    aten.__rshift__.Scalar, 
        
                                       #    KeywordArg("mat2"), 
        
                                       #    4, 
        
                                       # ), 
        
                                       True, 
        
                                   ), 
        
                                   1, 
        
                               ), 
        
                               KeywordArg("mat2_mm_shape"), 
        
                           ), 
        
                           KeywordArg("mat2_dtype"), 
        
                       ), 
        
                       8, 
        
                   ), 
        
               ), 
        
               extra_check=cuda_and_enabled_mixed_mm_and_not_int8, 
        
           ) 
        
           def uint4x2_mixed_mm(match: Match, mat1, mat2, mat2_mm_shape, mat2_dtype):

. Compared to the pattern I have in the tests:

        @register_graph_pattern(
            CallFunction(
                torch.ops.aten.mm,
                CallFunction(
                    torch.ops.higher_order.invoke_quant,
                    Ignored(),
                    Ignored(),
                    Ignored(),
                    scheme="nf4",
                ),
                Arg(),
            ),
            pass_dict=test_pass,
        )

Ability to specify inductor specific logic, like codegen'ing the operators in lower precision, or forcing fusion to a matmul.

Example graph:

 ===== AFTER POST GRAD =====
 /data/users/eellison/pytorch/torch/fx/_lazy_graph_module.py class <lambda>(torch.nn.Module):
    def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"):
         # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self)  # type: ignore[call-arg]
        repeated_subgraph0 = self.repeated_subgraph0
        invoke_quant: "f32[8][1]cpu" = torch.ops.higher_order.invoke_quant(repeated_subgraph0, arg0_1, arg1_1, scheme = 'nf4');  repeated_subgraph0 = arg0_1 = arg1_1 = None
        return (invoke_quant,)
        
    class repeated_subgraph0(torch.nn.Module):
        def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"):
             # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self)  # type: ignore[call-arg]
            mul: "f32[8][1]cpu" = torch.ops.aten.mul.Tensor(arg0_1, arg1_1);  arg0_1 = None
            add: "f32[8][1]cpu" = torch.ops.aten.add.Tensor(mul, arg1_1);  mul = arg1_1 = None
            return add

The schema for invoke_quant is torch.ops.higher_order.invoke_quant(subgraph, *args, scheme=None) where the scheme will not always be present.

I wasn't sure exactly how the inductor specific configurations like codgen_in_low_precision should be passed through. I didnt want to stuff them all in as kwargs, and I didn't want to have them affect pattern matching. So they will be stored as meta of the node itself. And, following that, I wanted the invocation of the hop to match how it will show up in the graph. So I decided to have it be an object that is then invoked for the tracing.

invoke_quant = InvokeQuant(codegen_low_precision=True)
invoke_quant(gn, (x, y), scheme="nf4")

Todo - not require the packing of args in a tuple, will do following #139162.

Feedback welcome.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @aakhundov @rec

[ghstack-poisoned]

pytorch-bot · 2024-10-28T20:34:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139102

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 5 Pending

As of commit 0353760 with merge base 49082f9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov rec [ghstack-poisoned]

ghstack-source-id: 0632fb5 Pull Request resolved: #139102

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov rec [ghstack-poisoned]

ghstack-source-id: 2685546 Pull Request resolved: #139102

jerryzh168 · 2024-10-29T20:36:41Z

torch/_higher_order_ops/invoke_quant.py

+@dataclasses.dataclass(frozen=True)
+class InvokeQuant:
+    """
+    Invoke a quantization function that will be preserved as a single operator. Preservation


can you give some examples of quantization function here? are you referring exclusively to dequantize ops? or does it mean any quantization related functions like quantize_affine or quantized kernel as well?

I'm wondering if the higher order op has to mention quant in the name or it can be more general

Uh, maybe @drisspg can help with some fp8 prologue scaling functions we want fused. The uint4x2_mixed_mm added here is another example that @HDCharles added.

Yea, it does not specifically have to be quant. The general thing here is about :

a) preservation as a top level op and scheme tagging for special cased lowerings/pattern matching
b) specific inductor behaviors.

Some of the patterns seem pretty specific to quant/dequant.

Specifically I was envisioning:

codegen_low_precision

forcing fusion to mm (both as prologue and epilogue) / autotuning when not max-autotune

maybe there others that come up, not sure.

thanks, if it's not specific to quantization, would it be more descriptive to use a different name that doesn't contain "quant" in it

What would a different name be ? And what are the other use cases you're envisioning ?

I don't have new use cases.

For naming, I just feel mentioning "quant" in a higher order op is a bit weird, if this is the best name we have now that's fine too, something for consideration:
invoke_undecomposed_op / invoke_high_level_op

torch/_inductor/fx_passes/joint_graph.py

drisspg · 2024-10-31T18:43:57Z

torch/_inductor/fx_passes/joint_graph.py

        remove_redundant_views(gm)


+def canonicalize_quant_mapping(gm: torch.fx.GraphModule):


This feels weird

I am going to update this - to reviewers - let's skip this part because going to revise it in base commit. I am more looking for feedback on the API to users (test files) and API to developers (after this has occurred)

drisspg · 2024-10-31T18:50:49Z

it would be nice if we didnt have to hardcode the set of inductor configs on the class since there might be others that effect the subgraph (you are the expert here, but there could be new ones).

Also I am curious what the expected flow is for a user registering their own dequant scheme?

Chillee

Overall this API makes sense to me. I do think force_fuse_mm should probably just be "force fuse", but minor nits.

I also think we'll need a story on how to register dequant schemes for out-of-tree schemes.

Chillee · 2024-10-30T17:49:20Z

torch/_higher_order_ops/invoke_quant.py

+            max-autotune enabled.
+    """
+
+    codegen_low_precision: bool = True


These don't do anything yet, right?

[ghstack-poisoned]

ghstack-source-id: f716efd Pull Request resolved: #139102

[ghstack-poisoned]

Adds a `invoke_quant` higher order operator as proposed [here](https://docs.google.com/document/d/1s2PfJlq6Q1F8l11CkTIC69BW1rEnGEgs6YmBC7hu8rA/edit?tab=t.0). The primary motivations are - Unifying scattered reasoning for quant operators throughout the code base - Easy of pattern matching - see this very large pattern match expression [here](https://github.com/pytorch/pytorch/blob/949fdd299764d4fbefe1db093717786d946aaa60/torch/_inductor/fx_passes/post_grad.py#L390-L426. Compared to the pattern I have in the tests: ``` register_graph_pattern( CallFunction( torch.ops.aten.mm, CallFunction( torch.ops.higher_order.invoke_quant, Ignored(), Ignored(), Ignored(), scheme="nf4", ), Arg(), ), pass_dict=test_pass, ) ``` - Ability to specify inductor specific logic, like codegen'ing the operators in lower precision, or forcing fusion to a matmul. Example graph: ``` Python ===== AFTER POST GRAD ===== /data/users/eellison/pytorch/torch/fx/_lazy_graph_module.py class <lambda>(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self) # type: ignore[call-arg] repeated_subgraph0 = self.repeated_subgraph0 invoke_quant: "f32[8][1]cpu" = torch.ops.higher_order.invoke_quant(repeated_subgraph0, arg0_1, arg1_1, scheme = 'nf4'); repeated_subgraph0 = arg0_1 = arg1_1 = None return (invoke_quant,) class repeated_subgraph0(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self) # type: ignore[call-arg] mul: "f32[8][1]cpu" = torch.ops.aten.mul.Tensor(arg0_1, arg1_1); arg0_1 = None add: "f32[8][1]cpu" = torch.ops.aten.add.Tensor(mul, arg1_1); mul = arg1_1 = None return add ``` The schema for `invoke_quant` is `torch.ops.higher_order.invoke_quant(subgraph, *args, scheme=None)` where the scheme will not always be present. I wasn't sure exactly how the inductor specific configurations like `codgen_in_low_precision` should be passed through. I didnt want to stuff them all in as kwargs, and I didn't want to have them affect pattern matching. So they will be stored as meta of the node itself. And, following that, I wanted the invocation of the hop to match how it will show up in the graph. So I decided to have it be an object that is then invoked for the tracing. ``` invoke_quant = InvokeQuant(codegen_low_precision=True) invoke_quant(gn, (x, y), scheme="nf4") ``` Todo - not require the packing of args in a tuple, will do following #139162. Feedback welcome. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov rec [ghstack-poisoned]

eellison · 2025-01-31T23:05:56Z

@pytorchbot merge

pytorchmergebot · 2025-01-31T23:07:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-02-01T01:17:51Z

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-cuda12.4-py3.10-gcc9-sm89 / test (default, 5, 5, linux.g6.4xlarge.experimental.nvidia.gpu)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

[ghstack-poisoned]

eellison · 2025-02-07T23:52:42Z

@pytorchbot merge

pytorchmergebot · 2025-02-07T23:54:25Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-02-08T00:53:53Z

Merge failed

Reason: 1 jobs have failed, first few of them are: inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_amx, 2, 2, linux.8xlarge.amx)

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

eellison · 2025-02-08T01:41:07Z

@pytorchbot merge

pytorchmergebot · 2025-02-08T01:42:44Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-02-08T07:41:28Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

eellison · 2025-02-08T19:28:36Z

@pytorchbot merge -f "rocm hanging"

pytorchmergebot · 2025-02-08T19:30:06Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ly) more aggressive fusion (#145104) Respect invoke_quant low precision options, also, be more aggressive in attepmting fusion. Pull Request resolved: #145104 Approved by: https://github.com/shunting314, https://github.com/jansel ghstack dependencies: #139102

[Inductor changes] add invoke quant

7db81e8

[ghstack-poisoned]

eellison requested a review from zou3519 as a code owner October 28, 2024 20:34

eellison mentioned this pull request Oct 28, 2024

[Dynamo changes] Invoke Quant #139101

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels Oct 28, 2024

eellison changed the title ~~[Inductor changes] add invoke quant~~ [Inductor changes] Invoke Quant Oct 28, 2024

Update on "[Inductor changes] Invoke Quant"

c493a02

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov [ghstack-poisoned]

pytorch-bot bot added the module: dynamo label Oct 28, 2024

Update on "[Inductor changes] Invoke Quant"

90de51b

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov rec [ghstack-poisoned]

eellison added a commit that referenced this pull request Oct 28, 2024

[Inductor changes] add invoke quant

dcd0653

ghstack-source-id: 0632fb5 Pull Request resolved: #139102

Update on "[Inductor changes] Invoke Quant"

62f9e2d

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov rec [ghstack-poisoned]

eellison added a commit that referenced this pull request Oct 29, 2024

[Inductor changes] add invoke quant

6923289

ghstack-source-id: 2685546 Pull Request resolved: #139102

eellison requested review from Chillee, HDCharles, drisspg and jerryzh168 October 29, 2024 19:08

jerryzh168 reviewed Oct 29, 2024

View reviewed changes

Chillee reviewed Oct 29, 2024

View reviewed changes

torch/_inductor/fx_passes/joint_graph.py Outdated Show resolved Hide resolved

drisspg reviewed Oct 31, 2024

View reviewed changes

jerryzh168 mentioned this pull request Nov 5, 2024

[Inductor][CPU] Fuse SmoothQuant int8 linear pattern #139595

Closed

Chillee approved these changes Nov 5, 2024

View reviewed changes

jerryzh168 mentioned this pull request Nov 6, 2024

How to skip decomposition of dequantize_affine and quantize_affine custom ops in inductor? pytorch/ao#1230

Closed

zou3519 removed their request for review December 2, 2024 19:49

Update

5191eed

[ghstack-poisoned]

eellison added a commit that referenced this pull request Dec 10, 2024

[Inductor changes] add invoke quant

f628d39

ghstack-source-id: f716efd Pull Request resolved: #139102

Update

efee0f4

[ghstack-poisoned]

pytorchmergebot removed the merging label Jan 31, 2025

pytorchmergebot added the merging label Jan 31, 2025

pytorchmergebot removed the merging label Feb 1, 2025

eellison added 2 commits February 6, 2025 14:41

Update

4f2a3ba

[ghstack-poisoned]

Update

28c7e6c

[ghstack-poisoned]

pytorchmergebot added the merging label Feb 7, 2025

pytorchmergebot removed the merging label Feb 8, 2025

Update

0353760

[ghstack-poisoned]

pytorchmergebot added the merging label Feb 8, 2025

pytorchmergebot added the Merged label Feb 8, 2025

pytorchmergebot closed this in 92b7e61 Feb 8, 2025

pytorchmergebot removed the merging label Feb 8, 2025

github-actions bot deleted the gh/eellison/711/head branch March 11, 2025 02:08

	@register_lowering_pattern(
	CallFunction(
	aten.mm.default,
	KeywordArg("mat1"),
	CallFunction(
	aten.sub.Tensor,
	CallFunction(
	prims.convert_element_type.default,
	CallFunction(
	aten.reshape.default,
	CallFunction(
	aten.cat.default,
	ListOf(
	CallFunction(
	aten.bitwise_and.Scalar,
	KeywordArg("mat2"),
	0xF,
	),
	# CallFunction(
	# aten.__rshift__.Scalar,
	# KeywordArg("mat2"),
	# 4,
	# ),
	True,
	),
	1,
	),
	KeywordArg("mat2_mm_shape"),
	),
	KeywordArg("mat2_dtype"),
	),
	8,
	),
	),
	extra_check=cuda_and_enabled_mixed_mm_and_not_int8,
	)
	def uint4x2_mixed_mm(match: Match, mat1, mat2, mat2_mm_shape, mat2_dtype):

		remove_redundant_views(gm)


		def canonicalize_quant_mapping(gm: torch.fx.GraphModule):

Conversation

eellison commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139102

⏳ No Failures, 5 Pending

Uh oh!

jerryzh168 Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

eellison Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

eellison Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

drisspg Oct 31, 2024

Choose a reason for hiding this comment

Uh oh!

eellison Oct 31, 2024

Choose a reason for hiding this comment

Uh oh!

drisspg commented Oct 31, 2024

Uh oh!

Chillee left a comment

Choose a reason for hiding this comment

Uh oh!

Chillee Oct 30, 2024

Choose a reason for hiding this comment

Uh oh!

eellison commented Jan 31, 2025

Uh oh!

pytorchmergebot commented Jan 31, 2025

Merge started

Uh oh!

pytorchmergebot commented Feb 1, 2025

Merge failed

Uh oh!

eellison commented Feb 7, 2025

Uh oh!

pytorchmergebot commented Feb 7, 2025

Merge started

Uh oh!

pytorchmergebot commented Feb 8, 2025

Merge failed

Uh oh!

eellison commented Feb 8, 2025

Uh oh!

pytorchmergebot commented Feb 8, 2025

Merge started

Uh oh!

pytorchmergebot commented Feb 8, 2025

Uh oh!

eellison commented Feb 8, 2025

Uh oh!

pytorchmergebot commented Feb 8, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

eellison commented Oct 28, 2024 •

edited

Loading

pytorch-bot bot commented Oct 28, 2024 •

edited

Loading

jerryzh168 Oct 29, 2024 •

edited

Loading