[invoke_subgraph] User facing API to support arbitrary args and kwargs by anijain2305 · Pull Request #139162 · pytorch/pytorch

anijain2305 · 2024-10-29T06:41:53Z

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames @rec

[ghstack-poisoned]

pytorch-bot · 2024-10-29T06:41:56Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139162

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 60da56e with merge base bf1b8ad ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: d80f3db Pull Request resolved: #139162

…s and kwargs" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

ghstack-source-id: 8ce9ef7 Pull Request resolved: #139162

zou3519 · 2024-10-31T13:36:44Z

torch/_higher_order_ops/invoke_subgraph.py

+invoke_subgraph_placeholder = InvokeSubgraphPlaceholderHOP()
+
+
+def wrap_with_invoke_subgraph(fn=None):


Since this is the user-facing API: can we bikeshed the name? I imagine this API will end up in the torch.compiler namespace.

torch.compiler.mark_subregion ?

torch.compiler.set_inline(False) ?

Yes, but not right now. I want to see some wins, before introducing it in the compiler namespace.

I am imagining something like torch.compile to torch._dynamo.optimizer type of mapping, when we have established that this HOP is indeed useful in practice.

Can we name the function the name we want to see in the future? If we want to call it mark_subregion or set_inline in the future, then let's name it that now instead of wrap_with_invoke_subgraph. (Names tend to stick and "wrap_with_invoke_subgraph" sounds too much like an implementation detail)

Alright, calling it mark_compile_region.

zou3519 · 2024-10-31T13:38:23Z

torch/_higher_order_ops/invoke_subgraph.py

        return super().__call__(subgraph, identifier, operands)


+class InvokeSubgraphPlaceholderHOP(HigherOrderOperator):


Does this need to be a HOP? For torch.utils.checkpoint, we transformed the torch.utils.checkpoint function into the checkpoint HOP. So we can probably get away with this just being an invoke_subgraph_placeholder function

zou3519 · 2024-10-31T13:39:21Z

test/higher_order_ops/test_invoke_subgraph.py

    def test_multiple(self):
        n_layers = 2

+        @wrap_with_invoke_subgraph


Can we test applying this to an nn.Module?

Good idea! I added a test - test_module

zou3519 · 2024-10-31T13:40:13Z

torch/_higher_order_ops/invoke_subgraph.py

+    def wrap(func):
+        def inner(*args, **kwargs):
+            if torch._dynamo.is_compiling():
+                return invoke_subgraph_placeholder(func, *args, **kwargs)


Are there any constraints to the input types of (args, kwargs) at all?

I don't think so. Dynamo will traverse through the user defined objects here and pull up the tensors from the underlying datastructures to feed to the invoke_subgraph HOP

zou3519 · 2024-10-31T13:43:19Z

torch/_higher_order_ops/invoke_subgraph.py

+    This is a user facing API to wrap the function with invoke_subgraph HOP. For
+    PyTorch eager, this is a no-op. For torch.compile, we wrap the given
+    function into invoke_subgraph_placeholder, which is parsed by Dynamo and
+    replaced by invoke_subgraph.


this is a developer-facing docstring, we should make it more user-facing. "Use mark_subregion (or whatever we call it) to decorate a region so that torch.compile attempts to only compile it once (instead of inlining it multiple times).

Under the hood, this creates an invoke_subgraph HOP..."

Sounds good. But, I am thinking of this as private API. The real-public API will be in compiler namespace. I can add more docs there.

zou3519

left some questions but overall seems good

…s and kwargs" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

anijain2305 · 2024-11-02T22:40:27Z

@zou3519 This is ready for another round of review.

…s and kwargs" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

pytorchmergebot · 2024-11-08T00:15:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch#139162) Pull Request resolved: pytorch#139162 Approved by: https://github.com/zou3519

Pull Request resolved: #140058 Approved by: https://github.com/ydwu4, https://github.com/eellison ghstack dependencies: #139162

Summary: X-link: pytorch/pytorch#139162 Approved by: https://github.com/zou3519 Reviewed By: ZainRizvi Differential Revision: D65661188 Pulled By: anijain2305 fbshipit-source-id: 5a090d84e22d020552a4901aacab60aabdc94d24

pytorch#139162) Pull Request resolved: pytorch#139162 Approved by: https://github.com/zou3519

Pull Request resolved: pytorch#140058 Approved by: https://github.com/ydwu4, https://github.com/eellison ghstack dependencies: pytorch#139162

Adds a `invoke_quant` higher order operator as proposed [here](https://docs.google.com/document/d/1s2PfJlq6Q1F8l11CkTIC69BW1rEnGEgs6YmBC7hu8rA/edit?tab=t.0). The primary motivations are - Unifying scattered reasoning for quant operators throughout the code base - Easy of pattern matching - see this very large pattern match expression [here](https://github.com/pytorch/pytorch/blob/949fdd299764d4fbefe1db093717786d946aaa60/torch/_inductor/fx_passes/post_grad.py#L390-L426. Compared to the pattern I have in the tests: ``` register_graph_pattern( CallFunction( torch.ops.aten.mm, CallFunction( torch.ops.higher_order.invoke_quant, Ignored(), Ignored(), Ignored(), scheme="nf4", ), Arg(), ), pass_dict=test_pass, ) ``` - Ability to specify inductor specific logic, like codegen'ing the operators in lower precision, or forcing fusion to a matmul. Example graph: ``` Python ===== AFTER POST GRAD ===== /data/users/eellison/pytorch/torch/fx/_lazy_graph_module.py class <lambda>(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self) # type: ignore[call-arg] repeated_subgraph0 = self.repeated_subgraph0 invoke_quant: "f32[8][1]cpu" = torch.ops.higher_order.invoke_quant(repeated_subgraph0, arg0_1, arg1_1, scheme = 'nf4'); repeated_subgraph0 = arg0_1 = arg1_1 = None return (invoke_quant,) class repeated_subgraph0(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self) # type: ignore[call-arg] mul: "f32[8][1]cpu" = torch.ops.aten.mul.Tensor(arg0_1, arg1_1); arg0_1 = None add: "f32[8][1]cpu" = torch.ops.aten.add.Tensor(mul, arg1_1); mul = arg1_1 = None return add ``` The schema for `invoke_quant` is `torch.ops.higher_order.invoke_quant(subgraph, *args, scheme=None)` where the scheme will not always be present. I wasn't sure exactly how the inductor specific configurations like `codgen_in_low_precision` should be passed through. I didnt want to stuff them all in as kwargs, and I didn't want to have them affect pattern matching. So they will be stored as meta of the node itself. And, following that, I wanted the invocation of the hop to match how it will show up in the graph. So I decided to have it be an object that is then invoked for the tracing. ``` invoke_quant = InvokeQuant(codegen_low_precision=True) invoke_quant(gn, (x, y), scheme="nf4") ``` Todo - not require the packing of args in a tuple, will do following #139162. Feedback welcome. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang aakhundov rec [ghstack-poisoned]

Adds a `invoke_quant` higher order operator as proposed [here](https://docs.google.com/document/d/1s2PfJlq6Q1F8l11CkTIC69BW1rEnGEgs6YmBC7hu8rA/edit?tab=t.0). The primary motivations are - Unifying scattered reasoning for quant operators throughout the code base - Easy of pattern matching - see this very large pattern match expression [here](https://github.com/pytorch/pytorch/blob/949fdd299764d4fbefe1db093717786d946aaa60/torch/_inductor/fx_passes/post_grad.py#L390-L426. Compared to the pattern I have in the tests: ``` @register_graph_pattern( CallFunction( torch.ops.aten.mm, CallFunction( torch.ops.higher_order.invoke_quant, Ignored(), Ignored(), Ignored(), scheme="nf4", ), Arg(), ), pass_dict=test_pass, ) ``` - Ability to specify inductor specific logic, like codegen'ing the operators in lower precision, or forcing fusion to a matmul. Example graph: ``` Python ===== AFTER POST GRAD ===== /data/users/eellison/pytorch/torch/fx/_lazy_graph_module.py class <lambda>(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self) # type: ignore[call-arg] repeated_subgraph0 = self.repeated_subgraph0 invoke_quant: "f32[8][1]cpu" = torch.ops.higher_order.invoke_quant(repeated_subgraph0, arg0_1, arg1_1, scheme = 'nf4'); repeated_subgraph0 = arg0_1 = arg1_1 = None return (invoke_quant,) class repeated_subgraph0(torch.nn.Module): def forward(self, arg0_1: "f32[8][1]cpu", arg1_1: "f32[8][1]cpu"): # File: /data/users/eellison/pytorch/torch/_higher_order_ops/invoke_quant.py:87 in __call__, code: return invoke_quant_tracer(*args, **kwargs, quant_options=self) # type: ignore[call-arg] mul: "f32[8][1]cpu" = torch.ops.aten.mul.Tensor(arg0_1, arg1_1); arg0_1 = None add: "f32[8][1]cpu" = torch.ops.aten.add.Tensor(mul, arg1_1); mul = arg1_1 = None return add ``` The schema for `invoke_quant` is `torch.ops.higher_order.invoke_quant(subgraph, *args, scheme=None)` where the scheme will not always be present. I wasn't sure exactly how the inductor specific configurations like `codgen_in_low_precision` should be passed through. I didnt want to stuff them all in as kwargs, and I didn't want to have them affect pattern matching. So they will be stored as meta of the node itself. And, following that, I wanted the invocation of the hop to match how it will show up in the graph. So I decided to have it be an object that is then invoked for the tracing. ``` invoke_quant = InvokeQuant(codegen_low_precision=True) invoke_quant(gn, (x, y), scheme="nf4") ``` Todo - not require the packing of args in a tuple, will do following #139162. Feedback welcome. Pull Request resolved: #139102 Approved by: https://github.com/Chillee

[invoke_subgraph] User facing API to support arbitrary args and kwargs

8ce4623

[ghstack-poisoned]

anijain2305 requested a review from zou3519 as a code owner October 29, 2024 06:41

This was referenced Oct 29, 2024

[invoke_subgraph][aot_autograd_cache] Cache AC HOP #139094

Closed

[invoke_subgraph] Generate fake_inputs correctly #139130

Closed

pytorch-bot bot added ciflow/inductor module: dynamo labels Oct 29, 2024

anijain2305 added a commit that referenced this pull request Oct 29, 2024

[invoke_subgraph] User facing API to support arbitrary args and kwargs

7731e34

ghstack-source-id: d80f3db Pull Request resolved: #139162

Update on "[invoke_subgraph] User facing API to support arbitrary arg…

caee872

…s and kwargs" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

anijain2305 added a commit that referenced this pull request Oct 29, 2024

[invoke_subgraph] User facing API to support arbitrary args and kwargs

ac21b43

ghstack-source-id: 8ce9ef7 Pull Request resolved: #139162

eellison mentioned this pull request Oct 29, 2024

[Inductor changes] Invoke Quant #139102

Closed

zou3519 reviewed Oct 31, 2024

View reviewed changes

anijain2305 added 4 commits November 1, 2024 10:54

Update on "[invoke_subgraph] User facing API to support arbitrary arg…

08abeab

…s and kwargs" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

Update on "[invoke_subgraph] User facing API to support arbitrary arg…

29ae9bc

…s and kwargs" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

Update on "[invoke_subgraph] User facing API to support arbitrary arg…

1f5300e

…s and kwargs" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

Update on "[invoke_subgraph] User facing API to support arbitrary arg…

8779e35

…s and kwargs" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

Update on "[invoke_subgraph] User facing API to support arbitrary arg…

322b2d8

…s and kwargs" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang amjames rec [ghstack-poisoned]

anijain2305 added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Nov 3, 2024

pytorchmergebot added the merging label Nov 8, 2024

anijain2305 mentioned this pull request Nov 8, 2024

[inductor][invoke_subgraph] Fix SDPA seed/offset issue #140070

Closed

pytorchmergebot added the Merged label Nov 8, 2024

pytorchmergebot closed this in 86792a5 Nov 8, 2024

pytorchmergebot removed the merging label Nov 8, 2024

frances720 pushed a commit to Promptless/pytorch-test that referenced this pull request Nov 8, 2024

[invoke_subgraph] User facing API to support arbitrary args and kwargs (

31f7379

pytorch#139162) Pull Request resolved: pytorch#139162 Approved by: https://github.com/zou3519

frances720 mentioned this pull request Nov 8, 2024

[invoke_subgraph] User facing API to support arbitrary args and kwarg… Promptless/pytorch-test#11

Open

pytorchmergebot pushed a commit that referenced this pull request Nov 11, 2024

[invoke_subgraph] Support symint/int as inputs (#140058)

5f7ea7c

Pull Request resolved: #140058 Approved by: https://github.com/ydwu4, https://github.com/eellison ghstack dependencies: #139162

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024

[invoke_subgraph] User facing API to support arbitrary args and kwargs (

cf49f2e

pytorch#139162) Pull Request resolved: pytorch#139162 Approved by: https://github.com/zou3519

github-actions bot deleted the gh/anijain2305/568/head branch December 9, 2024 02:15

		invoke_subgraph_placeholder = InvokeSubgraphPlaceholderHOP()


		def wrap_with_invoke_subgraph(fn=None):

		return super().__call__(subgraph, identifier, operands)


		class InvokeSubgraphPlaceholderHOP(HigherOrderOperator):

Conversation

anijain2305 commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/139162

✅ No Failures

Uh oh!

zou3519 Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

anijain2305 Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

anijain2305 commented Nov 2, 2024

Uh oh!

pytorchmergebot commented Nov 8, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anijain2305 commented Oct 29, 2024 •

edited

Loading

pytorch-bot bot commented Oct 29, 2024 •

edited

Loading

zou3519 Oct 31, 2024 •

edited

Loading

anijain2305 Nov 1, 2024 •

edited

Loading