Remove getitem special handling in the partitioner by IvanYashchuk · Pull Request #87073 · pytorch/pytorch

IvanYashchuk · 2022-10-17T11:35:21Z

This special handling of getitem unnecessary splits fusions at functions with tuple outputs.

Example script:

import torch
from torch.fx.passes.infra.partitioner import CapabilityBasedPartitioner
from torch._prims.nvfuser_executor import NvfuserPrimOperatorSupport
from torch.fx.experimental.proxy_tensor import make_fx

def func(x):
    xx = torch.ops.nvprims.add(x, 1)
    var, mean = torch.ops.nvprims.var_mean(x, correction=0)
    var_cos = torch.ops.nvprims.cos(var)
    mean_sin = torch.ops.nvprims.sin(mean)
    return torch.ops.nvprims.add(var_cos, mean_sin)

a = torch.randn(5, 3, 3, device="cuda")
gm = make_fx(func)(a)
gm.graph.print_tabular()

supported_ops = NvfuserPrimOperatorSupport()
partitioner = CapabilityBasedPartitioner(
    gm, supported_ops, allows_single_node_partition=False
)
partitions = partitioner.propose_partitions()
print(partitions)
partitioned_graph = partitioner.fuse_partitions(partitions)
partitioned_graph.graph.print_tabular()

Output on master:

opcode         name       target                       args              kwargs
-------------  ---------  ---------------------------  ----------------  -----------------
placeholder    x_1        x_1                          ()                {}
call_function  add        nvprims.add.default          (x_1, 1)          {}
call_function  var_mean   nvprims.var_mean.main        (x_1, [0, 1, 2])  {'correction': 0}
call_function  getitem    <built-in function getitem>  (var_mean, 0)     {}
call_function  getitem_1  <built-in function getitem>  (var_mean, 1)     {}
call_function  cos        nvprims.cos.default          (getitem,)        {}
call_function  sin        nvprims.sin.default          (getitem_1,)      {}
call_function  add_1      nvprims.add.default          (cos, sin)        {}
output         output     output                       (add_1,)          {}
[{cos, sin, add_1}, {var_mean, add, getitem, getitem_1}]
opcode         name       target                       args                    kwargs
-------------  ---------  ---------------------------  ----------------------  --------
placeholder    x_1        x_1                          ()                      {}
call_module    fused_1    fused_1                      (x_1,)                  {}
call_function  getitem_2  <built-in function getitem>  (fused_1, 0)            {}
call_function  getitem_3  <built-in function getitem>  (fused_1, 1)            {}
call_module    fused_0    fused_0                      (getitem_2, getitem_3)  {}
output         output     output                       (fused_0,)              {}

Output with this PR:

[{var_mean, add_1, cos, sin, add, getitem_1, getitem}]
opcode       name     target    args        kwargs
-----------  -------  --------  ----------  --------
placeholder  x_1      x_1       ()          {}
call_module  fused_0  fused_0   (x_1,)      {}
output       output   output    (fused_0,)  {}

This special handling of getitem unnecessary splits fusions at functions with tuple outputs.

pytorch-bot · 2022-10-17T11:35:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87073

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9ce171f:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ngimel · 2022-10-17T16:46:36Z

cc @SherlockNoMad, why was this special handling needed, and why does it split fusions?

jjsjann123

LGTM.

jjsjann123 · 2022-10-17T16:51:07Z

torch/fx/passes/infra/partitioner.py

                    # this is a no-op
                    maybe_merge_partition(self_id, other_id)

-        # post processing to re-assign "getitem" nodes into upstream partition


nitpick, should we remove this part?

This was supposed to fuse getitem to producer, which is a fusion, before we have getitem disabled in the line you commented above. With that logic removed, I believe this section would remove getitem node at the beginning of the fusion, if it comes from an unfused node.

Keeping this code works with the example I added to the PR description, but I was hitting the assert in merge_single_node in one benchmark model. I should extract that failing graph portion for testing.

Got ya. merge_single_node might be asserting too aggressively as well.

pytorch/torch/fx/passes/infra/partitioner.py

Line 108 in 8393213

assert node not in assignment

This assert should be removed along with you update on getitem logic.

Keeping this code python -m pytest test/test_fx_passes.py -k "test_partitioner_fn_" -vvv fails with the assert

Traceback (most recent call last): File "/home/iyashchuk/dev/pytorch/master/test/test_fx_passes.py", line 224, in test_partitioner partitions = partitioner.propose_partitions() File "/home/iyashchuk/dev/pytorch/master/torch/fx/passes/infra/partitioner.py", line 162, in propose_partitions merge_single_node(node, id) File "/home/iyashchuk/dev/pytorch/master/torch/fx/passes/infra/partitioner.py", line 108, in merge_single_node assert node not in assignment AssertionError

Removing this code the test still fails (as in CI) but the failure is expected now:

____ TestFXGraphPasses.test_partitioner_fn_<function TestPartitionFunctions_forward13 at 0x7f08e8fdfd00>_expected_partition_[['add_2', 'add_1', 'add']] ____ Traceback (most recent call last): File "/home/iyashchuk/dev/pytorch/master/test/test_fx_passes.py", line 229, in test_partitioner assert set(partitions_name[i]) == set(expected_partition[i]) AssertionError: assert {'add', 'getitem_2', 'getitem', 'getitem_1', 'getitem_3', 'add_2', 'add_1'} == {'add_2', 'add_1', 'add'} Extra items in the left set: 'getitem_2' 'getitem_3' 'getitem_1' 'getitem' Full diff: - {'add_2', 'add_1', 'add'} + {'add', 'getitem_2', 'getitem', 'getitem_1', 'getitem_3', 'add_2', 'add_1'}

The failing test was added in https://github.com/pytorch/pytorch/pull/86713 😉

I was suggesting keeping this code and remove the assert.

Keeping the code and removing the assert gives the following partitions in that test:

[{getitem, add_1, add_2, getitem_2, getitem_3, getitem_1, add}, {getitem_3, getitem_2, getitem, getitem_1}]

Oops, that's surprising to me. 😆

How can we have getitem spanning acorss multiple partition? I think there's just a small bug somewhere. Looks like it's attempting to clear getitem from the original partition... But somehow that wasn't done right. I can take a quick look at this afterwards.

jjsjann123 · 2022-10-17T19:53:02Z

torch/_prims/nvfuser_executor.py

+        if "getitem" in node.name:
+            # Check if the node unpacks a tuple from a supported node
+            node_to_unpack = node.args[0]
+            return self.is_node_supported(submodules, node_to_unpack)


I'm slightly leaning towards clean up getitem at the beginning of the graph in partitioner, instead of having more complicated logic in op support query. For now it works fine, since partitioner goes from consumer to producer and we don't really fuse anything as we propose partition group. But that implementation could change 😦

It's of course a matter of taste. I think special casing getitem and processing it differently may cause bugs. There are two situations with getitem it's either unpack a node that's supported or not, and we can't unconditionally either reject or accept getitem. Previously we have already relied on the partitioner (always rejecting getitem) to do the right thing and it didn't.

SherlockNoMad · 2022-10-17T22:23:14Z

test/test_fx_passes.py


        # 5 getitem special case
-        (TestPartitionFunctions.forward13, [["add_2", "add_1", "add"]]),
+        (TestPartitionFunctions.forward13, [["add_2", "add_1", "add", "getitem", "getitem_1", "getitem_2", "getitem_3"]]),


Hi @IvanYashchuk, I think this is an unexpected change.
getitem node should always be partitioned together with its producer node.
Across the stack, we have an implicit assumption that module's input and output must be tensor type. This is why we have the special handling logic in the first place.

This doesn't look right even for this PR. getitem is produced by split, which isn't supported by the fusion node?

SherlockNoMad · 2022-10-17T22:27:11Z

Given the example, getitem_2, getitem_3 should be partitioned into fused_1.

SherlockNoMad

getitem should always be partitioned together with its producer node.

jjsjann123 · 2022-10-18T00:33:36Z

getitem should always be partitioned together with its producer node.

@SherlockNoMad The problem here is that the original logic special case getitem after fusion partition has been proposed. Which resulted in us always segment the graph across getitem nodes.
i.e. if you have a var_mean where it's output is used by another fusion-supported op, the getitem node that's used to unpack the tuple output from var_mean stops us from fusing the two nodes.

Changes in this PR is supposed to fix that.

IvanYashchuk · 2022-10-18T07:17:01Z

getitem should always be partitioned together with its producer node.

I agree with you, but the previous code was doing that too aggressively and incorrectly. getitem shouldn't be unconditionally accepted nor rejected, it should be tied to its producer when deciding whether it's supported or not.

Please take a look at the latest changes.

…rtitioner

jjsjann123 · 2022-10-18T10:54:18Z

getitem should always be partitioned together with its producer node.

I agree with you, but the previous code was doing that too aggressively and incorrectly. getitem shouldn't be unconditionally accepted nor rejected, it should be tied to its producer when deciding whether it's supported or not.

Please take a look at the latest changes.

On this topic, could you consider changes like this one?
csarofeen@9e9934e

We are merging things as is provided by op support list, and at the end of fusion, we clean up getitem node and keep them with their producer.

Sorry that my previous refactor leaves the code in an ugly state. 😝

jjsjann123 · 2022-10-18T20:09:36Z

On this topic, could you consider changes like this one? csarofeen@9e9934e

Seems like we are not the sole user of fx partitioner. #87007

if getitem is relied by multiple backends, maybe it's a better that we keep the logic on merging getitem in op support simpler and have a common post processing pass to clean it up. I'm shamelessly promoting the patch I have in the commit above again :)

I'm not sure if the discussion here concerns you at all @wschin, would an getitem node at the beginning of partition break your parser?

wschin · 2022-10-18T22:10:53Z

On this topic, could you consider changes like this one? csarofeen@9e9934e

Seems like we are not the sole user of fx partitioner. #87007

if getitem is relied by multiple backends, maybe it's a better that we keep the logic on merging getitem in op support simpler and have a common post processing pass to clean it up. I'm shamelessly promoting the patch I have in the commit above again :)

I'm not sure if the discussion here concerns you at all @wschin, would an getitem node at the beginning of partition break your parser?

Some getitems in the beginning and at the end should be fine. I just hope most of the computation can be partitioned into a single torch.fx.GraphModule. Having non-computation ops before and after the major computation should be fine; of course, those non-computation ops should be as less as possible. Many thanks!

Getitem patch

Placeholder nodes called "getitem_XXX" were incorrectly dropped from the graph.

IvanYashchuk · 2022-10-20T17:50:02Z

@SherlockNoMad could you please take another look at the proposed changes?

SherlockNoMad · 2022-10-25T21:12:45Z

torch/fx/passes/infra/partitioner.py


-            assignment[node] = id
-            if id not in partitions_by_id:
+            if id is None:


Do we really have case where id is None？
Looking at the call site for this function, it doesn't seem to have any None case...

If this cannot be None, let assert id is not None.

Yes, this happens naturally when getitem is marked as supported by backends.

Later during the special handling of getitem, where we merge each getitem calls to its producer, we could run into cases where the producer is not supported by backends, but we accidentally merged getitem into the fusion. We'll ended up pulling these nodes out and that's when id here would be None.

pytorch/torch/fx/passes/infra/partitioner.py

Line 167 in 7d3fbd7

merge_single_node(node, id)

This is checked in test case forward13

SherlockNoMad

LGTM, expect for the minor comment.

jjsjann123 · 2022-10-25T22:55:19Z

Thanks to @SherlockNoMad 's stamp.

Failure seems to be on CI nodes and unrelated. Let's rebase to a stable commit and merge it! 🎉 🎉 🎉

…rtitioner

IvanYashchuk · 2022-10-26T14:17:04Z

@pytorchbot merge -g

pytorchmergebot · 2022-10-26T14:18:41Z

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This special handling of getitem unnecessary splits fusions at functions with tuple outputs. Example script: ```py import torch from torch.fx.passes.infra.partitioner import CapabilityBasedPartitioner from torch._prims.nvfuser_executor import NvfuserPrimOperatorSupport from torch.fx.experimental.proxy_tensor import make_fx def func(x): xx = torch.ops.nvprims.add(x, 1) var, mean = torch.ops.nvprims.var_mean(x, correction=0) var_cos = torch.ops.nvprims.cos(var) mean_sin = torch.ops.nvprims.sin(mean) return torch.ops.nvprims.add(var_cos, mean_sin) a = torch.randn(5, 3, 3, device="cuda") gm = make_fx(func)(a) gm.graph.print_tabular() supported_ops = NvfuserPrimOperatorSupport() partitioner = CapabilityBasedPartitioner( gm, supported_ops, allows_single_node_partition=False ) partitions = partitioner.propose_partitions() print(partitions) partitioned_graph = partitioner.fuse_partitions(partitions) partitioned_graph.graph.print_tabular() ``` Output on master: ```py opcode name target args kwargs ------------- --------- --------------------------- ---------------- ----------------- placeholder x_1 x_1 () {} call_function add nvprims.add.default (x_1, 1) {} call_function var_mean nvprims.var_mean.main (x_1, [0, 1, 2]) {'correction': 0} call_function getitem <built-in function getitem> (var_mean, 0) {} call_function getitem_1 <built-in function getitem> (var_mean, 1) {} call_function cos nvprims.cos.default (getitem,) {} call_function sin nvprims.sin.default (getitem_1,) {} call_function add_1 nvprims.add.default (cos, sin) {} output output output (add_1,) {} [{cos, sin, add_1}, {var_mean, add, getitem, getitem_1}] opcode name target args kwargs ------------- --------- --------------------------- ---------------------- -------- placeholder x_1 x_1 () {} call_module fused_1 fused_1 (x_1,) {} call_function getitem_2 <built-in function getitem> (fused_1, 0) {} call_function getitem_3 <built-in function getitem> (fused_1, 1) {} call_module fused_0 fused_0 (getitem_2, getitem_3) {} output output output (fused_0,) {} ``` Output with this PR: ``` [{var_mean, add_1, cos, sin, add, getitem_1, getitem}] opcode name target args kwargs ----------- ------- -------- ---------- -------- placeholder x_1 x_1 () {} call_module fused_0 fused_0 (x_1,) {} output output output (fused_0,) {} ``` Pull Request resolved: pytorch#87073 Approved by: https://github.com/jjsjann123, https://github.com/SherlockNoMad

Remove getitem special handling in the partitioner

8393213

This special handling of getitem unnecessary splits fusions at functions with tuple outputs.

IvanYashchuk added the module: fx.passes Optimization passes written in FX (don't forget to select a more specific label) label Oct 17, 2022

IvanYashchuk requested review from SherlockNoMad, jjsjann123 and ngimel October 17, 2022 11:35

pytorch-bot bot added the release notes: fx release notes category label Oct 17, 2022

pytorchbot added the open source label Oct 17, 2022

IvanYashchuk marked this pull request as draft October 17, 2022 14:04

jjsjann123 approved these changes Oct 17, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 17, 2022

Fix getitem test

cc8f987

IvanYashchuk marked this pull request as ready for review October 17, 2022 18:36

getitem is supported only if node to be unpacked is supported by nvFuser

1d2af15

jjsjann123 reviewed Oct 17, 2022

View reviewed changes

SherlockNoMad reviewed Oct 17, 2022

View reviewed changes

SherlockNoMad requested changes Oct 17, 2022

View reviewed changes

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 18, 2022

SherlockNoMad mentioned this pull request Oct 18, 2022

Allow custom backend to execute getitem when they want #87007

Closed

IvanYashchuk added 2 commits October 18, 2022 10:02

mypy fix

29b9c51

Accept getitem node only if its producer is supported

94ca8d8

Merge remote-tracking branch 'upstream/viable/strict' into getitem-pa…

8a68271

…rtitioner

Bring back nodes reassignment code

880d1fc

IvanYashchuk and others added 6 commits October 19, 2022 16:26

Add test with var_mean

6f484a2

updated getitem handling & new tests

40b25d8

fixing import

9495f79

fixing tests

dfe835e

Merge pull request #4 from jjsjann123/getitem_patch

59025b2

Getitem patch

Fix getitem handling

6d90cc1

Placeholder nodes called "getitem_XXX" were incorrectly dropped from the graph.

Update nvfuser_executor.py

7d3fbd7

jjsjann123 requested a review from SherlockNoMad October 25, 2022 17:03

SherlockNoMad reviewed Oct 25, 2022

View reviewed changes

SherlockNoMad approved these changes Oct 25, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/viable/strict' into getitem-pa…

9ce171f

…rtitioner

IvanYashchuk added the topic: not user facing topic category label Oct 26, 2022

pytorchmergebot added the Merged label Oct 26, 2022

pytorchmergebot closed this in 72f446b Oct 26, 2022

IvanYashchuk deleted the getitem-partitioner branch October 26, 2022 16:13

Conversation

IvanYashchuk commented Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87073

✅ No Failures

Uh oh!

ngimel commented Oct 17, 2022

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SherlockNoMad commented Oct 17, 2022

Uh oh!

SherlockNoMad left a comment

Choose a reason for hiding this comment

Uh oh!

jjsjann123 commented Oct 18, 2022

Uh oh!

IvanYashchuk commented Oct 18, 2022

Uh oh!

jjsjann123 commented Oct 18, 2022

Uh oh!

jjsjann123 commented Oct 18, 2022

Uh oh!

wschin commented Oct 18, 2022

Uh oh!

IvanYashchuk commented Oct 20, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SherlockNoMad left a comment

Choose a reason for hiding this comment

Uh oh!

jjsjann123 commented Oct 25, 2022

Uh oh!

IvanYashchuk commented Oct 26, 2022

Uh oh!

pytorchmergebot commented Oct 26, 2022

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

IvanYashchuk commented Oct 17, 2022 •

edited

Loading

pytorch-bot bot commented Oct 17, 2022 •

edited

Loading