Allow custom backend to execute getitem when they want#87007
Allow custom backend to execute getitem when they want#87007wschin wants to merge 1 commit intopytorch:masterfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/87007
Note: Links to docs will display an error until the docs builds have been completed. ❌ 7 Failures, 11 PendingAs of commit 098f1b0: The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
receive much smaller sub-graphs (from ~5 graphs to ~50 graphs). For ORT, this is 2x slow down.
efdbed5 to
098f1b0
Compare
|
See discussion in #87073 |
|
Yes, we did realize the problem caused by last refactor. Totally my fault. We are trying to patch it as well, but still working out how. One thing we absolutely should add is a test case where getitem node connects fusible ops, so we won't accidentally break this in our getitem special handling refactor in the future. |
| _get_qualified_name(node.target) != "_operator.getitem" # type: ignore[arg-type] | ||
| ) | ||
| ) | ||
| return self.operator_support.is_node_supported(dict(self.graph_module.named_modules()), node) |
There was a problem hiding this comment.
BTW, this is not safe.
You'll have getitem node absorbed by its consumer and later hitting an assert somewhere in the merge_single_node function. I think you'll be failing one of the getitem special handling tests we have.
|
Thanks for the replies. Close before we have #87073. |
If a backend wants to execute getitem,
CapabilityBasedPartitionershould let that backend to execute getitem if that backend wants. The function getitem is just a normal Python function. IfCapabilityBasedPartitionerrejects to partition getitem, it will cause 2x slow down when running ONNXRuntime via TorchDynamo.Figure 1. Profiling result when rejecting getitem. There are much more sub-graphs (each gray bin below "F" bin is a sub-graph) and there is a launching cost for each of them.

Figure 2. Profiling result when partitioning getitem (each gray bin below "F" bin is a sub-graph). Most BERT-base is partitioned into a single sub-graph.
