sparse.mm backward: performance improvements by nikitaved · Pull Request #94991 · pytorch/pytorch

nikitaved · 2023-02-16T17:34:04Z

torch.sparse.mm - faster and without syncs in "most" cases.

Stack from ghstack (oldest at bottom):

cc @alexsamardzic @pearu @cpuhrsch @amjames @bhosmer @ezyang @albanD @zou3519 @gqchen @soulitzer @lezcano @Varal7

[ghstack-poisoned]

pytorch-bot · 2023-02-16T17:34:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94991

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e862db2:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: 1526dd8 Pull Request resolved: #94991

torch/csrc/autograd/FunctionsManual.cpp

[ghstack-poisoned]

ghstack-source-id: 39235c7 Pull Request resolved: #94991

[ghstack-poisoned]

nikitaved · 2023-02-19T10:59:41Z

@albanD , @soulitzer , could you please also have a look?

[ghstack-poisoned]

ghstack-source-id: 66cfe12 Pull Request resolved: #94991

`torch.sparse.mm` - faster and without syncs in "most" cases. [ghstack-poisoned]

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

pearu

I have a few clarification questions, otherwise LGTM!

pearu · 2023-04-26T08:14:00Z

aten/src/ATen/native/sparse/SparseTensor.cpp

+  // larger. This function prepares inputs for `sparse_mask` such that `t` is
+  // projected onto `mask` by sorting `t` if uncoalesced and artifically marking it
+  // as coalesced all while `mask` is set to uncoalesced.
+  // The result of this projectionk is going to be uncoalesced, so it is up to the


Thanks for adding this comment!

This function returns three tensors. How do these relate to the inputs and to the result of the projection?

IIUC, for

lhs, rhs, lhs_hash_opt = sparse_mask_like_prepare_sparse_inputs(t, mask)

we have the following invariants

lhs.indices() == t.indices()[lhs_hash] lhs.values() == t.values()[lhs_hash] rhs == mask

where lhs_hash = lhs_hash_opt or slice(None), lhs may be a copy or a view of t, rhs is uncoaleced view of mask.

See the comment in #94991 (comment).

aten/src/ATen/native/sparse/SparseTensor.cpp

pearu · 2023-04-26T08:29:37Z

aten/src/ATen/native/sparse/SparseTensor.cpp

+  // the other way around depending on which arguments are coalesced and which are
+  // larger. This function prepares inputs for `sparse_mask` such that `t` is
+  // projected onto `mask` by sorting `t` if uncoalesced and artifically marking it
+  // as coalesced all while `mask` is set to uncoalesced.


What is the advantage of returning mask as uncoalesced?

If mask is uncoalesced, but the other argument is not, the COO intersection kernel will return a tensor with the same indices as mask but will do binary search of mask hashes into the hashes of the other argument's indices all without calls to sort and coalesce. The COO intersection kernel is heavily optimized to take advantage of is_coalesced of either argument, see https://github.com/pytorch/pytorch/pull/92976/files. As such, calling an intersection kernel with arguments (a, b) might produce results with a.indices() or b.indices() depending on which is more performant (i.e. does not sync) based on whether a.is_coalesced or b.is_coalesced. In order to force this kernel to do what we want for sparse_mask, we mark certain arguments as "coalesced" (if need to after sort) and mask as uncoalesced to make sure the result has indices mask.indices().

This (forcing a coalesced input to be uncoalesced) sounds like a convoluted way to control COO intersection kernel functionality (to ensure that the result indices are the input indices).

It still better than writing things from scratch, imho. The COO intersection kernel is not a public function with fixed semantics, but we want it to be fast and without any syncs if possible, so that, say, mul(a, b) and mul(b, a) produce the very same tensor and without any syncs if either a or b is coalesced. In my opinion we can sacrifice some clarity for performance here given that the array of ops that we can implement with this kernel is quite substantial and is performance critical (do you remember @amjames was working on removing calls to coalesce?) Fast sparse on CUDA is what sells it in the first place, and there is still some room for improvement in the intersection kernel as I see it... But I hear you, and will probably modify interface a bit now that I know the use cases better. Current design assumed just mul, I did not know about sparse_mask and its importance back then (and other use cases, like non-symmetric context). That could be a nice follow-up once performance is here...

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved · 2023-04-27T14:25:55Z

@pearu , do you have any other concerns? I will provide comfortable interfaces to COO intersection kernels as a follow-up to indeed reduce cognitive burden...

pearu

LGTM! Thanks, @nikitaved!

cpuhrsch · 2023-04-28T14:29:59Z

torch/csrc/autograd/FunctionsManual.cpp

  if (grad_order == 0) {
    auto a_grad = _sparse_sparse_matmul(grad, b.conj().t());
-    return a_grad.mul(mask_ones_like(a.coalesce()));
+    return sparse_mask_like_grad(a, a_grad);


Ok, so essentially we're starting to write manual fusions. If we had torch.compile support we could presumably generate the code for this pattern. Since this is a hot path this is ok, but it's going to start to become difficult to test correctness.

Since _sparse_mask_projection is a native function you could even write a test that compares it to the pattern that you're fusing.

Is this a common general pattern? Maybe we can apply it in more locations. If we generalize the fused composite native function that you're adding here a bit, maybe it applies to more locations? It can then also subsume all the logic of sparse_mask_like_grad and we can explicitly say "This function represents the fusion of this pattern". That'll be easier to understand to future maintainers.

@cpuhrsch , alternatively, we can make this function composite implicit now that sparse_mask does support backward with COO inputs. This way we can remove the code for backward altogether. The result: much less code, still sync-less backward albeit potentially slower compared to this impl (there is only one way to call backward, there is no back-and-force between grad and input, because of the fixed semantics of sparse_mask). But, yes, it is a generic pattern forced by some "sparse-semantics" functions which might become obsolete with differentiable sparse_mask or something similar but more flexible in enforcing projection direction (aka a public interface for our COO intersection primitive kernel).
Another possibility: we can create a public function which is doing either _sparse_mask_projection and/or sparse_mask (whichever is faster) which is explicitly differentiable and could be used to enforce sparse semantics all while having flexible backward. Then any sparse semantics function will be a composition of this method and the underlying logic it implements.

I think we should go with your first approach for now and keep the amount of code low and then work on torch.compile support to provide these fusions. We can keep what you have for now as a reference and goal for that integration. Otherwise we'll have to undo these manual fusions again once we have torch.compile support.

cpuhrsch

I think this looks good, I'm just worried that future maintainers might find this difficult to grok quickly. What if we reframe this work as implementing a fused function for a specific pattern of operations?

nikitaved · 2023-06-12T09:36:06Z

@cpuhrsch , do you mind merging this for now to untangle some PR dependencies? Once sparse_mask backward is in, I will make sparse-mm composite implicit and we will likely remove a LOT of already in the codebase code...

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

cpuhrsch · 2023-06-12T20:55:28Z

@pytorchbot merge

pytorchmergebot · 2023-06-12T20:57:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

sparse.mm backward: performance improvements

fbbde3e

[ghstack-poisoned]

nikitaved requested review from albanD and soulitzer as code owners February 16, 2023 17:34

nikitaved mentioned this pull request Feb 16, 2023

COO intersection primitives: performance improvement #92976

Closed

pytorch-bot bot added the release notes: sparse release notes category label Feb 16, 2023

This was referenced Feb 16, 2023

flatten_indices: remove syncs #94401

Closed

sparse_mask: remove syncs by removing calls to coalesce #94406

Closed

COO intersection: allow external hash + hash reuse in sparse_mask #94596

Closed

Skylion007 approved these changes Feb 16, 2023

View reviewed changes

pytorchbot added the open source label Feb 16, 2023

Update on "sparse.mm backward: performance improvements"

e3a0fbb

[ghstack-poisoned]

nikitaved added a commit that referenced this pull request Feb 16, 2023

sparse.mm backward: performance improvements

994905f

ghstack-source-id: 1526dd8 Pull Request resolved: #94991

nikitaved commented Feb 16, 2023

View reviewed changes

torch/csrc/autograd/FunctionsManual.cpp Outdated Show resolved Hide resolved

nikitaved marked this pull request as draft February 16, 2023 17:53

Update on "sparse.mm backward: performance improvements"

8735779

[ghstack-poisoned]

nikitaved added a commit that referenced this pull request Feb 17, 2023

sparse.mm backward: performance improvements

b13740b

ghstack-source-id: 39235c7 Pull Request resolved: #94991

Update on "sparse.mm backward: performance improvements"

f2aa330

[ghstack-poisoned]

nikitaved marked this pull request as ready for review February 19, 2023 10:59

nikitaved requested a review from Skylion007 February 19, 2023 10:59

Update on "sparse.mm backward: performance improvements"

fce7c20

[ghstack-poisoned]

nikitaved added a commit that referenced this pull request Feb 19, 2023

sparse.mm backward: performance improvements

fc3da36

ghstack-source-id: 66cfe12 Pull Request resolved: #94991

nikitaved added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 19, 2023

Update on "sparse.mm backward: performance improvements"

0e3e664

`torch.sparse.mm` - faster and without syncs in "most" cases. [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

d68bb00

`torch.sparse.mm` - faster and without syncs in "most" cases. [ghstack-poisoned]

nikitaved added 2 commits April 18, 2023 15:52

Update on "sparse.mm backward: performance improvements"

1865b86

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

678e1b3

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved requested a review from pearu April 19, 2023 11:30

nikitaved added 7 commits April 19, 2023 13:03

Update on "sparse.mm backward: performance improvements"

490899b

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

77e2eea

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

cf1c66e

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

fafa8f0

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

8a0cb68

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

b7ba456

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

4196986

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

pearu reviewed Apr 26, 2023

View reviewed changes

nikitaved added 3 commits April 26, 2023 09:35

Update on "sparse.mm backward: performance improvements"

be67806

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

4717d71

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

Update on "sparse.mm backward: performance improvements"

02a1082

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

pearu approved these changes Apr 27, 2023

View reviewed changes

cpuhrsch reviewed Apr 28, 2023

View reviewed changes

cpuhrsch requested changes Apr 28, 2023

View reviewed changes

Update on "sparse.mm backward: performance improvements"

e862db2

`torch.sparse.mm` - faster and without syncs in "most" cases. cc alexsamardzic pearu cpuhrsch amjames bhosmer ezyang albanD zou3519 gqchen soulitzer Lezcano Varal7 [ghstack-poisoned]

nikitaved requested a review from cpuhrsch June 12, 2023 13:26

cpuhrsch approved these changes Jun 12, 2023

View reviewed changes

pytorchmergebot added the merging label Jun 12, 2023

pytorchmergebot added Merged and removed merging labels Jun 12, 2023

pytorchmergebot closed this in 056d92e Jun 12, 2023

facebook-github-bot deleted the gh/nikitaved/24/head branch June 16, 2023 14:17

Conversation

nikitaved commented Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Feb 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94991

✅ No Failures

Uh oh!

Uh oh!

nikitaved commented Feb 19, 2023

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

pearu Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

pearu Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

nikitaved Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pearu Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

nikitaved Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pearu Apr 26, 2023

Choose a reason for hiding this comment

Uh oh!

nikitaved Apr 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikitaved commented Apr 27, 2023

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

cpuhrsch Apr 28, 2023

Choose a reason for hiding this comment

Uh oh!

nikitaved May 3, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cpuhrsch May 3, 2023

Choose a reason for hiding this comment

Uh oh!

cpuhrsch left a comment

Choose a reason for hiding this comment

Uh oh!

nikitaved commented Jun 12, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cpuhrsch commented Jun 12, 2023

Uh oh!

pytorchmergebot commented Jun 12, 2023

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nikitaved commented Feb 16, 2023 •

edited

Loading

pytorch-bot bot commented Feb 16, 2023 •

edited

Loading

nikitaved Apr 26, 2023 •

edited

Loading

nikitaved Apr 26, 2023 •

edited

Loading

nikitaved May 3, 2023 •

edited

Loading

nikitaved commented Jun 12, 2023 •

edited

Loading