Make `_embedding_bag_backward` explicitly dispatch to CPU and CUDA. by ysiraichi · Pull Request #129691 · pytorch/pytorch

ysiraichi · 2024-06-27T19:35:03Z

Stack from ghstack (oldest at bottom):

-> Make _embedding_bag_backward explicitly dispatch to CPU and CUDA. #129691

This PR modifies _embedding_bag_backward item inside native_functions.yaml, so that it
dispatches to CPU and CUDA directly, instead of CompositeImplicitAutograd.

Context: PyTorch operations that have the CompositeImplicitAutograd dispatch do not
allow third party backends (e.g. XLA) to modify its implementation, since this dispatch
key has higher priority. When calling _embedding_bag_backward operation using XLA, a
dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors.

Problem: _embedding_bag_backward has a sparse parameter that controls whether the
operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does
not support sparse tensors. In order to fallback that execution to dense, i.e. change the
flag at runtime, we need to be able to modify its implementation.

Solution: we have changed the dispatch of _embedding_bag_backward to CPU and CUDA,
which allowed us to introduce our own kernel for it.

Additionally, this PR refactored the representation of its mode from constant integers
into an enum class. It also introduces two additional operators: int == EmbeddingBagMode
and int != EmbeddingBagMode.

cc @bdhirsh @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @miladm @lezcano

[ghstack-poisoned]

pytorch-bot · 2024-06-27T19:35:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129691

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Checkout action fails due to incompatible GLIBC

❌ 2 New Failures, 1 Unrelated Failure

As of commit f4c27b5 with merge base 5ee893a ():

NEW FAILURES - The following jobs have failed:

trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-13) (gh)
test_mps.py::TestMPS::test_mps_allocator_module
trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-14) (gh)
test_mps.py::TestMPS::test_mps_allocator_module

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-focal-cuda12.4-py3.10-gcc9-experimental-split-build-test / test (nogpu_AVX512, 1, 1, linux.2xlarge) (gh) (disabled by #127813)
dynamo/test_misc.py::MiscTests::test_cpp_extension_recommends_custom_ops

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`. *Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not allow third party backends (e.g. XLA) to modify its implementation, since this dispatch key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors. *Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does not support sparse tensors. In order to fallback that execution to dense, i.e. change the flag at runtime, we need to be able to modify its implementation. *Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA, which allowed us to introduce our own kernel for it. Additionally, this PR refactored the representation of its mode from constant integers into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode` and `int != EmbeddingBagMode`. ghstack-source-id: 33149bc Pull Request resolved: #129691

lezcano · 2024-06-28T07:18:03Z

-    case MODE_SUM:
-    case MODE_MEAN:
-      if (mode == MODE_MEAN)
+  switch (static_cast<EmbeddingBagMode>(mode)) {


if you want to go for the full refactor, you should probably never have static_casts, and always pass the right type around.

I wasn't really planning on going all the way, at least in this PR. This refactor was just to centralize the mode enum (we were declaring constant ints in 2 places already).

lezcano · 2024-06-28T07:19:07Z

it seems you'll also need to change a few other things here and there.

[ghstack-poisoned]

This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`. *Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not allow third party backends (e.g. XLA) to modify its implementation, since this dispatch key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors. *Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does not support sparse tensors. In order to fallback that execution to dense, i.e. change the flag at runtime, we need to be able to modify its implementation. *Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA, which allowed us to introduce our own kernel for it. Additionally, this PR refactored the representation of its mode from constant integers into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode` and `int != EmbeddingBagMode`. ghstack-source-id: 4e2fbf3 Pull Request resolved: #129691

ysiraichi · 2024-06-28T20:07:52Z

@lezcano Incorporated your suggestions + a few other changes to fix CI. Let me know if this is good to go!

lezcano

Just a minor point, otherwise LGTM

[ghstack-poisoned]

This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`. *Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not allow third party backends (e.g. XLA) to modify its implementation, since this dispatch key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors. *Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does not support sparse tensors. In order to fallback that execution to dense, i.e. change the flag at runtime, we need to be able to modify its implementation. *Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA, which allowed us to introduce our own kernel for it. Additionally, this PR refactored the representation of its mode from constant integers into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode` and `int != EmbeddingBagMode`. ghstack-source-id: d6fde08 Pull Request resolved: #129691

[ghstack-poisoned]

This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`. *Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not allow third party backends (e.g. XLA) to modify its implementation, since this dispatch key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors. *Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does not support sparse tensors. In order to fallback that execution to dense, i.e. change the flag at runtime, we need to be able to modify its implementation. *Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA, which allowed us to introduce our own kernel for it. Additionally, this PR refactored the representation of its mode from constant integers into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode` and `int != EmbeddingBagMode`. ghstack-source-id: d633153 Pull Request resolved: #129691

ysiraichi · 2024-07-02T19:15:27Z

@pytorchbot merge

pytorchmergebot · 2024-07-02T19:17:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-07-02T21:16:06Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-13)

Details for Dev Infra team

Raised by workflow job

ysiraichi · 2024-07-03T21:52:07Z

These CI failures don't seem related to this PR, since they are failing on other commits, too. I will merge this PR with -f.

ysiraichi · 2024-07-03T21:53:02Z

@pytorchbot merge -f "CI failures are not related to this PR."

pytorchmergebot · 2024-07-03T21:54:36Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Update

d5e3d51

[ghstack-poisoned]

ysiraichi requested review from albanD, eqy and soulitzer as code owners June 27, 2024 19:35

ysiraichi mentioned this pull request Jun 27, 2024

Fallback _embedding_bag_backward and force sparse=false. pytorch/xla#7584

Merged

ysiraichi added the module: xla Related to XLA support label Jun 27, 2024

pytorchbot added the open source label Jun 27, 2024

lezcano approved these changes Jun 28, 2024

View reviewed changes

Rebased.

4511b22

[ghstack-poisoned]

Update

8d7b38e

[ghstack-poisoned]

pytorch-bot Bot added ciflow/inductor module: inductor labels Jun 28, 2024

lezcano approved these changes Jun 30, 2024

View reviewed changes

Comment thread tools/autograd/derivatives.yaml Outdated

Update

9f410f9

[ghstack-poisoned]

ysiraichi mentioned this pull request Jul 1, 2024

Failing Torchbench Models: tracking issue pytorch/xla#5932

Open

Update

f4c27b5

[ghstack-poisoned]

ysiraichi added the topic: not user facing topic category label Jul 2, 2024

pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 2, 2024

pytorchmergebot added the merging label Jul 2, 2024

pytorchmergebot removed the merging label Jul 2, 2024

pytorchmergebot added the merging label Jul 3, 2024

pytorchmergebot closed this in a79bb8d Jul 3, 2024

pytorchmergebot added Merged and removed merging labels Jul 3, 2024

github-actions Bot deleted the gh/ysiraichi/61/head branch August 4, 2024 02:00

Conversation

ysiraichi commented Jun 27, 2024 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129691

❗ 1 Active SEVs

❌ 2 New Failures, 1 Unrelated Failure

Uh oh!

Uh oh!

lezcano Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

ysiraichi Jun 28, 2024

Choose a reason for hiding this comment

Uh oh!

lezcano commented Jun 28, 2024

Uh oh!

ysiraichi commented Jun 28, 2024

Uh oh!

lezcano left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ysiraichi commented Jul 2, 2024

Uh oh!

pytorchmergebot commented Jul 2, 2024

Merge started

Uh oh!

pytorchmergebot commented Jul 2, 2024

Merge failed

Uh oh!

ysiraichi commented Jul 3, 2024

Uh oh!

ysiraichi commented Jul 3, 2024

Uh oh!

pytorchmergebot commented Jul 3, 2024

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ysiraichi commented Jun 27, 2024 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Jun 27, 2024 •

edited

Loading