Make _embedding_bag_backward explicitly dispatch to CPU and CUDA.#129691
Make _embedding_bag_backward explicitly dispatch to CPU and CUDA.#129691ysiraichi wants to merge 5 commits intogh/ysiraichi/61/basefrom
_embedding_bag_backward explicitly dispatch to CPU and CUDA.#129691Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129691
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 1 Unrelated FailureAs of commit f4c27b5 with merge base 5ee893a ( NEW FAILURES - The following jobs have failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`. *Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not allow third party backends (e.g. XLA) to modify its implementation, since this dispatch key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors. *Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does not support sparse tensors. In order to fallback that execution to dense, i.e. change the flag at runtime, we need to be able to modify its implementation. *Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA, which allowed us to introduce our own kernel for it. Additionally, this PR refactored the representation of its mode from constant integers into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode` and `int != EmbeddingBagMode`. ghstack-source-id: 33149bc Pull Request resolved: #129691
| case MODE_SUM: | ||
| case MODE_MEAN: | ||
| if (mode == MODE_MEAN) | ||
| switch (static_cast<EmbeddingBagMode>(mode)) { |
There was a problem hiding this comment.
if you want to go for the full refactor, you should probably never have static_casts, and always pass the right type around.
There was a problem hiding this comment.
I wasn't really planning on going all the way, at least in this PR. This refactor was just to centralize the mode enum (we were declaring constant ints in 2 places already).
|
it seems you'll also need to change a few other things here and there. |
This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`. *Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not allow third party backends (e.g. XLA) to modify its implementation, since this dispatch key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors. *Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does not support sparse tensors. In order to fallback that execution to dense, i.e. change the flag at runtime, we need to be able to modify its implementation. *Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA, which allowed us to introduce our own kernel for it. Additionally, this PR refactored the representation of its mode from constant integers into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode` and `int != EmbeddingBagMode`. ghstack-source-id: 4e2fbf3 Pull Request resolved: #129691
|
@lezcano Incorporated your suggestions + a few other changes to fix CI. Let me know if this is good to go! |
lezcano
left a comment
There was a problem hiding this comment.
Just a minor point, otherwise LGTM
This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`. *Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not allow third party backends (e.g. XLA) to modify its implementation, since this dispatch key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors. *Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does not support sparse tensors. In order to fallback that execution to dense, i.e. change the flag at runtime, we need to be able to modify its implementation. *Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA, which allowed us to introduce our own kernel for it. Additionally, this PR refactored the representation of its mode from constant integers into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode` and `int != EmbeddingBagMode`. ghstack-source-id: d6fde08 Pull Request resolved: #129691
This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`. *Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not allow third party backends (e.g. XLA) to modify its implementation, since this dispatch key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors. *Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does not support sparse tensors. In order to fallback that execution to dense, i.e. change the flag at runtime, we need to be able to modify its implementation. *Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA, which allowed us to introduce our own kernel for it. Additionally, this PR refactored the representation of its mode from constant integers into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode` and `int != EmbeddingBagMode`. ghstack-source-id: d633153 Pull Request resolved: #129691
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-13) Details for Dev Infra teamRaised by workflow job |
|
These CI failures don't seem related to this PR, since they are failing on other commits, too. I will merge this PR with |
|
@pytorchbot merge -f "CI failures are not related to this PR." |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
_embedding_bag_backwardexplicitly dispatch to CPU and CUDA. #129691This PR modifies
_embedding_bag_backwarditem inside native_functions.yaml, so that itdispatches to CPU and CUDA directly, instead of
CompositeImplicitAutograd.Context: PyTorch operations that have the
CompositeImplicitAutograddispatch do notallow third party backends (e.g. XLA) to modify its implementation, since this dispatch
key has higher priority. When calling
_embedding_bag_backwardoperation using XLA, adispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors.
Problem:
_embedding_bag_backwardhas asparseparameter that controls whether theoperation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does
not support sparse tensors. In order to fallback that execution to dense, i.e. change the
flag at runtime, we need to be able to modify its implementation.
Solution: we have changed the dispatch of
_embedding_bag_backwardto CPU and CUDA,which allowed us to introduce our own kernel for it.
Additionally, this PR refactored the representation of its mode from constant integers
into an enum class. It also introduces two additional operators:
int == EmbeddingBagModeand
int != EmbeddingBagMode.cc @bdhirsh @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @miladm @lezcano