Skip to content

Make _embedding_bag_backward explicitly dispatch to CPU and CUDA.#129691

Closed
ysiraichi wants to merge 5 commits intogh/ysiraichi/61/basefrom
gh/ysiraichi/61/head
Closed

Make _embedding_bag_backward explicitly dispatch to CPU and CUDA.#129691
ysiraichi wants to merge 5 commits intogh/ysiraichi/61/basefrom
gh/ysiraichi/61/head

Conversation

@ysiraichi
Copy link
Copy Markdown
Collaborator

@ysiraichi ysiraichi commented Jun 27, 2024

Stack from ghstack (oldest at bottom):

This PR modifies _embedding_bag_backward item inside native_functions.yaml, so that it
dispatches to CPU and CUDA directly, instead of CompositeImplicitAutograd.

Context: PyTorch operations that have the CompositeImplicitAutograd dispatch do not
allow third party backends (e.g. XLA) to modify its implementation, since this dispatch
key has higher priority. When calling _embedding_bag_backward operation using XLA, a
dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors.

Problem: _embedding_bag_backward has a sparse parameter that controls whether the
operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does
not support sparse tensors. In order to fallback that execution to dense, i.e. change the
flag at runtime, we need to be able to modify its implementation.

Solution: we have changed the dispatch of _embedding_bag_backward to CPU and CUDA,
which allowed us to introduce our own kernel for it.

Additionally, this PR refactored the representation of its mode from constant integers
into an enum class. It also introduces two additional operators: int == EmbeddingBagMode
and int != EmbeddingBagMode.

cc @bdhirsh @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @miladm @lezcano

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jun 27, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129691

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 1 Unrelated Failure

As of commit f4c27b5 with merge base 5ee893a (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ysiraichi added a commit that referenced this pull request Jun 27, 2024
This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it
dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`.

*Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not
allow third party backends (e.g. XLA) to modify its implementation, since this dispatch
key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a
dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors.

*Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the
operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does
not support sparse tensors. In order to fallback that execution to dense, i.e. change the
flag at runtime, we need to be able to modify its implementation.

*Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA,
which allowed us to introduce our own kernel for it.

Additionally, this PR refactored the representation of its mode from constant integers
into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode`
and `int != EmbeddingBagMode`.

ghstack-source-id: 33149bc
Pull Request resolved: #129691
@ysiraichi ysiraichi added the module: xla Related to XLA support label Jun 27, 2024
Comment thread tools/autograd/derivatives.yaml
case MODE_SUM:
case MODE_MEAN:
if (mode == MODE_MEAN)
switch (static_cast<EmbeddingBagMode>(mode)) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to go for the full refactor, you should probably never have static_casts, and always pass the right type around.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't really planning on going all the way, at least in this PR. This refactor was just to centralize the mode enum (we were declaring constant ints in 2 places already).

@lezcano
Copy link
Copy Markdown
Collaborator

lezcano commented Jun 28, 2024

it seems you'll also need to change a few other things here and there.

[ghstack-poisoned]
[ghstack-poisoned]
ysiraichi added a commit that referenced this pull request Jun 28, 2024
This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it
dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`.

*Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not
allow third party backends (e.g. XLA) to modify its implementation, since this dispatch
key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a
dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors.

*Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the
operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does
not support sparse tensors. In order to fallback that execution to dense, i.e. change the
flag at runtime, we need to be able to modify its implementation.

*Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA,
which allowed us to introduce our own kernel for it.

Additionally, this PR refactored the representation of its mode from constant integers
into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode`
and `int != EmbeddingBagMode`.

ghstack-source-id: 4e2fbf3
Pull Request resolved: #129691
@ysiraichi
Copy link
Copy Markdown
Collaborator Author

@lezcano Incorporated your suggestions + a few other changes to fix CI. Let me know if this is good to go!

Copy link
Copy Markdown
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor point, otherwise LGTM

Comment thread tools/autograd/derivatives.yaml Outdated
[ghstack-poisoned]
ysiraichi added a commit that referenced this pull request Jul 1, 2024
This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it
dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`.

*Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not
allow third party backends (e.g. XLA) to modify its implementation, since this dispatch
key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a
dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors.

*Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the
operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does
not support sparse tensors. In order to fallback that execution to dense, i.e. change the
flag at runtime, we need to be able to modify its implementation.

*Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA,
which allowed us to introduce our own kernel for it.

Additionally, this PR refactored the representation of its mode from constant integers
into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode`
and `int != EmbeddingBagMode`.

ghstack-source-id: d6fde08
Pull Request resolved: #129691
[ghstack-poisoned]
ysiraichi added a commit that referenced this pull request Jul 1, 2024
This PR modifies `_embedding_bag_backward` item inside _native_functions.yaml_, so that it
dispatches to CPU and CUDA directly, instead of `CompositeImplicitAutograd`.

*Context:* PyTorch operations that have the `CompositeImplicitAutograd` dispatch do not
allow third party backends (e.g. XLA) to modify its implementation, since this dispatch
key has higher priority. When calling `_embedding_bag_backward` operation using XLA, a
dispatch error will be thrown, since PyTorch/XLA doesn't support sparse tensors.

*Problem:* `_embedding_bag_backward` has a `sparse` parameter that controls whether the
operation should return a sparse or dense tensor. However, at the moment, PyTorch/XLA does
not support sparse tensors. In order to fallback that execution to dense, i.e. change the
flag at runtime, we need to be able to modify its implementation.

*Solution:* we have changed the dispatch of `_embedding_bag_backward` to CPU and CUDA,
which allowed us to introduce our own kernel for it.

Additionally, this PR refactored the representation of its mode from constant integers
into an enum class. It also introduces two additional operators: `int == EmbeddingBagMode`
and `int != EmbeddingBagMode`.

ghstack-source-id: d633153
Pull Request resolved: #129691
@ysiraichi ysiraichi added the topic: not user facing topic category label Jul 2, 2024
@ysiraichi
Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 2, 2024
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64-mps / test (mps, 1, 1, macos-m1-13)

Details for Dev Infra team Raised by workflow job

@ysiraichi
Copy link
Copy Markdown
Collaborator Author

These CI failures don't seem related to this PR, since they are failing on other commits, too. I will merge this PR with -f.

@ysiraichi
Copy link
Copy Markdown
Collaborator Author

@pytorchbot merge -f "CI failures are not related to this PR."

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions Bot deleted the gh/ysiraichi/61/head branch August 4, 2024 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor module: xla Related to XLA support open source topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants