Skip to content

[CUDA_FUSER] Expand operation support for cuda fuser#37849

Closed
jjsjann123 wants to merge 62 commits intopytorch:masterfrom
csarofeen:operation_expand_pr
Closed

[CUDA_FUSER] Expand operation support for cuda fuser#37849
jjsjann123 wants to merge 62 commits intopytorch:masterfrom
csarofeen:operation_expand_pr

Conversation

@jjsjann123
Copy link
Copy Markdown
Collaborator

This PR added more supported operations in CUDA fuser. We are covering major point-wise operations supported in legacy fuser.

In an attempt to adapt to legacy executor:

  1. added an naive shape propagation pass on pytorch JIT IR;
  2. small refactor on graph partitioning;
  3. fallback interpreter execution of fusion group;

kevinstephano and others added 30 commits April 17, 2020 11:46
…our op declaration / or revisit type promotion
…as to be solved either by defining them all internal to the fuser or somehow including a CUDA path.
@mruberry mruberry requested review from ngimel and removed request for apaszke May 5, 2020 22:27
@mruberry mruberry added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels May 5, 2020
@ngimel ngimel removed their request for review May 5, 2020 23:17
@mruberry mruberry removed the module: cuda Related to torch.cuda, and CUDA support in general label May 5, 2020
@jjsjann123
Copy link
Copy Markdown
Collaborator Author

Windows test seems to be be jammed... I clicked the rerun that somehow spins up the passed tests 🤦

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jjsjann123 jjsjann123 deleted the operation_expand_pr branch May 7, 2020 17:35
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@soumith merged this pull request in 1667aa6.

glaringlee pushed a commit that referenced this pull request May 27, 2020
This is to reland #38675, and test cpp_extension compatible in _test only, this is enough, the purpose of this test is to make sure pytorch and cpp extension are compatible with xenial + cuda 9.2 + gcc 5.4

There are two non gcc5.4 (+ cuda9.2) compatible change introduced recently:
#37849
#38627
which caused the following problems:
https://app.circleci.com/pipelines/github/pytorch/pytorch/173756/workflows/7445e169-9c26-4ec4-a23a-ff6160d155b1/jobs/5582207/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/173970/workflows/bf0de0f2-9156-4c8f-a097-53ca8e20d4b0/jobs/5589265/steps

The root cause is that gcc 5.4 does not support uniform initialization list well, it can not deduce a correct type in some cases. It probably bugs in the gcc 5 compiler,  I modified these code a little bit to make them compatible with cuda 9.2 + gcc 5.4.

People are still using xenial + gcc5.4 + cuda 9.x, this env should be covered until xenial is deprecated. 

Differential Revision: [D21731026](https://our.internmc.facebook.com/intern/diff/D21731026)

[ghstack-poisoned]
glaringlee pushed a commit that referenced this pull request May 27, 2020
This is to reland #38675, and test cpp_extension compatible in _test only, this is enough, the purpose of this test is to make sure pytorch and cpp extension are compatible with xenial + cuda 9.2 + gcc 5.4

There are two non gcc5.4 (+ cuda9.2) compatible change introduced recently:
#37849
#38627
which caused the following problems:
https://app.circleci.com/pipelines/github/pytorch/pytorch/173756/workflows/7445e169-9c26-4ec4-a23a-ff6160d155b1/jobs/5582207/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/173970/workflows/bf0de0f2-9156-4c8f-a097-53ca8e20d4b0/jobs/5589265/steps

The root cause is that gcc 5.4 does not support uniform initialization list well, it can not deduce a correct type in some cases. It probably bugs in the gcc 5 compiler,  I modified these code a little bit to make them compatible with cuda 9.2 + gcc 5.4.

People are still using xenial + gcc5.4 + cuda 9.x, this env should be covered until xenial is deprecated. 

Differential Revision: [D21731026](https://our.internmc.facebook.com/intern/diff/D21731026)

[ghstack-poisoned]
glaringlee pushed a commit that referenced this pull request May 28, 2020
This is to reland #38675, and test cpp_extension compatible in _test only, this is enough, the purpose of this test is to make sure pytorch and cpp extension are compatible with xenial + cuda 9.2 + gcc 5.4

There are two non gcc5.4 (+ cuda9.2) compatible change introduced recently:
#37849
#38627
which caused the following problems:
https://app.circleci.com/pipelines/github/pytorch/pytorch/173756/workflows/7445e169-9c26-4ec4-a23a-ff6160d155b1/jobs/5582207/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/173970/workflows/bf0de0f2-9156-4c8f-a097-53ca8e20d4b0/jobs/5589265/steps

The root cause is that gcc 5.4 does not support uniform initialization list well, it can not deduce a correct type in some cases. It probably bugs in the gcc 5 compiler,  I modified these code a little bit to make them compatible with cuda 9.2 + gcc 5.4.

People are still using xenial + gcc5.4 + cuda 9.x, this env should be covered until xenial is deprecated. 

Differential Revision: [D21731026](https://our.internmc.facebook.com/intern/diff/D21731026)

[ghstack-poisoned]
glaringlee pushed a commit that referenced this pull request May 28, 2020
This is to reland #38675, and test cpp_extension compatibility in _test only, this is enough, the purpose of this test is to make sure pytorch and cpp extension are compatible with xenial + cuda 9.2 + gcc 5.4

There are two non gcc5.4 (+ cuda9.2) compatible change introduced recently:
#37849
#38627
which caused the following problems:
https://app.circleci.com/pipelines/github/pytorch/pytorch/173756/workflows/7445e169-9c26-4ec4-a23a-ff6160d155b1/jobs/5582207/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/173970/workflows/bf0de0f2-9156-4c8f-a097-53ca8e20d4b0/jobs/5589265/steps

The root cause is that gcc 5.4 does not support uniform initialization list well, it can not deduce a correct type in some cases. It probably bugs in the gcc 5 compiler,  I modified these code a little bit to make them compatible with cuda 9.2 + gcc 5.4.

People are still using xenial + gcc5.4 + cuda 9.x, this env should be covered until xenial is deprecated. 

Differential Revision: [D21731026](https://our.internmc.facebook.com/intern/diff/D21731026)

[ghstack-poisoned]
glaringlee pushed a commit that referenced this pull request May 28, 2020
This is to reland #38675, and test cpp_extension compatibility in _test only, this is enough, the purpose of this test is to make sure pytorch and cpp extension are compatible with xenial + cuda 9.2 + gcc 5.4

There are two non gcc5.4 (+ cuda9.2) compatible change introduced recently:
#37849
#38627
which caused the following problems:
https://app.circleci.com/pipelines/github/pytorch/pytorch/173756/workflows/7445e169-9c26-4ec4-a23a-ff6160d155b1/jobs/5582207/steps
https://app.circleci.com/pipelines/github/pytorch/pytorch/173970/workflows/bf0de0f2-9156-4c8f-a097-53ca8e20d4b0/jobs/5589265/steps

The root cause is that gcc 5.4 does not support uniform initialization list well, it can not deduce a correct type in some cases. It probably bugs in the gcc 5 compiler,  I modified these code a little bit to make them compatible with cuda 9.2 + gcc 5.4.

People are still using xenial + gcc5.4 + cuda 9.x, this env should be covered until xenial is deprecated. 

Differential Revision: [D21731026](https://our.internmc.facebook.com/intern/diff/D21731026)

[ghstack-poisoned]
jjsjann123 added a commit to jjsjann123/nvfuser that referenced this pull request Oct 29, 2022
Summary:
This PR added more supported operations in CUDA fuser. We are covering major point-wise operations supported in legacy fuser.

In an attempt to adapt to legacy executor:
1. added an naive shape propagation pass on pytorch JIT IR;
2. small refactor on graph partitioning;
3. fallback interpreter execution of fusion group;
Pull Request resolved: pytorch/pytorch#37849

Reviewed By: yf225

Differential Revision: D21444320

Pulled By: soumith

fbshipit-source-id: 712e18ab8497f8d58a07e6f8d200cdab52cf0d74
jjsjann123 added a commit to jjsjann123/nvfuser that referenced this pull request Nov 10, 2022
Summary:
This PR added more supported operations in CUDA fuser. We are covering major point-wise operations supported in legacy fuser.

In an attempt to adapt to legacy executor:
1. added an naive shape propagation pass on pytorch JIT IR;
2. small refactor on graph partitioning;
3. fallback interpreter execution of fusion group;
Pull Request resolved: pytorch/pytorch#37849

Reviewed By: yf225

Differential Revision: D21444320

Pulled By: soumith

fbshipit-source-id: 712e18ab8497f8d58a07e6f8d200cdab52cf0d74
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
This PR added more supported operations in CUDA fuser. We are covering major point-wise operations supported in legacy fuser.

In an attempt to adapt to legacy executor:
1. added an naive shape propagation pass on pytorch JIT IR;
2. small refactor on graph partitioning;
3. fallback interpreter execution of fusion group;
Pull Request resolved: pytorch#37849

Reviewed By: yf225

Differential Revision: D21444320

Pulled By: soumith

fbshipit-source-id: 712e18ab8497f8d58a07e6f8d200cdab52cf0d74
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged oncall: jit Add this issue/PR to JIT oncall triage queue open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants