Skip to content

Nvfuser code bump 12 5 (#69964)#1345

Merged
jjsjann123 merged 1 commit intodevelfrom
upstream_code_bump_12_5_cherry_pick
Jan 1, 2022
Merged

Nvfuser code bump 12 5 (#69964)#1345
jjsjann123 merged 1 commit intodevelfrom
upstream_code_bump_12_5_cherry_pick

Conversation

@jjsjann123
Copy link
Copy Markdown
Collaborator

Summary:
Pull Request resolved: pytorch#69964

Things added in this PR that requires review:

  1. cuLaunchCooperativeKernel driver API added
    aten/src/ATen/cuda/detail/LazyNVRTC.cpp
    aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h

nvfuser code update:

  1. perf turning on codegen scheduler that improves performance.
  2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark)

Things reverted from local changes:

  1. aten::gelu with approximation
  2. local changes that is upstreamed in PR fixing removeProfilingNodes duplicated functions (#1282) pytorch/pytorch#68804

Pull Request resolved: pytorch#69428

Reviewed By: ngimel

Differential Revision: D33073817

Pulled By: wconstab

fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb

Fixes #{issue number}

Copy link
Copy Markdown
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine overall, just don't know where the index_compute.h change is coming from.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any uses of this, where is it being used?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Likely code removed recently in devel branch. Let me clean it up.

Summary:
Pull Request resolved: pytorch#69964

Things added in this PR that requires review:
1. cuLaunchCooperativeKernel driver API added
aten/src/ATen/cuda/detail/LazyNVRTC.cpp
aten/src/ATen/cuda/nvrtc_stub/ATenNVRTC.h

nvfuser code update:
1. perf turning on codegen scheduler that improves performance.
2. permutation support has been extended beyond contiguous/channels-last. (The improvements could be observed on PW benchmark)

Things reverted from local changes:
1. aten::gelu with approximation
2. local changes that is upstreamed in PR pytorch#68804

Pull Request resolved: pytorch#69428

Reviewed By: ngimel

Differential Revision: D33073817

Pulled By: wconstab

fbshipit-source-id: e77d32e81d037d7370822b040456fd4c3bd68edb
@jjsjann123 jjsjann123 force-pushed the upstream_code_bump_12_5_cherry_pick branch from e05b902 to d645666 Compare January 1, 2022 04:26
@jjsjann123 jjsjann123 merged commit 9fb69ab into devel Jan 1, 2022
@jjsjann123 jjsjann123 deleted the upstream_code_bump_12_5_cherry_pick branch January 1, 2022 04:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants