Skip to content

Fix the fused reduction runtime kernel#1729

Merged
naoyam merged 3 commits intodevelfrom
fix_allreduce
May 25, 2022
Merged

Fix the fused reduction runtime kernel#1729
naoyam merged 3 commits intodevelfrom
fix_allreduce

Conversation

@naoyam
Copy link
Copy Markdown
Collaborator

@naoyam naoyam commented May 25, 2022

No description provided.

@naoyam naoyam requested a review from csarofeen May 25, 2022 07:10
@naoyam
Copy link
Copy Markdown
Collaborator Author

naoyam commented May 25, 2022

The performance of the added C++ test on Titan RTX:

Launch Parameters: BlockDim.x = 16, BlockDim.y = 8, BlockDim.z = -1, GridDim.x = 1, GridDim.y = 196, GridDim.z = -1, Smem Size = 512
kernel1 run in 0.071008 ms, achieved: 45.2258 GB/s
Kernel performance profile:
GroupedGridReduction, T5, 18.191 us, 2

The thread mapping follows the outer-reduction scheduling scheme.

When the horizontal grouping is not used, the total time of the two grid reductions was about 23 us, so the grouping has some non-negligible performance impact in this persistent case.

Copy link
Copy Markdown
Owner

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't review the test but the fixes make sense to me so stamping.

@naoyam naoyam merged commit b5feee5 into devel May 25, 2022
@naoyam naoyam deleted the fix_allreduce branch May 25, 2022 17:17
jjsjann123 added a commit that referenced this pull request Jun 22, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Bug fixes and minor refactor

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
4c60e7d Add examples infrastructure for using nvFuser in a standalone program (#1725)
02a05d9 Fix issue #1751 (#1753)
8a69aa3 Refactor NvFuser transpose API to match eager mode behavior (#1746)
ffdf6b7 Remove BroadcastWithoutStride. (#1738)
02bab16 Fix flipping of a boolean flag (#1745)
465d668 cleanup (#1744)
26d354e fixing noncontig broadcast (#1742)
856b6b2 Add IterDomainBuilder (#1736)
1fd974f fixing warning for gcc7 (#1732)
de2740a disabling complex in python tests for #1730 (#1733)
fbbbe0a fixing MSVC build (#1728)
b5feee5 Fix the fused reduction runtime kernel (#1729)
5247682 Re-entrant GroupedGridReduction (#1727)
```

RUN_TORCHBENCH: nvfuser
Pull Request resolved: pytorch#79147
Approved by: https://github.com/davidberard98
jjsjann123 added a commit that referenced this pull request Jun 22, 2022
…h#79406)

Landing reverted PR pytorch#79147.

Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Bug fixes and minor refactor

Squashed commits to WAR github API
Commits that's actually in this PR from the devel branch:

```
4c60e7d Add examples infrastructure for using nvFuser in a standalone program (#1725)
02a05d9 Fix issue #1751 (#1753)
8a69aa3 Refactor NvFuser transpose API to match eager mode behavior (#1746)
ffdf6b7 Remove BroadcastWithoutStride. (#1738)
02bab16 Fix flipping of a boolean flag (#1745)
465d668 cleanup (#1744)
26d354e fixing noncontig broadcast (#1742)
856b6b2 Add IterDomainBuilder (#1736)
1fd974f fixing warning for gcc7 (#1732)
de2740a disabling complex in python tests for #1730 (#1733)
fbbbe0a fixing MSVC build (#1728)
b5feee5 Fix the fused reduction runtime kernel (#1729)
5247682 Re-entrant GroupedGridReduction (#1727)
```

RUN_TORCHBENCH: nvfuser
Pull Request resolved: pytorch#79406
Approved by: https://github.com/davidberard98
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants