Merged
Conversation
naoyam
commented
May 24, 2022
Comment on lines
-208
to
-209
| const nvfuser_index_t entrance_ind_ = PERSISTENT_REDUCTION ? 0 : entrance_ind; | ||
| const nvfuser_index_t n_entrances_ = PERSISTENT_REDUCTION ? 1 : n_entrances; |
Collaborator
Author
There was a problem hiding this comment.
Just removed unused variables
naoyam
commented
May 24, 2022
| } | ||
| } | ||
|
|
||
| inline void clearL2Cache() { |
Collaborator
Author
There was a problem hiding this comment.
Just copied from the benchmark directory as I'm doing ad-hoc perf testing in the C++ test files.
added 2 commits
May 24, 2022 15:54
csarofeen
approved these changes
May 25, 2022
|
|
||
| auto tv0_cache = tv0->cacheAfter(); | ||
|
|
||
| const int vec = 2; |
Owner
There was a problem hiding this comment.
Out of curiosity what does the heuristics select if it's not grouped? Can the heuristics/scheduler run after grouping?
Collaborator
Author
There was a problem hiding this comment.
scheduler_params output:
===== Reduction Stats ========
total_reduction_numel: 99
total_iteration_numel: 999
vectorize_factor: 1
n_tensor_inputs: 1
max_input_dtype_size: 4
block(16, 16, 1)
===== Reduction Parameters ========
Red On Slow Dim
Iteration Domain: blockIdx.x / threadIdx.x / multiple reductions per block /
Inner Reduction Domain: cross block - threadIdx.y / unroll / factor 4
Launch Parameters: BlockDim.x = 16, BlockDim.y = 16, BlockDim.z = -1, GridDim.x = -1, GridDim.y = -1, GridDim.z = -1, Smem Size = 0
====================================
===== Reduction Stats ========
total_reduction_numel: 99
total_iteration_numel: 999
vectorize_factor: 1
n_tensor_inputs: 1
max_input_dtype_size: 4
block(16, 16, 1)
===== Reduction Parameters ========
Red On Slow Dim
Iteration Domain: blockIdx.x / threadIdx.x / multiple reductions per block /
Inner Reduction Domain: cross block - threadIdx.y / unroll / factor 4
Launch Parameters: BlockDim.x = 16, BlockDim.y = 16, BlockDim.z = -1, GridDim.x = -1, GridDim.y = -1, GridDim.z = -1, Smem Size = 0
====================================
Grouping before scheduling currently fails as expressions are changed from ReducitonOp to GroupedReductionOp. Just making it work would be likely just some mechanical changes.
jjsjann123
added a commit
that referenced
this pull request
Jun 22, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Bug fixes and minor refactor Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` 4c60e7d Add examples infrastructure for using nvFuser in a standalone program (#1725) 02a05d9 Fix issue #1751 (#1753) 8a69aa3 Refactor NvFuser transpose API to match eager mode behavior (#1746) ffdf6b7 Remove BroadcastWithoutStride. (#1738) 02bab16 Fix flipping of a boolean flag (#1745) 465d668 cleanup (#1744) 26d354e fixing noncontig broadcast (#1742) 856b6b2 Add IterDomainBuilder (#1736) 1fd974f fixing warning for gcc7 (#1732) de2740a disabling complex in python tests for #1730 (#1733) fbbbe0a fixing MSVC build (#1728) b5feee5 Fix the fused reduction runtime kernel (#1729) 5247682 Re-entrant GroupedGridReduction (#1727) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: pytorch#79147 Approved by: https://github.com/davidberard98
jjsjann123
added a commit
that referenced
this pull request
Jun 22, 2022
…h#79406) Landing reverted PR pytorch#79147. Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Bug fixes and minor refactor Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` 4c60e7d Add examples infrastructure for using nvFuser in a standalone program (#1725) 02a05d9 Fix issue #1751 (#1753) 8a69aa3 Refactor NvFuser transpose API to match eager mode behavior (#1746) ffdf6b7 Remove BroadcastWithoutStride. (#1738) 02bab16 Fix flipping of a boolean flag (#1745) 465d668 cleanup (#1744) 26d354e fixing noncontig broadcast (#1742) 856b6b2 Add IterDomainBuilder (#1736) 1fd974f fixing warning for gcc7 (#1732) de2740a disabling complex in python tests for #1730 (#1733) fbbbe0a fixing MSVC build (#1728) b5feee5 Fix the fused reduction runtime kernel (#1729) 5247682 Re-entrant GroupedGridReduction (#1727) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: pytorch#79406 Approved by: https://github.com/davidberard98
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Enable re-entrance with
GroupedGridReduction. Mostly just copied the logic already implemented forGridReductiontoGroupedGridReduction.See
FusionGroupedReductionChannelsLastBatchNormLike. The two grid reductions with vectorized iteration domains are grouped as: