fixing noncontig broadcast#1742
Conversation
| TORCH_INTERNAL_ASSERT( | ||
| stride == cur_contig_stride || (still_rightmost && stride == 1) || | ||
| (!still_rightmost && stride % word_size == 0), | ||
| (!still_rightmost && (stride % word_size == 0 || size == 1)), |
There was a problem hiding this comment.
I think it's safe to relax it even further as
size == 1 || stride == cur_contig_stride || (still_rightmost && stride == 1) ||
(!still_rightmost && stride % word_size == 0),
naoyam
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the fix.
@csarofeen removed automatic conversion of size-one IterDoamins to broadcast IterDomains (https://github.com/csarofeen/pytorch/pull/1738/files#diff-28d672948f070384f5d8c15ec70180f5ac1b7632d713af96dec1634fbdbb89ceL38-L49) in #1738 (which is not yet merged). Does it conflict with this PR? I guess tv1 won't get broadcast domains?
|
The only point I had in that change was that the TensorView contructor shouldn't be responsible for changing the iter domain. We should also always be using TensorViewBuilder which will convert const size==1 dims. |
|
Sounds like we are good with this changes. Local tests also passed. I'm merging this one |
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Bug fixes and minor refactor Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` 4c60e7d Add examples infrastructure for using nvFuser in a standalone program (#1725) 02a05d9 Fix issue #1751 (#1753) 8a69aa3 Refactor NvFuser transpose API to match eager mode behavior (#1746) ffdf6b7 Remove BroadcastWithoutStride. (#1738) 02bab16 Fix flipping of a boolean flag (#1745) 465d668 cleanup (#1744) 26d354e fixing noncontig broadcast (#1742) 856b6b2 Add IterDomainBuilder (#1736) 1fd974f fixing warning for gcc7 (#1732) de2740a disabling complex in python tests for #1730 (#1733) fbbbe0a fixing MSVC build (#1728) b5feee5 Fix the fused reduction runtime kernel (#1729) 5247682 Re-entrant GroupedGridReduction (#1727) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: pytorch#79147 Approved by: https://github.com/davidberard98
…h#79406) Landing reverted PR pytorch#79147. Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/ Bug fixes and minor refactor Squashed commits to WAR github API Commits that's actually in this PR from the devel branch: ``` 4c60e7d Add examples infrastructure for using nvFuser in a standalone program (#1725) 02a05d9 Fix issue #1751 (#1753) 8a69aa3 Refactor NvFuser transpose API to match eager mode behavior (#1746) ffdf6b7 Remove BroadcastWithoutStride. (#1738) 02bab16 Fix flipping of a boolean flag (#1745) 465d668 cleanup (#1744) 26d354e fixing noncontig broadcast (#1742) 856b6b2 Add IterDomainBuilder (#1736) 1fd974f fixing warning for gcc7 (#1732) de2740a disabling complex in python tests for #1730 (#1733) fbbbe0a fixing MSVC build (#1728) b5feee5 Fix the fused reduction runtime kernel (#1729) 5247682 Re-entrant GroupedGridReduction (#1727) ``` RUN_TORCHBENCH: nvfuser Pull Request resolved: pytorch#79406 Approved by: https://github.com/davidberard98
Fixes #1741
Shamelessly stole Kevin's cpp repro 😈