Skip to content

inital port of ConcatDataset#1902

Merged
soumith merged 12 commits intopytorch:masterfrom
esc:esc/feature/port_concat_dataset
Jun 27, 2017
Merged

inital port of ConcatDataset#1902
soumith merged 12 commits intopytorch:masterfrom
esc:esc/feature/port_concat_dataset

Conversation

@esc
Copy link
Contributor

@esc esc commented Jun 25, 2017

Starting to work on #1759
Can someone give me feedback before I continue to invest time, please.

a) Is this the right location for this class
b) Where should I add tests? test_dataloader.py?
c) Is it still correct to remove the numpy dependency?

Copy link
Member

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put the tests in test_dataloader or test_utils

return self.cum_sizes[-1]

def __getitem__(self, idx):
super(ConcatDataset, self).__getitem__(idx)

This comment was marked as off-topic.


def __getitem__(self, idx):
super(ConcatDataset, self).__getitem__(idx)
dataset_index = self.searchsorted(idx, self.cum_sizes)

This comment was marked as off-topic.

This comment was marked as off-topic.

else:
dataset_idx = idx - self.cum_sizes[dataset_index - 1]

return self.datasets[dataset_index][dataset_idx]

This comment was marked as off-topic.

"""

@staticmethod
def cumsum(sequence):

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@esc esc changed the title inital (untested) port of ConcatDataset inital port of ConcatDataset Jun 26, 2017
@esc
Copy link
Contributor Author

esc commented Jun 26, 2017

Tests are done. I also found itertools.accumulate which was quite nice.

@colesbury
Copy link
Member

@pytorchbot test this please

@esc
Copy link
Contributor Author

esc commented Jun 26, 2017

@colesbury how is pytorchbot implemented?

@pytorchbot
Copy link
Collaborator

i am implemented via jenkins' plugin github-pull-request-builder. More details here: https://github.com/pytorch/builder

@esc
Copy link
Contributor Author

esc commented Jun 26, 2017

@pytorchbot did you trigger the travis-ci build too?

@pytorchbot
Copy link
Collaborator

no, travis-ci is triggered by travis directly. I just trigger GPU builds via pytorch's jenkins server https://build.pytorch.org

@esc
Copy link
Contributor Author

esc commented Jun 26, 2017

Yeah, thought so.

@esc
Copy link
Contributor Author

esc commented Jun 26, 2017

Have to go to bed now, will fix if still broken tomorrow.

@esc
Copy link
Contributor Author

esc commented Jun 26, 2017

Travis glitched...

@esc
Copy link
Contributor Author

esc commented Jun 26, 2017

I don't have perms to restart the build, pushing an empty commit instead.

@esc
Copy link
Contributor Author

esc commented Jun 27, 2017

@pytorchbot test this please

@esc
Copy link
Contributor Author

esc commented Jun 27, 2017

Since travis is green

@esc
Copy link
Contributor Author

esc commented Jun 27, 2017

@colesbury I guess @pytorchbot won't execute my commands, right?

@colesbury
Copy link
Member

@pytorchbot test this please

@soumith soumith merged commit d592e18 into pytorch:master Jun 27, 2017
@esc
Copy link
Contributor Author

esc commented Jun 27, 2017

Sweet!

@esc esc deleted the esc/feature/port_concat_dataset branch June 27, 2017 16:34
@esc
Copy link
Contributor Author

esc commented Jun 27, 2017

@fmassa anything else you want me to port from tnt?

IvanYashchuk pushed a commit to IvanYashchuk/pytorch that referenced this pull request Aug 12, 2022
pytorchmergebot pushed a commit that referenced this pull request Sep 8, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Sep 8, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 8, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 8, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 9, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 9, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 9, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 9, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 9, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange,
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

ghstack-source-id: d0d88cf
Pull Request resolved: #84626
jjsjann123 added a commit that referenced this pull request Sep 19, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 19, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 19, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 19, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 19, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange,
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

ghstack-source-id: 0443a2a
Pull Request resolved: #84626
jjsjann123 added a commit that referenced this pull request Sep 21, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 21, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange, 
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)

[ghstack-poisoned]
jjsjann123 added a commit that referenced this pull request Sep 21, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange,
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

ghstack-source-id: 34c0b92
Pull Request resolved: #84626
pytorchmergebot pushed a commit that referenced this pull request Sep 23, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange,
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)
Pull Request resolved: #84626
Approved by: https://github.com/malfet
mehtanirav pushed a commit that referenced this pull request Oct 4, 2022
Syncing nvfuser devel branch to upstream master. https://github.com/csarofeen/pytorch/

Codegen changes include:

- codegen improvement:
i. improved view support on pointwise and transpose scheduler
ii. grouped grid welford added for better outer-norm grid persistence in normalization

- misc:
i. new composite ops added: variance_mean , arange,
ii. fixes misaligned address for transpose scheduler
iii. refactor on separation of compilation API from execution API to prepare us for async compilation
iv. double type support on expression evaluator
v. PYTORCH_NVFUSER_DUMP refactor to save PTX and CUBIN

Commits that's in this PR from the devel branch:
```
89330aa Tensor factories must set the output shape as its input (#1939)
b2fd01e arange support (#1933)
56c00fd Double support on all expression evaluators (#1937)
371f282 Improve trivial reduction merge support (#1931)
1d0c267 Test `rand` in a fusion with zero tensor input (#1932)
0dab160 Fix softmax bwd sizes. (#1890)
ef98f36 Fix a bug (#1936)
63132a0 Propagate permissive mapping information into indexing pass (#1929)
b4ac2c8 Map IterationDomains through view operations. (#1919)
c0a187a do not use deprecated functions (#1935)
88de85e Upstream cherry pick fixes 0811 (#1934)
b247dcf Separate kernel compilation API from kernel execution API (#1914)
b34e3b9 Fix `ir_utils::hasBlockSync` + misc fixes in transpose scheduler (#1924)
14a53e6 Nullary RNGOp (#1892)
3c3c89e Misc fixes/tuning for transpose scheduler (#1912)
20cf109 Grouped grid welford (#1921)
6cf7eb0 Transpose scheduler small dim sizes better support (#1910)
9341ea9 Disabled ViewPersistentShmoo sizes that results in NAN (#1922)
057237f Fix CUDA driver error: misaligned address for transpose scheduler  (#1918)
3fb3d80 Add variance_mean function using Welford (#1907)
98febf6 Remove DisableOption::UnrollWithRng (#1913)
ee8ef33 Minor fix for the debug interface of using PTX directly (#1917)
6e8f953 Add PYTORCH_NVFUSER_DUMP options to save PTX and CUBIN (#1916)
5eefa9a dopt is only available since nvrtc 11.7 (#1915)
2ec8fc7 Kill computeAtBetween (#1911)
d0d106a Improve view support on pointwise and transpose scheduler (#1906)
e71e1ec Fix name clash of RNG with shared memory (#1904)
3381793 Fix mutator and sameAs for expanded IterDomain (#1902)
```

RUN_TORCHBENCH: nvfuser

Differential Revision: [D39324552](https://our.internmc.facebook.com/intern/diff/D39324552)
Pull Request resolved: #84626
Approved by: https://github.com/malfet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants