🐛 Describe the bug
Thanks @jjsjann123 for reporting.
TEST_F(NVFuserTest, TMP) {
auto fusion = std::make_unique<Fusion>();
FusionGuard fg(fusion.get());
auto tv0 = makeSymbolicTensor(3);
fusion->addInput(tv0);
auto tv1 = sum(tv0, {0});
auto tv2 = neg(tv1);
fusion->addOutput(tv2);
fusion->print();
}
The above code gives
%kernel {
T1_l[ rS3{i0}, iS4{i2}, iS5{i3} ]
= reduction( T0_g[ iS0{i0}, iS1{i2}, iS2{i3} ], op = add, initial value = double(0), allreduce = 0 )
T2_g[ iS6{i2}, iS7{i3} ]
= -T1_l[ rS3{i0}, iS4{i2}, iS5{i3} ];
TransformPrinter :
T0_g[ iS0{i0}, iS1{i2}, iS2{i3} ]
root domain : (iS0{i0},iS1{i2},iS2{i3})
T1_l[ rS3{i0}, iS4{i2}, iS5{i3} ]
root domain : (rS3{i0},iS4{i2},iS5{i3})
T2_g[ iS6{i2}, iS7{i3} ]
root domain : (iS6{i2},iS7{i3})
}
in gitlab-master.nvidia.com:5005/dl/pytorch/update-scripts:jit-cuda11-latest
and it gives
%kernel {
T1_l[ rS3{i1}, iS4{i2}, iS5{i3} ]
= reduction( T0_g[ iS0{i1}, iS1{i2}, iS2{i3} ], op = add, initial value = double(0), allreduce = 0 )
T2_g[ iS6{i2}, iS7{i3} ]
= -T1_l[ rS3{i1}, iS4{i2}, iS5{i3} ];
TransformPrinter :
T0_g[ iS0{i1}, iS1{i2}, iS2{i3} ]
root domain : (iS0{i1},iS1{i2},iS2{i3})
T1_l[ rS3{i1}, iS4{i2}, iS5{i3} ]
root domain : (rS3{i1},iS4{i2},iS5{i3})
T2_g[ iS6{i2}, iS7{i3} ]
root domain : (iS6{i2},iS7{i3})
}
on my local machine.
Pay attention to the iS0{i1} vs iS0{i0}.
Both environments are using TOT of the devel branch.
Although I don't see any real issue with having different variable names, but it sounds to me that, if this behavior depends on the environment, then we must be programming UB somewhere.
Versions
My local environment:
Collecting environment information...
PyTorch version: 1.13.0a0+gita054b3e
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Arch Linux (x86_64)
GCC version: (GCC) 12.1.0
Clang version: 13.0.1
CMake version: version 3.23.2
Libc version: glibc-2.35
Python version: 3.10.5 (main, Jun 6 2022, 18:49:26) [GCC 12.1.0] (64-bit runtime)
Python platform: Linux-5.18.5-arch1-1-x86_64-with-glibc2.35
Is CUDA available: True
CUDA runtime version: 11.7.64
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3090
GPU 1: NVIDIA GeForce RTX 2080 Ti
Nvidia driver version: 515.48.07
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.8.4.0
/usr/lib/libcudnn_adv_infer.so.8.4.0
/usr/lib/libcudnn_adv_train.so.8.4.0
/usr/lib/libcudnn_cnn_infer.so.8.4.0
/usr/lib/libcudnn_cnn_train.so.8.4.0
/usr/lib/libcudnn_ops_infer.so.8.4.0
/usr/lib/libcudnn_ops_train.so.8.4.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.22.3
[pip3] torch==1.13.0a0+gitc270fd4
[pip3] torch-ucc==1.0.0
[pip3] torchani==2.2
[pip3] torchvision==0.2.2.post3
[conda] Could not collect
Note that
GCC version: (GCC) 12.1.0
Clang version: 13.0.1
are both very new, and they might not have been heavily tested by the world yet. So not sure if this could be the source of the problem.
🐛 Describe the bug
Thanks @jjsjann123 for reporting.
The above code gives
in gitlab-master.nvidia.com:5005/dl/pytorch/update-scripts:jit-cuda11-latest
and it gives
on my local machine.
Pay attention to the
iS0{i1}vsiS0{i0}.Both environments are using TOT of the devel branch.
Although I don't see any real issue with having different variable names, but it sounds to me that, if this behavior depends on the environment, then we must be programming UB somewhere.
Versions
My local environment:
Note that
are both very new, and they might not have been heavily tested by the world yet. So not sure if this could be the source of the problem.