🐛 Describe the bug
This small preamble serves to explain the apparently big batch size my data has and why I want to increase it further.
I have a series of data containing different channels (think images for simplicity), which I am passing through a relatively small CNN. Since I want the same convolutional filters to be applied to each channel, I reshape each batch to have a single channel and larger batch size. This means a small batch with dimensionality of e.g. [16, 100, 200, 200] becomes [1600, 1, 200, 200].
As the dimensions increase, a Conv2D layer still works completely fine with the former but errors out on the latter with
RuntimeError: Expected canUse32BitIndexMath(input) && canUse32BitIndexMath(output) to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
The concrete dimensions that give me this problem are [50000, 8, 1, 3840] (output of an intermediate layer) with a Conv2D layer of shape Conv2d(8, 16, kernel_size=(1, 1), stride=(1, 1), groups=8).
Now the code of canUse32BitIndexMath can be found in ATen/native/IndexingUtils.cpp, and simply checks whether the number of elements or an "offset" is greater than the size of int32_t. Naturally this is the case neither for the matrix above nor for the eventual Conv2D output.
I believe this is a bug, if it is not please advise how to solve. Please do not tell me to use smaller batch sizes.
Versions
PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Red Hat Enterprise Linux release 8.5 (Ootpa) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
Clang version: 12.0.1 (Red Hat 12.0.1-4.module+el8.5.0+13246+cefb5d4c)
CMake version: version 3.20.2
Libc version: glibc-2.28
Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.18.0-348.23.1.el8_5.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB
Nvidia driver version: 510.39.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy==0.910
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] numpydoc==1.1.0
[pip3] torch==1.10.0
[pip3] torch-tb-profiler==0.3.1
[pip3] torchaudio==0.10.0
[pip3] torchvision==0.11.1
[conda] Could not collect
🐛 Describe the bug
This small preamble serves to explain the apparently big batch size my data has and why I want to increase it further.
I have a series of data containing different channels (think images for simplicity), which I am passing through a relatively small CNN. Since I want the same convolutional filters to be applied to each channel, I reshape each batch to have a single channel and larger batch size. This means a small batch with dimensionality of e.g.
[16, 100, 200, 200]becomes[1600, 1, 200, 200].As the dimensions increase, a Conv2D layer still works completely fine with the former but errors out on the latter with
RuntimeError: Expected canUse32BitIndexMath(input) && canUse32BitIndexMath(output) to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)The concrete dimensions that give me this problem are
[50000, 8, 1, 3840](output of an intermediate layer) with a Conv2D layer of shapeConv2d(8, 16, kernel_size=(1, 1), stride=(1, 1), groups=8).Now the code of
canUse32BitIndexMathcan be found inATen/native/IndexingUtils.cpp, and simply checks whether the number of elements or an "offset" is greater than the size ofint32_t. Naturally this is the case neither for the matrix above nor for the eventual Conv2D output.I believe this is a bug, if it is not please advise how to solve. Please do not tell me to use smaller batch sizes.
Versions
PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A
OS: Red Hat Enterprise Linux release 8.5 (Ootpa) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
Clang version: 12.0.1 (Red Hat 12.0.1-4.module+el8.5.0+13246+cefb5d4c)
CMake version: version 3.20.2
Libc version: glibc-2.28
Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.18.0-348.23.1.el8_5.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB
Nvidia driver version: 510.39.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy==0.910
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] numpydoc==1.1.0
[pip3] torch==1.10.0
[pip3] torch-tb-profiler==0.3.1
[pip3] torchaudio==0.10.0
[pip3] torchvision==0.11.1
[conda] Could not collect