canUse32BitIndexMath not working properly in Conv2D layer

### 🐛 Describe the bug

_This small preamble serves to explain the apparently big batch size my data has and why I want to increase it further._
I have a series of data containing different channels (think images for simplicity), which I am passing through a relatively small CNN. Since I want the same convolutional filters to be applied to each channel, I reshape each batch to have a single channel and larger batch size. This means a small batch with dimensionality of e.g. `[16, 100, 200, 200]` becomes `[1600, 1, 200, 200]`.

As the dimensions increase, a Conv2D layer still works completely fine with the former but errors out on the latter with 
``RuntimeError: Expected canUse32BitIndexMath(input) && canUse32BitIndexMath(output) to be true, but got false.  (Could this error message be improved?  If so, please report an enhancement request to PyTorch.)
``

The concrete dimensions that give me this problem are `[50000, 8, 1, 3840]` (output of an intermediate layer) with a Conv2D layer of shape `Conv2d(8, 16, kernel_size=(1, 1), stride=(1, 1), groups=8)`. 

Now the code of `canUse32BitIndexMath` can be found in `ATen/native/IndexingUtils.cpp`, and simply checks whether the number of elements or an "offset" is greater than the size of `int32_t`. Naturally this is the case neither for the matrix above nor for the eventual Conv2D output.

I believe this is a bug, if it is not please advise how to solve. **Please do not tell me to use smaller batch sizes.**


### Versions

PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Red Hat Enterprise Linux release 8.5 (Ootpa) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
Clang version: 12.0.1 (Red Hat 12.0.1-4.module+el8.5.0+13246+cefb5d4c)
CMake version: version 3.20.2
Libc version: glibc-2.28

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.18.0-348.23.1.el8_5.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB
Nvidia driver version: 510.39.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy==0.910
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] numpydoc==1.1.0
[pip3] torch==1.10.0
[pip3] torch-tb-profiler==0.3.1
[pip3] torchaudio==0.10.0
[pip3] torchvision==0.11.1
[conda] Could not collect

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

canUse32BitIndexMath not working properly in Conv2D layer #80020

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

canUse32BitIndexMath not working properly in Conv2D layer #80020

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions