Skip to content

canUse32BitIndexMath not working properly in Conv2D layer #80020

@francescocarzaniga

Description

@francescocarzaniga

🐛 Describe the bug

This small preamble serves to explain the apparently big batch size my data has and why I want to increase it further.
I have a series of data containing different channels (think images for simplicity), which I am passing through a relatively small CNN. Since I want the same convolutional filters to be applied to each channel, I reshape each batch to have a single channel and larger batch size. This means a small batch with dimensionality of e.g. [16, 100, 200, 200] becomes [1600, 1, 200, 200].

As the dimensions increase, a Conv2D layer still works completely fine with the former but errors out on the latter with
RuntimeError: Expected canUse32BitIndexMath(input) && canUse32BitIndexMath(output) to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

The concrete dimensions that give me this problem are [50000, 8, 1, 3840] (output of an intermediate layer) with a Conv2D layer of shape Conv2d(8, 16, kernel_size=(1, 1), stride=(1, 1), groups=8).

Now the code of canUse32BitIndexMath can be found in ATen/native/IndexingUtils.cpp, and simply checks whether the number of elements or an "offset" is greater than the size of int32_t. Naturally this is the case neither for the matrix above nor for the eventual Conv2D output.

I believe this is a bug, if it is not please advise how to solve. Please do not tell me to use smaller batch sizes.

Versions

PyTorch version: 1.10.0
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Red Hat Enterprise Linux release 8.5 (Ootpa) (x86_64)
GCC version: (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4)
Clang version: 12.0.1 (Red Hat 12.0.1-4.module+el8.5.0+13246+cefb5d4c)
CMake version: version 3.20.2
Libc version: glibc-2.28

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.18.0-348.23.1.el8_5.x86_64-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: NVIDIA A100-SXM4-40GB
Nvidia driver version: 510.39.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy==0.910
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.2
[pip3] numpydoc==1.1.0
[pip3] torch==1.10.0
[pip3] torch-tb-profiler==0.3.1
[pip3] torchaudio==0.10.0
[pip3] torchvision==0.11.1
[conda] Could not collect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions