Skip to content

CUDA error: an illegal memory access was encountered when using output_padding in nn.ConvTranspose3d #32866

@flysofast

Description

@flysofast

I'm building a simple Encoder-Decoder architecture for video with 3D Convolutions and TransposedConv. The aim is make it fully convolutional so it would work with any video size.
The Encoder looks like this:

    padding = int((kernel_size - 1)/2) #kernel_size = 5
    self.network = nn.Sequential(    
       nn.Conv3d(3, 128, kernel_size=kernel_size, stride=1, padding=padding),
       nn.BatchNorm3d(128, affine=False),
       nn.ReLU(True),

       nn.Conv3d(128, 64, kernel_size=kernel_size, stride=2, padding=padding),
       nn.BatchNorm3d(64, affine=False),
       nn.ReLU(True),

       nn.Conv3d(64, 64, kernel_size=kernel_size, stride=2, padding=padding),
       nn.BatchNorm3d(64, affine=False),
       nn.ReLU(True),

       nn.Conv3d(64, 24, kernel_size=kernel_size, stride=1, padding=padding),
       nn.BatchNorm3d(output_channels, affine=False),
       nn.ReLU(True)
    )

And here is the Decoder:

    padding = int((kernel_size - 1)/2) #kernel_size = 5
    self.network = nn.Sequential( 
       nn.ConvTranspose3d(self.input_channels, 64, kernel_size=kernel_size, stride=1, padding=padding),
       nn.BatchNorm3d(64, affine=False),
       nn.ReLU(True),
    
       nn.ConvTranspose3d(64, 64, kernel_size=kernel_size, stride=2, padding=padding, output_padding=1),
       nn.BatchNorm3d(64, affine=False),
       nn.ReLU(True),

       nn.ConvTranspose3d(64, 128, kernel_size=kernel_size, stride=2, padding=padding, output_padding=1),
       nn.BatchNorm3d(128, affine=False),
       nn.ReLU(True),

       nn.Conv3d(128, self.output_channels, kernel_size=kernel_size, stride=1, padding=padding),
    )

I trained the model with input size of (NxCxHxW) = (16x3x112x112). The outputs of each layer looks like this:

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv3d-1    [-1, 128, 16, 112, 112]          48,128
       BatchNorm3d-2    [-1, 128, 16, 112, 112]               0
              ReLU-3    [-1, 128, 16, 112, 112]               0
            Conv3d-4        [-1, 64, 8, 56, 56]       1,024,064
       BatchNorm3d-5        [-1, 64, 8, 56, 56]               0
              ReLU-6        [-1, 64, 8, 56, 56]               0
            Conv3d-7        [-1, 64, 4, 28, 28]         512,064
       BatchNorm3d-8        [-1, 64, 4, 28, 28]               0
              ReLU-9        [-1, 64, 4, 28, 28]               0
           Conv3d-10        [-1, 24, 4, 28, 28]         192,024
      BatchNorm3d-11        [-1, 24, 4, 28, 28]               0
             ReLU-12        [-1, 24, 4, 28, 28]               0
        Encoder3D-13        [-1, 24, 4, 28, 28]               0
   
  ConvTranspose3d-16        [-1, 64, 4, 28, 28]         192,064
      BatchNorm3d-17        [-1, 64, 4, 28, 28]               0
             ReLU-18        [-1, 64, 4, 28, 28]               0
  ConvTranspose3d-19        [-1, 64, 8, 56, 56]         512,064
      BatchNorm3d-20        [-1, 64, 8, 56, 56]               0
             ReLU-21        [-1, 64, 8, 56, 56]               0
  ConvTranspose3d-22    [-1, 128, 16, 112, 112]       1,024,128
      BatchNorm3d-23    [-1, 128, 16, 112, 112]               0
             ReLU-24    [-1, 128, 16, 112, 112]               0
           Conv3d-25      [-1, 3, 16, 112, 112]          48,003
        Decoder3D-26      [-1, 3, 16, 112, 112]               0
================================================================

After the the model got trained, I tested in on a UCF-101 video sequence of size (Frames x Width x Height) = (16 x 342 x 256) with os.environ["CUDA_LAUNCH_BLOCKING"] = "1" and got this error:

Traceback (most recent call last):
  File "reconstruct.py", line 80, in <module>
    torchsummary.summary(model, dataloader.dataset[0].shape)
  File "/home/namle/anaconda3/envs/condapy3/lib/python3.7/site-packages/torchsummary/torchsummary.py", line 72, in summary
    model(*x)
  File "/home/namle/anaconda3/envs/condapy3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/namle/VCM/E2E/Model/model.py", line 57, in forward
    x_hat = self.decoder(y)
  File "/home/namle/anaconda3/envs/condapy3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/namle/VCM/E2E/AutoEncoder/model3d.py", line 84, in forward
    x = self.network(x)
  File "/home/namle/anaconda3/envs/condapy3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/namle/anaconda3/envs/condapy3/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/namle/anaconda3/envs/condapy3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/namle/anaconda3/envs/condapy3/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 921, in forward
    output_padding, self.groups, self.dilation)
RuntimeError: CUDA error: an illegal memory access was encountered

However if I ran the code on CPU and not using CUDA it will output a frame sequence of size (344 x 256) which is 2 pixels more in its width.
Another way to make to code run is at inference time change the output_padding in both TransposedConv layers in the Decoder to 0. Then I would get this sequence of size (341 x 253)

I hope this has something to do with the arithmetic as in this paper https://arxiv.org/pdf/1603.07285.pdf, rather than a bug. I'd appreciate if you can point me into the right direction, so that my model can take in any video size and reconstruct it to the same size automatically.

cc @ezyang @gchanan @zou3519 @ngimel

Metadata

Metadata

Assignees

Labels

high prioritymodule: convolutionProblems related to convolutions (THNN, THCUNN, CuDNN)module: crashProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: cudaRelated to torch.cuda, and CUDA support in generalmodule: error checkingBugs related to incorrect/lacking error checkingtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions