Skip to content

conv1d, conv2d, etc. causing segmentation fault on torch 1.8.0 #53565

@tqbl

Description

@tqbl

🐛 Bug

When using a DataLoader with one or more worker subprocesses, calling F.conv1d() in the __getitem__() function of the Dataset instance can cause a segmentation fault if F.conv1d() is also called in the __init__() function.

To Reproduce

The bug should be reproducible with the following MWE:

import torch
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader


class MyDataset(Dataset):
    def __init__(self):
        self[0]  # The important thing is that conv1d is called here

    def __getitem__(self, index):
        x = torch.Tensor(1, 1, 24000)  # Needs to be long enough
        x = F.conv1d(x, torch.ones(1, 1, 2))  # Causes segfault
        return x

    def __len__(self):
        return 1


# num_workers>0 necessary to reproduce error
loader = DataLoader(MyDataset(), num_workers=1)
for x in loader:
    pass

For the segmentation fault to be thrown, note that

  • F.conv1d()* must be called (directly or indirectly) in MyDataset.__init__().
  • The tensor x must be long enough.
  • The size of the kernel must at least be 2.
  • The DataLoader must use one or more worker subprocesses.

*Replacing conv1d with conv2d also produces the bug.

Expected behavior

The above code should not result in a segmentation fault. This did not occur in torch 1.7.

Environment

Collecting environment information...
PyTorch version: 1.8.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.4 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.10.2

Python version: 3.8 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.1
[pip3] torch==1.8.0
[pip3] torchaudio==0.8.0
[conda] Could not collect

Additional context

You may be wondering why F.conv1d() would even be called in the __init__() function. In the original code, I have a CachedDataset class that caches the dataset elements of another Dataset instance to system memory on-the-fly. In order to allocate the correct amount of system memory, I take a look at one of the dataset elements. If, by retrieving the dataset element, F.conv1d() (or similar) is called, a segmentation fault will later occur.

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @VitalyFedyunin

Metadata

Metadata

Assignees

Labels

high prioritymodule: convolutionProblems related to convolutions (THNN, THCUNN, CuDNN)module: crashProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: mkldnnRelated to Intel IDEEP or oneDNN (a.k.a. mkldnn) integrationmodule: openmpRelated to OpenMP (omp) support in PyTorchmodule: regressionIt used to work, and now it doesn'ttriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions