-
Notifications
You must be signed in to change notification settings - Fork 27.7k
conv1d, conv2d, etc. causing segmentation fault on torch 1.8.0 #53565
Copy link
Copy link
Closed
Labels
high prioritymodule: convolutionProblems related to convolutions (THNN, THCUNN, CuDNN)Problems related to convolutions (THNN, THCUNN, CuDNN)module: crashProblem manifests as a hard crash, as opposed to a RuntimeErrorProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: mkldnnRelated to Intel IDEEP or oneDNN (a.k.a. mkldnn) integrationRelated to Intel IDEEP or oneDNN (a.k.a. mkldnn) integrationmodule: openmpRelated to OpenMP (omp) support in PyTorchRelated to OpenMP (omp) support in PyTorchmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'ttriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Milestone
Metadata
Metadata
Assignees
Labels
high prioritymodule: convolutionProblems related to convolutions (THNN, THCUNN, CuDNN)Problems related to convolutions (THNN, THCUNN, CuDNN)module: crashProblem manifests as a hard crash, as opposed to a RuntimeErrorProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: mkldnnRelated to Intel IDEEP or oneDNN (a.k.a. mkldnn) integrationRelated to Intel IDEEP or oneDNN (a.k.a. mkldnn) integrationmodule: openmpRelated to OpenMP (omp) support in PyTorchRelated to OpenMP (omp) support in PyTorchmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'ttriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
🐛 Bug
When using a DataLoader with one or more worker subprocesses, calling
F.conv1d()in the__getitem__()function of the Dataset instance can cause a segmentation fault ifF.conv1d()is also called in the__init__()function.To Reproduce
The bug should be reproducible with the following MWE:
For the segmentation fault to be thrown, note that
F.conv1d()* must be called (directly or indirectly) inMyDataset.__init__().xmust be long enough.*Replacing conv1d with conv2d also produces the bug.
Expected behavior
The above code should not result in a segmentation fault. This did not occur in torch 1.7.
Environment
Additional context
You may be wondering why
F.conv1d()would even be called in the__init__()function. In the original code, I have aCachedDatasetclass that caches the dataset elements of another Dataset instance to system memory on-the-fly. In order to allocate the correct amount of system memory, I take a look at one of the dataset elements. If, by retrieving the dataset element,F.conv1d()(or similar) is called, a segmentation fault will later occur.cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @anjali411 @gujinghui @PenghuiCheng @XiaobingSuper @jianyuh @VitalyFedyunin