Skip to content

Did Dropout2d change for Pytorch 1.11? #77081

@albertfgu

Description

@albertfgu

🐛 Describe the bug

After having issues reproducing results, my collaborators and I ran small models on many different environments, GPUs, and package versions, and found that there is a definite performance change when using nn.Dropout2d on torch=1.11 compared to torch=1.10. For example, for one of our small models with 100K parameters and dropout=0.1, the version on torch 1.11 is around 3% worse than the version on torch 1.10, and this gap begins within the first few epochs and persists throughout training for 100 epochs. No other hyperparameters in this model seem to cause performance differences between the torch versions. The performance differences are consistent across several different environments (e.g. Google Cloud and our local cluster) and GPUs (T4, P100, A100).

I don't have time to create a minimal reproducible example right now, but can do so later if that is helpful. For now I just want to understand the difference in implementation: what changed for Dropout2d between 1.10 and 1.11?

Versions

Collecting environment information...
PyTorch version: 1.11.0
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 16.04.6 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Clang version: 3.8.0-2ubuntu4 (tags/RELEASE_380/final)
CMake version: version 3.21.0
Libc version: glibc-2.23

Python version: 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-4.4.0-130-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB
Nvidia driver version: 430.26
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] pytorch-fast-transformers==0.4.0
[pip3] pytorch-lightning==1.5.0
[pip3] torch==1.11.0
[pip3] torchaudio==0.11.0
[pip3] torchmetrics==0.6.0
[pip3] torchtext==0.11.0
[pip3] torchvision==0.12.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.2.89 hfd86e86_1
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2021.4.0 h06a4308_640
[conda] mkl-service 2.4.0 py38h7f8727e_0
[conda] mkl_fft 1.3.1 py38hd3c417c_0
[conda] mkl_random 1.2.2 py38h51133e4_0
[conda] numpy 1.20.3 pypi_0 pypi
[conda] numpy-base 1.21.5 py38hf524024_2
[conda] pytorch 1.11.0 py3.8_cuda10.2_cudnn7.6.5_0 pytorch
[conda] pytorch-fast-transformers 0.4.0 pypi_0 pypi
[conda] pytorch-lightning 1.5.0 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch
[conda] torch 1.10.0 pypi_0 pypi
[conda] torchaudio 0.11.0 py38_cu102 pytorch
[conda] torchmetrics 0.6.0 pypi_0 pypi
[conda] torchtext 0.11.0 pypi_0 pypi
[conda] torchvision 0.12.0 py38_cu102 pytorch

cc @ezyang @gchanan @zou3519 @ngimel @pbelevich @mruberry @kurtamohler

Metadata

Metadata

Assignees

Labels

high prioritymodule: cudaRelated to torch.cuda, and CUDA support in generalmodule: randomRelated to random number generation in PyTorch (rng generator)module: regressionIt used to work, and now it doesn'ttriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions