Skip to content

Core dump when checking that basic CNN works (Python 3.9) #47776

@seemethere

Description

@seemethere

🐛 Bug

Currently experiencing a core dump when running smoke tests for CNN on python 3.9:

Nov 11 09:07:54 + python /builder/test_example_code/cnn_smoke.py
Nov 11 09:07:54 Checking that basic CNN works
Nov 11 09:08:00 tensor([0.0817], device='cuda:0', grad_fn=<ViewBackward>)
Nov 11 09:08:00 terminate called without an active exception
Nov 11 09:08:00 terminate called recursively
Nov 11 09:08:01 /builder/check_binary.sh: line 374:   453 Aborted                 (core dumped) python ${TEST_CODE_DIR}/cnn_smoke.py

Link to CircleCI logs: https://app.circleci.com/pipelines/github/pytorch/pytorch/237660/workflows/5021f68c-07d3-4fcb-8a2a-c1b14b87d9e2/jobs/8843626

To Reproduce

#!/usr/bin/env bash
docker run --rm -it --gpus all pytorch/manylinux-cuda101 bash -s <<EOF
export PATH=/opt/python/cp39-cp39/bin/:${PATH}
pip install https://8842687-65600975-gh.circle-artifacts.com/0/final_pkgs/torch-1.8.0.dev20201111%2Bcu101-cp39-cp39-linux_x86_64.whl
git clone https://github.com/pytorch/builder
pushd builder
python test_example_code/cnn_smoke.py
EOF

Expected behavior

Smoke tests don't fail

Environment

Please copy and paste the output from our
environment collection script
(or fill out the checklist below manually).

You can get the script and run it with:

wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
# For security purposes, please check the contents of collect_env.py before running it.
python collect_env.py
PyTorch version: 1.8.0.dev20201111+cu101
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: CentOS Linux 7 (Core) (x86_64)
GCC version: (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5)
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.9 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 418.116.00
cuDNN version: /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudnn.so.7.6.3
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] torch==1.8.0.dev20201111+cu101
[conda] Could not collect

Additional context

This issue didn't present itself when doing my initial testing for Python 3.9 here: #47238

cc @ezyang @gchanan @zou3519 @bdhirsh @albanD @gqchen @pearu @nikitaved

Metadata

Metadata

Assignees

Labels

high prioritymodule: autogradRelated to torch.autograd, and the autograd engine in generalmodule: crashProblem manifests as a hard crash, as opposed to a RuntimeErrormodule: pybindRelated to our Python bindings / interactions with other Python librariestriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions