Skip to content

Bad performance with python threads #37259

@SSS135

Description

@SSS135

🐛 Bug

concurrent.futures.ThreadPoolExecutor with max_workers=1 loads all CPU cores (24 cores / 100% load) when torch.zeros(64, 8, 2, 128) is used inside it's thread. If torch.zeros used with smaller size, like (16, 8, 2, 128), then there is no performance hit. Also this sometimes happens with torch.logsumexp, tensor indexing, tensor copying.
Edit: multiprocessing.dummy.Pool is also affected

To Reproduce

Run this code:

import torch
torch.set_num_threads(1)
torch.set_num_interop_threads(1)
from concurrent.futures import ThreadPoolExecutor

executor = ThreadPoolExecutor(max_workers=1)

def train_async(data):
    with torch.no_grad():
        torch.zeros(64, 8, 2, 128)
            
while True:
    executor.submit(train_async, []).result()

Expected behavior

Only single CPU core being utilized.

Environment

PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.1

OS: Њ ©Єа®б®дв Windows 10 Pro
GCC version: Could not collect
CMake version: Could not collect

Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 442.19
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudnn64_7.dll

Versions of relevant libraries:
[pip] numpy==1.17.4
[pip] numpydoc==0.9.1
[pip] ppo-pytorch==0.1
[pip] torch==1.5.0
[pip] torchvision==0.6.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.1.243             h74a9793_0
[conda] mkl                       2019.4                      245
[conda] mkl-service               2.3.0            py37hb782905_0
[conda] mkl_fft                   1.0.15           py37h14836fe_0
[conda] mkl_random                1.1.0            py37h675688f_0
[conda] numpy                     1.17.4           py37h4320e6b_0
[conda] numpy-base                1.17.4           py37hc3f5095_0
[conda] numpydoc                  0.9.1                      py_0
[conda] ppo-pytorch               0.1                       dev_0    <develop>
[conda] pytorch                   1.5.0           py3.7_cuda101_cudnn7_0    pytorch
[conda] pytorch-qrnn              0.2.1                    pypi_0    pypi
[conda] torchvision               0.6.0                py37_cu101    pytorch

cc @ezyang @gchanan @zou3519 @VitalyFedyunin @ngimel

Metadata

Metadata

Assignees

Labels

has workaroundhigh prioritymodule: multithreadingRelated to issues that occur when running on multiple CPU threadsmodule: performanceIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions