-
Notifications
You must be signed in to change notification settings - Fork 27.4k
Bad performance with python threads #37259
Copy link
Copy link
Closed
Labels
has workaroundhigh prioritymodule: multithreadingRelated to issues that occur when running on multiple CPU threadsRelated to issues that occur when running on multiple CPU threadsmodule: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Bug
concurrent.futures.ThreadPoolExecutor with max_workers=1 loads all CPU cores (24 cores / 100% load) when torch.zeros(64, 8, 2, 128) is used inside it's thread. If torch.zeros used with smaller size, like (16, 8, 2, 128), then there is no performance hit. Also this sometimes happens with torch.logsumexp, tensor indexing, tensor copying.
Edit: multiprocessing.dummy.Pool is also affected
To Reproduce
Run this code:
import torch
torch.set_num_threads(1)
torch.set_num_interop_threads(1)
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=1)
def train_async(data):
with torch.no_grad():
torch.zeros(64, 8, 2, 128)
while True:
executor.submit(train_async, []).result()
Expected behavior
Only single CPU core being utilized.
Environment
PyTorch version: 1.5.0
Is debug build: No
CUDA used to build PyTorch: 10.1
OS: Њ ©Єа®б®дв Windows 10 Pro
GCC version: Could not collect
CMake version: Could not collect
Python version: 3.7
Is CUDA available: Yes
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 442.19
cuDNN version: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin\cudnn64_7.dll
Versions of relevant libraries:
[pip] numpy==1.17.4
[pip] numpydoc==0.9.1
[pip] ppo-pytorch==0.1
[pip] torch==1.5.0
[pip] torchvision==0.6.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 10.1.243 h74a9793_0
[conda] mkl 2019.4 245
[conda] mkl-service 2.3.0 py37hb782905_0
[conda] mkl_fft 1.0.15 py37h14836fe_0
[conda] mkl_random 1.1.0 py37h675688f_0
[conda] numpy 1.17.4 py37h4320e6b_0
[conda] numpy-base 1.17.4 py37hc3f5095_0
[conda] numpydoc 0.9.1 py_0
[conda] ppo-pytorch 0.1 dev_0 <develop>
[conda] pytorch 1.5.0 py3.7_cuda101_cudnn7_0 pytorch
[conda] pytorch-qrnn 0.2.1 pypi_0 pypi
[conda] torchvision 0.6.0 py37_cu101 pytorch
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
has workaroundhigh prioritymodule: multithreadingRelated to issues that occur when running on multiple CPU threadsRelated to issues that occur when running on multiple CPU threadsmodule: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module