-
Notifications
You must be signed in to change notification settings - Fork 27.6k
Bad performance with python threads #37259
Copy link
Copy link
Closed
Labels
has workaroundhigh prioritymodule: multithreadingRelated to issues that occur when running on multiple CPU threadsRelated to issues that occur when running on multiple CPU threadsmodule: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Metadata
Metadata
Assignees
Labels
has workaroundhigh prioritymodule: multithreadingRelated to issues that occur when running on multiple CPU threadsRelated to issues that occur when running on multiple CPU threadsmodule: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
🐛 Bug
concurrent.futures.ThreadPoolExecutor with max_workers=1 loads all CPU cores (24 cores / 100% load) when torch.zeros(64, 8, 2, 128) is used inside it's thread. If torch.zeros used with smaller size, like (16, 8, 2, 128), then there is no performance hit. Also this sometimes happens with torch.logsumexp, tensor indexing, tensor copying.
Edit: multiprocessing.dummy.Pool is also affected
To Reproduce
Run this code:
Expected behavior
Only single CPU core being utilized.
Environment
cc @ezyang @gchanan @zou3519 @VitalyFedyunin @ngimel