-
Notifications
You must be signed in to change notification settings - Fork 27.7k
CUDA irfft may be doing unnecessary cloning of input #38413
Copy link
Copy link
Closed
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: fftmodule: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Metadata
Metadata
Assignees
Labels
module: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: fftmodule: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Context:
CuFFTPlanCache.hpytorch/aten/src/ATen/native/cuda/CuFFTPlanCache.h
Lines 177 to 192 in 899a075
We should figure out why the previous check doesn't detect all the cases, whether it is a bug in our check or in cuFFT. I don't have access to a T4 so I write this issue to document the situation in case anyone wants to take a look.
cc @ngimel @mruberry @peterbell10 @VitalyFedyunin