-
Notifications
You must be signed in to change notification settings - Fork 27.4k
c10::scalar_to_tensor(...) uses should be audited for performance and type promotion impact #49758
Copy link
Copy link
Open
Labels
high prioritymodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
See, for example:
pytorch/aten/src/ATen/native/Pow.cpp
Line 53 in 272f4db
| native::pow_out(result, c10::scalar_to_tensor(base, exp.device()), exp); |
There are several other cases, too, and the pattern is, in general, an antipattern. This may affect type promotion and (see below) impacts performance.
cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @ngimel @VitalyFedyunin
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
high prioritymodule: cudaRelated to torch.cuda, and CUDA support in generalRelated to torch.cuda, and CUDA support in generalmodule: performanceIssues related to performance, either of kernel code or framework glueIssues related to performance, either of kernel code or framework gluetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module