🚀 Feature
Allow users to return multiple values in the CPU kernel.
Motivation
In the NumPy-like functionality rollup list #38349, there are some functions that require returning two tensors, such as divmod and frexp.
They are more complicated and less straightforward to be implemented compared to other NumPy-like functions, as the current CPU elementwise kernels like cpu_kernel, cpu_kernel_vec and cpu_serial_kernel only support one output tensor.
Adding a new kernel function in aten/src/ATen/native/cpu/Loops.h that supports multiple outputs could decrease the complexity of implementing such NumPy-like functions, and may help developers implement torch functions with multiple outputs more conveniently in the future.
Pitch
PR: #51097
Implement a new kernel function cpu_kernel_multiple_outputs. Instead of a scalar type output, it requires developer return output values using std::tuple.
Example code:
auto iter = at::TensorIteratorConfig()
.add_output(out1)
.add_output(out2)
.add_input(in1)
.add_input(in2)
.build();
at::native::cpu_kernel_multiple_outputs(iter,
[=](float a, float b) -> std::tuple<float, float> {
float add = a + b;
float mul = a * b;
return std::tuple<float, float>(add, mul);
}
);
The out1 tensor will equal to torch.add(in1, in2), while the out2 will equal to torch.mul(in1, in2).
Alternatives
Instead of leveraging CPU kernel functions, developers have to use a more primitive for_each TensorIterator function.
This requires developers to manually handle logics like data type casting, offset calculations via strides and etc.
Additional context
cc @mruberry @heitorschueroff
🚀 Feature
Allow users to return multiple values in the CPU kernel.
Motivation
In the NumPy-like functionality rollup list #38349, there are some functions that require returning two tensors, such as divmod and frexp.
They are more complicated and less straightforward to be implemented compared to other NumPy-like functions, as the current CPU elementwise kernels like
cpu_kernel,cpu_kernel_vecandcpu_serial_kernelonly support one output tensor.Adding a new kernel function in
aten/src/ATen/native/cpu/Loops.hthat supports multiple outputs could decrease the complexity of implementing such NumPy-like functions, and may help developers implement torch functions with multiple outputs more conveniently in the future.Pitch
PR: #51097
Implement a new kernel function
cpu_kernel_multiple_outputs. Instead of ascalartype output, it requires developer return output values usingstd::tuple.Example code:
The
out1tensor will equal totorch.add(in1, in2), while theout2will equal totorch.mul(in1, in2).Alternatives
Instead of leveraging CPU kernel functions, developers have to use a more primitive
for_eachTensorIterator function.This requires developers to manually handle logics like data type casting, offset calculations via strides and etc.
Additional context
gpu_kernel_multiple_outputsthrough PR Implementgpu_kernel_multiple_outputs#37969.cc @mruberry @heitorschueroff