❓ Questions and Help
Broadcast is not allowed on output TensorView. There's a proper check to detect that which spits out [output_tv] cannot be registered as an output as it has a broadcast axis
Current concern on this:
The code snippet below would trigger a TORCH_CHECK to fail.
TensoView *t1 = makeDummyTensor(1);
TensorView *t2 = broadcast(t1, {false, true});
fusion.addInput(t1);
fusion.addOutput(t2);
// ...
If we explicitly mark t2 to be broadcasted on dimension 1, we are assuming its corresponding stride to be 0 (because broadcasted elements map to the same physical memory location).
However, as t2 is an output tensor its stride will be an input to the kernel (the generated kernel will be something like below):
void kernel(Tensor<float, 1> T1, Tensor<float, 2> T2) {
// ...
}
Hence marking the I/O TensorView as broadcasting violates the contract that I/O Tensors are provided at runtime. We can't generate a safe kernel that would behave as the user of the generated code would expect.
If later we found out use cases that broadcasting on output could save memory bandwidth of generated kernel, we could revisit this topic.
❓ Questions and Help
Broadcast is not allowed on output TensorView. There's a proper check to detect that which spits out
[output_tv] cannot be registered as an output as it has a broadcast axisCurrent concern on this:
The code snippet below would trigger a
TORCH_CHECKto fail.If we explicitly mark
t2to be broadcasted on dimension 1, we are assuming its corresponding stride to be 0 (because broadcasted elements map to the same physical memory location).However, as
t2is an output tensor its stride will be an input to the kernel (the generated kernel will be something like below):Hence marking the I/O TensorView as broadcasting violates the contract that I/O Tensors are provided at runtime. We can't generate a safe kernel that would behave as the user of the generated code would expect.
If later we found out use cases that broadcasting on output could save memory bandwidth of generated kernel, we could revisit this topic.