Add type promotion support in sparse tensors#26051
Conversation
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
|
@ezyang do you think this would cause any problems with your multiple dispatch patches? I'm thinking the cases of mixed sparse/strided matrices. |
| auto promotedType = promoteTypes(t.scalar_type(), src.scalar_type()); | ||
| TORCH_CHECK(canCast(promotedType, r_.scalar_type()), "Can't convert result type ", promotedType, " to output ", r_.scalar_type()); | ||
|
|
||
| if (s_values_.scalar_type() != promotedType) { |
There was a problem hiding this comment.
err..what happens if you did all of this with s instead of s_values? Would it copy the indices tensor?
There was a problem hiding this comment.
I think it would yeah. I think it'd also be a bit messier converting src directly since below still needs the values/indices separately, and src is a reference, so there'd be an additional temporary variable that isn't really needed.
|
Yes, this will conflict. I've been trying to land multiple dispatch since all day yesterday. I can give advice where code will have moved afterwards, but all in all it should be pretty straightforward. |
[ghstack-poisoned]
|
@ezyang: just to clarify, you are suggesting we get your patch in first, correct? |
|
Yes, that would be preferred! |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
|
@pytorchbot retest this please |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Differential Revision: [D17556195](https://our.internmc.facebook.com/intern/diff/D17556195) [ghstack-poisoned]
ghstack-source-id: 229bc9a Pull Request resolved: pytorch#26051
Differential Revision: [D17556195](https://our.internmc.facebook.com/intern/diff/D17556195) [ghstack-poisoned]
| TORCH_CHECK(!r.is_cuda(), "add: expected 'out' to be CPU tensor, but got CUDA tensor"); | ||
| TORCH_CHECK(!src.is_cuda(), "add: expected 'other' to be a CPU tensor, but got a CUDA tensor"); | ||
|
|
||
| auto commonDtype = promoteTypes(t.scalar_type(), src.scalar_type()); |
There was a problem hiding this comment.
This seems wrong because t, src could be zero-dim tensors:
i = torch.zeros([0, 1])
v = torch.tensor(3.14, dtype=torch.float32)
sparse_scalar = torch.sparse_coo_tensor(i, v, [])
Are we saying that sparse zero-dim scalars belong in a different category for type promotion?
There was a problem hiding this comment.
ah I didn't realize it was possible to construct a zero-dim sparse tensor.
There was a problem hiding this comment.
It is, but on the other hand, I'm not sure who does
There was a problem hiding this comment.
actually, just tried this, we don't support adding dimensioned tensors to zero-dim tensors. Current implementation of sparse tensors requires t.dims == src.dims() so as long as the dims match promoteTypes will be correct anyway. We should probably expand sparse capabilities but that'll be another issue.
There was a problem hiding this comment.
Cool, then there should be an assert here before the promoteTypes call so it's clear that we're assuming the inputs are in the same category
| get_sparse_impl(r)->set_indices_and_values_unsafe(r_indices, r_values); | ||
|
|
||
| if (commonDtype != r_values.scalar_type()) { | ||
| r_values = r_values.to(commonDtype); |
There was a problem hiding this comment.
We can speed this up by allocating r_values = torch.empty_like(r_values). We don't actually care about the data that is initially in the result tensor.
| ); | ||
|
|
||
| if (r.scalar_type() != commonDtype) { | ||
| r_values = r_values.to(r.scalar_type()); |
There was a problem hiding this comment.
Instead of doing a .to, we can save memory here by doing a copy from r_values directly into the values tensor instead of creating a new tensor here and assigning it.
There was a problem hiding this comment.
I guess the values tensor of the result might not be large enough to hold all of that...
There was a problem hiding this comment.
I think for this one we may still need the to, otherwise we would have to create a new empty tensor first to copy values into anyway, and I assume that's just basically the implementation of to?
r_values here is the promoted version (e.g. float += double, r_values is a double tensor, but we'd need to set r._values to a yet-uncreated float tensor containing those values.
There was a problem hiding this comment.
r is the result tensor. It's possible that r already has data and r._values() and r._indices() are tensors with some non-trivial storage. If that is the case, instead of discarding values and indices in favor of the new values, we could copy r_values directly into r._values(), assuming there is enough memory in r._values().
| }); | ||
| if (r.scalar_type() != commonDtype) { | ||
| r.copy_(promoted_result); | ||
| } |
There was a problem hiding this comment.
The implementation is not the nicest; it looks like to do type promotion, the current implementation adds the logic to each possible pathway. Instead of doing that, we could imagine writing a wrapper function around the implementation that promotes the input sparse tensors to the correct dtype, calls the implementation (that operates on tensors of the same dtype), and then copy the result back to the result tensor.
There was a problem hiding this comment.
yeah I'd like to clean this up eventually, but in doing so I think I'd want to extend support for more combinations of tensors too. Right now arithmetic ops on sparse tensors are pretty limited and didn't want to try to make it generic without actually making the availability of sparse ops more generic.
There was a problem hiding this comment.
I don't see why "adding more support for sparse ops" and "making the implementation more generic" have to go together. Right now, the implementation for add_sparse_out_cpu is hard to review because there are two code paths that need to be considered, one for where values are contiguous, and one for when they're not.
It would make it easier to review (and also make the implementation look better) if we did something like the following (pseudocode)
add_out_sparse_cpu(result, tensor, other) {
// do type promotion
common_dtype = result_type(tensor, other)
result_tmp = empty_like(result, common_dtype)
tensor_tmp = tensor.to(common_dtype)
other_tmp = other.to(common_dtype)
// call the base op
add_out_sparse_cpu_no_type_promotion(result_tmp, other_tmp, tensor_tmp)
// shove the result back into its original type.
result.set_inplace(result_tmp.indices(), result_tmp.values().to(result.dtype), result_tmp.sizes())
}
There was a problem hiding this comment.
In general putting the type promotion code in helper functions or a wrapper function like this would help because the body of add_out_sparse_cpu as well as the other functions are really, really long
| self.assertTrue(actual, expected) | ||
| self.assertTrue(actual.dtype == torch.bool) | ||
|
|
||
| def test_sparse(self): |
There was a problem hiding this comment.
If I'm reading the code correctly, for add and mul, type promotion is implemented for the following:
- add_out_sparse (cpu + cuda)
- add_out_sparse_dense_sparse (cpu only)
- mul_out_sparse (cpu + cuda)
Why is there a discrepancy in what is supported in add_out_sparse_dense_sparse?
| self.assertEqual((s + s.to(torch.double)).to_dense(), t + t) | ||
| self.assertEqual((s + s.to(torch.double)).dtype, torch.double) | ||
|
|
||
| inplace = s.to(torch.float, copy=True) |
There was a problem hiding this comment.
There are a lot of test cases here. I imagine when we add type promotion to more sparse ops, we will not want to just append LOC to this test case (test_sparse). Can we split this up into semantically meaningful tests for different ops and different overloads?
|
FYI: this was fake merged due to a bug in ghstack |
Stack from ghstack:
Differential Revision: D17556195