Conversation
jjsjann123
left a comment
There was a problem hiding this comment.
Quick question on TensorView constructor.
jjsjann123
left a comment
There was a problem hiding this comment.
A few comments here and there.
| int no_broadcast_i = 0; | ||
| for (const auto i : c10::irange(ids.size())) { | ||
| if (!ids.at(i)->isBroadcast()) { | ||
| full2nob_map.at(i) = no_broadcast_i++; |
There was a problem hiding this comment.
nitpick: std::vector (size_t) gives zero init values? I'm a little uncomfortable with us leaving them here, which might accidentally map that to axis-0. Can we default init everything to -1
There was a problem hiding this comment.
Actually, the way we are accessing these, I think an unordered_map makes more sense, in terms of readability... It'll be easier to catch bug in accessing the mapping.
There was a problem hiding this comment.
I agree that unordered_map provide better readability and error checking, but isn't unordered_map much slower compared to vector? I changed the code to initialize the vector with std::numeric_limits<size_t>::max(), so if an unexpected item is accessed, it will almost always lead to an out-of-bound error or a segfault.
|
It seems that many of the changes are because the contiguity vector now only holds flags only for non-broadcast domains. I wonder if it could be simpler if we kept the contiguity vector to have flags of all domains and just change the definition of the flag. IIRC, if the flag is true, it means the stride of the domain can be calculated as the stride of the next inner domain multiplied by the extent of the inner domain. If we change the definition of the next inner domain to the next non-broadcast inner domain, I think we should be able to have the same benefit of this PR without doing Just my two cents. |
I agree that keeping the flag for broadcasting could still have the benefit of ignoring broadcast in its definition. But I don't think it would make this diff easier. The only save is a few |
|
Hahaha, I expected this answer:
|
| if ((*it)->isBroadcast()) { | ||
| if (inner_most_id == nullptr) { | ||
| inner_most_id = *it; | ||
| } |
There was a problem hiding this comment.
It seems this function ignored broadcast domains if there's any other non-broadcast domains. If the tensor only has broadcast domains, it would have returned the innermost broadcast domain. I don't remember why we're doing this.
There was a problem hiding this comment.
I am not sure either. But this would only happen on all-broadcasting tensors, and this function is only used in vectorization and transpose detection. Loosely speaking, if a tensor is all-broadcasting, it should be a no-op in vectorization/transpose analysis, so I would speculate this change is safe.
|
Looks like my issues are all resolved. I'm leaving it to @naoyam to stamp on this one. |
| from nvfuser import compute_contiguity | ||
|
|
||
| return compute_contiguity(shape, strides) | ||
| return tuple(compute_contiguity(shape, strides)) |
There was a problem hiding this comment.
Looks like I need to change this. @jjsjann123
There was a problem hiding this comment.
Should I upstream this?
@naoyam Indeed, instead of making contiguity storing redundant unused boolean, we can still make contiguity have the same size as rfactor domain by making contiguity a |
In csarofeen#2517 the return value of `compute_contiguity` is changed from tuple to list. This PR handles that change. Pull Request resolved: #96218 Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
Sounds good to me. |
|
Briefly brought up this conversation with the frontend team. There's some opinion on how we should expose contiguity flag on the frontend. pointing @kevinstephano here for visibility. |
|
The change on python frontend is linked above in #2561 |
In csarofeen/pytorch#2517 the return value of `compute_contiguity` is changed from tuple to list. This PR handles that change. Pull Request resolved: pytorch/pytorch#96218 Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
In csarofeen/pytorch#2517 the return value of `compute_contiguity` is changed from tuple to list. This PR handles that change. Pull Request resolved: pytorch/pytorch#96218 Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
In csarofeen#2517 the return value of `compute_contiguity` is changed from tuple to list. This PR handles that change. Pull Request resolved: pytorch#96218 Approved by: https://github.com/jjsjann123, https://github.com/davidberard98
Currently, contiguity had the same size as the rfactor domain of a tensor. And in this PR, I am changing it to the size of
TensorDomain::noBroadcasts(rfactor_dom). With this change, "the contiguity of a broadcast/expand dimension" is no longer meaningful by definition. Andcontiguity[i]istrueif and only if it is memory dense with the next non-broadcasting dimension.For example, if I have a tensor
torch.zeros(4, 1, 3).expand(-1, 10, -1), before this change, the contiguity of this tensor will be(false, false, true), and after the change, it will be(true, true).The reason for doing this change is, we are more interested in whether a non-broadcasting dimension is memory dense with its next non-broadcasting dimension. In the example above, we are interested in whether 4 and 3 is memory dense. We are not interested in if 10 and 3 are memory dense, because by definition they are trivially not. In this example, we want to vectorize 4, however, the current contiguity design is blocking us from doing so.
Currently, our definition about the contiguity of the broadcasting dimensions and the dimension before a broadcasting dimension is vague and not well formalized. For example, if I have shape
(4, 1, 6), stride(4*999999, 999999, 1), then on the one hand, our system will calculate its contiguity as(true, false, true), however, on the other hand, our indexing will collapse the index of dim 0 with dim 2 because it ignoring broadcasts (this is the root cause of #2169). I will not consider this an indexing bug. Instead, I consider this as an issue of ambiguity in the definition of contiguity. And my design change is an effort to remove this ambiguity.See also: #2169, #2049
Fixes #2169