Add PermuteDimensions and TransposeDimensions transforms#6800
Add PermuteDimensions and TransposeDimensions transforms#6800datumbox merged 8 commits intopytorch:mainfrom
Conversation
pmeier
left a comment
There was a problem hiding this comment.
I agree we need this functionality somehow, but I'm not sure these general transformations are the way to go. They have very generic names and one could reasonable deduce that PermuteDimensions also works for features.BoundingBox'es or the like. Worse, one can even put features.BoundingBox in the permutation dict just for it to be silently ignored.
Basically these transforms suffer from the same fate as ConvertDtype from #6783. We basically need a name for "image or video" that can identify these transforms better. I'm going to paddle one idea that @vfdev-5 started in #6783 (comment) and I already mentioned in our offline conversation: right now we use the term "input" (or inpt) to be exact to mean Union[features._Feature, _is_simple_tensor, PIL.Image.Image]. However, "input" would be a good name for Union[features.Image, features.Video, _is_simple_tensor, PIL.Image.Image] and could clearly separate from "target" or "annotation" like Union[features.Label, features.Mask, features.BoundingBox]. We could change what we currently call inpt to feature. That would free the term "input" and we could use that for ConvertInputDtype, PermuteInputDimensions, ... Of course torch.Tensor's and PIL.Image.Image's are not features._Feature's but they have the same purpose. We could even for feature_like as a name to avoid confusion. WDYT?
Regarding the way forward: do we actually need both transforms or is one of them sufficient? Given that our references currently only need to swap two dimensions
wouldn't TransposeDimensions be sufficient here? Although it can only swap two dimensions at a time, it has one major upside: you don't have to know the whole shape. For example image.transpose(-1, -2) is valid for arbitrary batch sizes while I need to know the number of dimensions if I use image.permute(...) although I don't need to know.
|
@pmeier Let's take the naming discussion out side of this PR because these are transforms needed ASAP to reach parity of functionality for internal use-cases (ClassyVision). I confirm we need I'll move the |
Part of #6768
This PR adds the
PermuteDimensionsandTransposeDimensionstransforms which are useful for video training which is often needed to move the temporal dimension around.Because the end tensors can have their dimensions completely messed, meta-data information from Images and Videos are not retrained and instead a simple tensor is assumed.
cc @vfdev-5 @bjuncek @pmeier