Conversation
| if mixup_or_cutmix: | ||
| batch_transform = transforms.Compose( | ||
| [ | ||
| WrapIntoFeatures(), |
There was a problem hiding this comment.
Don't we have to WrapIntoFeatures unconditionally from whether we use mixup/cutmix?
There was a problem hiding this comment.
The problem is that default_collate does not respect tensor subclasses. Since we use this transform afterwards, we need to wrap here. Of course we can also wrap before, but it is not necessary since the input is a plain tensor that defaults to images and an integer which is completely ignored.
There was a problem hiding this comment.
My question is, what if I don't use mixup or cutmix? Shouldn't we wrap the data into features.Image anyway? I might be missing something here. My main point is that since we are testing the new API, we should probably wrap all inputs using their appropriate types and see how the new kernels behave (Rather than relying on their default/legacy pure tensor implementations).
There was a problem hiding this comment.
We can only wrap after we have converted from PIL. This happens fairly late in the transform:
I remember @vfdev-5 noting that on the CPU PIL kernels are faster (I don't remember if there was a special case or other constraints; please fill the blanks). Thus, if we want to optimize for speed, we should probably leave it as is. No strong opinion though.
|
I've added the needed changes for the detection references. |
|
I've run the references for a few iterations with the following parameters to confirm they work:
|
Tensor Backend + antialias=TrueReverifying the new API after the speed optimizations. Reference runs at #6433 (comment) and #6433 (comment) ClassificationAugmentation: ta_wide + random erasing + mixup + cutmixUsing githash 959af2d: Result: Similar accuracy, 11% faster than unoptimized V2. Augmentation: aa + random erasingUsing githash 959af2d: Result: Similar accuracy (improvement not statistically significant), 13% faster than unoptimized V2. DetectionAugmentation: multiscaleUsing githash 959af2d: Result: Similar accuracy and speed. Augmentation: ssdliteUsing githash 959af2d: Result: Similar accuracy, 8% faster than unoptimized V2. Augmentation: ssdUsing githash 959af2d: Result: Similar accuracy, 4% faster than unoptimized V2. Augmentation: lsj + copypasteUsing githash 959af2d: Result: Similar accuracy and speed. SegmentationUsing githash 959af2d: Result: Similar accuracy and speed. VideoNew RecipeUsing githash 959af2d: Result: Similar accuracy, the speed is faster. It's a bit harder to estimate improvement because the logs indicate IO slowdown caused by OnTap during at least 3 epochs. The new version is consistently 6-7 minutes per epoch faster than the old, which translates to roughly 7-8% improvement. |
…ences/classification
I don't want to merge this PR. This more like a feature branch that we can discuss on. For the actual port we can either cleanup this PR or use it as a starting point for another.