Added antialias flag to interpolate (CUDA, bilinear and bicubic) by vfdev-5 · Pull Request #70930 · pytorch/pytorch

vfdev-5 · 2022-01-06T17:29:54Z

Description:

Added antialias flag to interpolate (CUDA)
- forward and backward for bicubic mode
- added tests

Previous PR for CPU bilinear, #65142
Previous PR for CPU bicubic, #68819

Benchmarks

Bilinear forward pass, PIL, PTH CPU and PTH CUDA

Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112


Torch version: 1.11.0a0+gitd032369
Torch config: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61
  - CuDNN 8.0.5
  - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, 

Num threads: 8
[----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |               2851.2              |            874.1          |            57.1          
      channels_last non-contiguous torch.float32  |               2856.1              |           1155.8          |           130.6          

Times are in microseconds (us).

[----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |               3705.9              |           1005.8          |            66.3          
      channels_last non-contiguous torch.float32  |               3742.9              |           1332.8          |           143.5          

Times are in microseconds (us).

[------------------------------------ Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -----------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |               1768.0              |           725.2           |            77.9          
      channels_last non-contiguous torch.float32  |               1753.7              |           942.5           |           144.0          

Times are in microseconds (us).

[----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ----------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |               9522.6              |           2593.8          |           157.8          
      channels_last non-contiguous torch.float32  |               9513.5              |           3622.7          |           241.5          

Times are in microseconds (us).

[----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ----------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |               2240.1              |           565.5           |            93.3          
      channels_last non-contiguous torch.float32  |               2244.2              |           972.7           |           170.8          

Times are in microseconds (us).

[------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              1441.3             |           386.1           |            22.3          

Times are in microseconds (us).

[------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              1815.2             |           376.8           |            27.8          

Times are in microseconds (us).

[-------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) --------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              962.3              |           400.0           |            29.4          

Times are in microseconds (us).

[------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) -------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              4749.7             |           910.1           |            63.7          

Times are in microseconds (us).

[------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) -------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              1098.1             |           272.0           |            36.4          

Times are in microseconds (us).

Bicubic forward pass, PIL, PTH CPU and PTH CUDA

Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112


Torch version: 1.11.0a0+gitd032369
Torch config: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61
  - CuDNN 8.0.5
  - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, 

Num threads: 8
[------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |               4522.4              |           1406.7          |           170.3          
      channels_last non-contiguous torch.float32  |               4530.0              |           1435.4          |           242.2          

Times are in microseconds (us).

[------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |               5726.4              |           1628.6          |           164.0          
      channels_last non-contiguous torch.float32  |               5722.6              |           1665.6          |           234.7          

Times are in microseconds (us).

[------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) ------------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |               2909.1              |           1461.5          |           276.9          
      channels_last non-contiguous torch.float32  |               2892.9              |           1458.7          |           345.1          

Times are in microseconds (us).

[----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) -----------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |              14699.2              |           4283.9          |           407.1          
      channels_last non-contiguous torch.float32  |              14711.3              |           4321.1          |           477.0          

Times are in microseconds (us).

[----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) -----------------------------------]
                                                  |  Reference, PIL 8.4.0, mode: RGB  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |               3467.0              |           980.0           |           339.2          
      channels_last non-contiguous torch.float32  |               3465.2              |           982.3           |           407.8          

Times are in microseconds (us).

[-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              2396.7             |           877.8           |            68.1          

Times are in microseconds (us).

[-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              3068.2             |           777.3           |            64.7          

Times are in microseconds (us).

[-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              1540.2             |           829.3           |           100.4          

Times are in microseconds (us).

[------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              7919.5             |           1467.8          |           151.6          

Times are in microseconds (us).

[------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------------------]
                                 |  Reference, PIL 8.4.0, mode: F  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ---------------------------------------------------------------------------------------------------------------
       contiguous torch.float32  |              1695.7             |           631.2           |           117.7          

Times are in microseconds (us).

Bilinear backward pass, PTH CPU and PTH CUDA

Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112

- Measure only backward op

Torch version: 1.11.0a0+gitd032369
Torch config: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61
  - CuDNN 8.0.5
  - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, 

Num threads: 8
[------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) ------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |           4686.8          |           215.7          
      channels_last non-contiguous torch.float32  |           5101.1          |           220.5          

Times are in microseconds (us).

[------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) ------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |           6011.2          |           204.4          
      channels_last non-contiguous torch.float32  |           6396.0          |           210.0          

Times are in microseconds (us).

[------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |           2035.6          |           250.2          
      channels_last non-contiguous torch.float32  |           1589.6          |           252.5          

Times are in microseconds (us).

[------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |          11392.5          |           256.5          
      channels_last non-contiguous torch.float32  |          11640.2          |           263.9          

Times are in microseconds (us).

[------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |          11769.6          |           465.9          
      channels_last non-contiguous torch.float32  |          12407.0          |           474.4          

Times are in microseconds (us).

[---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) ----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |           3931.0          |           133.3          

Times are in microseconds (us).

[---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) ----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |           5594.8          |           133.9          

Times are in microseconds (us).

[---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) -----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |           1272.6          |           133.0          

Times are in microseconds (us).

[--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |          10618.1          |           134.0          

Times are in microseconds (us).

[--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |          11082.2          |           154.6          

Times are in microseconds (us).

Bicubic backward pass, PTH CPU and PTH CUDA

Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112

- Measure only backward op

Torch version: 1.11.0a0+gitd032369
Torch config: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - CPU capability usage: AVX2
  - CUDA Runtime 11.1
  - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61
  - CuDNN 8.0.5
  - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, 

Num threads: 8
[------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |           6791.2          |           618.9          
      channels_last non-contiguous torch.float32  |           7125.2          |           622.9          

Times are in microseconds (us).

[------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |           8806.2          |           600.3          
      channels_last non-contiguous torch.float32  |           9167.6          |           607.5          

Times are in microseconds (us).

[-------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |           3683.6          |           693.8          
      channels_last non-contiguous torch.float32  |           3617.4          |           695.0          

Times are in microseconds (us).

[------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |          17548.2          |           779.4          
      channels_last non-contiguous torch.float32  |          17966.2          |           786.5          

Times are in microseconds (us).

[------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------]
                                                  |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: ----------------------------------------------------------------------------------------------
      channels_first contiguous torch.float32     |            28.4           |            1.6           
      channels_last non-contiguous torch.float32  |            28.4           |            1.6           

Times are in milliseconds (ms).

[---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) -----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |           6266.1          |           208.5          

Times are in microseconds (us).

[---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) -----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |           8218.3          |           200.8          

Times are in microseconds (us).

[----- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) -----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |           3458.9          |           231.9          

Times are in microseconds (us).

[---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |          15729.3          |           261.6          

Times are in microseconds (us).

[---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----]
                                 |  1.11.0a0+gitd032369 cpu  |  1.11.0a0+gitd032369 cuda
8 threads: -----------------------------------------------------------------------------
       contiguous torch.float32  |          26279.8          |           547.0          

Times are in microseconds (us).

Code is moved from torchvision: pytorch/vision#4211 and optimized

…d-interp-antialias-cuda

pytorch-probot · 2022-01-06T17:29:57Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/vfdev-5/pytorch/blob/a9113b5118b82b0e5ab1345832af8ca7961850fc/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-binary-conda	`ciflow/binaries`, `ciflow/binaries/conda`, `ciflow/default`	✅ triggered
linux-binary-libtorch-cxx11-abi	`ciflow/binaries`, `ciflow/binaries/libtorch`, `ciflow/default`	✅ triggered
linux-binary-libtorch-pre-cxx11	`ciflow/binaries`, `ciflow/binaries/libtorch`, `ciflow/default`	✅ triggered
linux-binary-manywheel	`ciflow/binaries`, `ciflow/binaries/wheel`, `ciflow/default`	✅ triggered
linux-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/trunk`, `ciflow/xla`	✅ triggered
linux-docs	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/docs`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-vulkan-bionic-py3.7-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-cuda11.3-py3.7-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
linux-xenial-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/trunk`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/trunk`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
docker-builds	`ciflow/all`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`, `ciflow/trunk`	🚫 skipped
linux-docs-push	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
linux-xenial-cuda11.3-py3.7-gcc7-no-ops	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
macos-11-py3-x86-64	`ciflow/all`, `ciflow/macos`, `ciflow/trunk`	🚫 skipped
parallelnative-linux-xenial-py3.7-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped
periodic-libtorch-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-bionic-cuda11.5-py3.7-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.7-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
periodic-win-vs2019-cuda11.5-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-build	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/linux`, `ciflow/trunk`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2022-01-06T17:30:00Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/70930
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit d032369 (more details on the Dr. CI page):

1/1 failures introduced in this PR

1 failure not recognized by patterns:

Job	Step	Action
^{linux-bionic-py3.7-clang9 / test (xla, 1, 1, linux.2xlarge)}	^Unknown	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

…d-interp-antialias-cuda

vfdev-5 · 2022-01-10T13:03:58Z

@jbschlosser this is a follow-up PR adding anti-alias support to interpolate op: a) already merged CPU forward/backward 2d bilinear and bicubic options, b) this PR does CUDA forward/backward 2d bilinear and bicubic options. Could you please review it, thanks.

@ngimel could you please also review this PR, especially cuda kernels, thanks a lot !

jbschlosser · 2022-01-11T20:25:34Z

@vfdev-5 FYI this week I'm a bit busy due to performance evaluation, but I will target the review early next week

ngimel

This looks good! I left minor comments, and I don't know the details of the algorithm, so I trust you on those.

aten/src/ATen/native/cuda/UpSampleBilinear2d.cu

aten/src/ATen/native/cuda/UpSample.cuh

aten/src/ATen/native/cuda/UpSampleBilinear2d.cu

…d-interp-antialias-cuda

jbschlosser

Looks good on my end but test failures are real - will leave final review to @ngimel

aten/src/ATen/native/cuda/UpSample.cuh

IvanYashchuk · 2022-01-20T13:47:40Z

Hey @jeffdaily, we are trying to debug the ROCm failure. This PR introduces a few new CUDA kernels and tests fail on ROCm. We compiled from source using ROCm 4.5 and MI100. We see tests fail with the release build (DEBUG=0), but with DEBUG=1 tests pass. This interesting situation makes it difficult to debug. Do you have any ideas what changes in ROCm build with DEBUG=1?

jeffdaily · 2022-01-20T17:45:39Z

I'm building the PR now on my dev host so we might be able to assist better.

A superficial glance at the PR shows the use of at::cuda::warp_size(), __launch_bounds__(256), and shared memory. These can all lead to subtle bugs for CUDA vs ROCm. The MI series of AMD GPUs have warp size 64. The runtime check at::cuda::warp_size() will be 64 for our CI hosts today, so make sure there aren't any hard-coded assumptions at compile-time that this is 32 (as it is for CUDA).

aten/src/ATen/native/cuda/UpSampleBilinear2d.cu

…-antialias-cuda

aten/src/ATen/native/cuda/UpSampleBilinear2d.cu

ngimel · 2022-01-26T23:27:39Z

Ok looks like tests are passing, @jbschlosser can you land?

facebook-github-bot · 2022-01-27T14:39:21Z

@jbschlosser has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

) Summary: Description: - Added antialias flag to interpolate (CUDA) - forward and backward for bicubic mode - added tests Previous PR for CPU bilinear, #65142 Previous PR for CPU bicubic, #68819 ### Benchmarks <details> <summary> Bilinear forward pass, PIL, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2851.2 | 874.1 | 57.1 channels_last non-contiguous torch.float32 | 2856.1 | 1155.8 | 130.6 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 3705.9 | 1005.8 | 66.3 channels_last non-contiguous torch.float32 | 3742.9 | 1332.8 | 143.5 Times are in microseconds (us). [------------------------------------ Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 1768.0 | 725.2 | 77.9 channels_last non-contiguous torch.float32 | 1753.7 | 942.5 | 144.0 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 9522.6 | 2593.8 | 157.8 channels_last non-contiguous torch.float32 | 9513.5 | 3622.7 | 241.5 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2240.1 | 565.5 | 93.3 channels_last non-contiguous torch.float32 | 2244.2 | 972.7 | 170.8 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1441.3 | 386.1 | 22.3 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1815.2 | 376.8 | 27.8 Times are in microseconds (us). [-------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 962.3 | 400.0 | 29.4 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) -------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 4749.7 | 910.1 | 63.7 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) -------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1098.1 | 272.0 | 36.4 Times are in microseconds (us). ``` </details> <details> <summary> Bicubic forward pass, PIL, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 4522.4 | 1406.7 | 170.3 channels_last non-contiguous torch.float32 | 4530.0 | 1435.4 | 242.2 Times are in microseconds (us). [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 5726.4 | 1628.6 | 164.0 channels_last non-contiguous torch.float32 | 5722.6 | 1665.6 | 234.7 Times are in microseconds (us). [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) ------------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2909.1 | 1461.5 | 276.9 channels_last non-contiguous torch.float32 | 2892.9 | 1458.7 | 345.1 Times are in microseconds (us). [----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 14699.2 | 4283.9 | 407.1 channels_last non-contiguous torch.float32 | 14711.3 | 4321.1 | 477.0 Times are in microseconds (us). [----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 3467.0 | 980.0 | 339.2 channels_last non-contiguous torch.float32 | 3465.2 | 982.3 | 407.8 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 2396.7 | 877.8 | 68.1 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 3068.2 | 777.3 | 64.7 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1540.2 | 829.3 | 100.4 Times are in microseconds (us). [------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 7919.5 | 1467.8 | 151.6 Times are in microseconds (us). [------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1695.7 | 631.2 | 117.7 Times are in microseconds (us). ``` </details> <details> <summary> Bilinear backward pass, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` - Measure only backward op Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 4686.8 | 215.7 channels_last non-contiguous torch.float32 | 5101.1 | 220.5 Times are in microseconds (us). [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 6011.2 | 204.4 channels_last non-contiguous torch.float32 | 6396.0 | 210.0 Times are in microseconds (us). [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2035.6 | 250.2 channels_last non-contiguous torch.float32 | 1589.6 | 252.5 Times are in microseconds (us). [------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 11392.5 | 256.5 channels_last non-contiguous torch.float32 | 11640.2 | 263.9 Times are in microseconds (us). [------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 11769.6 | 465.9 channels_last non-contiguous torch.float32 | 12407.0 | 474.4 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 3931.0 | 133.3 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 5594.8 | 133.9 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 1272.6 | 133.0 Times are in microseconds (us). [--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 10618.1 | 134.0 Times are in microseconds (us). [--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 11082.2 | 154.6 Times are in microseconds (us). ``` </details> <details> <summary> Bicubic backward pass, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` - Measure only backward op Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 6791.2 | 618.9 channels_last non-contiguous torch.float32 | 7125.2 | 622.9 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 8806.2 | 600.3 channels_last non-contiguous torch.float32 | 9167.6 | 607.5 Times are in microseconds (us). [-------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 3683.6 | 693.8 channels_last non-contiguous torch.float32 | 3617.4 | 695.0 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 17548.2 | 779.4 channels_last non-contiguous torch.float32 | 17966.2 | 786.5 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 28.4 | 1.6 channels_last non-contiguous torch.float32 | 28.4 | 1.6 Times are in milliseconds (ms). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 6266.1 | 208.5 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 8218.3 | 200.8 Times are in microseconds (us). [----- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 3458.9 | 231.9 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 15729.3 | 261.6 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 26279.8 | 547.0 Times are in microseconds (us). ``` </details> Code is moved from torchvision: pytorch/vision#4211 and optimized Pull Request resolved: #70930 Reviewed By: zou3519 Differential Revision: D33817902 Pulled By: jbschlosser fbshipit-source-id: d63a620f8972ff36b63841f0bc6c820466f58f69

) Summary: Description: - Added antialias flag to interpolate (CUDA) - forward and backward for bicubic mode - added tests Previous PR for CPU bilinear, #65142 Previous PR for CPU bicubic, #68819 ### Benchmarks <details> <summary> Bilinear forward pass, PIL, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2851.2 | 874.1 | 57.1 channels_last non-contiguous torch.float32 | 2856.1 | 1155.8 | 130.6 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 3705.9 | 1005.8 | 66.3 channels_last non-contiguous torch.float32 | 3742.9 | 1332.8 | 143.5 Times are in microseconds (us). [------------------------------------ Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 1768.0 | 725.2 | 77.9 channels_last non-contiguous torch.float32 | 1753.7 | 942.5 | 144.0 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 9522.6 | 2593.8 | 157.8 channels_last non-contiguous torch.float32 | 9513.5 | 3622.7 | 241.5 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2240.1 | 565.5 | 93.3 channels_last non-contiguous torch.float32 | 2244.2 | 972.7 | 170.8 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1441.3 | 386.1 | 22.3 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1815.2 | 376.8 | 27.8 Times are in microseconds (us). [-------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 962.3 | 400.0 | 29.4 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) -------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 4749.7 | 910.1 | 63.7 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) -------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1098.1 | 272.0 | 36.4 Times are in microseconds (us). ``` </details> <details> <summary> Bicubic forward pass, PIL, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 4522.4 | 1406.7 | 170.3 channels_last non-contiguous torch.float32 | 4530.0 | 1435.4 | 242.2 Times are in microseconds (us). [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 5726.4 | 1628.6 | 164.0 channels_last non-contiguous torch.float32 | 5722.6 | 1665.6 | 234.7 Times are in microseconds (us). [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) ------------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2909.1 | 1461.5 | 276.9 channels_last non-contiguous torch.float32 | 2892.9 | 1458.7 | 345.1 Times are in microseconds (us). [----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 14699.2 | 4283.9 | 407.1 channels_last non-contiguous torch.float32 | 14711.3 | 4321.1 | 477.0 Times are in microseconds (us). [----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 3467.0 | 980.0 | 339.2 channels_last non-contiguous torch.float32 | 3465.2 | 982.3 | 407.8 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 2396.7 | 877.8 | 68.1 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 3068.2 | 777.3 | 64.7 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1540.2 | 829.3 | 100.4 Times are in microseconds (us). [------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 7919.5 | 1467.8 | 151.6 Times are in microseconds (us). [------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1695.7 | 631.2 | 117.7 Times are in microseconds (us). ``` </details> <details> <summary> Bilinear backward pass, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` - Measure only backward op Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 4686.8 | 215.7 channels_last non-contiguous torch.float32 | 5101.1 | 220.5 Times are in microseconds (us). [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 6011.2 | 204.4 channels_last non-contiguous torch.float32 | 6396.0 | 210.0 Times are in microseconds (us). [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2035.6 | 250.2 channels_last non-contiguous torch.float32 | 1589.6 | 252.5 Times are in microseconds (us). [------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 11392.5 | 256.5 channels_last non-contiguous torch.float32 | 11640.2 | 263.9 Times are in microseconds (us). [------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 11769.6 | 465.9 channels_last non-contiguous torch.float32 | 12407.0 | 474.4 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 3931.0 | 133.3 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 5594.8 | 133.9 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 1272.6 | 133.0 Times are in microseconds (us). [--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 10618.1 | 134.0 Times are in microseconds (us). [--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 11082.2 | 154.6 Times are in microseconds (us). ``` </details> <details> <summary> Bicubic backward pass, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` - Measure only backward op Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 6791.2 | 618.9 channels_last non-contiguous torch.float32 | 7125.2 | 622.9 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 8806.2 | 600.3 channels_last non-contiguous torch.float32 | 9167.6 | 607.5 Times are in microseconds (us). [-------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 3683.6 | 693.8 channels_last non-contiguous torch.float32 | 3617.4 | 695.0 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 17548.2 | 779.4 channels_last non-contiguous torch.float32 | 17966.2 | 786.5 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 28.4 | 1.6 channels_last non-contiguous torch.float32 | 28.4 | 1.6 Times are in milliseconds (ms). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 6266.1 | 208.5 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 8218.3 | 200.8 Times are in microseconds (us). [----- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 3458.9 | 231.9 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 15729.3 | 261.6 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 26279.8 | 547.0 Times are in microseconds (us). ``` </details> Code is moved from torchvision: pytorch/vision#4211 and optimized Pull Request resolved: #70930 Reviewed By: zou3519 Differential Revision: D33817902 Pulled By: jbschlosser fbshipit-source-id: d63a620f8972ff36b63841f0bc6c820466f58f69 (cherry picked from commit d358cfd)

…930) Summary: Description: - Added antialias flag to interpolate (CUDA) - forward and backward for bicubic mode - added tests Previous PR for CPU bilinear, pytorch/pytorch#65142 Previous PR for CPU bicubic, pytorch/pytorch#68819 ### Benchmarks <details> <summary> Bilinear forward pass, PIL, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2851.2 | 874.1 | 57.1 channels_last non-contiguous torch.float32 | 2856.1 | 1155.8 | 130.6 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 3705.9 | 1005.8 | 66.3 channels_last non-contiguous torch.float32 | 3742.9 | 1332.8 | 143.5 Times are in microseconds (us). [------------------------------------ Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 1768.0 | 725.2 | 77.9 channels_last non-contiguous torch.float32 | 1753.7 | 942.5 | 144.0 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 9522.6 | 2593.8 | 157.8 channels_last non-contiguous torch.float32 | 9513.5 | 3622.7 | 241.5 Times are in microseconds (us). [----------------------------------- Downsampling (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2240.1 | 565.5 | 93.3 channels_last non-contiguous torch.float32 | 2244.2 | 972.7 | 170.8 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1441.3 | 386.1 | 22.3 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1815.2 | 376.8 | 27.8 Times are in microseconds (us). [-------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 962.3 | 400.0 | 29.4 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) -------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 4749.7 | 910.1 | 63.7 Times are in microseconds (us). [------------------------- Downsampling (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) -------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1098.1 | 272.0 | 36.4 Times are in microseconds (us). ``` </details> <details> <summary> Bicubic forward pass, PIL, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 4522.4 | 1406.7 | 170.3 channels_last non-contiguous torch.float32 | 4530.0 | 1435.4 | 242.2 Times are in microseconds (us). [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 5726.4 | 1628.6 | 164.0 channels_last non-contiguous torch.float32 | 5722.6 | 1665.6 | 234.7 Times are in microseconds (us). [------------------------------------ Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) ------------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2909.1 | 1461.5 | 276.9 channels_last non-contiguous torch.float32 | 2892.9 | 1458.7 | 345.1 Times are in microseconds (us). [----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 14699.2 | 4283.9 | 407.1 channels_last non-contiguous torch.float32 | 14711.3 | 4321.1 | 477.0 Times are in microseconds (us). [----------------------------------- Downsampling (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) -----------------------------------] | Reference, PIL 8.4.0, mode: RGB | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 3467.0 | 980.0 | 339.2 channels_last non-contiguous torch.float32 | 3465.2 | 982.3 | 407.8 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 2396.7 | 877.8 | 68.1 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 3068.2 | 777.3 | 64.7 Times are in microseconds (us). [-------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) ---------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1540.2 | 829.3 | 100.4 Times are in microseconds (us). [------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 7919.5 | 1467.8 | 151.6 Times are in microseconds (us). [------------------------- Downsampling (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) --------------------------] | Reference, PIL 8.4.0, mode: F | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: --------------------------------------------------------------------------------------------------------------- contiguous torch.float32 | 1695.7 | 631.2 | 117.7 Times are in microseconds (us). ``` </details> <details> <summary> Bilinear backward pass, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` - Measure only backward op Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (320, 196) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 4686.8 | 215.7 channels_last non-contiguous torch.float32 | 5101.1 | 220.5 Times are in microseconds (us). [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (460, 220) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 6011.2 | 204.4 channels_last non-contiguous torch.float32 | 6396.0 | 210.0 Times are in microseconds (us). [------------- Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 2035.6 | 250.2 channels_last non-contiguous torch.float32 | 1589.6 | 252.5 Times are in microseconds (us). [------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 11392.5 | 256.5 channels_last non-contiguous torch.float32 | 11640.2 | 263.9 Times are in microseconds (us). [------------ Downsampling backward (bilinear): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 11769.6 | 465.9 channels_last non-contiguous torch.float32 | 12407.0 | 474.4 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (320, 196) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 3931.0 | 133.3 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (460, 220) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 5594.8 | 133.9 Times are in microseconds (us). [---- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 96) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 1272.6 | 133.0 Times are in microseconds (us). [--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 10618.1 | 134.0 Times are in microseconds (us). [--- Downsampling backward (bilinear): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 11082.2 | 154.6 Times are in microseconds (us). ``` </details> <details> <summary> Bicubic backward pass, PTH CPU and PTH CUDA </summary> Code: https://gist.github.com/vfdev-5/b173761a567f2283b3c649c3c0574112 ``` - Measure only backward op Torch version: 1.11.0a0+gitd032369 Torch config: PyTorch built with: - GCC 9.3 - C++ Version: 201402 - OpenMP 201511 (a.k.a. OpenMP 4.5) - CPU capability usage: AVX2 - CUDA Runtime 11.1 - NVCC architecture flags: -gencode;arch=compute_61,code=sm_61 - CuDNN 8.0.5 - Build settings: BUILD_TYPE=Release, CUDA_VERSION=11.1, CUDNN_VERSION=8.0.5, CXX_COMPILER=/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.11.0, USE_CUDA=1, USE_CUDNN=1, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=0, USE_OPENMP=ON, USE_ROCM=OFF, Num threads: 8 [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (320, 196) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 6791.2 | 618.9 channels_last non-contiguous torch.float32 | 7125.2 | 622.9 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (460, 220) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 8806.2 | 600.3 channels_last non-contiguous torch.float32 | 9167.6 | 607.5 Times are in microseconds (us). [-------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 96) -------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 3683.6 | 693.8 channels_last non-contiguous torch.float32 | 3617.4 | 695.0 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (1200, 196) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 17548.2 | 779.4 channels_last non-contiguous torch.float32 | 17966.2 | 786.5 Times are in microseconds (us). [------------- Downsampling backward (bicubic): torch.Size([1, 3, 906, 438]) -> (120, 1200) ------------] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ---------------------------------------------------------------------------------------------- channels_first contiguous torch.float32 | 28.4 | 1.6 channels_last non-contiguous torch.float32 | 28.4 | 1.6 Times are in milliseconds (ms). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (320, 196) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 6266.1 | 208.5 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (460, 220) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 8218.3 | 200.8 Times are in microseconds (us). [----- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 96) -----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 3458.9 | 231.9 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (1200, 196) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 15729.3 | 261.6 Times are in microseconds (us). [---- Downsampling backward (bicubic): torch.Size([1, 1, 906, 438]) -> (120, 1200) ----] | 1.11.0a0+gitd032369 cpu | 1.11.0a0+gitd032369 cuda 8 threads: ----------------------------------------------------------------------------- contiguous torch.float32 | 26279.8 | 547.0 Times are in microseconds (us). ``` </details> Code is moved from torchvision: pytorch/vision#4211 and optimized Pull Request resolved: pytorch/pytorch#70930 Reviewed By: zou3519 Differential Revision: D33817902 Pulled By: jbschlosser fbshipit-source-id: d63a620f8972ff36b63841f0bc6c820466f58f69 (cherry picked from commit d358cfd)

…84599) Description: Following #69318 (comment) adding missing bicubic path for anti-alias flag to C++ frontend. - #70930 - added tests in pytorch/test/cpp/api/functional.cpp Pull Request resolved: #84599 Approved by: https://github.com/kit1980, https://github.com/malfet

vfdev-5 and others added 6 commits December 8, 2021 16:36

WIP interp antialias cuda

dabd4da

Merge branch 'master' of https://github.com/pytorch/pytorch into adde…

80500fa

…d-interp-antialias-cuda

Merge branch 'master' of https://github.com/pytorch/pytorch into adde…

75ab7ad

…d-interp-antialias-cuda

Merge branch 'master' of https://github.com/pytorch/pytorch into adde…

3dbb95a

…d-interp-antialias-cuda

Cuda kernels for interp with AA bilinear and bicubic

ae3581c

Merge branch 'master' of https://github.com/pytorch/pytorch into adde…

872e3cd

…d-interp-antialias-cuda

pytorch-probot bot added the ciflow/default label Jan 6, 2022

facebook-github-bot added the cla signed label Jan 6, 2022

pytorchbot added the open source label Jan 6, 2022

Updated kernels

5c567d1

vfdev-5 force-pushed the added-interp-antialias-cuda branch from 100c8d7 to 5c567d1 Compare January 6, 2022 21:28

vfdev-5 added 3 commits January 10, 2022 09:32

Merge branch 'master' of https://github.com/pytorch/pytorch into adde…

f948df9

…d-interp-antialias-cuda

Removed buffer1 allocation and speed up the kernel

a273791

Allocate buffer2 on shared memory

df11967

vfdev-5 force-pushed the added-interp-antialias-cuda branch from f8fcbf9 to df11967 Compare January 10, 2022 12:51

vfdev-5 marked this pull request as ready for review January 10, 2022 12:59

vfdev-5 requested a review from ezyang as a code owner January 10, 2022 12:59

dagitses requested review from jbschlosser and ngimel January 10, 2022 14:14

dagitses assigned ngimel and jbschlosser Jan 10, 2022

dagitses added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 10, 2022

ngimel approved these changes Jan 19, 2022

View reviewed changes

Merge branch 'master' of https://github.com/pytorch/pytorch into adde…

e3cfbc6

…d-interp-antialias-cuda

jbschlosser reviewed Jan 19, 2022

View reviewed changes

aten/src/ATen/native/cuda/UpSample.cuh Outdated Show resolved Hide resolved

Fixed issue with channels last mem format

a21f268

vfdev-5 commented Jan 20, 2022

View reviewed changes

aten/src/ATen/native/cuda/UpSampleBilinear2d.cu Outdated Show resolved Hide resolved

vfdev-5 added 3 commits January 20, 2022 21:32

Reworked the code to fix the issue with ROCm

a9113b5

Merge branch 'master' of github.com:pytorch/pytorch into added-interp…

a4e9418

…-antialias-cuda

Rewritten code with templates and filter functors

d032369

vfdev-5 commented Jan 24, 2022

View reviewed changes

aten/src/ATen/native/cuda/UpSampleBilinear2d.cu Show resolved Hide resolved

vfdev-5 requested a review from ngimel January 24, 2022 21:50

jbschlosser mentioned this pull request Jan 25, 2022

[feature request] Upstream to core PyTorch antialiased interpolation #71638

Closed

vfdev-5 changed the title ~~Added antialias flag to interpolate (CUDA, blinear and bicubic)~~ Added antialias flag to interpolate (CUDA, bilinear and bicubic) Jan 26, 2022

pytorchmergebot closed this in eeda31f Jan 27, 2022

vfdev-5 deleted the added-interp-antialias-cuda branch January 27, 2022 20:46

vfdev-5 mentioned this pull request Feb 1, 2022

Remove custom ops interpolation with antialiasing pytorch/vision#5329

Merged

vfdev-5 mentioned this pull request Jun 10, 2022

PIL version check for enum change appears to break SIMD versions pytorch/vision#6153

Closed

vfdev-5 mentioned this pull request Sep 6, 2022

[C++ API] Added missing antialiasing path in interpolation C++ api #84599

Closed

Conversation

vfdev-5 commented Jan 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

pytorch-probot bot commented Jan 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Jan 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

1 failure not recognized by patterns:

Uh oh!

vfdev-5 commented Jan 10, 2022

Uh oh!

jbschlosser commented Jan 11, 2022

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jbschlosser left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IvanYashchuk commented Jan 20, 2022

Uh oh!

jeffdaily commented Jan 20, 2022

Uh oh!

Uh oh!

Uh oh!

ngimel commented Jan 26, 2022

Uh oh!

facebook-github-bot commented Jan 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

vfdev-5 commented Jan 6, 2022 •

edited

Loading

pytorch-probot bot commented Jan 6, 2022 •

edited

Loading

facebook-github-bot commented Jan 6, 2022 •

edited

Loading

jbschlosser left a comment •

edited

Loading