Skip to content

Optimize opencv dft by vectorizing radix2 and radix3.#18158

Merged
opencv-pushbot merged 1 commit intoopencv:3.4from
legrosbuffle:3.4-vectorize-dft-radix
Oct 30, 2020
Merged

Optimize opencv dft by vectorizing radix2 and radix3.#18158
opencv-pushbot merged 1 commit intoopencv:3.4from
legrosbuffle:3.4-vectorize-dft-radix

Conversation

@legrosbuffle
Copy link
Copy Markdown
Contributor

This is useful for non power-of-two sizes when WITH_IPP is not an option.

This shows consistent improvement on openCV benchmarks, and we measure
even larger improvements on our internal workloads.

For example, for 320x480, 32FC*, we can see a ~5% improvement, as
320=2^6*5 and 480=2^5*3*5, so the improved radix3 version is used.
64FC* is flat as expected, as we do not specialize the functors for double
in this change.

dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, 0, false)                                1.239  1.153     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, 0, true)                                 0.991  0.926     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_COMPLEX_OUTPUT, false)               1.367  1.281     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_COMPLEX_OUTPUT, true)                1.114  1.049     1.06
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_INVERSE, false)                      1.313  1.254     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_INVERSE, true)                       1.027  0.977     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   1.296  1.217     1.06
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    1.039  0.963     1.08
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_ROWS, false)                         0.542  0.524     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_ROWS, true)                          0.293  0.277     1.06
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_SCALE, false)                        1.265  1.175     1.08
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_SCALE, true)                         1.004  0.942     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, 0, false)                                1.292  1.280     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, 0, true)                                 1.038  1.030     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_COMPLEX_OUTPUT, false)               1.484  1.488     1.00
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_COMPLEX_OUTPUT, true)                1.222  1.224     1.00
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_INVERSE, false)                      1.380  1.355     1.02
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_INVERSE, true)                       1.117  1.133     0.99
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   1.372  1.383     0.99
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    1.117  1.127     0.99
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_ROWS, false)                         0.546  0.539     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_ROWS, true)                          0.293  0.299     0.98
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_SCALE, false)                        1.351  1.339     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_SCALE, true)                         1.099  1.092     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, 0, false)                                2.235  2.123     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, 0, true)                                 1.843  1.727     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_COMPLEX_OUTPUT, false)               2.189  2.109     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_COMPLEX_OUTPUT, true)                1.827  1.754     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_INVERSE, false)                      2.392  2.309     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_INVERSE, true)                       1.951  1.865     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   2.391  2.293     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    1.954  1.882     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_ROWS, false)                         0.811  0.815     0.99
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_ROWS, true)                          0.426  0.437     0.98
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_SCALE, false)                        2.268  2.152     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_SCALE, true)                         1.893  1.788     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, 0, false)                                4.546  4.395     1.03
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, 0, true)                                 3.616  3.426     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_COMPLEX_OUTPUT, false)               4.843  4.668     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_COMPLEX_OUTPUT, true)                3.825  3.748     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_INVERSE, false)                      4.720  4.525     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_INVERSE, true)                       3.743  3.601     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   4.755  4.527     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    3.744  3.586     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_ROWS, false)                         1.992  2.012     0.99
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_ROWS, true)                          1.048  1.048     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_SCALE, false)                        4.625  4.451     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_SCALE, true)                         3.643  3.491     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, 0, false)                                4.499  4.488     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, 0, true)                                 3.559  3.555     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_COMPLEX_OUTPUT, false)               5.155  5.165     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_COMPLEX_OUTPUT, true)                4.103  4.101     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_INVERSE, false)                      5.484  5.474     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_INVERSE, true)                       4.617  4.518     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   5.547  5.509     1.01
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    4.553  4.554     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_ROWS, false)                         2.067  2.018     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_ROWS, true)                          1.104  1.079     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_SCALE, false)                        4.665  4.619     1.01
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_SCALE, true)                         3.698  3.681     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, 0, false)                                8.774  8.275     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, 0, true)                                 6.975  6.527     1.07
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_COMPLEX_OUTPUT, false)               8.720  8.270     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_COMPLEX_OUTPUT, true)                6.928  6.532     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_INVERSE, false)                      9.272  8.862     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_INVERSE, true)                       7.323  6.946     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   9.262  8.768     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    7.298  6.871     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_ROWS, false)                         3.766  3.639     1.03
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_ROWS, true)                          1.932  1.889     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_SCALE, false)                        8.865  8.417     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_SCALE, true)                         7.067  6.643     1.06
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, 0, false)                              10.014 10.141    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, 0, true)                               7.600  7.632     1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_COMPLEX_OUTPUT, false)             11.059 11.283    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_COMPLEX_OUTPUT, true)              8.475  8.552     0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_INVERSE, false)                    12.678 12.789    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_INVERSE, true)                     10.445 10.359    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 12.626 12.925    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  10.538 10.553    1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_ROWS, false)                       5.041  5.084     0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_ROWS, true)                        2.595  2.607     1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_SCALE, false)                      10.231 10.330    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_SCALE, true)                       7.786  7.815     1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, 0, false)                              13.597 13.302    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, 0, true)                               10.377 10.207    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_COMPLEX_OUTPUT, false)             15.940 15.545    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_COMPLEX_OUTPUT, true)              12.299 12.230    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_INVERSE, false)                    15.270 15.181    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_INVERSE, true)                     12.757 12.339    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 15.512 15.157    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  12.505 12.635    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_ROWS, false)                       6.359  6.255     1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_ROWS, true)                        3.314  3.248     1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_SCALE, false)                      13.937 13.733    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_SCALE, true)                       10.782 10.495    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, 0, false)                              18.985 18.926    1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, 0, true)                               14.256 14.509    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_COMPLEX_OUTPUT, false)             18.696 19.021    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_COMPLEX_OUTPUT, true)              14.290 14.429    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_INVERSE, false)                    20.135 20.296    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_INVERSE, true)                     15.390 15.512    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 20.121 20.354    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  15.341 15.605    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_ROWS, false)                       8.932  9.084     0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_ROWS, true)                        4.539  4.649     0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_SCALE, false)                      19.137 19.303    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_SCALE, true)                       14.565 14.808    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, 0, false)                              22.553 21.171    1.07
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, 0, true)                               17.850 16.390    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_COMPLEX_OUTPUT, false)             24.062 22.634    1.06
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_COMPLEX_OUTPUT, true)              19.342 17.932    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_INVERSE, false)                    28.609 27.326    1.05
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_INVERSE, true)                     24.591 23.289    1.06
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 28.667 27.467    1.04
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  24.671 23.309    1.06
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_ROWS, false)                       9.458  9.077     1.04
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_ROWS, true)                        4.709  4.566     1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_SCALE, false)                      22.791 21.583    1.06
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_SCALE, true)                       18.029 16.691    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, 0, false)                              25.238 24.427    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, 0, true)                               19.636 19.270    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_COMPLEX_OUTPUT, false)             28.342 27.957    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_COMPLEX_OUTPUT, true)              22.413 22.477    1.00
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE, false)                    26.465 26.085    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE, true)                     21.972 21.704    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 26.497 26.127    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  22.010 21.523    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_ROWS, false)                       11.188 10.774    1.04
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_ROWS, true)                        6.094  5.916     1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_SCALE, false)                      25.728 24.934    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_SCALE, true)                       20.077 19.653    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, 0, false)                              43.834 40.726    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, 0, true)                               35.198 32.218    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_COMPLEX_OUTPUT, false)             43.743 40.897    1.07
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_COMPLEX_OUTPUT, true)              35.240 32.226    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_INVERSE, false)                    46.022 42.612    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_INVERSE, true)                     36.779 33.961    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 46.396 42.723    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  37.025 33.874    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_ROWS, false)                       17.334 16.832    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_ROWS, true)                        9.212  8.970     1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_SCALE, false)                      44.190 41.211    1.07
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_SCALE, true)                       35.900 32.888    1.09
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, 0, false)                              40.948 38.256    1.07
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, 0, true)                               33.825 30.759    1.10
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_COMPLEX_OUTPUT, false)             53.210 53.584    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_COMPLEX_OUTPUT, true)              46.356 46.712    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_INVERSE, false)                    47.471 47.213    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_INVERSE, true)                     40.491 41.363    0.98
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 46.724 47.049    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  40.834 41.381    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_ROWS, false)                       14.508 14.490    1.00
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_ROWS, true)                        7.832  7.828     1.00
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_SCALE, false)                      41.491 38.341    1.08
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_SCALE, true)                       34.587 31.208    1.11
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, 0, false)                              65.155 63.173    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, 0, true)                               56.091 54.752    1.02
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_COMPLEX_OUTPUT, false)             71.549 70.626    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_COMPLEX_OUTPUT, true)              62.319 61.437    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_INVERSE, false)                    61.480 59.540    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_INVERSE, true)                     54.047 52.650    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 61.752 61.366    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  54.400 53.665    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_ROWS, false)                       20.219 19.704    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_ROWS, true)                        11.145 10.868    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_SCALE, false)                      66.220 64.525    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_SCALE, true)                       57.389 56.114    1.02
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, 0, false)                              86.761 88.128    0.98
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, 0, true)                               75.528 76.725    0.98
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_COMPLEX_OUTPUT, false)             86.750 88.223    0.98
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_COMPLEX_OUTPUT, true)              75.830 76.809    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE, false)                    91.728 92.161    1.00
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE, true)                     78.797 79.876    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 92.163 92.177    1.00
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  78.957 79.863    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_ROWS, false)                       24.781 25.576    0.97
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_ROWS, true)                        13.226 13.695    0.97
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_SCALE, false)                      87.990 89.324    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_SCALE, true)                       76.732 77.869    0.99

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • [ X] I agree to contribute to the project under Apache 2 License.
  • [ X] To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • [ X] The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • [ X] The feature is well documented and sample code can be built with the project CMake

@legrosbuffle legrosbuffle force-pushed the 3.4-vectorize-dft-radix branch from acfa28a to c420c72 Compare August 21, 2020 07:40
@legrosbuffle
Copy link
Copy Markdown
Contributor Author

I've removed the constexpr since this is supposed to go in 3.4.

From the coding style:

The code should be written C++ 11. When the patch is prepared for OpenCV 2.4.x or 3.4.x, use C++ 98.

This is useful for non power-of-two sizes when WITH_IPP is not an option.

This shows consistent improvement over openCV benchmarks, and we measure
even larger improvements on our internal workloads.

For example, for 320x480, `32FC*`, we can see a ~5% improvement}, as
`320=2^6*5` and `480=2^5*3*5`, so the improved radix3 version is used.
`64FC*` is flat as expected, as we do not specialize the functors for `double`
in this change.

```
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, 0, false)                                1.239  1.153     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, 0, true)                                 0.991  0.926     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_COMPLEX_OUTPUT, false)               1.367  1.281     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_COMPLEX_OUTPUT, true)                1.114  1.049     1.06
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_INVERSE, false)                      1.313  1.254     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_INVERSE, true)                       1.027  0.977     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   1.296  1.217     1.06
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    1.039  0.963     1.08
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_ROWS, false)                         0.542  0.524     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_ROWS, true)                          0.293  0.277     1.06
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_SCALE, false)                        1.265  1.175     1.08
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC1, DFT_SCALE, true)                         1.004  0.942     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, 0, false)                                1.292  1.280     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, 0, true)                                 1.038  1.030     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_COMPLEX_OUTPUT, false)               1.484  1.488     1.00
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_COMPLEX_OUTPUT, true)                1.222  1.224     1.00
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_INVERSE, false)                      1.380  1.355     1.02
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_INVERSE, true)                       1.117  1.133     0.99
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   1.372  1.383     0.99
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    1.117  1.127     0.99
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_ROWS, false)                         0.546  0.539     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_ROWS, true)                          0.293  0.299     0.98
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_SCALE, false)                        1.351  1.339     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 64FC1, DFT_SCALE, true)                         1.099  1.092     1.01
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, 0, false)                                2.235  2.123     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, 0, true)                                 1.843  1.727     1.07
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_COMPLEX_OUTPUT, false)               2.189  2.109     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_COMPLEX_OUTPUT, true)                1.827  1.754     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_INVERSE, false)                      2.392  2.309     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_INVERSE, true)                       1.951  1.865     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   2.391  2.293     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    1.954  1.882     1.04
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_ROWS, false)                         0.811  0.815     0.99
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_ROWS, true)                          0.426  0.437     0.98
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_SCALE, false)                        2.268  2.152     1.05
dft::Size_MatType_FlagsType_NzeroRows::(320x480, 32FC2, DFT_SCALE, true)                         1.893  1.788     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, 0, false)                                4.546  4.395     1.03
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, 0, true)                                 3.616  3.426     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_COMPLEX_OUTPUT, false)               4.843  4.668     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_COMPLEX_OUTPUT, true)                3.825  3.748     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_INVERSE, false)                      4.720  4.525     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_INVERSE, true)                       3.743  3.601     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   4.755  4.527     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    3.744  3.586     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_ROWS, false)                         1.992  2.012     0.99
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_ROWS, true)                          1.048  1.048     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_SCALE, false)                        4.625  4.451     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC1, DFT_SCALE, true)                         3.643  3.491     1.04
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, 0, false)                                4.499  4.488     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, 0, true)                                 3.559  3.555     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_COMPLEX_OUTPUT, false)               5.155  5.165     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_COMPLEX_OUTPUT, true)                4.103  4.101     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_INVERSE, false)                      5.484  5.474     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_INVERSE, true)                       4.617  4.518     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   5.547  5.509     1.01
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    4.553  4.554     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_ROWS, false)                         2.067  2.018     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_ROWS, true)                          1.104  1.079     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_SCALE, false)                        4.665  4.619     1.01
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 64FC1, DFT_SCALE, true)                         3.698  3.681     1.00
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, 0, false)                                8.774  8.275     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, 0, true)                                 6.975  6.527     1.07
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_COMPLEX_OUTPUT, false)               8.720  8.270     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_COMPLEX_OUTPUT, true)                6.928  6.532     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_INVERSE, false)                      9.272  8.862     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_INVERSE, true)                       7.323  6.946     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false)   9.262  8.768     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)    7.298  6.871     1.06
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_ROWS, false)                         3.766  3.639     1.03
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_ROWS, true)                          1.932  1.889     1.02
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_SCALE, false)                        8.865  8.417     1.05
dft::Size_MatType_FlagsType_NzeroRows::(800x600, 32FC2, DFT_SCALE, true)                         7.067  6.643     1.06
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, 0, false)                              10.014 10.141    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, 0, true)                               7.600  7.632     1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_COMPLEX_OUTPUT, false)             11.059 11.283    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_COMPLEX_OUTPUT, true)              8.475  8.552     0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_INVERSE, false)                    12.678 12.789    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_INVERSE, true)                     10.445 10.359    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 12.626 12.925    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  10.538 10.553    1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_ROWS, false)                       5.041  5.084     0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_ROWS, true)                        2.595  2.607     1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_SCALE, false)                      10.231 10.330    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC1, DFT_SCALE, true)                       7.786  7.815     1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, 0, false)                              13.597 13.302    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, 0, true)                               10.377 10.207    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_COMPLEX_OUTPUT, false)             15.940 15.545    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_COMPLEX_OUTPUT, true)              12.299 12.230    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_INVERSE, false)                    15.270 15.181    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_INVERSE, true)                     12.757 12.339    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 15.512 15.157    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  12.505 12.635    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_ROWS, false)                       6.359  6.255     1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_ROWS, true)                        3.314  3.248     1.02
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_SCALE, false)                      13.937 13.733    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 64FC1, DFT_SCALE, true)                       10.782 10.495    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, 0, false)                              18.985 18.926    1.00
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, 0, true)                               14.256 14.509    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_COMPLEX_OUTPUT, false)             18.696 19.021    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_COMPLEX_OUTPUT, true)              14.290 14.429    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_INVERSE, false)                    20.135 20.296    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_INVERSE, true)                     15.390 15.512    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 20.121 20.354    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  15.341 15.605    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_ROWS, false)                       8.932  9.084     0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_ROWS, true)                        4.539  4.649     0.98
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_SCALE, false)                      19.137 19.303    0.99
dft::Size_MatType_FlagsType_NzeroRows::(1280x1024, 32FC2, DFT_SCALE, true)                       14.565 14.808    0.98
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, 0, false)                              22.553 21.171    1.07
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, 0, true)                               17.850 16.390    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_COMPLEX_OUTPUT, false)             24.062 22.634    1.06
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_COMPLEX_OUTPUT, true)              19.342 17.932    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_INVERSE, false)                    28.609 27.326    1.05
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_INVERSE, true)                     24.591 23.289    1.06
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 28.667 27.467    1.04
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  24.671 23.309    1.06
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_ROWS, false)                       9.458  9.077     1.04
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_ROWS, true)                        4.709  4.566     1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_SCALE, false)                      22.791 21.583    1.06
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC1, DFT_SCALE, true)                       18.029 16.691    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, 0, false)                              25.238 24.427    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, 0, true)                               19.636 19.270    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_COMPLEX_OUTPUT, false)             28.342 27.957    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_COMPLEX_OUTPUT, true)              22.413 22.477    1.00
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE, false)                    26.465 26.085    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE, true)                     21.972 21.704    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 26.497 26.127    1.01
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  22.010 21.523    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_ROWS, false)                       11.188 10.774    1.04
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_ROWS, true)                        6.094  5.916     1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_SCALE, false)                      25.728 24.934    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 64FC1, DFT_SCALE, true)                       20.077 19.653    1.02
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, 0, false)                              43.834 40.726    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, 0, true)                               35.198 32.218    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_COMPLEX_OUTPUT, false)             43.743 40.897    1.07
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_COMPLEX_OUTPUT, true)              35.240 32.226    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_INVERSE, false)                    46.022 42.612    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_INVERSE, true)                     36.779 33.961    1.08
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 46.396 42.723    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  37.025 33.874    1.09
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_ROWS, false)                       17.334 16.832    1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_ROWS, true)                        9.212  8.970     1.03
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_SCALE, false)                      44.190 41.211    1.07
dft::Size_MatType_FlagsType_NzeroRows::(1920x1080, 32FC2, DFT_SCALE, true)                       35.900 32.888    1.09
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, 0, false)                              40.948 38.256    1.07
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, 0, true)                               33.825 30.759    1.10
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_COMPLEX_OUTPUT, false)             53.210 53.584    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_COMPLEX_OUTPUT, true)              46.356 46.712    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_INVERSE, false)                    47.471 47.213    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_INVERSE, true)                     40.491 41.363    0.98
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 46.724 47.049    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  40.834 41.381    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_ROWS, false)                       14.508 14.490    1.00
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_ROWS, true)                        7.832  7.828     1.00
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_SCALE, false)                      41.491 38.341    1.08
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC1, DFT_SCALE, true)                       34.587 31.208    1.11
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, 0, false)                              65.155 63.173    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, 0, true)                               56.091 54.752    1.02
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_COMPLEX_OUTPUT, false)             71.549 70.626    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_COMPLEX_OUTPUT, true)              62.319 61.437    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_INVERSE, false)                    61.480 59.540    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_INVERSE, true)                     54.047 52.650    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 61.752 61.366    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  54.400 53.665    1.01
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_ROWS, false)                       20.219 19.704    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_ROWS, true)                        11.145 10.868    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_SCALE, false)                      66.220 64.525    1.03
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 64FC1, DFT_SCALE, true)                       57.389 56.114    1.02
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, 0, false)                              86.761 88.128    0.98
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, 0, true)                               75.528 76.725    0.98
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_COMPLEX_OUTPUT, false)             86.750 88.223    0.98
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_COMPLEX_OUTPUT, true)              75.830 76.809    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE, false)                    91.728 92.161    1.00
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE, true)                     78.797 79.876    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, false) 92.163 92.177    1.00
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_INVERSE|DFT_COMPLEX_OUTPUT, true)  78.957 79.863    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_ROWS, false)                       24.781 25.576    0.97
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_ROWS, true)                        13.226 13.695    0.97
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_SCALE, false)                      87.990 89.324    0.99
dft::Size_MatType_FlagsType_NzeroRows::(2048x2048, 32FC2, DFT_SCALE, true)                       76.732 77.869    0.99
```
@legrosbuffle legrosbuffle force-pushed the 3.4-vectorize-dft-radix branch from c420c72 to da555a2 Compare August 21, 2020 12:07
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution!

// vector register.
inline __m128 complexMul(const Complex<float>* const a, const Complex<float>* const b) {
const __m128 z = _mm_setzero_ps();
const __m128 neg_elem0 = _mm_set_ps(0.0f,0.0f,0.0f,-0.0f);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New code with native SSE/AVX intrinsics is not accepted anymore without a strong reason.
Please try to use approach with "Universal SIMD Intrinsics" instead (which covers multiple platforms like ARM NEON, Power VSX, etc).
Some words with motivation: https://github.com/opencv/opencv/wiki/OE-27.-Wide-Universal-Intrinsics

BTW, Modern compilers are smart enough with optimization or auto-vectorization of floating-point code.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with disabled added native SSE code. Regression is about 7-9%.
Also there is DFT_VecR4 implementation with native SSE code which is not added by this patch.

So just removal of native intrinsics is not a good option.
Unfortunately there is no resources to port on OpenCV SIMD UI too, so lets merge this "as is".

@vpisarev
Copy link
Copy Markdown
Contributor

@legrosbuffle, thank you for the contribution! Can you please convert your code to OpenCV's universal intrinsics?

@legrosbuffle
Copy link
Copy Markdown
Contributor Author

Hi Alexander, Vadim, Sorry I missed then previous message. I'll have a loot at it ASAP.

@opencv-pushbot opencv-pushbot merged commit 716450c into opencv:3.4 Oct 30, 2020
@alalek alalek mentioned this pull request Nov 5, 2020
@alalek alalek mentioned this pull request Nov 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants