Impl hal_rvv LUT | Add more LUT test by GenshinImpactStarts · Pull Request #26941 · opencv/opencv

GenshinImpactStarts · 2025-02-19T10:42:05Z

Implement through the existing cv_hal_lut interfaces.

Add more LUT accuracy and performance tests:

Accuracy test: Multi-channel table tests are added, and the boundary of randu used for generating test data is broadened to make the test more robust.
Performance test: Multi-channel input and multi-channel table tests are added.

Perf test done on

MUSE-PI (vlen=256)
Compiler: gcc 14.2 (riscv-collab/riscv-gnu-toolchain Nightly: December 16, 2024)

$ opencv_test_core --gtest_filter="Core_LUT*"
$ opencv_perf_core --gtest_filter="SizePrm_LUT*" --perf_min_samples=300 --perf_force_samples=300

Geometric mean (ms)

         Name of Test          scalar   ui    rvv       ui        rvv    
                                                        vs         vs    
                                                      scalar     scalar  
                                                    (x-factor) (x-factor)
LUT::SizePrm::320x240          0.248  0.249  0.052     1.00       4.74   
LUT::SizePrm::640x480          0.277  0.275  0.085     1.01       3.28   
LUT::SizePrm::1920x1080        0.950  0.947  0.634     1.00       1.50   
LUT_multi2::SizePrm::320x240   2.051  2.045  2.049     1.00       1.00   
LUT_multi2::SizePrm::640x480   2.128  2.134  2.125     1.00       1.00   
LUT_multi2::SizePrm::1920x1080 7.397  7.380  7.390     1.00       1.00   
LUT_multi::SizePrm::320x240    0.715  0.747  0.154     0.96       4.64   
LUT_multi::SizePrm::640x480    0.741  0.766  0.257     0.97       2.88   
LUT_multi::SizePrm::1920x1080  2.766  2.765  1.925     1.00       1.44

This optimization is achieved by loading the entire lookup table into vector registers. Due to register size limitations, the optimization is only effective under the following conditions:

For the U8C1 table type, the optimization works when vlen >= 256
For U16C1, it works when vlen >= 512
For U32C1, it works when vlen >= 1024

Since I don’t have real hardware with vlen > 256, the corresponding accuracy tests were conducted on QEMU built from the riscv-collab/riscv-gnu-toolchain.

This patch does not implement optimizations for multi-channel tables.

Previous attempts:

For the U8C1 table type, when vlen = 128, it is possible to use four u8m4 vectors to load the entire table, perform gathering, and merge the results. However, the performance is almost the same as the scalar version.
Loading part of the table and repeatedly loading the source data is faster for small sizes. But as the table size grows, the performance quickly degrades compared to the scalar version.
Using vluxei8 as a general solution does not show any performance improvement.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

GenshinImpactStarts · 2025-02-19T14:16:33Z

The CI failed due to the SANITY_CHECK. I'm unsure whether I should use SANITY_CHECK_NOTHING, as the original LUT performance test uses SANITY_CHECK, and these newly added tests are the same set of tests.

mshabunin

We need to decide what to do with parallel HAL functions.

UPD: after discussion we decided to allow cv::parallel_for for now until more appropriate solution is available.

asmorkalov · 2025-03-01T11:52:08Z

My performance results for Muse Pi v30 (gcc 14.2):

LUT::SizePrm::320x240 	0.251 	0.053 	4.71
LUT::SizePrm::640x480 	0.287 	0.088 	3.26
LUT::SizePrm::1920x1080 	0.946 	0.646 	1.46
LUT_multi2::SizePrm::320x240 	2.063 	2.056 	1.00
LUT_multi2::SizePrm::640x480 	2.141 	2.145 	1.00
LUT_multi2::SizePrm::1920x1080 	7.457 	7.397 	1.01
LUT_multi::SizePrm::320x240 	0.750 	0.154 	4.86
LUT_multi::SizePrm::640x480 	0.775 	0.255 	3.04
LUT_multi::SizePrm::1920x1080 	2.691 	1.959 	1.37

fengyuentau · 2025-03-03T12:33:43Z

K1 (clang spacemit-v1.0.4)

LUT::SizePrm::320x240            0.290   0.054    5.42
LUT::SizePrm::640x480            0.309   0.093    3.30
LUT::SizePrm::1920x1080          1.053   0.643    1.64
LUT_multi2::SizePrm::320x240     1.948   1.954    1.00
LUT_multi2::SizePrm::640x480     2.032   2.056    0.99
LUT_multi2::SizePrm::1920x1080   7.110   7.195    0.99
LUT_multi::SizePrm::320x240      0.860   0.156    5.50
LUT_multi::SizePrm::640x480      0.892   0.263    3.39
LUT_multi::SizePrm::1920x1080    3.075   1.909    1.61

K1 vs RK3568:

LUT::SizePrm::320x240           0.249   0.054    4.66
LUT::SizePrm::640x480           0.266   0.093    2.85
LUT::SizePrm::1920x1080         1.814   0.643    2.82
LUT_multi2::SizePrm::320x240    1.258   1.954    0.64
LUT_multi2::SizePrm::640x480    1.309   2.056    0.64
LUT_multi2::SizePrm::1920x1080  8.991   7.195    1.25
LUT_multi::SizePrm::320x240     0.754   0.156    4.82
LUT_multi::SizePrm::640x480     0.794   0.263    3.01
LUT_multi::SizePrm::1920x1080   5.397   1.909    2.83

LUT_multi2 does not have speedup. Is this expected?

GenshinImpactStarts · 2025-03-03T17:22:32Z

Yes, LUT_multi2 is a new test where both the input and the table are multi-channel, and in this case, I don't apply any optimization.

fengyuentau · 2025-03-04T06:00:25Z

Yes, LUT_multi2 is a new test where both the input and the table are multi-channel, and in this case, I don't apply any optimization.

Since this patch does not bring speedup for multi-channel lut case, I wonder whether the test is needed.

GenshinImpactStarts · 2025-03-04T07:27:15Z

I'm not sure, because it's actually also part of the LUT test.

modules/core/perf/perf_lut.cpp

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>

asmorkalov added optimization platform: riscv labels Feb 20, 2025

asmorkalov added this to the 4.12.0 milestone Feb 20, 2025

GenshinImpactStarts force-pushed the lut_hal_rvv branch from 5a81d8d to 52545f7 Compare February 26, 2025 18:40

mshabunin added the RFC label Feb 27, 2025

mshabunin reviewed Feb 27, 2025

View reviewed changes

mshabunin mentioned this pull request Feb 27, 2025

HAL parallel_for proposal #26989

Draft

asmorkalov assigned asmorkalov and mshabunin and unassigned asmorkalov Mar 1, 2025

mshabunin approved these changes Mar 2, 2025

View reviewed changes

mshabunin removed the RFC label Mar 2, 2025

fengyuentau reviewed Mar 4, 2025

View reviewed changes

modules/core/perf/perf_lut.cpp Outdated Show resolved Hide resolved

GenshinImpactStarts and others added 3 commits March 6, 2025 09:00

impl hal_rvv lut | add more lut test

0a76abd

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>

remove sanity check of new added perf test

de19ef7

Co-authored-by: Liutong HAN <liutong2020@iscas.ac.cn>

Code review fix.

cd4f4d7

asmorkalov force-pushed the lut_hal_rvv branch from 52545f7 to cd4f4d7 Compare March 6, 2025 06:01

fengyuentau approved these changes Mar 6, 2025

View reviewed changes

asmorkalov merged commit 57a78cb into opencv:4.x Mar 6, 2025
24 of 28 checks passed

asmorkalov mentioned this pull request Mar 11, 2025

5.x merge 4.x #27045

Merged

GenshinImpactStarts deleted the lut_hal_rvv branch March 12, 2025 07:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Impl hal_rvv LUT | Add more LUT test#26941

Impl hal_rvv LUT | Add more LUT test#26941
asmorkalov merged 3 commits intoopencv:4.xfrom
GenshinImpactStarts:lut_hal_rvv

GenshinImpactStarts commented Feb 19, 2025

Uh oh!

GenshinImpactStarts commented Feb 19, 2025

Uh oh!

mshabunin left a comment •

edited

Loading

Uh oh!

asmorkalov commented Mar 1, 2025

Uh oh!

fengyuentau commented Mar 3, 2025 •

edited

Loading

Uh oh!

GenshinImpactStarts commented Mar 3, 2025

Uh oh!

fengyuentau commented Mar 4, 2025

Uh oh!

GenshinImpactStarts commented Mar 4, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

GenshinImpactStarts commented Feb 19, 2025

Pull Request Readiness Checklist

Uh oh!

GenshinImpactStarts commented Feb 19, 2025

Uh oh!

mshabunin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asmorkalov commented Mar 1, 2025

Uh oh!

fengyuentau commented Mar 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GenshinImpactStarts commented Mar 3, 2025

Uh oh!

fengyuentau commented Mar 4, 2025

Uh oh!

GenshinImpactStarts commented Mar 4, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mshabunin left a comment •

edited

Loading

fengyuentau commented Mar 3, 2025 •

edited

Loading