Skip to content

Rewrite Universal Intrinsic code by using new API: ImgProc module.#24058

Merged
asmorkalov merged 7 commits intoopencv:4.xfrom
hanliutong:rewrite-imgporc
Sep 14, 2023
Merged

Rewrite Universal Intrinsic code by using new API: ImgProc module.#24058
asmorkalov merged 7 commits intoopencv:4.xfrom
hanliutong:rewrite-imgporc

Conversation

@hanliutong
Copy link
Copy Markdown
Contributor

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the opencv/modules/imgproc folder: rewrite them by using the new Universal Intrinsic API.

For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, opencv_test_imgproc is passed.

The patch is partially auto-generated by using the rewriter, related PR #23885 and #23980.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Copy link
Copy Markdown
Contributor

@mshabunin mshabunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good to me. I have some minor concerns regarding CV_SIMD_WIDTH usage and whether some blocks have not been enabled (intentionally or not).

@@ -3017,16 +3017,16 @@ void accW_simd_(const float* src, double* dst, const uchar* mask, int len, int c
#elif CV_SIMD_64F
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CV_SIMD_SCALABLE_64F?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Fixed.

@@ -3075,8 +3075,8 @@ void accW_simd_(const double* src, double* dst, const uchar* mask, int len, int
#elif CV_SIMD_64F
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CV_SIMD_SCALABLE_64F?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Fixed.

@@ -1261,21 +1261,16 @@ struct VResizeLinearVec_32s8u
v_int16 b0 = vx_setall_s16(beta[0]), b1 = vx_setall_s16(beta[1]);

if( (((size_t)S0|(size_t)S1)&(CV_SIMD_WIDTH - 1)) == 0 )
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't CV_SIMD_WIDTH be replaced with VTraits<v_uint8>::vlanes? Or what is the meaning of this condition? Here and below.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should use VTraits<v_uint8>::vlanes() here and below. Fixed.

And I personally think that the meaning of this condition is to judge whether S0 and S1 (as addresses) are aligned with CV_SIMD_WIDTH (CV_SIMD_WIDTH-1 acts as a mask here). If so, we can use vx_load_aligned, otherwise use vx_load. BTW, the implementation in the two load functions RVV is the same.

{
v_uint16 masklow = vx_setall_u16(0x00ff);
for ( ; dx <= w - v_uint16::nlanes; dx += v_uint16::nlanes, S0 += v_uint8::nlanes, S1 += v_uint8::nlanes, D += v_uint16::nlanes)
for ( ; dx <= w - VTraits<v_uint16>::vlanes(); dx += VTraits<v_uint16>::vlanes(), S0 += VTraits<v_uint8>::vlanes(), S1 += VTraits<v_uint8>::vlanes(), D += VTraits<v_uint16>::vlanes())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this block has not been enabled for CV_SIMD_SCALABLE. Line 2433. Or is it intended because of fixed CV_SIMD_WIDTH checks below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I originally rewrote these codes, but later found that these codes cannot be used in the variable length backend due to the CV_SIMD_WIDTH checks. Should I revert this part?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's keep modifications, eventually this part might be updated to be compatible with scalable intrinsics.

@mshabunin
Copy link
Copy Markdown
Contributor

mshabunin commented Aug 8, 2023

@asmorkalov , attached is my performance comparison report (x86 and aarch64). No obvious problems except some minor fluctuations.
report_imgproc.zip

@asmorkalov
Copy link
Copy Markdown
Contributor

@hanliutong I merged the first part for core and see conflicts. Could you rebase and fix them?

@asmorkalov asmorkalov self-requested a review August 11, 2023 16:08
Comment on lines +240 to +242
#else
swap(b, r);

#endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same for swap

*((unaligned_int*) (dst + x)) = v_reinterpret_as_s32(v_rshr_pack<8>(v_pack_u(t0, t0), v_setzero_u16())).get0();
}
#else
for (; x <= width - 1; x += 1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect, that the branch with plain loop should be outside of #else to handle CV_SIMD128 tails.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CV_SIMD128 code block does not seem to encounter the tails, because the data is 128-bit aligned, as follows:

  1. In the CV_SIMD128 code block, for row0~row4, each loop loads and processes 128-bit data.
  2. The row0~row4 are src[0]~src[4] respectively, coming from the parameter int** src
  3. For the call to the PyrDownVecV function (only one), WT* rows[PD_SZ] is passed in as argument.
  4. The size of rows[0]~rows[4] is determined by buf and bufstep, and they are 16byte aligned, that is, 128-bit.
int bufstep = (int)alignSize(dsize.width*cn, 16); //line 928
WT* buf = alignPtr((WT*)_buf.data(), 16); //line 930

If the above analysis is correct, then I think the plain loop should only work on non-CV_SIMD128 cases, that is, in #else

BTW, this part of the code has been working like this for many years, which may indirectly means that the original implementation (128-fixed related code) does not need to consider the tail.

@asmorkalov asmorkalov added this to the 4.9.0 milestone Sep 14, 2023
@hanliutong
Copy link
Copy Markdown
Contributor Author

It looks like there are some problems with CI PR:4.x / Windows10-x64 / BuildAndTest (pull_request) and PR:4.x / macOS-x64 / BuildAndTest (pull_request) :

Run actions/upload-artifact@v3
With the provided path, there will be 1 file uploaded
Starting artifact upload
For more detailed logs during the artifact upload process, enable step-debugging: https://docs.github.com/actions/monitoring-and-troubleshooting-workflows/enabling-debug-logging#enabling-step-debug-logging
Artifact name is valid!
Create Artifact Container - Attempt 1 of 5 failed with error: read ECONNRESET
Create Artifact Container - Attempt 2 of 5 failed with error: read ECONNRESET
Create Artifact Container - Attempt 3 of 5 failed with error: read ECONNRESET
Create Artifact Container - Attempt 4 of 5 failed with error: read ECONNRESET
Create Artifact Container - Attempt 5 of 5 failed with error: read ECONNRESET
Error: Create Artifact Container failed: read ECONNRESET

@asmorkalov asmorkalov merged commit 5e91915 into opencv:4.x Sep 14, 2023
@komakai
Copy link
Copy Markdown
Contributor

komakai commented Sep 17, 2023

@hanliutong I have some refactoring to corner.cpp I'd like to contribute at 4.x...komakai:opencv:w20230830
but unfortunately there is some overlap with changes you have made here and potentially other changes that you mention you plan to contribute soon.
Could you let me know when you are done with your changes to corner.cpp, corner.hpp, corner.avx.cpp etc. so I can avoid having to repeatly rebase my changes? Thanks

@hanliutong
Copy link
Copy Markdown
Contributor Author

Hello @komakai , our work is a modification of Universal Intrinsic, which only involves the code blocks guarded by the CV_SIMD and CV_SIMD_64F macros. The corner.cpp file has been modified and has been merged into 4.x by this PR. And there are no code blocks in corner.hpp and corner.avx.cpp that need to be modified. So you can rebase 4.x now

@komakai
Copy link
Copy Markdown
Contributor

komakai commented Sep 17, 2023

@hanliutong thanks for your reply - I will go ahead and rebase now. There are few things in your changes I didn't understand very well so I may need your help later when I submit my PR

asmorkalov pushed a commit that referenced this pull request Sep 19, 2023
Rewrite Universal Intrinsic code by using new API: ImgProc module Part 2 #24132

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the opencv/modules/imgproc folder: rewrite them by using the new Universal Intrinsic API.

This is the second part of the modification to the Imgproc module ( Part 1: #24058 ), And I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed.

The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter).

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
asmorkalov pushed a commit that referenced this pull request Sep 19, 2023
Rewrite Universal Intrinsic code: ImgProc (CV_SIMD_WIDTH related Part) #24166

Related PR: #24058, #24132. The goal of this series of PRs is to modify the SIMD code blocks in the opencv/modules/imgproc folder by using the new Universal Intrinsic API.

The modification of this PR mainly focuses on the code that uses the `CV_SIMD_WIDTH` macro. This macro is sometimes used for loop tail processing, such as `box_filter.simd.hpp` and `morph.simd.hpp`.

```cpp
#if CV_SIMD
int i = 0;
for (i < n - v_uint16::nlanes; i += v_uint16::nlanes) {
// some universal intrinsic code
// e.g. v_uint16...
}
#if CV_SIMD_WIDTH > 16
for (i < n - v_uint16x8::nlanes; i += v_uint16x8::nlanes) {
// handle loop tail by 128 bit SIMD
// e.g. v_uint16x8
}
#endif //CV_SIMD_WIDTH 
#endif// CV_SIMD
```
The main contradiction is that the variable-length Universal Intrinsic backend cannot use 128bit fixed-length data structures. Therefore, this PR uses the scalar loop to handle the loop tail.

This PR is marked as draft because the modification of the `box_filter.simd.hpp` file caused a compilation error. The cause of the error is initially believed to be due to an internal error in the GCC compiler.

```bash
box_filter.simd.hpp:1162:5: internal compiler error: Segmentation fault
 1162 |     }
      |     ^
0xe03883 crash_signal
        /wafer/share/gcc/gcc/toplev.cc:314
0x7ff261c4251f ???
        ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x6bde48 hash_set<rtl_ssa::set_info*, false, default_hash_traits<rtl_ssa::set_info*> >::iterator::operator*()
        /wafer/share/gcc/gcc/hash-set.h:125
0x6bde48 extract_single_source
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1184
0x6bde48 extract_single_source
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1174
0x119ad9e pass_vsetvl::propagate_avl() const
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4087
0x119ceaf pass_vsetvl::execute(function*)
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4344
0x119ceaf pass_vsetvl::execute(function*)
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4325
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
```

This PR can be compiled with Clang 16, and `opencv_test_imgproc` is passed on QEMU.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
@asmorkalov asmorkalov mentioned this pull request Sep 28, 2023
asmorkalov pushed a commit that referenced this pull request Oct 5, 2023
Rewrite Universal Intrinsic code: float related part #24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
#23885 First patch, an example
#23980 Core module
#24058 ImgProc module, part 1
#24132 ImgProc module, part 2
#24166 ImgProc module, part 3
#24301 Features2d and calib3d module
#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
hanliutong added a commit to hanliutong/opencv that referenced this pull request Oct 7, 2023
Rewrite Universal Intrinsic code: float related part opencv#24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
opencv#23885 First patch, an example
opencv#23980 Core module
opencv#24058 ImgProc module, part 1
opencv#24132 ImgProc module, part 2
opencv#24166 ImgProc module, part 3
opencv#24301 Features2d and calib3d module
opencv#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Rewrite Universal Intrinsic code by using new API: ImgProc module. opencv#24058

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the `opencv/modules/imgproc` folder: rewrite them by using the new Universal Intrinsic API. 

For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed.

The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885 and opencv#23980.



### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Rewrite Universal Intrinsic code by using new API: ImgProc module Part 2 opencv#24132

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the opencv/modules/imgproc folder: rewrite them by using the new Universal Intrinsic API.

This is the second part of the modification to the Imgproc module ( Part 1: opencv#24058 ), And I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed.

The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter).

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Rewrite Universal Intrinsic code: ImgProc (CV_SIMD_WIDTH related Part) opencv#24166

Related PR: opencv#24058, opencv#24132. The goal of this series of PRs is to modify the SIMD code blocks in the opencv/modules/imgproc folder by using the new Universal Intrinsic API.

The modification of this PR mainly focuses on the code that uses the `CV_SIMD_WIDTH` macro. This macro is sometimes used for loop tail processing, such as `box_filter.simd.hpp` and `morph.simd.hpp`.

```cpp
#if CV_SIMD
int i = 0;
for (i < n - v_uint16::nlanes; i += v_uint16::nlanes) {
// some universal intrinsic code
// e.g. v_uint16...
}
#if CV_SIMD_WIDTH > 16
for (i < n - v_uint16x8::nlanes; i += v_uint16x8::nlanes) {
// handle loop tail by 128 bit SIMD
// e.g. v_uint16x8
}
#endif //CV_SIMD_WIDTH 
#endif// CV_SIMD
```
The main contradiction is that the variable-length Universal Intrinsic backend cannot use 128bit fixed-length data structures. Therefore, this PR uses the scalar loop to handle the loop tail.

This PR is marked as draft because the modification of the `box_filter.simd.hpp` file caused a compilation error. The cause of the error is initially believed to be due to an internal error in the GCC compiler.

```bash
box_filter.simd.hpp:1162:5: internal compiler error: Segmentation fault
 1162 |     }
      |     ^
0xe03883 crash_signal
        /wafer/share/gcc/gcc/toplev.cc:314
0x7ff261c4251f ???
        ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x6bde48 hash_set<rtl_ssa::set_info*, false, default_hash_traits<rtl_ssa::set_info*> >::iterator::operator*()
        /wafer/share/gcc/gcc/hash-set.h:125
0x6bde48 extract_single_source
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1184
0x6bde48 extract_single_source
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1174
0x119ad9e pass_vsetvl::propagate_avl() const
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4087
0x119ceaf pass_vsetvl::execute(function*)
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4344
0x119ceaf pass_vsetvl::execute(function*)
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4325
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
```

This PR can be compiled with Clang 16, and `opencv_test_imgproc` is passed on QEMU.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Rewrite Universal Intrinsic code: float related part opencv#24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
opencv#23885 First patch, an example
opencv#23980 Core module
opencv#24058 ImgProc module, part 1
opencv#24132 ImgProc module, part 2
opencv#24166 ImgProc module, part 3
opencv#24301 Features2d and calib3d module
opencv#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Rewrite Universal Intrinsic code by using new API: ImgProc module. opencv#24058

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the `opencv/modules/imgproc` folder: rewrite them by using the new Universal Intrinsic API. 

For easier review, this PR includes a part of the rewritten code, and another part will be brought in the next PR (coming soon). I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed.

The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter), related PR opencv#23885 and opencv#23980.



### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Rewrite Universal Intrinsic code by using new API: ImgProc module Part 2 opencv#24132

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro in the opencv/modules/imgproc folder: rewrite them by using the new Universal Intrinsic API.

This is the second part of the modification to the Imgproc module ( Part 1: opencv#24058 ), And I tested this patch on RVV (QEMU) and AVX devices, `opencv_test_imgproc` is passed.

The patch is partially auto-generated by using the [rewriter](https://github.com/hanliutong/rewriter).

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Rewrite Universal Intrinsic code: ImgProc (CV_SIMD_WIDTH related Part) opencv#24166

Related PR: opencv#24058, opencv#24132. The goal of this series of PRs is to modify the SIMD code blocks in the opencv/modules/imgproc folder by using the new Universal Intrinsic API.

The modification of this PR mainly focuses on the code that uses the `CV_SIMD_WIDTH` macro. This macro is sometimes used for loop tail processing, such as `box_filter.simd.hpp` and `morph.simd.hpp`.

```cpp
#if CV_SIMD
int i = 0;
for (i < n - v_uint16::nlanes; i += v_uint16::nlanes) {
// some universal intrinsic code
// e.g. v_uint16...
}
#if CV_SIMD_WIDTH > 16
for (i < n - v_uint16x8::nlanes; i += v_uint16x8::nlanes) {
// handle loop tail by 128 bit SIMD
// e.g. v_uint16x8
}
#endif //CV_SIMD_WIDTH 
#endif// CV_SIMD
```
The main contradiction is that the variable-length Universal Intrinsic backend cannot use 128bit fixed-length data structures. Therefore, this PR uses the scalar loop to handle the loop tail.

This PR is marked as draft because the modification of the `box_filter.simd.hpp` file caused a compilation error. The cause of the error is initially believed to be due to an internal error in the GCC compiler.

```bash
box_filter.simd.hpp:1162:5: internal compiler error: Segmentation fault
 1162 |     }
      |     ^
0xe03883 crash_signal
        /wafer/share/gcc/gcc/toplev.cc:314
0x7ff261c4251f ???
        ./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x6bde48 hash_set<rtl_ssa::set_info*, false, default_hash_traits<rtl_ssa::set_info*> >::iterator::operator*()
        /wafer/share/gcc/gcc/hash-set.h:125
0x6bde48 extract_single_source
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1184
0x6bde48 extract_single_source
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:1174
0x119ad9e pass_vsetvl::propagate_avl() const
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4087
0x119ceaf pass_vsetvl::execute(function*)
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4344
0x119ceaf pass_vsetvl::execute(function*)
        /wafer/share/gcc/gcc/config/riscv/riscv-vsetvl.cc:4325
Please submit a full bug report, with preprocessed source (by using -freport-bug).
Please include the complete backtrace with any bug report.
```

This PR can be compiled with Clang 16, and `opencv_test_imgproc` is passed on QEMU.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Rewrite Universal Intrinsic code: float related part opencv#24325

The goal of this series of PRs is to modify the SIMD code blocks guarded by CV_SIMD macro: rewrite them by using the new Universal Intrinsic API.

The series of PRs is listed below:
opencv#23885 First patch, an example
opencv#23980 Core module
opencv#24058 ImgProc module, part 1
opencv#24132 ImgProc module, part 2
opencv#24166 ImgProc module, part 3
opencv#24301 Features2d and calib3d module
opencv#24324 Gapi module

This patch (hopefully) is the last one in the series. 

This patch mainly involves 3 parts
1. Add some modifications related to float (CV_SIMD_64F)
2. Use `#if (CV_SIMD || CV_SIMD_SCALABLE)` instead of `#if CV_SIMD || CV_SIMD_SCALABLE`, 
    then we can get the `CV_SIMD` module that is not enabled for `CV_SIMD_SCALABLE` by looking for `if CV_SIMD`
3. Summary of `CV_SIMD` blocks that remains unmodified: Updated comments
    - Some blocks will cause test fail when enable for RVV, marked as `TODO: enable for CV_SIMD_SCALABLE, ....`
    - Some blocks can not be rewrited directly. (Not commented in the source code, just listed here)
      - ./modules/core/src/mathfuncs_core.simd.hpp (Vector type wrapped in class/struct)
      - ./modules/imgproc/src/color_lab.cpp (Array of vector type)
      - ./modules/imgproc/src/color_rgb.simd.hpp (Array of vector type)
      - ./modules/imgproc/src/sumpixels.simd.hpp (fixed length algorithm, strongly ralated with `CV_SIMD_WIDTH`)
      These algorithms will need to be redesigned to accommodate scalable backends.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [ ] I agree to contribute to the project under Apache 2 License.
- [ ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [ ] The PR is proposed to the proper branch
- [ ] There is a reference to the original bug report and related work
- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants