Further optimize DNN for RISC-V Vector.#21086
Merged
alalek merged 5 commits intoopencv:4.xfrom Dec 10, 2021
Merged
Conversation
Contributor
Author
|
|
asmorkalov
approved these changes
Dec 10, 2021
Merged
Merged
a-sajjad72
pushed a commit
to a-sajjad72/opencv
that referenced
this pull request
Mar 30, 2023
Further optimize DNN for RISC-V Vector. * Optimize DNN on RVV by using vsetvl. * Rename vl. * Update fastConv by using setvl instead of mask. * Fix fastDepthwiseConv
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch is going to further optimize DNN for RVV based on my GSoC work. The previous version is #20521.
There are 3 changes in this patch.
Using
vsetvlinstead of a branch to handle vector tail (The last few elements of each row, which can not fill the entire vector register).I wrote an example on Godbolt about the different between using
vsetvland usingifto show that use ofvsetvleliminates conditional jumps and just introduce a statement (sub).Unify the name of variables, which is about
vlThe variable naming in each function before is independent and unfriendly to readers. So I modified the variable name about
vlwith the same rule. For now, in all 4 functions:All the following variables are used for
vlparameters in intrinsic, but different names have different meanings:vlm<LMUL>: The maximum value thatvlcan be set for a certain LMUL. It is a constant value.vl: The number of elements processed in each inner loop, which will be used to process tail in the final loop.unroll_tail: The number of elements processed in each outer loop, also used to process tail in the final loop, but this tail is caused by loop unrollingAnd there are new parameters intrudced by CHANGE 1 called
avl, which represents the number of unprocessed elements, and used as the parameter ofvsetvl.Update the way function
fastConvhandles the matrix tail (The last few rows of the matrix, usually caused by loop unrolling, thevlfor matrix tail is calledunroll_tailin CHANGE 2).In previous version, I use both vl and mask for the matrix tail to handle the different sizes of the blocksize and here is the discussion at the time. However, mask usually takes a lot of costs and I find a new way to only use vl to handle that. With that, no mask, even no additional branch is needed.
I have already tested this patch on QEMU, the minimal DNN test data set show the same result on the patch and on the master branch:
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.