Implement ctc prefix beam search decode for TextRecognitionModel. by yichenj · Pull Request #20524 · opencv/opencv

yichenj · 2021-08-09T05:56:15Z

The algorithm is based on Hannun's paper First-Pass Large Vocabulary
Continuous Speech Recognition using Bi-Directional Recurrent DNNs.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
The PR is proposed to proper branch
There is reference to original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

alalek

Thank you for the contribution!

Please take a look on the comments below.

alalek · 2021-08-09T15:53:35Z

modules/dnn/src/model.cpp

+              beam = std::move(newBeam);
+          }
+
+          CV_Assert(beam.size() > 0);


beam.size() > 0

Consider using empty call for that: !beam.empty()

alalek · 2021-08-09T15:57:35Z

modules/dnn/src/model.cpp

+          CV_Assert(beam.size() > 0);
+          for (int token : beam[0].first)
+          {
+              decodeSeq += vocabulary.at(token - 1);


It makes sense to add check to avoid out of range array access:

CV_Check(token, token > 0 && token <= vocabulary.size(), "")

alalek · 2021-08-09T16:02:02Z

modules/dnn/include/opencv2/dnn/dnn.hpp

+     * only take top @p vocPrune tokens in each search step, @p vocPrune <= 0 stands for disable this prune.
+     */
+    CV_WRAP
+    TextRecognitionModel& setDecodeOpts(int beam, int vocPrune = 0);


Perhaps, it makes sense to name this as setDecodeOptsCTCPrefixBeamSearch to avoid confusions in the future.

alalek · 2021-08-09T16:06:31Z

modules/dnn/include/opencv2/dnn/dnn.hpp

+     * {
+     *    'CTC-greedy': greedy decoding for the output of CTC-based methods
+     *    'CTC-prefix-beam-search': Prefix beam search decoding for the output of CTC-based methods
+     * }


Documentation doesn't look well.

It makes sense to move possible values before the @param statement.
Consider using the "list" mode through (-) without {}

Fixed. Format refers to cv::dnn::readNet().

yichenj · 2021-08-11T06:15:36Z

Seems to me that there is some problem in CI system

alalek

Well done 👍

("OpenCV CN" builders are optional. They may fail due to periodic network issues)

yichenj · 2021-08-11T11:12:20Z

Thank you for your time to review this code ❤️

modules/dnn/src/model.cpp

The algorithm is based on Hannun's paper: First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

asmorkalov added category: dnn feature labels Aug 9, 2021

alalek reviewed Aug 9, 2021

View reviewed changes

yichenj force-pushed the dnn_text_recognition_enhance branch 2 times, most recently from 5668dbc to 269e1de Compare August 11, 2021 02:43

alalek approved these changes Aug 11, 2021

View reviewed changes

yichenj force-pushed the dnn_text_recognition_enhance branch from 269e1de to c67bfb9 Compare August 12, 2021 10:50

yichenj commented Aug 12, 2021

View reviewed changes

modules/dnn/src/model.cpp Show resolved Hide resolved

Implement ctc prefix beam search decode for TextRecognitionModel.

955cf35

The algorithm is based on Hannun's paper: First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs

yichenj force-pushed the dnn_text_recognition_enhance branch from c67bfb9 to 955cf35 Compare August 12, 2021 12:34

opencv-pushbot merged commit 05d733e into opencv:master Aug 15, 2021

alalek mentioned this pull request Sep 15, 2021

TextReconitionModel failed with CTC-prefix-beam-search #20704

Closed

alalek mentioned this pull request Oct 15, 2021

(5.x) Merge 4.x #20886

Merged

Uh oh!

Conversation

yichenj commented Aug 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

alalek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yichenj commented Aug 11, 2021

Uh oh!

alalek left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yichenj commented Aug 11, 2021

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yichenj commented Aug 9, 2021 •

edited

Loading

alalek left a comment •

edited

Loading