Skip to content

Implement ctc prefix beam search decode for TextRecognitionModel.#20524

Merged
opencv-pushbot merged 1 commit intoopencv:masterfrom
yichenj:dnn_text_recognition_enhance
Aug 15, 2021
Merged

Implement ctc prefix beam search decode for TextRecognitionModel.#20524
opencv-pushbot merged 1 commit intoopencv:masterfrom
yichenj:dnn_text_recognition_enhance

Conversation

@yichenj
Copy link
Copy Markdown
Contributor

@yichenj yichenj commented Aug 9, 2021

The algorithm is based on Hannun's paper First-Pass Large Vocabulary
Continuous Speech Recognition using Bi-Directional Recurrent DNNs.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution!

Please take a look on the comments below.

beam = std::move(newBeam);
}

CV_Assert(beam.size() > 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beam.size() > 0

Consider using empty call for that: !beam.empty()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

CV_Assert(beam.size() > 0);
for (int token : beam[0].first)
{
decodeSeq += vocabulary.at(token - 1);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to add check to avoid out of range array access:

CV_Check(token, token > 0 && token <= vocabulary.size(), "")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* only take top @p vocPrune tokens in each search step, @p vocPrune <= 0 stands for disable this prune.
*/
CV_WRAP
TextRecognitionModel& setDecodeOpts(int beam, int vocPrune = 0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, it makes sense to name this as setDecodeOptsCTCPrefixBeamSearch to avoid confusions in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

* {
* 'CTC-greedy': greedy decoding for the output of CTC-based methods
* 'CTC-prefix-beam-search': Prefix beam search decoding for the output of CTC-based methods
* }
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation doesn't look well.

It makes sense to move possible values before the @param statement.
Consider using the "list" mode through (-) without {}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Format refers to cv::dnn::readNet().

@yichenj yichenj force-pushed the dnn_text_recognition_enhance branch 2 times, most recently from 5668dbc to 269e1de Compare August 11, 2021 02:43
@yichenj
Copy link
Copy Markdown
Contributor Author

yichenj commented Aug 11, 2021

Seems to me that there is some problem in CI system

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done 👍

("OpenCV CN" builders are optional. They may fail due to periodic network issues)

@yichenj
Copy link
Copy Markdown
Contributor Author

yichenj commented Aug 11, 2021

Thank you for your time to review this code ❤️

@yichenj yichenj force-pushed the dnn_text_recognition_enhance branch from 269e1de to c67bfb9 Compare August 12, 2021 10:50
The algorithm is based on Hannun's paper: First-Pass Large Vocabulary
Continuous Speech Recognition using Bi-Directional Recurrent DNNs
@yichenj yichenj force-pushed the dnn_text_recognition_enhance branch from c67bfb9 to 955cf35 Compare August 12, 2021 12:34
@opencv-pushbot opencv-pushbot merged commit 05d733e into opencv:master Aug 15, 2021
@alalek alalek mentioned this pull request Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants