Implement ctc prefix beam search decode for TextRecognitionModel.#20524
Implement ctc prefix beam search decode for TextRecognitionModel.#20524opencv-pushbot merged 1 commit intoopencv:masterfrom
Conversation
alalek
left a comment
There was a problem hiding this comment.
Thank you for the contribution!
Please take a look on the comments below.
modules/dnn/src/model.cpp
Outdated
| beam = std::move(newBeam); | ||
| } | ||
|
|
||
| CV_Assert(beam.size() > 0); |
There was a problem hiding this comment.
beam.size() > 0
Consider using empty call for that: !beam.empty()
| CV_Assert(beam.size() > 0); | ||
| for (int token : beam[0].first) | ||
| { | ||
| decodeSeq += vocabulary.at(token - 1); |
There was a problem hiding this comment.
It makes sense to add check to avoid out of range array access:
CV_Check(token, token > 0 && token <= vocabulary.size(), "")
| * only take top @p vocPrune tokens in each search step, @p vocPrune <= 0 stands for disable this prune. | ||
| */ | ||
| CV_WRAP | ||
| TextRecognitionModel& setDecodeOpts(int beam, int vocPrune = 0); |
There was a problem hiding this comment.
Perhaps, it makes sense to name this as setDecodeOptsCTCPrefixBeamSearch to avoid confusions in the future.
| * { | ||
| * 'CTC-greedy': greedy decoding for the output of CTC-based methods | ||
| * 'CTC-prefix-beam-search': Prefix beam search decoding for the output of CTC-based methods | ||
| * } |
There was a problem hiding this comment.
Documentation doesn't look well.
It makes sense to move possible values before the @param statement.
Consider using the "list" mode through (-) without {}
There was a problem hiding this comment.
Fixed. Format refers to cv::dnn::readNet().
5668dbc to
269e1de
Compare
|
Seems to me that there is some problem in CI system |
|
Thank you for your time to review this code ❤️ |
269e1de to
c67bfb9
Compare
The algorithm is based on Hannun's paper: First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs
c67bfb9 to
955cf35
Compare
The algorithm is based on Hannun's paper First-Pass Large Vocabulary
Continuous Speech Recognition using Bi-Directional Recurrent DNNs.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.