Skip to content

[GSoC] High Level API and Samples for Scene Text Detection and Recognition#17570

Merged
alalek merged 17 commits intoopencv:masterfrom
wenqingzhang-gordon:text_det_recog_demo
Dec 3, 2020
Merged

[GSoC] High Level API and Samples for Scene Text Detection and Recognition#17570
alalek merged 17 commits intoopencv:masterfrom
wenqingzhang-gordon:text_det_recog_demo

Conversation

@wenqingzhang-gordon
Copy link
Copy Markdown
Contributor

@wenqingzhang-gordon wenqingzhang-gordon commented Jun 17, 2020

Merge with extra: opencv/opencv_extra#773
High-Level API and Samples for Scene Text Detection and Recognition
This is my project in GSoC 2020: OpenCV Text/Digit detection & recognition projects.

Short Video: https://drive.google.com/file/d/1IlGpRRhPCifC9TRzuhq0_G1P6MkP33BJ/view?usp=sharing

For more information:
https://github.com/HannibalAPE/opencv/blob/text_det_recog_demo/doc/tutorials/dnn/dnn_text_spotting/dnn_text_spotting.markdown

TODO LIST:

  • Scene Text Recognition:

  • Samples

  • High-Level API (CRNN)

  • Scene Text Detection:

  • Samples

  • High-Level API (DB & EAST)

  • Scene Text Spotting:

  • Samples

  • Document:

  • Tutorials

Pull Request Readiness Checklist

  • I agree to contribute to the project under OpenCV (Apache 2) License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    But the patch to opencv_extra does not have the same branch name. see as follows.
  • The feature is well documented and sample code can be built with the project CMake
opencv_extra=text_det_rec

build_image:Custom=centos:7
buildworker:Custom=linux-1

}
if (maxLoc > 0) {
char currentChar = vocabulary[maxLoc - 1];
if (currentChar != decodeSeq[-1])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decodeSeq[-1]

-1 is illegal, maybe use decodeSeq.back() instead ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix it.
I got the right result with index -1, so I thought a negative index may be supported now. :(
Thank you.

"{ help h | | Print help message. }"
"{ inputImage i | | Path to an input image. Skip this argument to capture frames from a camera. }"
"{ device d | 0 | camera device number. }"
"{ modelPath mp | | Path to a binary .onnx file contains trained DB detector model.}"
Copy link
Copy Markdown
Contributor

@berak berak Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious -- is there a pretrained onnx model (link) ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can upload models to Google Drive.
I am not sure about it, and I will ask my mentor.

@vpisarev vpisarev changed the title add scene text detection and recognition samples [GSoC] add scene text detection and recognition samples Jun 24, 2020
@wenqingzhang-gordon wenqingzhang-gordon changed the title [GSoC] add scene text detection and recognition samples [GSoC] High Level API and Samples for Scene Text Detection and Recognition Jul 7, 2020

TEST_P(Test_Model, SceneTextRec)
{
std::string imgPath = _tf("welcome.png");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this file?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already put the data into opencv/opencv_extra, but it has not been merged yet.
opencv/opencv_extra#773
The name of image has been changed into "text_rec_test.png", and I will push again when the data is ready.
Thanks for your review.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please push the new name. You don't need to wait the merge. See this step in the build pipeline.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.
Sadly, I still get an error in "Linux x64 Debug", but I find that the test_dnn in "Linux x64" passed. I am not sure about the difference between these two tests.

Is there any detailed log information containing which line throws out the error.
I can only get "error: (-215: Assertion failed) dims <= 2 in function 'at' thrown in the test body." in https://pullrequest.opencv.org/buildbot/builders/precommit_linux64_no_opt/builds/24676/steps/test_dnn/logs/stdio

I think I have tested the API successfully, and you can see more information in https://github.com/HannibalAPE/opencv/blob/text_det_recog_demo/doc/tutorials/dnn/dnn_scene_text_det_and_rec/scene_text_recognition.markdown

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still get an error in "Linux x64 Debug", but I find that the test_dnn in "Linux x64" passed. I am not sure about the difference between these two tests.

Obviously, it is "Debug" mode (with extra checks).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalek Thanks for your reminder.

@bhack
Copy link
Copy Markdown

bhack commented Jul 13, 2020

@vpisarev I suppose that we could not have a single PR looking at the TODO list in the description.

@vpisarev
Copy link
Copy Markdown
Contributor

@HannibalAPE, I'm now trying to run the code. None of the models you provided can be read with the version of OpenCV that is in text_det_recog_demo. BTW, hash sum for onnx/models/DB_IC15_resnet50.onnx does not match as well when I run download_models.py script. Could you please check that the models are imported correctly. Make sure that in CMake the version of protobuf from OpenCV is used, not the system version.

@wenqingzhang-gordon
Copy link
Copy Markdown
Contributor Author

wenqingzhang-gordon commented Jul 27, 2020

@HannibalAPE, I'm now trying to run the code. None of the models you provided can be read with the version of OpenCV that is in text_det_recog_demo. BTW, hash sum for onnx/models/DB_IC15_resnet50.onnx does not match as well when I run download_models.py script. Could you please check that the models are imported correctly. Make sure that in CMake the version of protobuf from OpenCV is used, not the system version.

@vpisarev

  1. I have updated the sha of DB_IC15_resnet50.onnx. I forgot to update it when I updated the model. sry.
  2. I compile it again, and I am sure that my models can be imported.
    I checked the protobuf version in the CMAKE.
  Other third-party libraries:
    Lapack:                      YES (/usr/lib/x86_64-linux-gnu/liblapack.so /usr/lib/x86_64-linux-gnu/libcblas.so /usr/lib/x86_64-linux-gnu/libatlas.so)
    Eigen:                       NO
    Custom HAL:                  NO
    Protobuf:                    build (3.5.1)

Is it not right?
When I compile it, only the ippicv_2020_lnx_intel64_20191018_general.tgz downloading is failed. Dose it matter?

@wenqingzhang-gordon
Copy link
Copy Markdown
Contributor Author

@vpisarev Hi Vadim, I have tested it on other Linux-based systems (e.g. Ubuntu 18.04 and 16.04). On both of them, my models can be imported and output the right results. Can you provide some information about your issue, then I can reproduce it.

@vpisarev
Copy link
Copy Markdown
Contributor

vpisarev commented Jul 29, 2020

@HannibalAPE, thank you. I've downloaded the latest DB_IC15_resnet50.onnx and it works well. It's noticeably slower than EAST detector, but the results are definitely better!

However, with the model DB_TD500_resnet50.onnx it still crashes with the following message:

[ERROR:0] global /Users/vpisarev/work/opencv/modules/dnn/src/dnn.cpp (3272) getLayerShapesRecursively OPENCV/DNN: [Reshape]:(712): getMemoryShapes() throws exception. inputs=1 outputs=1/1 blobs=0
[ERROR:0] global /Users/vpisarev/work/opencv/modules/dnn/src/dnn.cpp (3275) getLayerShapesRecursively     input[0] = [ 1 1 736 1280 ]
[ERROR:0] global /Users/vpisarev/work/opencv/modules/dnn/src/dnn.cpp (3279) getLayerShapesRecursively     output[0] = [ ]
[ERROR:0] global /Users/vpisarev/work/opencv/modules/dnn/src/dnn.cpp (3285) getLayerShapesRecursively Exception message: OpenCV(4.4.0-pre) /Users/vpisarev/work/opencv/modules/dnn/src/layers/reshape_layer.cpp:113: error: (-215:Assertion failed) total(srcShape, srcRange.start, srcRange.end) == maskTotal in function 'computeShapeByReshapeMask'

libc++abi.dylib: terminating with uncaught exception of type cv::Exception: OpenCV(4.4.0-pre) /Users/vpisarev/work/opencv/modules/dnn/src/layers/reshape_layer.cpp:113: error: (-215:Assertion failed) total(srcShape, srcRange.start, srcRange.end) == maskTotal in function 'computeShapeByReshapeMask'

Abort trap: 6

I'm using macOS 10.15.6, xcode 11.6. Protobuf is 3.5.1. Will try it on Linux tomorrow or on Friday.

Some other complains:

As I said, the speed is not that good. I tried to play with "--inputWidth" and "--inputHeight" parameters.

  1. First of all, it's inconvenient that you need to set both these fields. I would compute "height" out of "width" for example if just "width" is set and vice versa.
  2. Ok, when I specify both parameters, inputWidth, inputHeight, the app crashes again (with DB_IC15_resnet50.onnx model):
V16M:Release vpisarev$ ./example_dnn_scene_text_detection --mp=DB_IC15_resnet50.onnx --inputWidth=640 --inputHeight=368
[ERROR:0] global /Users/vpisarev/work/opencv/modules/dnn/src/dnn.cpp (3272) getLayerShapesRecursively OPENCV/DNN: [Eltwise]:(551): getMemoryShapes() throws exception. inputs=2 outputs=0/1 blobs=0
[ERROR:0] global /Users/vpisarev/work/opencv/modules/dnn/src/dnn.cpp (3275) getLayerShapesRecursively     input[0] = [ 1 256 24 40 ]
[ERROR:0] global /Users/vpisarev/work/opencv/modules/dnn/src/dnn.cpp (3275) getLayerShapesRecursively     input[1] = [ 1 256 23 40 ]
[ERROR:0] global /Users/vpisarev/work/opencv/modules/dnn/src/dnn.cpp (3285) getLayerShapesRecursively Exception message: OpenCV(4.4.0-pre) /Users/vpisarev/work/opencv/modules/dnn/src/layers/eltwise_layer.cpp:216: error: (-215:Assertion failed) inputs[0][j] == inputs[i][j] in function 'getMemoryShapes'

libc++abi.dylib: terminating with uncaught exception of type cv::Exception: OpenCV(4.4.0-pre) /Users/vpisarev/work/opencv/modules/dnn/src/layers/eltwise_layer.cpp:216: error: (-215:Assertion failed) inputs[0][j] == inputs[i][j] in function 'getMemoryShapes'

can you please check if inputWidth/inputHeight work at all?

==

The text recognition sample says it supports live video capture from camera, but if I run it without image, it complains that the image is not set. From the code I can conclude that it does not support live video capture. Can you modify scene_text_detection sample to support recognition as well? It would be very useful demonstration on how to use detection and recognition together.

@wenqingzhang-gordon
Copy link
Copy Markdown
Contributor Author

wenqingzhang-gordon commented Jul 30, 2020

However, with the model DB_TD500_resnet50.onnx it still crashes with the following message:

@vpisarev It may be caused by the wrong input size. For DB_TD500_resnet50.onnx, the image shape should be set to 736x736, which is mentioned in both the tutorial and the sample.

"--inputWidth" and "--inputHeight"

These two parameters are actually prepared for different models, because these models are trained on different benchmarks.
The performance is related to the shape of training samples.

when I specify both parameters, inputWidth, inputHeight, the app crashes again

Currently, it only supports a predefined shape. I will try to update it to support dynamic shape. But I think there will be a drop in accuracy, because the inference shape is not consistent with the training shape.

It would be very useful demonstration on how to use detection and recognition together.

This sample and some new tutorials will be pushed in this weak.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contribution!


```cpp
#include <iostream>
#include <fstream>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create standalone .cpp file for example code and embed snippets into documentation only.

You can adopt sample file below or create new one.

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 5, 2020

doc/tutorials/dnn/dnn_text_spotting/detect_test1.png
doc/tutorials/dnn/dnn_text_spotting/detect_test2.png

PNG lossless format is not necessary here (files are large for real world images). Try to reduce image size using the .jpg format

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 5, 2020

doc/tutorials/dnn/dnn_text_spotting/text_recognition.cpp

Sample code for tutorials should to into samples/cpp/tutorial_code/...

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Please take a look on left comments

} else {
// Open an image file
CV_Assert(parser.has("inputImage"));
Mat frame = imread(parser.get<String>("inputImage"));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use samples::findFile() for better file searching experience.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated all imread with samples::findFile()

@alalek
Copy link
Copy Markdown
Member

alalek commented Sep 5, 2020

@dkurt Please take look on public API proposals.

/**
* @brief Given the @p input frame, create input blob, run net and return recognition result.
* @param[in] frame: The input image.
* @param[in] decodeType: The decoding method of translating the network output into string.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which are possible values?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only supports "CTC-greedy" now, and I will add more decoding methods in the future, such as Beam Searching.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

User should know the options so it must be documented or we can add enum

* @param[in] binThresh: The threshold of the binary map.
* @param[in] polyThresh: The threshold of text polygons.
* @param[in] unclipRatio: The unclip ratio of the detected text region.
* @param[in] maxCandidates: The max number of the output polygons.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to limit this number?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can get good detection results with these parameters, for more information you can refer to the paper and code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some difficult cases, the output map of the network is full of small noise, and maxCandidates can avoid wasting inference time.

@wenqingzhang-gordon
Copy link
Copy Markdown
Contributor Author

wenqingzhang-gordon commented Nov 19, 2020

@alalek
I have no idea about how to update the struct Voc without modification to struct Impl.
I tried to define TextRecognition::Voc inherited from Model::Impl, but I do not know how to apply "Bridge" design pattern here.
Do you have any suggestions about it? Is there any example?
Thank you.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HannibalAPE Thank you for contribution!
I will update introduced public APIs and push them here till the end of this week.


Update: Done

Copy link
Copy Markdown
Contributor Author

@wenqingzhang-gordon wenqingzhang-gordon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alalek Thank you! I learn a lot with your help.

CV_Assert(!vocPath.empty());
std::ifstream vocFile;
vocFile.open(vocPath);
vocFile.open(samples::findFile(vocPath));
Copy link
Copy Markdown
Contributor Author

@wenqingzhang-gordon wenqingzhang-gordon Nov 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether I need to add samples::findFile() whenever I open files?
Do I need to change TextRecognitionModel recognizer(modelPath) into TextRecognitionModel recognizer(samples::findFile(modelPath)) ?
Does it slow the speed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samples::findFile() helps to use file by name (instead of full path) from <opencv>/samples/data location.

Model file is not stored there (and no plans to put it there due its size).

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HannibalAPE

I pushed updated public API for text detection / recognition tasks. Please take a look and check samples / documentation (probably I miss something).

{
CV_TRACE_FUNCTION();
std::vector< std::vector<Point> > contours = detectTextContours(frame);
confidences = std::vector<float>(contours.size(), 1.0f);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any confidences scoring in DB detection algo?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/HannibalAPE/opencv/blob/f59aa6d4ae76e08f199e56ae94235295e0380076/modules/dnn/src/model.cpp#L972

You can regard the return of contourScore() as a kind of confidence, but it is only used to filter some bad detection results. It is not the same meaning as those in general object detection algorithms.

@wenqingzhang-gordon
Copy link
Copy Markdown
Contributor Author

@alalek Since the onnx importer has supported models with dynamic input shape, I will try to generate new models of DB and update them in this week. I will also check samples and tutorials in this week.

update opencv.bib
* @return array with text detection results
*/
CV_WRAP
std::vector<cv::RotatedRect> detect(InputArray frame) const;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually RotatedRect doesn't work well with perspective transformations (like this one).

Perhaps we need 4 points in API with strong order (bl, tl, tr, br - according to targetVertices) which should be used with getPerspectiveTransform() to get more accurate results.

I will try to update API on this week.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can not open the above link. Can you share it with google drive?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is similar to side boxes of cube from here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, DL models may not work with perspective transformations (EAST output doesn't know anything about that) and they just detect rotated text.

any thoughts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on different DL models. Some methods can output irregular quadrilaterals.

From my side, the perspective transformation is a temporary replacement, or just a choice for the four-point outputs. (rotated boxes and irregular quadrilaterals)

There is a popular and fast text recognition algorithm ASTER which adopts Thin Plate Spline Transformation in its rectification network. (like this).
I am not sure whether the TPS transformation is implemented in OpenCV, maybe not?
You can refer to this.

By the way, ASTER is also an algorithm from our lab, and we are glad to contribute it to OpenCV.
However, there are some things to do before.

  1. support TPS transformation
  2. update LSTM in modules/dnn/src/onnx/onnx_importer.cpp
    we need to set these parameters non-zero, but it is not supported now.
  3. support GRU layer
  4. ...

I suppose to work on it after this PR.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated TextDetectionModel API:

  • added quadrangle support with strong requirement for order of returned points
  • dropped .detectTextCountours() from TextDetectionModel_DB (replaced by quadrangles).

Examples:
```bash
example_dnn_scene_text_recognition -mp=path/to/crnn_cs.onnx -i=path/to/an/image -rgb=1 -vp=/path/to/alphabet_94.txt
example_dnn_scene_text_detection -mp=path/to/DB_TD500_resnet50.onnx -i=path/to/an/image -ih=736 -iw=736
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-mp=path/to/DB_TD500_resnet50.onnx
-ih=736 -iw=736

Please check model parameters here (and above near model download links).
This set performs better: -ih=736 -iw=1280 (on IC15/test_images/img_5.jpg)

BTW, It make sense to put some defaults into TextDetectionModel_DB ctor.

Copy link
Copy Markdown
Contributor Author

@wenqingzhang-gordon wenqingzhang-gordon Dec 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, DB_TD500_resnet50.onnx and DB_IC15_resnet50.onnx are prepared for different datasets (i.e. TD500 and IC15) respectively, which aims to perform better performance on each benchmark. The recommended settings do not cover the above case (use DB_TD500_resnet50.onnx on IC15 images).

If it is needed, I can train a new model with different datasets together. What is your opinion?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation. That makes sense.

I can train a new model

This can be an activity after this PR merge.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a tutorial on training custom models and converting existing models to onnx?

Comment on lines +1404 to +1408
* Each result is quadrangle's 4 points in this order:
* - bottom-left
* - top-left
* - top-right
* - bottom-right
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added strong requirements for results to avoid "points reordering" in sample code.

CV_WRAP
void detect(
InputArray frame,
CV_OUT std::vector< std::vector<Point> >& detections,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any thoughts about Point vs Point2f?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Point is okay.

(1) keep points' order consistent with (bl, tl, tr, br) in unclip
(2) update contourScore with boundingRect
void setNMSThreshold(float nmsThreshold_) { nmsThreshold = nmsThreshold_; }
float getNMSThreshold() const { return nmsThreshold; }

// TODO: According to article EAST supports quadrangles output: https://arxiv.org/pdf/1704.03155.pdf
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can find that the performance of QUAD is worse than RBOX in Table. 3 and the authors do not provide official code and models.
Some good re-implementations of EAST only support RBOX.
TF: https://github.com/argman/EAST
PyTorch: https://github.com/songdejia/EAST

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@HannibalAPE @bhack Thank you for contribution!

@alalek alalek merged commit 22d64ae into opencv:master Dec 3, 2020
@wenqingzhang-gordon
Copy link
Copy Markdown
Contributor Author

Thank you. I really appreciate your help. @alalek @vpisarev @dkurt @bhack @berak

@alalek alalek mentioned this pull request Apr 9, 2021
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
[GSoC] High Level API and Samples for Scene Text Detection and Recognition

* APIs and samples for scene text detection and recognition

* update APIs and tutorial for Text Detection and Recognition

* API updates:
(1) put decodeType into struct Voc
(2) optimize the post-processing of DB

* sample update:
(1) add transformation into scene_text_spotting.cpp
(2) modify text_detection.cpp with API update

* update tutorial

* simplify text recognition API
update tutorial

* update impl usage in recognize() and detect()

* dnn: refactoring public API of TextRecognitionModel/TextDetectionModel

* update provided models
update opencv.bib

* dnn: adjust text rectangle angle

* remove points ordering operation in model.cpp

* update gts of DB test in test_model.cpp

* dnn: ensure to keep text rectangle angle

- avoid 90/180 degree turns

* dnn(text): use quadrangle result in TextDetectionModel API

* dnn: update Text Detection API
(1) keep points' order consistent with (bl, tl, tr, br) in unclip
(2) update contourScore with boundingRect
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants