- This is a summary of my contributions to the OpenCV library under Google Summer of Code 2023.
- I'd like to thank my mentor Yuantao Feng for his invaluable guidance and support throughout the project. His help was absolutely instrumental to the project's success.
Optical flow is the problem of estimating the motion of objects in an image or video sequence. Optical flow is pivotal to many computer vision applications such as object tracking, video stabilization, and motion analysis, providing essential information about the dynamics of a scene or an object. With the upsurge in the use of deep neural networks, DNN models were developed to solve the optical flow problem, achieving state-of-the-art results by learning to estimate motion from image data. However, DNNs can become computationally expensive making it infeasible to be deployed on embedded systems. This problem is tackled by the development of lightweight DNN models. However, the OpenCV model zoo hasn’t had any implementation of an optical flow lightweight model yet. Thus, this project aims to find the best lightweight optical flow model in terms of model size, speed, and accuracy to introduce to the OpenCV model zoo.
Using KITTI Vision Benchmark Suite, I shortlisted three models based on their accuracy - measured in Fl-All1 - and runtime. The models were RAFT-it+_RVC, GMFlow, and MatchFlow. In consultation with the mentor, we decided to use the RAFT2 model for two main reasons.
- First, the RAFT model is the basis for various optical flow models, such as ScaleRAFTRBO, CamLiRAFT, and RAFT-3D++. Therefore, the RAFT model is multi-purpose and modular and can be modified by OpenCV users to meet their various needs, especially given that it is supported across different libraries and frameworks.
- Second reason is the open-source availability of the model's ONNX version, which facilitates loading the model with OpenCV DNN.
To convert RAFT from ONNX to run with OpenCV DNN, the GatherElements operator had to be added to the OpenCV DNN module. The first pull request to OpenCV was implementing the operator in the OpenCV DNN module. Then, a second pull request was made to OpenCV_extra to test and validate the implementation of GatherElements.
Simultaneous to implementing and testing the GatherElements operator, a third pull request was submitted to opencv_zoo to add the RAFT model along with a demo and example for inputs and outputs using the ONNX version of the RAFT model.
After the GatherElements operator passed the tests, the pull request to opencv_zoo was updated to run the demos for the RAFT model with OpenCV DNN instead of ONNX. In addition, I solved the twinkling issue of the visualization module. An example of the updated output:
Finally, the initial OpenCV pull request of the GatherElements implmentation was updated to test the RAFT model loaded with OpenCV DNN. And the second OpenCV_extra pull request was updated to add the data needed for testing the RAFT model.
Footnotes
-
Fl-All refers to the percentage of pixels with endPoint Error over the complete frames (EPE) more than 3 pixels or larger than 5% of the ground truth. ↩
-
The Recurrent All-Pairs Field Transforms (RAFT) model was introduced in 2020 by Zachary Teed and Jia Deng of Princeton University. The model is available in open source on the GitHub repo RAFT. And the research paper is published on arXiv with the title RAFT: Recurrent All-Pairs Field Transforms for Optical Flow. ↩
