Conversation
8d0aa21 to
720d50f
Compare
|
Finally, it works. At first, I was worried about whether there would be too few keypoints with 320×200 resolution. But the results show that when prob_threshold is 0.01, there are about 400 keypoints in most scenes after NMS. Descriptors also perform well and can detect loop closure correctly. Image rate is limited to a maximum of 15 FPS when on-device superpoint is enabled. Now the network is running using 10 SHAVEs, 1x Thread (2NCE/Thread) for low latency. It should be just fine for 15 FPS. But I noticed that when other features are configured on OAK (such as some depth post-processing filters), the performance of neural network inference may also be affected. Although their resources are allocated separately. I don't know exactly what factors are involved. The current OAK resource configuration in rtabmap should be reasonable. If there is instability in the long-running test later, such as frame drop, we will consider running the network using 5 SHAVEs, 2x Threads (1NCE/Thread) to ensure 15 FPS. To do this change L370 and L371 to Information on low latency and max FPS can be found here. ONNX conversion is in my fork of superpoint_infer_engine. If there is nothing to change, just use the converted blobs from depthai-superpoint. I only made some changes to the outputs of the model. For the Detector Head, Softmax is done and heatmap is reshaped. For the Descriptor Head, L2 Norm is removed to avoid FP16 overflow issue. We perform L2 Norm on the host. Descriptor output is converted to NHWC, so it can be easily converted to cv::Mat via L632. Here is the entire network visualized using Netron. |
|
@borongyuan great work! I tried it, it kinda work but and I am not sure I get the right results. It seems it like to extract many keypoints from the same locations: While odometry seems working, loop closure detection doesn't work very well, in contrast to original superpoint I integrated. I'll try to setup this week my computer with pytorch to see the difference. |
|
The heatmap is resized to original image size using bicubic interpolation for better keypoint location, even though the original heatmap has a lower resolution. This is a trick used by some other networks with heatmaps. Then after the mask of prob_threshold, multiple keypoints will be extracted from the same location with different prob/response. NMS should select most significant ones. So when using high-resolution input, such as 800P, the nms distance should be increased appropriately. I also noticed that with the default nms distance (4), in some scenes using 400P looks better than using 800P. |
|
I think limited by the computing power of OAK Series 2, the current speed and accuracy should be just enough. So I am also looking forward to OAK Series 3, hoping to get better results then. I will try other features later when I have time. For example, KeyNet, AffNet, and HardNet, which are included in kornia.feature. According to the DepthAI documentation, some algorithms in kornia may be converted to blobs. But it is good to use superpoint for now. On-device GFTT + RootSIFT is also a good choice. I will study other things later. |
|
Results at 400p are very promising, thanks again for the PR! |
|
The fundamental limitation is that it runs at 320×200 resolution on the device. Otherwise the frame rate will not be sufficient for VO. Even if we can locate keypoints on the original image by resize, the resolution of coarse_desc is only 40×25. So the performance is definitely not comparable to inference on raw images. I'm currently treating this as an experimental feature, as it's really interesting to implement. But it’s also possible to improve it later. In addition to waiting for the next generation of hardware, we can also try to simplify the model. A simple experience is that for VGG-like models, MobileNetV1 can be used instead, and for ResNet-like models, MobileNetV2 can be used. I saw someone tried V2 before and it didn't work well, maybe they should try V1. Another possibility is that we only use SuperPoint for loop closure detection. Then it can run on device with higher resolution input, since it doesn’t need to process every frame. I'm not sure if this is convenient to implement. For NMS(), I modified it slightly when I moved it to util2d. It seems that after the response of Keypoints is set, there is no need to use conf Mat as input. I don't know if it's related to the last few issues. Correct me if I am wrong. |
For standalone, it may not be useful unless we combine with external VO/VIO like T265, as rtabmap's vo needs high framerate. I see it maybe more useful on ROS on a robot using wheel/lidar odometry, then low frame rate OAK's image with superpoint could be used for loop closure detection. For NMS, I checked and the issue was there before your PR, so I just modified after merging your PR. The confidence matrix is used to get confidence of other keypoints around a fixed radius of the current keypoint checked. This was a code taken from this project, this could be indeed optimized using a different approach (like using nearest neighbor index). |
I mean use other features for VO. SuperPoint is only used for loop closure. It should still be possible, right? You said before that odometry won't re-detect keypoints when setFeatures() is used. But now we want it to re-detect.
There is a clever simple_nms() implementation in SuperGlue, which acts directly on the Heatmap. This issue discusses both implementations. But I can't use this implementation because maxpooling in OpenCV is not a standalone function. |
|
I expected that image framerate would be lower when superpoint are extracted, but if image frame rate is 30 Hz, while superpoint is 5 Hz, send images at 30 Hz for VO thread (extracting other features), and send superpoint features to rtabmap at 5Hz. The issue is synchronization of odometry with superpoint received with latency on the other thread. At that point, I would use rtabmap_ros to redirect and synchronize the data. For SuperGlue's NMS, I guess that pytorch version could be converted to cpp torch. |










SuperPoint MyriadX Blob:
https://github.com/borongyuan/depthai-superpoint/raw/main/blobs/superpoint_200x320_10shave.blob