Skip to content

Add Depthai SuperPoint#1058

Merged
matlabbe merged 4 commits intointrolab:masterfrom
borongyuan:depthai-superpoint
Jun 21, 2023
Merged

Add Depthai SuperPoint#1058
matlabbe merged 4 commits intointrolab:masterfrom
borongyuan:depthai-superpoint

Conversation

@borongyuan
Copy link
Copy Markdown
Contributor

@borongyuan borongyuan commented Jun 17, 2023

SuperPoint MyriadX Blob:
https://github.com/borongyuan/depthai-superpoint/raw/main/blobs/superpoint_200x320_10shave.blob

  • add on-device superpoint network
  • move nms to util2d
  • update PreferencesDialog
  • keypoint decoder
  • descriptor decoder
  • update to depthai-core v2.22.0

@borongyuan borongyuan marked this pull request as ready for review June 19, 2023 06:34
@borongyuan borongyuan force-pushed the depthai-superpoint branch from 8d0aa21 to 720d50f Compare June 19, 2023 07:15
@borongyuan
Copy link
Copy Markdown
Contributor Author

borongyuan commented Jun 19, 2023

Finally, it works. At first, I was worried about whether there would be too few keypoints with 320×200 resolution. But the results show that when prob_threshold is 0.01, there are about 400 keypoints in most scenes after NMS. Descriptors also perform well and can detect loop closure correctly. Image rate is limited to a maximum of 15 FPS when on-device superpoint is enabled. Now the network is running using 10 SHAVEs, 1x Thread (2NCE/Thread) for low latency. It should be just fine for 15 FPS. But I noticed that when other features are configured on OAK (such as some depth post-processing filters), the performance of neural network inference may also be affected. Although their resources are allocated separately. I don't know exactly what factors are involved. The current OAK resource configuration in rtabmap should be reasonable. If there is instability in the long-running test later, such as frame drop, we will consider running the network using 5 SHAVEs, 2x Threads (1NCE/Thread) to ensure 15 FPS. To do this change L370 and L371 to

                superPointNetwork->setNumInferenceThreads(2);
                superPointNetwork->setNumNCEPerInferenceThread(1);

Information on low latency and max FPS can be found here.

ONNX conversion is in my fork of superpoint_infer_engine. If there is nothing to change, just use the converted blobs from depthai-superpoint. I only made some changes to the outputs of the model. For the Detector Head, Softmax is done and heatmap is reshaped. For the Descriptor Head, L2 Norm is removed to avoid FP16 overflow issue. We perform L2 Norm on the host. Descriptor output is converted to NHWC, so it can be easily converted to cv::Mat via L632. Here is the entire network visualized using Netron.
superpoint_200x320-sim onnx

@matlabbe
Copy link
Copy Markdown
Member

matlabbe commented Jun 20, 2023

@borongyuan great work! I tried it, it kinda work but and I am not sure I get the right results. It seems it like to extract many keypoints from the same locations:
Screenshot from 2023-06-19 18-50-04

While odometry seems working, loop closure detection doesn't work very well, in contrast to original superpoint I integrated. I'll try to setup this week my computer with pytorch to see the difference.

@borongyuan
Copy link
Copy Markdown
Contributor Author

The heatmap is resized to original image size using bicubic interpolation for better keypoint location, even though the original heatmap has a lower resolution. This is a trick used by some other networks with heatmaps. Then after the mask of prob_threshold, multiple keypoints will be extracted from the same location with different prob/response. NMS should select most significant ones. So when using high-resolution input, such as 800P, the nms distance should be increased appropriately. I also noticed that with the default nms distance (4), in some scenes using 400P looks better than using 800P.
But what confuses me is your screenshot, is it cropped? Keypoints should not appear close to image borders. I didn't add code to filter them. I just noticed that the raw network output doesn't respond to image borders. Also I use cv::remap() to get descriptor vectors, which is equivalent to torch::grid_sampler(). The coordinates are calculated using the method from this post. After calculation, it can be noticed that the border pixels in original image correspond to the "outliers" in coarse_desc. So even if keypoints appear close to image borders, their descriptors will be set to zeros by default.
Also please make sure you downloaded the latest blob file, because I made an update before submitting this PR. If it works correctly, you should see the successful loop closure like the image below. In fact, I still don't fully understand how the UI displays these keypoints types. Is it just based on the length and type of descriptors?
2023-06-20 10-25-24屏幕截图

@borongyuan
Copy link
Copy Markdown
Contributor Author

borongyuan commented Jun 20, 2023

I think limited by the computing power of OAK Series 2, the current speed and accuracy should be just enough. So I am also looking forward to OAK Series 3, hoping to get better results then. I will try other features later when I have time. For example, KeyNet, AffNet, and HardNet, which are included in kornia.feature. According to the DepthAI documentation, some algorithms in kornia may be converted to blobs. But it is good to use superpoint for now. On-device GFTT + RootSIFT is also a good choice. I will study other things later.

@matlabbe
Copy link
Copy Markdown
Member

I did some tests. I cannot make VO working with superpoint generated on oak-d at 800p (it gets lost as soon as I move). It works well at 400p though. I tried increasing NMS distance on 800p, but it didn't really help. My image was indeed cropped in my post above. Here some results and comparison with pytroch version:

  • 800p on device superpoint (nms=4):

    picture

  • 800p on device superpoint (nms=8):

    picture5

  • 800p torch superpoint (nms=4):

    picture2

  • 400p on device superpoint (nms=4):

    picture3

  • 400p torch superpoint (nms=4):

    picture4

Color code can be seen by mouse over the loop closure ID or the odometry view:
Screenshot from 2023-06-20 21-16-06

@matlabbe matlabbe merged commit d901fb2 into introlab:master Jun 21, 2023
@matlabbe
Copy link
Copy Markdown
Member

Results at 400p are very promising, thanks again for the PR!

@borongyuan
Copy link
Copy Markdown
Contributor Author

The fundamental limitation is that it runs at 320×200 resolution on the device. Otherwise the frame rate will not be sufficient for VO. Even if we can locate keypoints on the original image by resize, the resolution of coarse_desc is only 40×25. So the performance is definitely not comparable to inference on raw images. I'm currently treating this as an experimental feature, as it's really interesting to implement. But it’s also possible to improve it later. In addition to waiting for the next generation of hardware, we can also try to simplify the model. A simple experience is that for VGG-like models, MobileNetV1 can be used instead, and for ResNet-like models, MobileNetV2 can be used. I saw someone tried V2 before and it didn't work well, maybe they should try V1. Another possibility is that we only use SuperPoint for loop closure detection. Then it can run on device with higher resolution input, since it doesn’t need to process every frame. I'm not sure if this is convenient to implement.

For NMS(), I modified it slightly when I moved it to util2d. It seems that after the response of Keypoints is set, there is no need to use conf Mat as input. I don't know if it's related to the last few issues. Correct me if I am wrong.

@borongyuan borongyuan deleted the depthai-superpoint branch June 21, 2023 06:35
@matlabbe
Copy link
Copy Markdown
Member

Another possibility is that we only use SuperPoint for loop closure detection. Then it can run on device with higher resolution input, since it doesn’t need to process every frame. I'm not sure if this is convenient to implement.

For standalone, it may not be useful unless we combine with external VO/VIO like T265, as rtabmap's vo needs high framerate. I see it maybe more useful on ROS on a robot using wheel/lidar odometry, then low frame rate OAK's image with superpoint could be used for loop closure detection.

For NMS, I checked and the issue was there before your PR, so I just modified after merging your PR. The confidence matrix is used to get confidence of other keypoints around a fixed radius of the current keypoint checked. This was a code taken from this project, this could be indeed optimized using a different approach (like using nearest neighbor index).

@borongyuan
Copy link
Copy Markdown
Contributor Author

For standalone, it may not be useful unless we combine with external VO/VIO like T265, as rtabmap's vo needs high framerate. I see it maybe more useful on ROS on a robot using wheel/lidar odometry, then low frame rate OAK's image with superpoint could be used for loop closure detection.

I mean use other features for VO. SuperPoint is only used for loop closure. It should still be possible, right? You said before that odometry won't re-detect keypoints when setFeatures() is used. But now we want it to re-detect.

For NMS, I checked and the issue was there before your PR, so I just modified after merging your PR. The confidence matrix is used to get confidence of other keypoints around a fixed radius of the current keypoint checked. This was a code taken from this project, this could be indeed optimized using a different approach (like using nearest neighbor index).

There is a clever simple_nms() implementation in SuperGlue, which acts directly on the Heatmap. This issue discusses both implementations. But I can't use this implementation because maxpooling in OpenCV is not a standalone function.

@matlabbe
Copy link
Copy Markdown
Member

I expected that image framerate would be lower when superpoint are extracted, but if image frame rate is 30 Hz, while superpoint is 5 Hz, send images at 30 Hz for VO thread (extracting other features), and send superpoint features to rtabmap at 5Hz. The issue is synchronization of odometry with superpoint received with latency on the other thread. At that point, I would use rtabmap_ros to redirect and synchronize the data.

For SuperGlue's NMS, I guess that pytorch version could be converted to cpp torch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants