Implementation of SuperPoint and AutoModelForInterestPointDescription#25786
Implementation of SuperPoint and AutoModelForInterestPointDescription#25786sbucaille wants to merge 208 commits intohuggingface:mainfrom
Conversation
- Added the SuperPointConfig - Added the SuperPointModel and its implementation - Added new ImagePointDescriptionOutput dataclass
… and added it in AutoModels
|
@amyeroberts I have some questions about the implementation and AutoModel classes. First of all, I try to follow as much as possible the patterns I see in other model implementations (resnet or convnextv2 for example), but unlike these models, SuperPoint really only have one function or "mode" here, just to output the keypoints, their scores and descriptors. This is why I only implemented SuperPoint as a Then I added this Finally,
Apart from adding the AutoModelForInterestPointDescription, I couldn't find how to define such inputs and outputs, is it a new pipeline I should define or something else ? |
- Divided SuperPointModel in multiple submodules to handle hidden states in ModelOutput support (see ImagePointDescriptionOutput). - Added mandatory information to the SuperPointPreTrainedModel class such as the main input name and supports_gradient_checkpointing boolean. - Added weight initialization - Added imports to transformers.__init__.py
|
Hi @sbucaille, This is a bit of a special case. For other models which only perform a single task, what we normally do is just have |
|
@sbucaille From next week, I'll be off for a few weeks. If you have vision-specific questions, please ping @rafaelpadilla; for implementation questions @ArthurZucker. |
| hidden_states=encoder_outputs.hidden_states, | ||
| ) | ||
|
|
||
|
|
There was a problem hiding this comment.
SuperPointModelForInterestPointDescription is more or less the exact same as SuperPointModel.
I just use it for my own understanding of the transformers library since I'm taking inspiration on other implementations (like ConvNextV2). So either SuperPointModel or SuperPointModelForInterestPointDescription might be deleted later
There was a problem hiding this comment.
In transformers, the class XXXModel does not contain the head on top and the classes XXXForYYY add different head models. XXX represents the model name and YYY represents a task.
There was a problem hiding this comment.
So the proposed implementation is correct regarding this convention ? Because I can't really tell what would be considered as a head in SuperPoint. Or shoud SuperPointModel only contain the encoder and the SuperPointForInterestPointDescription contain the keypoint_decoder and descriptor_decoder ? And then in this case SuperPointModel.forward() should output a BaseModelOutputWithPoolingAndNoAttention similar to ConvNextV2 ?
There was a problem hiding this comment.
Looking at the paper and code structure, the SuperPoint is a traditional CNN based model, and I couldn't identify what could be identified as the head.
The encoder is simply a CNN which returns features (last_hidden). The 2 decoders keypoint_decoder and descriptor_decoder, return keypoints+scores and descriptors, respectively. So, I think the structure that best fits this case is:
- The
SuperPointModelshould only contain theencoder, aSuperPointEncoderobject. Then, theSuperPointModel.forward()should output aBaseModelOutputWithNoAttentionobject. - The
SuperPointForInterestPointDescriptionshould contain thesuperpoint, aSuperPointModelobject, and bothkeypoint_decoderanddescriptor_decoder. Then, theSuperPointForInterestPointDescriptor.forward()should output aImagePointDescriptionOutputobject.
@amyeroberts , what do you think of this structure? Makes sense?
There was a problem hiding this comment.
In this case - I would just have SuperPointModel which contains the encoder and decoders. In modeling_auto.py - AutoModelForInterestPointDescription should then map to loading this model.
There was a problem hiding this comment.
If I understood correctly, I need to remove every mentions of SuperPointForInterestPointDescription and only keep SuperPointModel that can be instantiated by AutoModelForInterestPointDescription ? this commit reflects these changes, let me know if I misunderstood something
… and actually added the tests
…ch the original model outputs
- Filled SuperPoint integration tests with the pretrained model with shape and value checks on the outputs of the model
|
Hi @ArthurZucker, I added the SuperPointImageProcessor as part of the code because SuperPoint requires a grayscale image as input. But when I added the tests I have the test_call_pil which fails and give me a very weird when it reaches these lines with the following error : Not sure what causes the problem as I tried to compare with tests made with the ConvNextImageProcessor which does not raise any error. Anyway, I continue on the implementation, let me know if I'm missing anything. I'll write the documentation for all the code I've previously pushed. |
Hi @sbucaille, 🙂 A quick help with that issue: I see that your processing converts all images to grayscale (here) and tests are failing here. The root cause is that the This has been discussed in PR #25767 and is not fully solved yet. A quick solution for this issue in your code may be possible. See that a 1 single channel image is definetely grayscale, but if the 3 channels in an RGB image are equal (R==G and G==B), the image is also noted as grayscale. So, if you replicate the channels of your 1-channel grayscale image as in here, this issue can be solved. However, SuperPoint would need to be adapted for that -> you would only need to consider one of the RGB channels, as they are equal. |
# Conflicts: # src/transformers/models/superpoint/modeling_superpoint.py
|
Hi @rafaelpadilla and @ArthurZucker , @ArthurZucker, please let me know what I'm missing in the implementation ! 🙂
|
|
Hi, |
Hi @sbucaille, It seems that the original code is under MIT license. If it is the case, you just need to add the MIT license on the top of the files, as done in graphormer and IDEFICS. The checkpoints seem to be under a non-commercial customized license. So, as you have already added the License here, you just need to set |
|
Sorry about this, but I am afraid there is nothing we can help on this situation (at least not from me). |
|
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
What does this PR do?
This PR implements SuperPoint, one of the few models that generate keypoints and descriptors given an image, as discussed in this previous pull request
The goal is to implement this model and a new type of AutoModel :
AutoModelForInterestPointDescription(name to be discussed).Who can review?
@amyeroberts @ArthurZucker
TODO's
AutoModelForInterestPointDescriptionmapping