Add DINOv2 depth estimation#26092
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for adding this!
Main comment is about backwards compatibility with config arguments. Otherwise looks good and just a few small nits.
ArthurZucker
left a comment
There was a problem hiding this comment.
Thanks a mile for adding support to more models!
✅ for the changes to the image processor
✅ for the conversion scripts, if the tests are optional and in separate functions.
✅ for the changes to dino, if we make them BC
🟨 for the changes to DPT, that's a lot of changes. @amyeroberts did not seem against it, but wondering if it makes sense to have a new model? Are we stuck because DINOv2 uses it and we would have a lack of consistency?
b288ab3 to
6763b36
Compare
0278c67 to
aef0d89
Compare
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for iterating!
Just a few small things to address before merging
7b09882 to
8cd7e50
Compare
|
@amyeroberts thanks for your review, I've addressed all comments. Feel free to merge :) |
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for iterating!
A few final small comments / things to address.
General comment: I completely agree with @ArthurZucker's comments about the added complexity to the model. I'm not a fan of adding this if/else structure in the forward method. However, I don't see an easy way to change because of the backbone_out_indices argument and the weight loading already mapping to self.dpt.
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
c488a74 to
1b3d95c
Compare
|
I've addressed all your comments, feel free to merge. To do:
|
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for iterating!
|
@NielsRogge Thanks again for this addition - merged! I'll let you handle the updates to the configs on the hub. |
|
Hello guys. Thank you for adding this. I have a silly question that is there any plan that facebook will push this to the huggingface repo? Like |
|
Hi @Starlento, All models are on the hub: https://huggingface.co/models?pipeline_tag=depth-estimation&other=dinov2&sort=trending. I'll open a PR to make this more explicit in the docs |
* First draft * Fix style * More improvements * Fix tests * Fix tests * Convert checkpoint * Improve DPTImageProcessor * Remove scripts, improve conversion script * Remove print statements * Fix test * Improve docstring * More improvements * Fix style * Fix image processor * Add tests * Address comments * Address comments * Make bias backwards compatible * Address comment * Address comment * Address comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Address comments * Add flag * Add tests * Make tests smaller * Use regular BackboneOutput * Fix all tests * Update test * Convert more checkpoints * Convert giant checkpoints, add integration test * Rename size_divisibility to size_divisor --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
|
Hey, guys, I want to know if it is possible to release the training part, and if it is possible, when will it be released? |
| hidden_states = backbone_hidden_states | ||
|
|
||
| patch_height, patch_width = None, None | ||
| if self.config.backbone_config is not None and self.config.is_hybrid is False: |
There was a problem hiding this comment.
Why calculate patch_height and patch_width (the input image size as multiples of patch_size) under these conditions only? If backbone_config is None, self.config.patch_size can be used instead of self.config.backbone_config.patch_size.
@NielsRogge
What does this PR do?
Fixes #26057
PR that implements a part of #25799. It extends the DPT framework to use the
AutoBackboneclass. Next it usesDinov2Backboneto convert the DINOv2+DPT checkpoints released by the authors here.To do:
out_indicesare saved properly => done in [AutoBackbone] Add test #26094DPTImageProcessor