Add DINOv2 depth estimation by NielsRogge · Pull Request #26092 · huggingface/transformers

NielsRogge · 2023-09-11T12:14:19Z

What does this PR do?

PR that implements a part of #25799. It extends the DPT framework to use the AutoBackbone class. Next it uses Dinov2Backbone to convert the DINOv2+DPT checkpoints released by the authors here.

To do:

add test in common backbone tests file to verify out_indices are saved properly => done in [AutoBackbone] Add test #26094
add support for transforms in DPTImageProcessor
add tests which test DPT with a backbone
add integration test and convert remaining checkpoints

HuggingFaceDocBuilderDev · 2023-09-11T12:34:08Z

The documentation is not available anymore as the PR was closed or merged.

amyeroberts

Thanks for adding this!

Main comment is about backwards compatibility with config arguments. Otherwise looks good and just a few small nits.

cc @rafaelpadilla

src/transformers/models/dpt/image_processing_dpt.py

tests/models/dpt/test_image_processing_dpt.py

src/transformers/models/dpt/modeling_dpt.py

tests/models/dinov2/test_modeling_dinov2.py

src/transformers/models/dpt/modeling_dpt.py

ArthurZucker

Thanks a mile for adding support to more models!
✅ for the changes to the image processor
✅ for the conversion scripts, if the tests are optional and in separate functions.
✅ for the changes to dino, if we make them BC
🟨 for the changes to DPT, that's a lot of changes. @amyeroberts did not seem against it, but wondering if it makes sense to have a new model? Are we stuck because DINOv2 uses it and we would have a lack of consistency?

src/transformers/models/dinov2/modeling_dinov2.py

src/transformers/models/dpt/convert_dinov2_depth_to_hf.py

src/transformers/models/dpt/image_processing_dpt.py

src/transformers/models/dpt/modeling_dpt.py

amyeroberts

Thanks for iterating!

Just a few small things to address before merging

src/transformers/models/dinov2/modeling_dinov2.py

tests/models/dinov2/test_modeling_dinov2.py

src/transformers/models/dinov2/modeling_dinov2.py

src/transformers/models/dpt/image_processing_dpt.py

src/transformers/models/dpt/modeling_dpt.py

NielsRogge · 2023-11-01T16:37:17Z

@amyeroberts thanks for your review, I've addressed all comments. Feel free to merge :)

amyeroberts

Thanks for iterating!

A few final small comments / things to address.

General comment: I completely agree with @ArthurZucker's comments about the added complexity to the model. I'm not a fan of adding this if/else structure in the forward method. However, I don't see an easy way to change because of the backbone_out_indices argument and the weight loading already mapping to self.dpt.

src/transformers/models/dpt/convert_dinov2_depth_to_hf.py

src/transformers/models/dpt/image_processing_dpt.py

tests/models/dpt/test_modeling_dpt_auto_backbone.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

NielsRogge · 2023-11-13T10:56:48Z

I've addressed all your comments, feel free to merge. To do:

update all checkpoints on the hub to use size_divisor rather than size_divisbility in the preprocessor_config.json

amyeroberts

Thanks for iterating!

amyeroberts · 2023-11-13T16:21:27Z

@NielsRogge Thanks again for this addition - merged! I'll let you handle the updates to the configs on the hub.

Starlento · 2023-11-14T11:50:30Z

Hello guys. Thank you for adding this. I have a silly question that is there any plan that facebook will push this to the huggingface repo? Like facebook/dpt-dinov2-large-nyu? And could you can kindly inform me who is actually operating the companies accounts... Seems it is not all done by their side based on my guess.

NielsRogge · 2023-11-14T19:17:50Z

Hi @Starlento,

All models are on the hub: https://huggingface.co/models?pipeline_tag=depth-estimation&other=dinov2&sort=trending. I'll open a PR to make this more explicit in the docs

* First draft * Fix style * More improvements * Fix tests * Fix tests * Convert checkpoint * Improve DPTImageProcessor * Remove scripts, improve conversion script * Remove print statements * Fix test * Improve docstring * More improvements * Fix style * Fix image processor * Add tests * Address comments * Address comments * Make bias backwards compatible * Address comment * Address comment * Address comment * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Address comments * Add flag * Add tests * Make tests smaller * Use regular BackboneOutput * Fix all tests * Update test * Convert more checkpoints * Convert giant checkpoints, add integration test * Rename size_divisibility to size_divisor --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

lightonthefloor · 2023-11-22T07:19:22Z

Hey, guys, I want to know if it is possible to release the training part, and if it is possible, when will it be released?

idol791 · 2024-01-16T12:16:13Z

src/transformers/models/dpt/modeling_dpt.py

+                hidden_states = backbone_hidden_states
+
+        patch_height, patch_width = None, None
+        if self.config.backbone_config is not None and self.config.is_hybrid is False:


Why calculate patch_height and patch_width (the input image size as multiples of patch_size) under these conditions only? If backbone_config is None, self.config.patch_size can be used instead of self.config.backbone_config.patch_size.
@NielsRogge

NielsRogge mentioned this pull request Sep 13, 2023

Getting equivalence between torchvision and image transforms when normalizing without rescaling #26133

Closed

NielsRogge requested a review from amyeroberts September 14, 2023 06:45

amyeroberts reviewed Sep 14, 2023

View reviewed changes

NielsRogge requested a review from ArthurZucker September 18, 2023 06:32

ArthurZucker reviewed Oct 2, 2023

View reviewed changes

NielsRogge force-pushed the add_dinov2_depth branch from b288ab3 to 6763b36 Compare October 2, 2023 08:37

NielsRogge requested a review from amyeroberts October 23, 2023 09:42

NielsRogge force-pushed the add_dinov2_depth branch from 0278c67 to aef0d89 Compare October 23, 2023 09:49

NielsRogge mentioned this pull request Oct 25, 2023

DINOv2 is now available in HF Transformers (with tutorial) facebookresearch/dinov2#153

Open

amyeroberts reviewed Oct 25, 2023

View reviewed changes

NielsRogge force-pushed the add_dinov2_depth branch 2 times, most recently from 7b09882 to 8cd7e50 Compare October 30, 2023 19:06

amyeroberts reviewed Nov 10, 2023

View reviewed changes

NielsRogge added 15 commits November 13, 2023 11:39

First draft

e289ce4

Fix style

a138c13

More improvements

aff4a27

Fix tests

89b6a11

Fix tests

a62e088

Convert checkpoint

408d398

Improve DPTImageProcessor

f936041

Remove scripts, improve conversion script

9684f8b

Remove print statements

3b329a6

Fix test

b9ebfdb

Improve docstring

509f6e0

More improvements

7ce87be

Fix style

aa294b5

Fix image processor

d417aae

Add tests

17d8a74

NielsRogge and others added 17 commits November 13, 2023 11:39

Address comments

05791b3

Address comments

5242dd8

Make bias backwards compatible

de761ca

Address comment

c646b52

Address comment

05c6537

Address comment

710d99c

Apply suggestions from code review

2ee5c01

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Address comments

255dd21

Add flag

b5849d9

Add tests

ac49c94

Make tests smaller

8dcdacb

Use regular BackboneOutput

05c82c8

Fix all tests

07eb1da

Update test

5e52c86

Convert more checkpoints

e69fe41

Convert giant checkpoints, add integration test

791b4f6

Rename size_divisibility to size_divisor

1b3d95c

NielsRogge force-pushed the add_dinov2_depth branch from c488a74 to 1b3d95c Compare November 13, 2023 10:55

amyeroberts approved these changes Nov 13, 2023

View reviewed changes

amyeroberts merged commit 2422c38 into huggingface:main Nov 13, 2023

xenova mentioned this pull request Nov 13, 2023

[ONNX export] Add depth-estimation w/ DPT+GLPN huggingface/optimum#1529

Merged

3 tasks

NielsRogge mentioned this pull request Nov 22, 2023

[DPT, Dinov2] Add resources #27655

Merged

idol791 reviewed Jan 16, 2024

View reviewed changes

Conversation

NielsRogge commented Sep 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NielsRogge commented Nov 1, 2023

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NielsRogge commented Nov 13, 2023

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts commented Nov 13, 2023

Uh oh!

Starlento commented Nov 14, 2023

Uh oh!

NielsRogge commented Nov 14, 2023

Uh oh!

lightonthefloor commented Nov 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

idol791 Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

NielsRogge commented Sep 11, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Sep 11, 2023 •

edited

Loading

lightonthefloor commented Nov 22, 2023 •

edited

Loading

idol791 Jan 16, 2024 •

edited

Loading