[DPT] Add MiDaS 3.1 series by NielsRogge · Pull Request #25799 · huggingface/transformers

NielsRogge · 2023-08-28T12:45:38Z

What does this PR do?

This PR improves the DPT model by leveraging the AutoBackbone API.

DPT is a depth estimation model. Recently, the MiDaS team released a new 3.1 version with various backbones: BEiT, Swinv2, etc. hence it's an ideal use case for the AutoBackbone class.

This PR:

adds the BeitBackbone class
adds the Swinv2Backbone class
extends modeling_dpt.py to leverage the AutoBackbone API
fixes the keep_aspect_ratio and ensure_multiple_of flags of DPTImageProcessor, which does not work on main due to them not being passed to the resize method.

To do:

make sure out_indices are backwards compatible for BEiT

NielsRogge · 2023-08-28T12:51:16Z

src/transformers/models/dpt/image_processing_dpt.py

            If `do_resize` is `True`, the image is resized to a size that is a multiple of this value. Can be overidden
            by `ensure_multiple_of` in `preprocess`.
-        resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BILINEAR`):
+        resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`):


This is a (slight) breaking change to make sure the same interpolation method is used as in the original implementation. However, Pillow's BICUBIC method does not 100% match the one of OpenCV :/ cc @amyeroberts

There's never 1:1 correspondence 😢

This is OK as saved models will have the resampling filter saved in the preprocessor config, and as you say it brings it in line with the original

HuggingFaceDocBuilderDev · 2023-08-28T13:04:26Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

NielsRogge · 2023-09-04T13:39:26Z

I've split up the PR in smaller pieces, see above for the first one

amyeroberts

Thanks for the work adding this!

Did a high-level review as the PR isn't in a finished state yet. At the moment, the changes to the Beit config and data2vec model can't be merged in as they're breaking. Beit is also one of our most popular models, so it's important for us to get this right.

cc @rafaelpadilla

amyeroberts · 2023-09-13T18:32:18Z

src/transformers/models/beit/test.py

To be removed

amyeroberts · 2023-09-13T18:35:15Z

src/transformers/models/beit/configuration_beit.py

        drop_path_rate=0.1,
        use_mean_pooling=True,
-        out_indices=[3, 5, 7, 11],
+        semantic_out_indices=[3, 5, 7, 11],


We can't make this change because of backwards compatibility. If someone loads in their config and they don't have the default value for out_indices then their model's behaviour will have changed. Moreover, if they try to set or change out_indices, which they might have done in their own code, this won't be correctly updated here.

amyeroberts · 2023-09-13T18:38:50Z

src/transformers/models/dpt/test_image_processor.py

amyeroberts · 2023-09-13T18:38:56Z

src/transformers/models/dpt/test.py

amyeroberts · 2023-09-13T18:41:14Z

tests/models/swinv2/test_modeling_swinv2.py

+    @unittest.skip(reason="Swinv2 does not support feedforward chunking yet")
+    def test_feed_forward_chunking(self):
+        pass


If this is being added then it must have supported it previously

amyeroberts · 2023-09-13T18:43:56Z

src/transformers/models/swinv2/test.py

amyeroberts · 2023-09-13T18:46:24Z

src/transformers/models/dpt/image_processing_dpt.py

            If `do_resize` is `True`, the image is resized to a size that is a multiple of this value. Can be overidden
            by `ensure_multiple_of` in `preprocess`.
-        resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BILINEAR`):
+        resample (`PILImageResampling`, *optional*, defaults to `PILImageResampling.BICUBIC`):


There's never 1:1 correspondence 😢

This is OK as saved models will have the resampling filter saved in the preprocessor config, and as you say it brings it in line with the original

amyeroberts · 2023-09-13T18:51:16Z

src/transformers/models/dpt/configuration_dpt.py

+        self.patch_size = None if use_autobackbone else patch_size
+        self.num_channels = None if use_autobackbone else num_channels
+        self.qkv_bias = None if use_autobackbone else qkv_bias
+        self.backbone_out_indices = None if use_autobackbone else backbone_out_indices


Some of these, I see why they're not set if we use AutoBackbone, but I believe e.g. layer_norm_eps is still needed for other parts of the model.

amyeroberts · 2023-09-13T18:55:20Z

src/transformers/models/swin2sr/modeling_swin2sr.py

-        always_partition: Optional[bool] = False,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
-        if not always_partition:
-            self.set_shift_and_window_size(input_dimensions)


Is removing this backwards compatible? Previously self.set_shift_and_window_size(input_dimensions) was being called by default

github-actions · 2023-10-11T08:06:50Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

NielsRogge added 28 commits August 28, 2023 11:15

First draft

54cfaa7

More improvements

7f6b12f

More improvements

035cac2

More improvements

90f666d

More improvements

d442a94

Convert more weights

93808da

Update conversion script

235a334

More improvements

1700cd5

Fix all tests

6ace549

Add swinv2 backbone

5657d7b

Convert more swinv2 weights

cdc93d2

More improvements

7004536

More improvements for Swin2

f414f8c

Fix Swin2 image size

a79d908

More improvements

7cb6910

Fix Swinv2 tests

d7b4db9

Add more tests

a21d048

Fix more tests

61434d2

Fix more tests, add semantic_out_indices

c33e28c

Support more Swinv2 models

b47067c

Convert all Swinv2 checkpoints

fe55bd4

Extend conversion script

d277770

Add get_backbone_hidden_size

d02eff2

Improve conversion scripts

e6ac93b

Fix image processor

3271684

Fix interpolation method

11dcb7c

Fix copies

7f9d5ed

Fix READMEs

8a17e40

NielsRogge commented Aug 28, 2023

View reviewed changes

NielsRogge requested a review from amyeroberts September 4, 2023 09:25

NielsRogge mentioned this pull request Sep 4, 2023

Add BeitBackbone #25952

Merged

2 tasks

This was referenced Sep 9, 2023

Dinov2 for depth estimation #26057

Closed

Add DINOv2 depth estimation #26092

Merged

amyeroberts reviewed Sep 13, 2023

View reviewed changes

github-actions bot closed this Oct 20, 2023

NielsRogge reopened this Oct 20, 2023

github-actions bot closed this Oct 29, 2023

NielsRogge reopened this Oct 30, 2023

github-actions bot closed this Nov 8, 2023

NielsRogge reopened this Nov 13, 2023

github-actions bot closed this Nov 22, 2023

NielsRogge mentioned this pull request Nov 28, 2023

Add Swinv2 backbone #27742

Merged

3 tasks

Conversation

NielsRogge commented Aug 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Aug 28, 2023

Uh oh!

NielsRogge commented Sep 4, 2023

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 11, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NielsRogge commented Aug 28, 2023 •

edited

Loading