Dinov2 for depth estimation

### Feature request

Dinov2's original repo has an example using Dinov2 backbone + DPT head for depth estimation [notebook link](https://github.com/facebookresearch/dinov2/blob/main/notebooks/depth_estimation.ipynb). If we can integrate it into `transformers` repo by adding a class `Dinov2ForImageDepthEstimation` and let `forward` method return `DepthEstimatorOutput`, we'll have a unified output interface across all depth estimation models. By doing this, we can easily chain this powerful depth estimation method together with other models under `transformers`'s pipelines. 




### Motivation

This would be a very great feature for many production use cases or research problems. One example is camera angle estimation from a 2D image, in which reliable depth information are critical. In my limited test cases, using dinov2+DPT head to run depth estimation is way better than the existing [DPT model](https://huggingface.co/docs/transformers/main/model_doc/dpt) itself. 

### Your contribution

I can submit a PR to add this feature if other professional developers don't have the bandwidth to deal with it. (I am relatively new to `transformers`'s develop workflow though.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dinov2 for depth estimation #26057

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dinov2 for depth estimation #26057

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions