-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
Feature request
Dinov2's original repo has an example using Dinov2 backbone + DPT head for depth estimation notebook link. If we can integrate it into transformers repo by adding a class Dinov2ForImageDepthEstimation and let forward method return DepthEstimatorOutput, we'll have a unified output interface across all depth estimation models. By doing this, we can easily chain this powerful depth estimation method together with other models under transformers's pipelines.
Motivation
This would be a very great feature for many production use cases or research problems. One example is camera angle estimation from a 2D image, in which reliable depth information are critical. In my limited test cases, using dinov2+DPT head to run depth estimation is way better than the existing DPT model itself.
Your contribution
I can submit a PR to add this feature if other professional developers don't have the bandwidth to deal with it. (I am relatively new to transformers's develop workflow though.)