Skip to content

Fail loading pretrained weights for Dinov2ForImageClassification model  #26167

@ofirshifman

Description

@ofirshifman

System Info

both on:
transformers 4.32.0
transformers 4.34.0.dev0

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I've encountered a bug with the Dinov2ForImageClassification model from Hugging Face Transformers. As per the provided documentation here, I've followed the code example using the latest Transformers version. However, when running the code, I encounter an error indicating that the model is performing binary classification instead of the expected ImageNet 1000-way classification.

Here's my code:

from transformers import AutoImageProcessor, Dinov2ForImageClassification
import torch
from datasets import load_dataset

# Load a sample image dataset (in this case, 'huggingface/cats-image')
dataset = load_dataset('huggingface/cats-image')
image = dataset['test']['image'][0]

# Load the image processor and the Dinov2ForImageClassification model
image_processor = AutoImageProcessor.from_pretrained('facebook/dinov2-base')
model = Dinov2ForImageClassification.from_pretrained('facebook/dinov2-base')

# Prepare the input and obtain logits
inputs = image_processor(image, return_tensors='pt')
with torch.no_grad():
    logits = model(**inputs).logits

# The expected number of labels for ImageNet classification should be 1000
predicted_label = logits.argmax(-1).item()

Regardless of whether I specify num_labels=1000 during model initialization to correct the label dimensions, the following error persists:

Some weights of Dinov2ForImageClassification were not initialized from the model checkpoint at facebook/dinov2-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

The issue persists, and I'm unable to utilize the pretrained Dinov2ForImageClassification model for ImageNet 1000-way classification as intended.

Expected behavior

loading without warning, having 1000-way long output vector, that is representing the correct classification labels of ImageNet.

see more here:
https://discuss.huggingface.co/t/dino2-for-classification-has-wrong-number-of-labels/55027

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions