-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Description
System Info
both on:
transformers 4.32.0
transformers 4.34.0.dev0
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
I've encountered a bug with the Dinov2ForImageClassification model from Hugging Face Transformers. As per the provided documentation here, I've followed the code example using the latest Transformers version. However, when running the code, I encounter an error indicating that the model is performing binary classification instead of the expected ImageNet 1000-way classification.
Here's my code:
from transformers import AutoImageProcessor, Dinov2ForImageClassification
import torch
from datasets import load_dataset
# Load a sample image dataset (in this case, 'huggingface/cats-image')
dataset = load_dataset('huggingface/cats-image')
image = dataset['test']['image'][0]
# Load the image processor and the Dinov2ForImageClassification model
image_processor = AutoImageProcessor.from_pretrained('facebook/dinov2-base')
model = Dinov2ForImageClassification.from_pretrained('facebook/dinov2-base')
# Prepare the input and obtain logits
inputs = image_processor(image, return_tensors='pt')
with torch.no_grad():
logits = model(**inputs).logits
# The expected number of labels for ImageNet classification should be 1000
predicted_label = logits.argmax(-1).item()
Regardless of whether I specify num_labels=1000 during model initialization to correct the label dimensions, the following error persists:
Some weights of Dinov2ForImageClassification were not initialized from the model checkpoint at facebook/dinov2-base and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The issue persists, and I'm unable to utilize the pretrained Dinov2ForImageClassification model for ImageNet 1000-way classification as intended.
Expected behavior
loading without warning, having 1000-way long output vector, that is representing the correct classification labels of ImageNet.
see more here:
https://discuss.huggingface.co/t/dino2-for-classification-has-wrong-number-of-labels/55027