Skip to content

problem_type="single_label_classification" with num_labels=1 leads to degenerate zero loss across multiple sequence-classification models #45479

@BohdanBabii

Description

@BohdanBabii

System Info

Hi, I found what looks like a library-wide issue in transformers affecting multiple ForSequenceClassification models, not just ModernBERT.

If a model is initialized with:

num_labels=1
problem_type="single_label_classification"

the forward pass uses CrossEntropyLoss() with only one output logit. This leads to a degenerate zero loss during training/evaluation instead of performing binary classification meaningfully.

I first observed this with ModernBertForSequenceClassification, but the same logic appears in other sequence-classification models as well (for example RoBERTa and others using the same loss-selection pattern).

In modeling_modernbert.py, the relevant part is:

loss = None
if labels is not None:
    if self.config.problem_type is None:
        if self.num_labels == 1:
            self.config.problem_type = "regression"
        elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
            self.config.problem_type = "single_label_classification"
        else:
            self.config.problem_type = "multi_label_classification"

    if self.config.problem_type == "regression":
        loss_fct = MSELoss()
        if self.num_labels == 1:
            loss = loss_fct(logits.squeeze(), labels.squeeze())
        else:
            loss = loss_fct(logits, labels)
    elif self.config.problem_type == "single_label_classification":
        loss_fct = CrossEntropyLoss()
        loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
    elif self.config.problem_type == "multi_label_classification":
        loss_fct = BCEWithLogitsLoss()
        loss = loss_fct(logits, labels)

With `num_labels=1 and problem_type="single_label_classification", this becomes:

CrossEntropyLoss()(logits.view(-1, 1), labels.view(-1))

which produces a degenerate zero loss because there is only one class dimension.

Why I think this is a bug:
This setup naturally suggests binary classification with labels like:

  • 0 -> class 0
  • 1 -> class 1

So from a user perspective, this looks like it should be a valid single-label binary classification setup.
Right now, however, num_labels=1 is effectively treated as if there were only one possible class in the loss computation, which makes the classification loss meaningless.

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Minimal reproduction

from transformers import AutoModelForSequenceClassification
import torch

model = AutoModelForSequenceClassification.from_pretrained(
    "answerdotai/ModernBERT-base",
    num_labels=1,
    problem_type="single_label_classification",
)

input_ids = torch.tensor([[101, 102]])
attention_mask = torch.tensor([[1, 1]])
labels = torch.tensor([0])

outputs = model(
    input_ids=input_ids,
    attention_mask=attention_mask,
    labels=labels,
)

print(outputs.logits.shape)
print(outputs.loss)

Observed result
outputs.loss is 0 (or degenerate), and the same behavior also shows up during training.

Expected behavior

Expected behavior
I would expect num_labels=1 with problem_type="single_label_classification" to support binary classification meaningfully for labels {0, 1}, instead of silently producing a degenerate zero loss.
For example, this could be implemented with a single-logit binary objective such as BCEWithLogitsLoss, or by internally mapping this configuration to an equivalent binary-classification setup.
In any case, the current behavior of silently returning zero loss seems incorrect.

Actual behavior
The model runs, but training/eval loss becomes degenerate (0) because CrossEntropyLoss is applied to logits with shape [..., 1].

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions