Skip to content

Small addition to a maybe_add_mask() function.#2674

Closed
gilmoright wants to merge 1 commit intohuggingface:mainfrom
gilmoright:main
Closed

Small addition to a maybe_add_mask() function.#2674
gilmoright wants to merge 1 commit intohuggingface:mainfrom
gilmoright:main

Conversation

@gilmoright
Copy link
Copy Markdown

@gilmoright gilmoright commented Mar 3, 2026

Some attention mechanism in the timm involves mask application, and while torch and onnx fused attention implementations like F.scaled_dot_product_attention supports both float mask (with -inf values) and bool mask (that is used to create float mask and False values corresponds to -inf), timm.layers.attention.maybe_add_mask expects only float mask.
I decided to make a small cosmetic change and now mask application function timm.layers.attention.maybe_add_mask supports both bool mask and float to better match torch functions.
Script to test:

from timm.layers.attention import maybe_add_mask
import torch

print("Scores:")
scores = torch.rand(3, 4)
print(scores)

print("bool_mask")
bool_mask = torch.ones(3, 4, dtype=torch.bool).tril(diagonal=0)
print(bool_mask)

print("float_mask")
float_mask = torch.zeros_like(scores)
float_mask.masked_fill_(bool_mask.logical_not(), float("-inf"))
print(float_mask)

print("bool mask application")
print(maybe_add_mask(scores, bool_mask))
print("float mask application")
print(maybe_add_mask(scores, float_mask))

Outputs before commit:

Scores:
tensor([[0.2659, 0.2794, 0.4925, 0.3640],
        [0.8867, 0.7231, 0.6309, 0.0957],
        [0.7616, 0.1949, 0.0843, 0.0461]])
bool_mask
tensor([[ True, False, False, False],
        [ True,  True, False, False],
        [ True,  True,  True, False]])
float_mask
tensor([[0., -inf, -inf, -inf],
        [0., 0., -inf, -inf],
        [0., 0., 0., -inf]])
bool mask application
tensor([[1.2659, 0.2794, 0.4925, 0.3640],
        [1.8867, 1.7231, 0.6309, 0.0957],
        [1.7616, 1.1949, 1.0843, 0.0461]])
float mask application
tensor([[0.2659,   -inf,   -inf,   -inf],
        [0.8867, 0.7231,   -inf,   -inf],
        [0.7616, 0.1949, 0.0843,   -inf]])

Outputs after commit

Scores:
tensor([[0.4827, 0.8551, 0.2385, 0.8508],
        [0.4199, 0.0258, 0.0059, 0.2185],
        [0.9652, 0.9934, 0.4838, 0.8479]])
bool_mask
tensor([[ True, False, False, False],
        [ True,  True, False, False],
        [ True,  True,  True, False]])
float_mask
tensor([[0., -inf, -inf, -inf],
        [0., 0., -inf, -inf],
        [0., 0., 0., -inf]])
bool mask application
tensor([[0.4827,   -inf,   -inf,   -inf],
        [0.4199, 0.0258,   -inf,   -inf],
        [0.9652, 0.9934, 0.4838,   -inf]])
float mask application
tensor([[0.4827,   -inf,   -inf,   -inf],
        [0.4199, 0.0258,   -inf,   -inf],
        [0.9652, 0.9934, 0.4838,   -inf]])

@rwightman
Copy link
Copy Markdown
Collaborator

@gilmoright aware of this, I did some work on another branch that requires more consistent mask behaviour ... see #2665 ... code: https://github.com/huggingface/pytorch-image-models/blob/ssl_tasks/timm/layers/attention.py

However, it's likely not going to get merged anytime soon, not sure if I should cherry pick that in the meantime.. hmm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants