Skip to content

Llama: always convert the causal mask in the SDPA code path#29663

Merged
gante merged 2 commits intohuggingface:mainfrom
gante:always_convert_mask
Mar 21, 2024
Merged

Llama: always convert the causal mask in the SDPA code path#29663
gante merged 2 commits intohuggingface:mainfrom
gante:always_convert_mask

Conversation

@gante
Copy link
Contributor

@gante gante commented Mar 14, 2024

What does this PR do?

Removes the if condition to apply the used-defined attention_mask on causal_mask: it is not required for correctness, and it prevents correct left-padding behavior in compile mode (related PR: #29374).

I could not observe any performance degradation with eager dynamic cache nor with compiled static cache 🙌

@gante gante requested a review from fxmarty March 14, 2024 20:24
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gante gante requested a review from amyeroberts March 19, 2024 16:50
@gante gante force-pushed the always_convert_mask branch from 7fbcb22 to 99edde5 Compare March 19, 2024 16:53
Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@gante gante merged commit ee38fc3 into huggingface:main Mar 21, 2024
@gante gante deleted the always_convert_mask branch March 21, 2024 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants