Non-Causal attention mask support?

I am attempting to use a non-causal attention mask for a llama model with Unsloth and looking over the code to find the best way to achieve all the other speedups but use a non-causal attention mask.

I would appreciate some advice on places to look for code changes, prior to stepping through with a debugger.

I see there that the Attention forward still seems to support the attention mask:
https://github.com/unslothai/unsloth/blob/main/unsloth/models/llama.py#L375

But it may be ignored in the `forward` fn:
https://github.com/unslothai/unsloth/blob/main/unsloth/models/llama.py#L589

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Non-Causal attention mask support? #696

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Non-Causal attention mask support? #696

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions