The new recommended way to use flash attention is to use kernels. We should update our tests, and documentation to use kernels instead of "flash_attention2". Eg
|
training_args = DPOConfig(..., padding_free=True, model_init_kwargs={"attn_implementation": "flash_attention_2"}) |
- training_args = DPOConfig(..., padding_free=True, model_init_kwargs={"attn_implementation": "flash_attention_2"})
+ training_args = DPOConfig(..., padding_free=True, model_init_kwargs={"attn_implementation": "kernels-community/flash-attn2"})
The new recommended way to use flash attention is to use kernels. We should update our tests, and documentation to use
kernelsinstead of "flash_attention2". Egtrl/docs/source/reducing_memory_usage.md
Line 149 in 1eb561c