It makes some intuitive sense to either have this hook by default or at least allow users to patch their model so this behavior exists. Would be great to be able to benefit from unsloths fast inference implementation without needing to update various trainers.
I've experimented with this in a modified DPO script and it works well.
https://gist.github.com/lapp0/e7d17884ed76669194c36e7fb3f64040#file-gistfile1-txt-L31-L44
It makes some intuitive sense to either have this hook by default or at least allow users to patch their model so this behavior exists. Would be great to be able to benefit from unsloths fast inference implementation without needing to update various trainers.
I've experimented with this in a modified DPO script and it works well.
https://gist.github.com/lapp0/e7d17884ed76669194c36e7fb3f64040#file-gistfile1-txt-L31-L44