Hey, I'm using Unsloth with 48Gb cards where it able to PRE-train models up to 70B with 4K context.
Is it possible to use Unsloth to do SFT with instructions on which tokens should be ignored / masked, and attention matrix for properly packing samples?
Please help with some examples if it possible. Have to use Axolotl for SFT tasks now.
Hey, I'm using Unsloth with 48Gb cards where it able to PRE-train models up to 70B with 4K context.
Is it possible to use Unsloth to do SFT with instructions on which tokens should be ignored / masked, and attention matrix for properly packing samples?
Please help with some examples if it possible. Have to use Axolotl for SFT tasks now.