Currently, the SFT Trainer takes a kwarg dataset_kwargs, which can take a key skip_prepare_dataset that enables skipping the dataset preparation. However, there is currently validation which throws an error if there is no formatting function or dataset text field provided when packing=True, regardless of the value of dataset_kwargs ["skip_prepare_dataset"] (code ref).
Given that these arguments are only leveraged by dataset preparation, I would like to propose changing this check to only throw if the kwargs don't contain a truthy value for dataset_kwargs ["skip_prepare_dataset"].
Currently, the SFT Trainer takes a kwarg
dataset_kwargs, which can take a keyskip_prepare_datasetthat enables skipping the dataset preparation. However, there is currently validation which throws an error if there is no formatting function or dataset text field provided whenpacking=True, regardless of the value ofdataset_kwargs ["skip_prepare_dataset"](code ref).Given that these arguments are only leveraged by dataset preparation, I would like to propose changing this check to only throw if the kwargs don't contain a truthy value for
dataset_kwargs ["skip_prepare_dataset"].