In this folder we show how to train an autoregressive Language model on APPS dataset, since a common way to evaluate on this benchmark is after finetuning the model on its training split. We use Hugging Face Trainer which supports distributed training on multiple GPUs.
First login to Weights & Biases
wandb login
You can finetune a model, gpt_345_python_any_license for example, by running:
# we use a global batch size of 256, here = 8 (GPUs) * 2 (batch_size_per_device) * 16 (gradient_accumulation)
python apps_train.py \
--model_ckpt BigCode/gpt_345_python_any_license \
--num_epochs 10 \
--batch_size 2 \
--gradient_accumulation_steps 16 \
--learning_rate 5e-5 \
--eval_freq 250 \
--fp16The fine-tuning takes 11h on 4 A100 GPUs.
This script is adapted from APPS repository.