Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

README.md

APPS finetuning

In this folder we show how to train an autoregressive Language model on APPS dataset, since a common way to evaluate on this benchmark is after finetuning the model on its training split. We use Hugging Face Trainer which supports distributed training on multiple GPUs.

Setup

First login to Weights & Biases

wandb login

You can finetune a model, gpt_345_python_any_license for example, by running:

# we use a global batch size of 256, here = 8 (GPUs) * 2 (batch_size_per_device) * 16 (gradient_accumulation)
python apps_train.py \
        --model_ckpt BigCode/gpt_345_python_any_license \
        --num_epochs 10 \
        --batch_size 2 \
        --gradient_accumulation_steps 16 \
        --learning_rate 5e-5 \
        --eval_freq 250 \
        --fp16

The fine-tuning takes 11h on 4 A100 GPUs.

Acknowledgments

This script is adapted from APPS repository.