Add patience argument to Trainer#4186
Conversation
|
This supercedes #2840, where I added patience to the outdated |
|
Looking good! Can you add a reference to your original post that this closes #4894? Thanks |
julien-c
left a comment
There was a problem hiding this comment.
Looks good! Small suggestions there
| best_eval_loss = None | ||
| evals_without_improvement = 0 |
There was a problem hiding this comment.
nit: prefix those with patience_ as they're specific to this
| best_eval_loss = None | |
| evals_without_improvement = 0 | |
| patience_best_eval_loss = None | |
| patience_evals_without_improvement = 0 | |
| patience_should_stop = False |
| logger.info( | ||
| f"Patience threshold ({self.args.patience}) exceeded, stopping training" | ||
| ) |
There was a problem hiding this comment.
| logger.info( | |
| f"Patience threshold ({self.args.patience}) exceeded, stopping training" | |
| ) | |
| patience_should_stop = True | |
| logger.info( | |
| f"Patience threshold ({self.args.patience}) exceeded, stopping training" | |
| ) |
| if ((self.args.max_steps > 0 and global_step > self.args.max_steps) or | ||
| (self.args.patience > 0 and evals_without_improvement >= self.args.patience)): |
There was a problem hiding this comment.
| if ((self.args.max_steps > 0 and global_step > self.args.max_steps) or | |
| (self.args.patience > 0 and evals_without_improvement >= self.args.patience)): | |
| if ((self.args.max_steps > 0 and global_step > self.args.max_steps) or | |
| patience_should_stop): |
| break | ||
| if self.args.max_steps > 0 and global_step > self.args.max_steps: | ||
| if ((self.args.max_steps > 0 and global_step > self.args.max_steps) or | ||
| (self.args.patience > 0 and evals_without_improvement >= self.args.patience)): |
|
Hello, when this feature will be merged? I would like to use it. Thank you. |
There are some changes requested that @thesamuel should fix before this can be merged. |
|
Bump. Early stopping is critical for an automated Trainer that reliably gives us the best model. Current way of figuring out the training stopping point seems to be specifying a static train_epochs but the training duration a model can take depends on way too many factors like learning rate, data complexity, model, model size, optimizer and so on that it is unreasonable to ask the user to specify the epochs in advance. |
|
I would like to use this early stopping on downstream training. I also would like to add a feature that stores the model each time when the monitored metric improves and then optionaly loads the model after training. Then later evaluation can be done on this "best" model. @thesamuel @julien-c @kevin-yauris what do you think? |
|
I plan to work on this once I'm finished with the Funnel Transformer model @PhilipMay (so end of this week, beginning of the next). |
@sgugger That would be awsome. Maybe you want to get some inspiration from the FARM training loop which is pretty nice IMO: https://github.com/deepset-ai/FARM/blob/master/farm/train.py#L262-L370 |
|
I just found this PR that was already merged: #7431 |
|
Not quite, but it makes implementing it easier. |
Yes - you are right. The patience part is still missing. |
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
|
@sgugger Should we keep this open? You wrote in this thread you will work on this if you find the time, but I am not sure if you plan to use another PR for that. |
|
There has been a PR merged adding the |
|
Thanks @cbrochtrup @sgugger! Sorry I didn't get around to this... |
|
You're welcome, happy to help! |
This closes #4894.
Summary
Often, we want to stop training if loss does not improve for a number of epochs. This PR adds a "patience" argument, which is a limit on the number of times we can get a non-improving eval loss before stopping training early.
It is implemented by other NLP frameworks, such as AllenNLP (see trainer.py and metric_tracker.py).
Motivation
This feature allows faster fine-tuning by breaking the training loop early and avoids users the toil of checking metrics on Tensorboard.
Caveats
Often, models are evaluated once per epoch, but run_lm_finetuning.py has an option to evaluate after a set number of model update steps (dictated by
--logging_stepsif--evaluate_during_trainingis true). Because of this, I've elected to tie patience to the number of evaluations without improvement in loss.