Add early stopping callback to pytorch trainer by cbrochtrup · Pull Request #8581 · huggingface/transformers

cbrochtrup · 2020-11-17T04:43:53Z

Summary

Address PyTorch half of #4894 by adding early stopping patience and a minimum threshold metrics must improve to prevent early stopping. I piggybacked heavily off of #7431 since the two functions are very similar.

Since #4186 seems to be abandoned and behind master, I figured I'd take a crack at this.

Who can review?

Anyone! But @julien-c and @sgugger seem the most appropriate.

… to prevent early stopping to pytorch trainer

sgugger · 2020-11-17T12:50:23Z

Hi there. Thanks your PR! When I was designing the callbacks, it was to be them small independent pieces of code. I would prefer if early stopping had its own callback that the user would then choose to add or not. Do you think you could amend your PR in that direction?

cbrochtrup · 2020-11-17T15:52:04Z

Hello, thank you for your feedback! I will amend the PR in that direction.

Could you clarify which pieces of early stopping should be in TrainerState and which should be in the callback? I'm grappling with the similarities between best_model_checkpoint and early stopping attributes.

class EarlyStoppingCallback(TrainerCallback):
    best_metric: Optional[float] = None # maybe not this
    best_model_checkpoint: Optional[str] = None # maybe not this either
    early_stopping_patience: int = None
    early_stopping_patience_counter: int = None

    def on_evaluate(self, args, state, control, **kwargs):
        # Keep track of patience
        # End training via early stopping
        if (
            self.early_stopping_patience is not None
            and self.early_sotpping_patience_counter >= self.early_stopping_patience
        ):
            control.should_training_stop = True

cbrochtrup · 2020-11-17T15:58:36Z

Or do you mean I just move the if statement I added to its own callback and keep TrainerState as is?

sgugger · 2020-11-17T16:57:25Z

The TrainerState shouldn't change, so the callback you are writing above sounds fine, without the arguments marked with # maybe not this, which should already be in the TrainerState, I think.
Does that sound right to you?

cbrochtrup · 2020-11-17T17:59:03Z

That makes sense. I think this block of code (to line 933) could be a callback because it's all about the best metric. Then users could customize the best model calculations. Is that desirable?

If you think that's out of scope I'll keep the early stopping callback simple and separate from the best metric calculation.

sgugger · 2020-11-17T19:02:12Z

I had put it in Trainer because I thought multiple callbacks could need it and it's used by load_best_model_at_end which is kind of a core feature.

cbrochtrup · 2020-11-17T20:08:18Z

Sounds good, you know best! I keep load_best_model_at_end in the Trainer and push up an early stopping callback sometime this week.

…est metric for early stopping callback to trigger on.

…asses.

src/transformers/trainer_callback.py

src/transformers/training_args.py

src/transformers/trainer_callback.py

sgugger

A few mote things to change, but we're close to get this in good state. Thanks a lot for your work on this!

src/transformers/trainer_callback.py

sgugger · 2020-11-19T19:56:08Z

src/transformers/trainer_callback.py

+        metric_value = metrics.get(metric_to_check)
+
+        if metric_value is None:
+            logger.warning(


Good warning!

tests/test_trainer.py

…omment out of public docs.

…rguments.

sgugger · 2020-11-20T14:01:23Z

src/transformers/trainer_callback.py

+            self.early_stopping_patience_counter += 1
+
+    def on_train_begin(self, args, state, control, **kwargs):
+        assert args.load_best_model_at_end, "EarlyStoppingCallback requires load_best_model_at_end = True"


I still don't understand why this line is necessary? I feel we should be able to use this callback without the option load_best_model_at_end? The other sanity checks are perfectly ok.

This is necessary because we require control.should_save=True for _save_checkpoint to update the best metric. Should I move the best metric calculation into its own function and place it in the should_evaluate block?

I agree that it's not fully intuitive to need load_best_model_at_end, but it makes sense to me because if we don't load the best model early stopping will stop us, but the model we receive back from training will not be the model early stopping deemed best.

Ok let's leave it as is for now then, and we will re-evaluate if some users complain!

Saw this issue while debugging something. It doesn't seem intuitive how these two are related, so can we please do what @cbrochtrup suggested above?

cbrochtrup · 2020-11-20T16:14:18Z

Thanks for your thorough and affable review!

src/transformers/trainer_callback.py

LysandreJik

Great addition, LGTM!

cbrochtrup added 3 commits November 16, 2020 22:26

Add early stopping patience and minimum threshold metric must improve…

3a590c2

… to prevent early stopping to pytorch trainer

Add early stopping test

3c965a3

Set patience counter to 0 if best metric not defined yet

e812753

cbrochtrup added 4 commits November 19, 2020 00:07

Make early stopping a callback. Add callback event for updating the b…

10f1ded

…est metric for early stopping callback to trigger on.

Run make style

9945732

make funciton name sensible

2e36640

Improve new argument docstring wording and hope that flakey CI test p…

b59e265

…asses.

sgugger reviewed Nov 19, 2020

View reviewed changes

src/transformers/trainer_callback.py Outdated Show resolved Hide resolved

src/transformers/trainer_callback.py Outdated Show resolved Hide resolved

src/transformers/trainer_callback.py Outdated Show resolved Hide resolved

src/transformers/training_args.py Outdated Show resolved Hide resolved

cbrochtrup added 4 commits November 19, 2020 11:23

Use on_evaluation callback instead of custom. Remove some debug printing

e75acaa

Move early stopping arguments and state into early stopping callback

b500b6f

Run make style

ebc191c

Remove old code

ec1a1f3

cbrochtrup commented Nov 19, 2020

View reviewed changes

src/transformers/trainer_callback.py Outdated Show resolved Hide resolved

cbrochtrup added 2 commits November 19, 2020 11:56

Fix docs formatting. make style went rogue on me.

af3983f

Remove copied attributes and fix variable

676e52d

cbrochtrup changed the title ~~Add early stopping patience to pytorch trainer~~ Add early stopping callback to pytorch trainer Nov 19, 2020

sgugger reviewed Nov 19, 2020

View reviewed changes

cbrochtrup added 4 commits November 19, 2020 22:02

Add assertions on training arguments instead of mutating them. Move c…

9201cb6

…omment out of public docs.

Make separate test for early stopping callback. Add test of invalid a…

7f43217

…rguments.

Run make style... I remembered before CI this time!

3093b51

appease flake8

b255738

sgugger reviewed Nov 20, 2020

View reviewed changes

sgugger requested a review from LysandreJik November 20, 2020 16:13

sgugger approved these changes Nov 20, 2020

View reviewed changes

cbrochtrup added 2 commits November 20, 2020 21:13

Add EarlyStoppingCallback to callback docs

b165ac0

Make docstring EarlyStoppingCallabck match other callbacks.

5fd7148

rbournhonesque reviewed Nov 21, 2020

View reviewed changes

src/transformers/trainer_callback.py Outdated Show resolved Hide resolved

Fix typo in docs

eecf21f

LysandreJik approved these changes Nov 23, 2020

View reviewed changes

sgugger merged commit 8ffc01a into huggingface:master Nov 23, 2020

cbrochtrup deleted the early-stopping-patience branch November 23, 2020 23:04

sgugger mentioned this pull request Nov 30, 2020

Add patience argument to Trainer #4186

Closed

Conversation

cbrochtrup commented Nov 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Who can review?

Uh oh!

sgugger commented Nov 17, 2020

Uh oh!

cbrochtrup commented Nov 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cbrochtrup commented Nov 17, 2020

Uh oh!

sgugger commented Nov 17, 2020

Uh oh!

cbrochtrup commented Nov 17, 2020

Uh oh!

sgugger commented Nov 17, 2020

Uh oh!

cbrochtrup commented Nov 17, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sgugger Nov 19, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sgugger Nov 20, 2020

Choose a reason for hiding this comment

Uh oh!

cbrochtrup Nov 20, 2020

Choose a reason for hiding this comment

Uh oh!

cbrochtrup Nov 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger Nov 20, 2020

Choose a reason for hiding this comment

Uh oh!

shubhanjan99 Mar 27, 2024

Choose a reason for hiding this comment

Uh oh!

cbrochtrup commented Nov 20, 2020

Uh oh!

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cbrochtrup commented Nov 17, 2020 •

edited

Loading

cbrochtrup commented Nov 17, 2020 •

edited

Loading

cbrochtrup Nov 20, 2020 •

edited

Loading