Fix issue with ratio evaluation steps and auto find batch size by muellerzr · Pull Request #25390 · huggingface/transformers

muellerzr · 2023-08-08T17:56:09Z

What does this PR do?

Modifies step ratios in the case of when auto_find_batch_size is used, otherwise it will still maintain the old ratio step (so if we went from 10% starting at 100 steps, at 1000 steps it would still try and evaluate at step 10 instead of step 100)

Fixes # (issue)

Solves #24248

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger @amyeroberts

HuggingFaceDocBuilderDev · 2023-08-08T18:18:56Z

The documentation is not available anymore as the PR was closed or merged.

sgugger

Thanks for the fix! There is normally a test that checks the training arguments have not been changed. I'm guessing it didn't kick in with a float value for those ;-)

Might be worth using a logic that does not change the training arguments and use that test to avoid future regression. In general the training arguments are not supposed to be modified outside of the post init, to allow users to be able to re-use them. So here we should store (in the Trainer state if needed but I think this is all contained to one method?) the logging_steps/eval_steps etc. once converted.

src/transformers/trainer.py

* docs: ko: add_new_model.md * feat: chatgpt draft * fix: manual edits * fix: change document title * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: add anchor to header * Update docs/source/ko/add_new_model.md Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com> * Update docs/source/ko/add_new_model.md Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com> * Update docs/source/ko/add_new_model.md Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com> * fix: edit with reviews * feat: edit toctree --------- Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com> Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

muellerzr · 2023-08-09T16:24:57Z

src/transformers/trainer.py

        self.state = TrainerState()
        self.state.is_hyper_param_search = trial is not None

+        # Compute absolute values for logging, eval, and save if given as ratio


If we move this to after the TrainerState is created, we can maintain and use the _step_ratios that were already computed, but keep them in the state instead of changing the values in training arguments

* docs: ko: model_summary.md * feat: nmt and manual edit model_summary.mdx * fix: resolve suggestions Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com> * fix: resolve suggestions2 Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> --------- Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* update bark generation configs for more coherent parameter * make style * update bark hub repo

* aligned sample_beam specs with beam_search * pull origin main * Revert "pull origin main" This reverts commit 06d356f. * update test_utils.py * fix format * remove comment --------- Co-authored-by: Shogo Fujita <shogo.fujita@legalontech.jp>

amyeroberts

Thanks for fixing this!

Just to make sure I've understood correctly before approving - is this right:

Previously when using auto_find_batch_size, the evaluation step number would be wrong if one of the step ratios < 1
This was because in the first _inner_training_loop call, the args were overridden so that they where absolute rather than relative
This meant that in the next call the _inner_training_loop the logic checking for relative values was skipped.
The solution is to store the absolute values in the TrainerState rather than modify the trainer arguments

The part I think I'm missing is why this is triggered in the auto_find_batch_size case

src/transformers/trainer_callback.py

src/transformers/trainer.py

sgugger

I think more code should be migrated to look at the state. Are those the only places we look at logging_steps and co? The state should always be filled, not jsut when the variables are floats.

@amyeroberts This is triggered by auto_find_batch_size since this decorator calls the training loop several times with different batch sizes. Except that at the second run, the values of logging_steps and others are wrong since they were modified in the first run, and the number of steps per epoch has changed with the batch size change.

src/transformers/trainer.py

muellerzr · 2023-08-09T19:01:16Z

@sgugger correct, those are the only places. (There are references in the tensorflow class, however I'm unsure if they need the migration or not).

What other aspects of the trainer should we look for when determining if it should go into the state?

sgugger

Thanks for iterating! We're almost there.

In terms of design for the Trainer: training arguments should be frozen after post init (exactly for this kind of bug, and there were others in hyperparameter search as well) whereas state contains the thing that can change depending on the training run. Does that make sense?

src/transformers/trainer.py

src/transformers/trainer_callback.py

Update pooler output

* docs: ko: philosophy.md * feat: chatgpt draft * fix: manual edits * fix: resolve suggestions

* Document check_dummies * Type hints and doc in other files * Document check inits * Add documentation to * Address review comments

* strict gen config save; Add tests * add note that the warning will be an exception in v4.34

* [WavLM] Fix Arxiv link and authors * make style

fix rendering

…sformers into muellerzr-ratio

muellerzr · 2023-08-10T12:53:57Z

Borked the rebase 😭 Will open a new PR

muellerzr · 2023-08-10T13:02:34Z

New PR opened in #25436

HuggingFaceDocBuilderDev · 2023-08-10T13:04:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

muellerzr added 5 commits June 13, 2023 10:24

Try with this

f5cc163

Try again

04c3368

Better solution

a5329e4

Run again

363865f

Maintain ratios

7f7f818

muellerzr requested review from amyeroberts and sgugger August 8, 2023 17:56

sgugger reviewed Aug 9, 2023

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

Migrate to after state creation

4d3a697

muellerzr force-pushed the muellerzr-ratio branch from 609a6f0 to 4d3a697 Compare August 9, 2023 16:24

muellerzr commented Aug 9, 2023

View reviewed changes

hyeonseo2 and others added 3 commits August 9, 2023 18:27

Update Bark generation configs and tests (#25409)

704bf59

* update bark generation configs for more coherent parameter * make style * update bark hub repo

amyeroberts reviewed Aug 9, 2023

View reviewed changes

src/transformers/trainer_callback.py Show resolved Hide resolved

src/transformers/trainer.py Outdated Show resolved Hide resolved

Enable passing number of channels when inferring data format (#25412)

944ddce

sgugger reviewed Aug 9, 2023

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

src/transformers/trainer.py Outdated Show resolved Hide resolved

src/transformers/trainer.py Outdated Show resolved Hide resolved

gante and others added 3 commits August 9, 2023 18:51

Bark: flexible generation config overload (#25414)

d0c1aeb

Just let it have the right value

2cba435

Docstring

936d2dc

muellerzr requested review from amyeroberts and sgugger August 9, 2023 19:01

muellerzr added 2 commits August 9, 2023 19:04

save_steps

d7eb82e

Readability

11f322b

sgugger reviewed Aug 9, 2023

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

src/transformers/trainer_callback.py Outdated Show resolved Hide resolved

NielsRogge and others added 2 commits August 10, 2023 09:13

[DINOv2] Update pooler output (#25392)

b175fc3

Update pooler output

🌐 [i18n-KO] Translated philosophy.md to Korean (#25010)

b14d464

* docs: ko: philosophy.md * feat: chatgpt draft * fix: manual edits * fix: resolve suggestions

sgugger and others added 6 commits August 10, 2023 10:53

Doc checks (#25408)

16edf4d

* Document check_dummies * Type hints and doc in other files * Document check inits * Add documentation to * Address review comments

Generation: strict generation config validation at save time (#25411)

123ad53

* strict gen config save; Add tests * add note that the warning will be an exception in v4.34

[WavLM] Fix Arxiv link and authors (#25415)

d0839f1

* [WavLM] Fix Arxiv link and authors * make style

Generate: Load generation config when device_map is passed (#25413)

3e41cf1

Fix rendering for torch.compile() docs (#25432)

e7b001d

fix rendering

Sylvains notes

525ed53

muellerzr mentioned this pull request Aug 10, 2023

Make training args fully immutable #25435

Merged

5 tasks

muellerzr added 11 commits August 10, 2023 12:46

Try again

eff466d

Better solution

eeb1cec

Run again

98fd2b1

Maintain ratios

ca65409

Migrate to after state creation

46a23b4

Just let it have the right value

947ae55

Docstring

355b573

save_steps

43bd9cf

Readability

acdde7f

Sylvains notes

0a69ccc

Merge branch 'muellerzr-ratio' of https://github.com/huggingface/tran…

1a7b8b9

…sformers into muellerzr-ratio

muellerzr closed this Aug 10, 2023

muellerzr mentioned this pull request Aug 10, 2023

Fix issue with ratio evaluation steps and auto find batch size #25436

Merged

5 tasks

muellerzr deleted the muellerzr-ratio branch August 10, 2023 13:02

Conversation

muellerzr commented Aug 8, 2023

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

muellerzr Aug 9, 2023

Choose a reason for hiding this comment

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

muellerzr commented Aug 9, 2023

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

muellerzr commented Aug 10, 2023

Uh oh!

muellerzr commented Aug 10, 2023

Uh oh!

HuggingFaceDocBuilderDev commented Aug 10, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

HuggingFaceDocBuilderDev commented Aug 8, 2023 •

edited

Loading