Skip to content

Fix issue with ratio evaluation steps and auto find batch size#25390

Closed
muellerzr wants to merge 35 commits intomainfrom
muellerzr-ratio
Closed

Fix issue with ratio evaluation steps and auto find batch size#25390
muellerzr wants to merge 35 commits intomainfrom
muellerzr-ratio

Conversation

@muellerzr
Copy link
Contributor

What does this PR do?

Modifies step ratios in the case of when auto_find_batch_size is used, otherwise it will still maintain the old ratio step (so if we went from 10% starting at 100 steps, at 1000 steps it would still try and evaluate at step 10 instead of step 100)

Fixes # (issue)

Solves #24248

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sgugger @amyeroberts

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 8, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! There is normally a test that checks the training arguments have not been changed. I'm guessing it didn't kick in with a float value for those ;-)

Might be worth using a logic that does not change the training arguments and use that test to avoid future regression. In general the training arguments are not supposed to be modified outside of the post init, to allow users to be able to re-use them. So here we should store (in the Trainer state if needed but I think this is all contained to one method?) the logging_steps/eval_steps etc. once converted.

* docs: ko: add_new_model.md

* feat: chatgpt draft

* fix: manual edits

* fix: change document title

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>

* fix: edit with reviewers

Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>

* fix: add anchor to header

* Update docs/source/ko/add_new_model.md

Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* Update docs/source/ko/add_new_model.md

Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* Update docs/source/ko/add_new_model.md

Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>

* fix: edit with reviews

* feat: edit toctree

---------

Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com>
Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com>
Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>
self.state = TrainerState()
self.state.is_hyper_param_search = trial is not None

# Compute absolute values for logging, eval, and save if given as ratio
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move this to after the TrainerState is created, we can maintain and use the _step_ratios that were already computed, but keep them in the state instead of changing the values in training arguments

hyeonseo2 and others added 3 commits August 9, 2023 18:27
* docs: ko: model_summary.md

* feat: nmt and manual edit model_summary.mdx

* fix: resolve suggestions

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>

* fix: resolve suggestions2

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>

---------

Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com>
Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
* update bark generation configs for more coherent parameter

* make style

* update bark hub repo
* aligned sample_beam specs with beam_search

* pull origin main

* Revert "pull origin main"

This reverts commit 06d356f.

* update test_utils.py

* fix format

* remove comment

---------

Co-authored-by: Shogo Fujita <shogo.fujita@legalontech.jp>
Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this!

Just to make sure I've understood correctly before approving - is this right:

  • Previously when using auto_find_batch_size, the evaluation step number would be wrong if one of the step ratios < 1
  • This was because in the first _inner_training_loop call, the args were overridden so that they where absolute rather than relative
  • This meant that in the next call the _inner_training_loop the logic checking for relative values was skipped.
  • The solution is to store the absolute values in the TrainerState rather than modify the trainer arguments

The part I think I'm missing is why this is triggered in the auto_find_batch_size case

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think more code should be migrated to look at the state. Are those the only places we look at logging_steps and co? The state should always be filled, not jsut when the variables are floats.

@amyeroberts This is triggered by auto_find_batch_size since this decorator calls the training loop several times with different batch sizes. Except that at the second run, the values of logging_steps and others are wrong since they were modified in the first run, and the number of steps per epoch has changed with the batch size change.

@muellerzr
Copy link
Contributor Author

@sgugger correct, those are the only places. (There are references in the tensorflow class, however I'm unsure if they need the migration or not).

What other aspects of the trainer should we look for when determining if it should go into the state?

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! We're almost there.

In terms of design for the Trainer: training arguments should be frozen after post init (exactly for this kind of bug, and there were others in hyperparameter search as well) whereas state contains the thing that can change depending on the training run. Does that make sense?

NielsRogge and others added 2 commits August 10, 2023 09:13
* docs: ko: philosophy.md

* feat: chatgpt draft

* fix: manual edits

* fix: resolve suggestions
sgugger and others added 6 commits August 10, 2023 10:53
* Document check_dummies

* Type hints and doc in other files

* Document check inits

* Add documentation to

* Address review comments
* strict gen config save; Add tests

* add note that the warning will be an exception in v4.34
* [WavLM] Fix Arxiv link and authors

* make style
@muellerzr
Copy link
Contributor Author

Borked the rebase 😭 Will open a new PR

@muellerzr
Copy link
Contributor Author

New PR opened in #25436

@muellerzr muellerzr deleted the muellerzr-ratio branch August 10, 2023 13:02
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.