Fix issue with ratio evaluation steps and auto find batch size#25390
Fix issue with ratio evaluation steps and auto find batch size#25390
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
sgugger
left a comment
There was a problem hiding this comment.
Thanks for the fix! There is normally a test that checks the training arguments have not been changed. I'm guessing it didn't kick in with a float value for those ;-)
Might be worth using a logic that does not change the training arguments and use that test to avoid future regression. In general the training arguments are not supposed to be modified outside of the post init, to allow users to be able to re-use them. So here we should store (in the Trainer state if needed but I think this is all contained to one method?) the logging_steps/eval_steps etc. once converted.
609a6f0 to
4d3a697
Compare
* docs: ko: add_new_model.md * feat: chatgpt draft * fix: manual edits * fix: change document title * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> * fix: edit with reviewers Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> * fix: add anchor to header * Update docs/source/ko/add_new_model.md Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com> * Update docs/source/ko/add_new_model.md Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com> * Update docs/source/ko/add_new_model.md Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com> * fix: edit with reviews * feat: edit toctree --------- Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com> Co-authored-by: Jungnerd <46880056+jungnerd@users.noreply.github.com> Co-authored-by: SeongWooChoi <46990061+nuatmochoi@users.noreply.github.com> Co-authored-by: 이서정 <97655267+sjlee-wise@users.noreply.github.com>
| self.state = TrainerState() | ||
| self.state.is_hyper_param_search = trial is not None | ||
|
|
||
| # Compute absolute values for logging, eval, and save if given as ratio |
There was a problem hiding this comment.
If we move this to after the TrainerState is created, we can maintain and use the _step_ratios that were already computed, but keep them in the state instead of changing the values in training arguments
* docs: ko: model_summary.md * feat: nmt and manual edit model_summary.mdx * fix: resolve suggestions Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com> * fix: resolve suggestions2 Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> --------- Co-authored-by: Sohyun Sim <96299403+sim-so@users.noreply.github.com> Co-authored-by: Wonhyeong Seo <wonhseo@kakao.com>
* update bark generation configs for more coherent parameter * make style * update bark hub repo
* aligned sample_beam specs with beam_search * pull origin main * Revert "pull origin main" This reverts commit 06d356f. * update test_utils.py * fix format * remove comment --------- Co-authored-by: Shogo Fujita <shogo.fujita@legalontech.jp>
amyeroberts
left a comment
There was a problem hiding this comment.
Thanks for fixing this!
Just to make sure I've understood correctly before approving - is this right:
- Previously when using
auto_find_batch_size, the evaluation step number would be wrong if one of the step ratios < 1 - This was because in the first
_inner_training_loopcall, the args were overridden so that they where absolute rather than relative - This meant that in the next call the
_inner_training_loopthe logic checking for relative values was skipped. - The solution is to store the absolute values in the TrainerState rather than modify the trainer arguments
The part I think I'm missing is why this is triggered in the auto_find_batch_size case
sgugger
left a comment
There was a problem hiding this comment.
I think more code should be migrated to look at the state. Are those the only places we look at logging_steps and co? The state should always be filled, not jsut when the variables are floats.
@amyeroberts This is triggered by auto_find_batch_size since this decorator calls the training loop several times with different batch sizes. Except that at the second run, the values of logging_steps and others are wrong since they were modified in the first run, and the number of steps per epoch has changed with the batch size change.
|
@sgugger correct, those are the only places. (There are references in the tensorflow class, however I'm unsure if they need the migration or not). What other aspects of the trainer should we look for when determining if it should go into the state? |
sgugger
left a comment
There was a problem hiding this comment.
Thanks for iterating! We're almost there.
In terms of design for the Trainer: training arguments should be frozen after post init (exactly for this kind of bug, and there were others in hyperparameter search as well) whereas state contains the thing that can change depending on the training run. Does that make sense?
Update pooler output
* docs: ko: philosophy.md * feat: chatgpt draft * fix: manual edits * fix: resolve suggestions
* Document check_dummies * Type hints and doc in other files * Document check inits * Add documentation to * Address review comments
* strict gen config save; Add tests * add note that the warning will be an exception in v4.34
* [WavLM] Fix Arxiv link and authors * make style
fix rendering
…sformers into muellerzr-ratio
|
Borked the rebase 😭 Will open a new PR |
|
New PR opened in #25436 |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
What does this PR do?
Modifies step ratios in the case of when
auto_find_batch_sizeis used, otherwise it will still maintain the old ratio step (so if we went from 10% starting at 100 steps, at 1000 steps it would still try and evaluate at step 10 instead of step 100)Fixes # (issue)
Solves #24248
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sgugger @amyeroberts