Bugfix for generation with an early-stopping process#32641
Bugfix for generation with an early-stopping process#32641ojh31 wants to merge 3 commits intohuggingface:mainfrom
Conversation
|
Failing tests seem to be related to versioning issues with CircleCI runners so marking this as ready for review anyway |
|
Hi @ojh31 👋 Thank you for opening #32603 and this PR! I see, multi-gpu + new cache changes likely brought problems. I may request changes to this PR, but let me first sync with our @SunMarc are the current shape-related problems of FSDP + This PR fixes the issue by removing the |
|
Hey @ojh31, usually for generation, we need to first unwrap the model. This is what is done in TRL for example : https://github.com/huggingface/trl/blob/54f806b6ffdfa49f584340aec18d079a58a3a342/trl/trainer/online_dpo_trainer.py#L536. Can you try that and see if you still get the error ? Also did it worked on past version of transformers ? @gante , we never had issues with FSDP + generate in the past. I'll investigate a bit ! If we have this issue in FSDP, we should also have for deepspeed. |
186c93a to
112c100
Compare
Thanks for taking a look at this @SunMarc! I get the same error modifying the last block as follows: I get the same error on transformers 4.42.4 and 4.44.0. However, I just randomly tried 4.35 and that did not hit the same error! |
|
git bisect finds that bd5091d is the breaking commit |
|
I am hitting the same issue with deepspeed zero3 on 4.44.0. For me, the mismatch is between the |
|
(see this comment in the original issue to avoid parallel discussions 🤗 ) |
|
with #34095, this PR should not be needed 🤗 |
|
(please reopen and ping me if the original issue is not fixed :) ) |
What does this PR do?
Fixes a bug where we get a shape mismatch from the forward call to a model which uses key-value caching if one of the processes finishes early during generation.
Fixes #32603
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.