-
Notifications
You must be signed in to change notification settings - Fork 32.4k
Closed
Labels
GenerationWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress
Description
🧹 This is a tracker regarding the move of prepare_inputs_for_generation into the generation mixin 🧹
Why?
prepare_inputs_for_generationis not part of the core modeling, but rather a utility forgenerate- it should greatly reduce the need to touch modeling code, on
generatechanges. Fewer modeling changes -> improved model stability - greatly reduced number of lines of code 🙏
Tracker
Kinda ordered list of tasks:
- 1. Fix related slow tests before we start — all
llama,generate, andcache_utils[except sink cache, broken atm] slow tests should be passing to ensure we don’t break anything (Llama: make slow tests green 🟢 #33138) - 2.
PreTrainedModeldoesn't inherit fromGenerationMixin, so thatcan_generate()becomes independent ofprepare_inputs_for_generationbeing overwritten or not (Generation: deprecatePreTrainedModelinheriting fromGenerationMixin#33203) - 3. Move llama’s
prepare_inputs_for_generationto the generation mixin. This implies moving one function that prepares the 4D mask too (the one that is called there) (Generate: move llamaprepare_inputs_for_generationtoGenerationMixin#33677) - 4. Add tests for the generalist
prepare_inputs_for_generation— currently we don’t test it directly, and we should (decoder-only llms: Generate: remove most decoder-only LLMsprepare_inputs_for_generation#33870, encoder-decoder llms: Generate: moveprepare_inputs_for_generationin encoder-decoder llms #34048) - 5. Address the case of
synced_gpusingenerate: whensynced_gpusandcache_positionsis out of bounds, take the latest availableinput_idsfor dummy computations (Generate: Fix modern llmgeneratecalls withsynced_gpus#34095) - 6. Delete
prepare_inputs_for_generationfrom as many models as possible. There may be merge conflicts here, due to the 4D mask function. Try to iron out as many trivial cases as possible (decoder-only llms: Generate: remove most decoder-only LLMsprepare_inputs_for_generation#33870, encoder-decoder llms: Generate: moveprepare_inputs_for_generationin encoder-decoder llms #34048) - 7. Change
prepare_inputs_for_generationto forward**kwargsfrom its input to its output. With minimal changes, this should enable most VLMs to use the shared function -- they forwardpixel_valuesfrom the input to the output (support for **kwargs Generate: remove most decoder-only LLMsprepare_inputs_for_generation#33870) - 8. By this point most cases of
prepare_inputs_for_generationshould have been removed 🤗 We would need to check the others individually, there may be further simplification patterns available!
Reactions are currently unavailable
Metadata
Metadata
Labels
GenerationWIPLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progressLabel your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress