Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| @add_start_docstrings( | ||
| "The Emu3 Text Model which consists of transformer with self attention layers.", | ||
| EMU3_START_DOCSTRING, | ||
| ) | ||
| class Emu3TextModel(Emu3PreTrainedModel): | ||
| config_class = Emu3TextConfig |
There was a problem hiding this comment.
Adding LlamaModel to bases messes up the auto-generated modeling file by adding new classes like Emu3TextAttention and so on, while we have Emu3Attention
|
I think this is ready for review. @ArthurZucker will you be reviewing or is there anyone I can tag for initial review? Btw, the repo consistency tests will fail because the modular doesn't import |
|
You can tag @Cyrilvallez ! |
Cyrilvallez
left a comment
There was a problem hiding this comment.
Thanks a lot, great work! With the new modular version #34487, I think we can still improve a bit! Should be merged very soon, but this is already very nice imo if you don't want to wait 🤗
| @add_start_docstrings( | ||
| "The Emu3 Text Model which consists of transformer with self attention layers.", | ||
| EMU3_START_DOCSTRING, | ||
| ) | ||
| class Emu3TextModel(Emu3PreTrainedModel): | ||
| config_class = Emu3TextConfig |
ArthurZucker
left a comment
There was a problem hiding this comment.
Waiting for the updates regarding @Cyrilvallez 's PR, will review again once updated
|
heh, is something wrong with code owners? |
|
Yeah, seems like it automatically tags all code owners depending on files touched/created... @ArthurZucker, would be nice to not tag that many people at once |
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
ArthurZucker
left a comment
There was a problem hiding this comment.
Nice thanks for iterating! My only comment is that I have not personnaly looked enough at the MIMI or the VQVAE from Chameleon you would know better, but the more standard the better!
A few nits but good to go IMO.
|
|
||
| # autoregressively complete prompt | ||
| output = model.generate(**inputs, max_new_tokens=50) | ||
| print(processor.decode(output[0], skip_special_tokens=True)) |
There was a problem hiding this comment.
nice to have some expected outputs!
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
Let's merge 🚀 |
* model can convert to HF and be loaded back * nit * works in single batch generation but hallucinates * use the image tokens * add image generation * now it works * add tests * update * add modulare but it doesn't work for porting docstring :( * skip some tests * add slow tests * modular removed the import? * guess this works * update * update * fix copies * fix test * fix copies * update * docs * fix tests * last fix tests? * pls * repo consistency * more style * style * remove file * address comments * tiny bits * update after the new modular * fix tests * add one more cond in check attributes * decompose down/up/mid blocks * allow static cache generation in VLMs * nit * fix copies * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/emu3.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix VAE upsampling * Update src/transformers/models/emu3/modular_emu3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * address comments * state overwritten stuff explicitly * fix copies * add the flag for flex attn --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
As per title. The code can work for generating text in single-batch scenarios but the generated text doesn't match input image. For batched generation, seems like the orig impl neither supports it mostly because image features from processor are returned with different shapes (smart resize to converse as much orig image size as possible). We can try to do padding similar to llava-next but I am not sure if will just work, I'll contact the authors
TODO:
extra-0to smth like<image>And for image generation: