Standardize image-text-to-text-models outputs#32471
Standardize image-text-to-text-models outputs#32471yonigozlan wants to merge 3 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@yonigozlan Chameleon can also do image-text-to-text |
Thanks! Will add it to the list |
molbap
left a comment
There was a problem hiding this comment.
Thanks for working on this! left a few comments, moving on to 32472 now
amyeroberts
left a comment
There was a problem hiding this comment.
Very nice! Looking forward to having all of the processing behaviour more standardized ❤️
Main comment is on the handling of the legacy behaviour
f25eb1d to
7074649
Compare
7074649 to
aa2b417
Compare
4137b24 to
04fb918
Compare
add post_process_image_text_to_text to chameleon and cleanup Fix legacy kwarg behavior and deprecation warning add post_process_image_text_to_text to qwen2_vl and llava_onevision Add post_process_image_text_to_text to idefics3, mllama, pixtral processor
04fb918 to
bc5cf3c
Compare
|
@LysandreJik This should be ready for a final review, and should significantly reduce the loc count and number of files changed for the image-text-to-text pipeline PR :). |
|
cc @molbap can you do an initial review please? |
|
Maybe not initial, but pre-final ? 😁 |
|
The changes from this PR were merged in #34170 |
What does this PR do?
Standardize outputs for existing image-text-to-text models by adding a
post_process_image_text_to_textfunction to their processor.Blocking PR for
image-text-to-textpipeline.The following models' processors need to be modified:
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@molbap @amyeroberts