Skip to content

👩‍🦯 Fix usage of VLM using text only#4080

Merged
qgallouedec merged 16 commits into
huggingface:mainfrom
SamuelBarryCS:fix-vlm-for-text-only-data
Sep 23, 2025
Merged

👩‍🦯 Fix usage of VLM using text only#4080
qgallouedec merged 16 commits into
huggingface:mainfrom
SamuelBarryCS:fix-vlm-for-text-only-data

Conversation

@SamuelBarryCS

@SamuelBarryCS SamuelBarryCS commented Sep 14, 2025

Copy link
Copy Markdown
Contributor

What

  • Fixes sft_gemma3 example doesn't work #3957 by checking if there is image and adapting the call to the processor accordingly
  • Fixes VLM config tests that were failing (mismatch in between config.text_config.num_hidden_layers and layer_types was causing the tests to fail)

How to review

  • Read diff

Test performed

  • Issue reproduced with a small script (pushed, but will be deleted before merging). Before the fix, the script yielded [rank0]: KeyError: 'images' like in sft_gemma3 example doesn't work #3957 but after the fix it is working just fine
  • No major logic change so existing tests still passing

Comment thread trl/trainer/sft_trainer.py Outdated
@SamuelBarryCS SamuelBarryCS changed the title [WIP] Fix usage of VLM using text only Fix usage of VLM using text only Sep 14, 2025
@SamuelBarryCS SamuelBarryCS marked this pull request as ready for review September 14, 2025 06:22
Comment thread test_vlm_text_only_issue.py Outdated
@@ -0,0 +1,31 @@
"""Test for issue #3957 - VLM KeyError fix"""

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be deleted before merging

@SamuelBarryCS

SamuelBarryCS commented Sep 14, 2025

Copy link
Copy Markdown
Contributor Author

cc @qgallouedec for review when you get the time.
Merci!

@sergiopaniego sergiopaniego left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!
We could add a modified version of the script as a test in test_sft_trainer.py and remove the script

@SamuelBarryCS

SamuelBarryCS commented Sep 17, 2025

Copy link
Copy Markdown
Contributor Author

Thanks for the fix! We could add a modified version of the script as a test in test_sft_trainer.py and remove the script

Thanks for your quick reply @sergiopaniego !
Done in 6465594 :), and the test is passing fine:
image

Good to merge on my side (but it looks like we still need approval for the workflows to run)

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SamuelBarryCS

Copy link
Copy Markdown
Contributor Author

@sergiopaniego do you understand why the tests are failing ? It doesn't look related to my changes ... :x
Otherwise ready to merge on my side (once this test issue is solved).

@qgallouedec

Copy link
Copy Markdown
Member

Hi, thanks for your contribution. I opted for a different approach: when the dataset does not contain images, it can be pre-processed, and there is no point in doing it on the fly.

@qgallouedec qgallouedec changed the title Fix usage of VLM using text only 👩‍🦯 Fix usage of VLM using text only Sep 23, 2025
@qgallouedec qgallouedec merged commit 9e5e60c into huggingface:main Sep 23, 2025
4 of 10 checks passed
@SamuelBarryCS

Copy link
Copy Markdown
Contributor Author

Hi, thanks for your contribution. I opted for a different approach: when the dataset does not contain images, it can be pre-processed, and there is no point in doing it on the fly.

This works as well! Thanks for the edit @qgallouedec.

qgallouedec added a commit that referenced this pull request Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sft_gemma3 example doesn't work

4 participants