Skip to content

[core] fix silent bug keep_in_fp32 modules#26589

Merged
younesbelkada merged 6 commits intohuggingface:mainfrom
younesbelkada:fix-final-instructblip
Oct 5, 2023
Merged

[core] fix silent bug keep_in_fp32 modules#26589
younesbelkada merged 6 commits intohuggingface:mainfrom
younesbelkada:fix-final-instructblip

Conversation

@younesbelkada
Copy link
Contributor

@younesbelkada younesbelkada commented Oct 4, 2023

What does this PR do?

Same PR as #26484 but without any extra diff

Before this PR we were performing a simple check if module_name in key but that lead to some modules silently converted in fp32.

For example instructblip models got their word_embedding layers converted in fp32 because _keep_in_fp32_modules includes "wo" which is contained in the string word_embedding. The fix is to check if module_name in key.split(".")

I can confirm with this PR the failing instructblip tests now pass

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Oct 4, 2023

The documentation is not available anymore as the PR was closed or merged.

@younesbelkada younesbelkada marked this pull request as ready for review October 4, 2023 14:45
@younesbelkada
Copy link
Contributor Author

Before merging this PR I want to test it on T5 models and 8bit tests as this might affect them

model = InstructBlipForConditionalGeneration.from_pretrained(
"Salesforce/instructblip-flan-t5-xl",
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to make the loading of the model faster

@younesbelkada
Copy link
Contributor Author

Relevant T5 and bnb tests are passing, this PR is ready for review!

@younesbelkada
Copy link
Contributor Author

... and tested the failing instructblip tests on the latest docker image and they pass with these changes

@LysandreJik
Copy link
Member

Thanks for the PR! Do we have a common test that could have failed in this specific instance? If not, would it be possible to work on one?

I'm a bit afraid of the repercussions of such a change without a test that ensures the modules that should be kept in fp32 actually are and that those that shouldn't are kept in their original dtype. It is fixing a silent error but also seems like it could break some silent successes

@younesbelkada
Copy link
Contributor Author

Hi @LysandreJik - OK makes sense, I am happy to work on a common test for that - I'll ping you once this is done

@younesbelkada
Copy link
Contributor Author

Tests are passing, this is ready for another review!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks @younesbelkada

@younesbelkada younesbelkada merged commit e6d250e into huggingface:main Oct 5, 2023
@younesbelkada younesbelkada deleted the fix-final-instructblip branch October 5, 2023 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants