Skip to content

Fix CPU offload + disk offload tests#27204

Merged
LysandreJik merged 1 commit intomainfrom
fix-safetensors-default-slow-failing-tests
Nov 1, 2023
Merged

Fix CPU offload + disk offload tests#27204
LysandreJik merged 1 commit intomainfrom
fix-safetensors-default-slow-failing-tests

Conversation

@LysandreJik
Copy link
Member

@LysandreJik LysandreJik commented Nov 1, 2023

Passing to safetensors serialization by default highlighted a few issues that we have with safetensors.

This PR fixes the issue, which is principally linked to weight sharing.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 1, 2023

The documentation is not available anymore as the PR was closed or merged.

@LysandreJik LysandreJik force-pushed the fix-safetensors-default-slow-failing-tests branch from e7651c9 to fed5e54 Compare November 1, 2023 12:32
@LysandreJik LysandreJik marked this pull request as ready for review November 1, 2023 12:32
@LysandreJik
Copy link
Member Author

@amyeroberts @patrickvonplaten if you feel uneasy with merging this right before the release, I'm fine with reverting the safetensors serialization by default to let it sit on main for a while longer. The release is going to be very packed already so it's fine for me.

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the fix so quickly!

@amyeroberts
Copy link
Contributor

@LysandreJik The change LGTM and seems to address some underlying issues. Re default safetensors serialization, I'm happy for it to be part of this release as long as some of the slow tests on the most popular models (bert, llama, wav2vec2, whisper, clip etc.) are good.

# Initialize weights and apply final processing
self.post_init()

def _tie_weights(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new_device_map = {}
for module, device in device_map.items():
new_device_map.update({p: device for p in param_names if p == module or p.startswith(f"{module}.")})
new_device_map.update(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean!

@LysandreJik
Copy link
Member Author

Thanks both for your reviews! I'll go ahead and merge this, sorry but you'll have the conflict Patrick 😁

@LysandreJik LysandreJik merged commit 95020f2 into main Nov 1, 2023
@LysandreJik LysandreJik deleted the fix-safetensors-default-slow-failing-tests branch November 1, 2023 18:25
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
Fix disk offload tests + weight sharing issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants