[Cache] rename dtype attribute 🚨 🚨 #37044
Conversation
|
Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Looks like there is an issue finetuning Gemma3 with It works with |
|
Arf 😢 @SunMarc if you can have a look! |
|
We should no longer recommend eager as flex and flash do the proper fix |
|
@mark-314e I'm assuming it's not directly related to this PR, since this PR fixes a logic error that was preventing training in some circumstances. Would you be able to open a new issue with a self-contained example to reproduce it? 🤗 |
* yoink * same pattern in all cache
* yoink * same pattern in all cache
Fixes #36938
Fixes #36814
[fine-tunning gemma3 or other models with a non-default cache]
🚨 Breaking: renaming of a public attribute in a public class.
acceleratesensibly detects whether a given object is a tensor or tensor-like through its type or, alternatively, through the existence of adtypeattribute (example). OurStaticCacheand related objects acceptdtypeat init time, and store it as an attribute under the same name. Because of this,acceleratemay treat our caches as a tensor, leading to downstream problems as in the issues above.Since
self.dtypeis only used to initialize tensors, renaming it shouldn't be too breaking 🤞Code for reproduction