detect if torch.distributed is available#2963
Conversation
Signed-off-by: vladmandic <mandic00@live.com>
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Flawless, thanks for the PR. The failing CI is just some GH hiccup.
|
@BenjaminBossan can i ask what's the plan for new peft release? last release was nov-13, quite some time ago. |
|
I didn't know this was a common use case, that's good to know. We don't have enough for a full release at the moment, but I could see us doing patch release if it helps. |
patch release is fine, its just that i have a pretty big install base and this is right now the most common open issue - and i cant point users to install from main every time. fyi, why is this so common? therock build of torch-rocm-for-windows doesn't have distributed support and other than that, its by far the best way to run anything torch based if you have amd gpu on windows. sure, not as common as nvidia, but its still a really big community. and z-image model is currently the most popular new model, so each new user tries that. |
|
Great, we'll aim at a patch release this week then. Also, thanks for the further background info. It's always good for us to know the use cases of PEFT in the wild. |
E.g. it's not available for the torch rocm build. Signed-off-by: vladmandic <mandic00@live.com>
Contains the following changes: - huggingface#2934 - huggingface#2963 - huggingface#2976
* FIX Transformers v5 fixes (#2934) With the v5 rc being out, we should now ensure that the PEFT tests pass. This PR contains fixes to achieve that. 1. hub_online_once was failing because transformers.utils.hub._is_offline_mode no longer exists. Using the new function instead if transformers v5 is detected. 2. tests/test_encoder_decoder_models.py::TestEncoderDecoderModels::test_merge_layers[LoraConfig-config_kwargs10-peft-internal-testing/tiny-random-BartForConditionalGeneration] failing due to TrainableTokensWrapper not being applied to all layers owing to changes to _tied_weights_keys. 3. While working on this, I discovered a tangential bug in TrainableTokensLayer.get_merged_weights. This method returns a torch.Tensor but the expected type is nn.Parameter (since foo.bar.weight is supposed to be a nn.Parameter). This type mismatch would cause torch's model.get_parameter, which I used in _get_module_names_tied_with_embedding, to fail. At first, I wanted to change the return type to nn.Parameter but this causes all kinds of issues. Therefore, I left this bug as is. Instead, in _get_module_names_tied_with_embedding, I opted to use attrgetter instead of model.get_parameter. * FIX Detect if torch.distributed is available (#2963) E.g. it's not available for the torch rocm build. Signed-off-by: vladmandic <mandic00@live.com> * FIX Don't implicitly require transformers v4.52 (#2976) Resolves #2975 In #2826, we inadvertently added a dependency on transformers v4.52 to PEFT. However, this is really only needed under very specific circumstances (aLoRA + gradient checkpointing). With this PR, unless we're in these circumstances, this requirement is no longer there. * Release: v0.18.1 Contains the following changes: - #2934 - #2963 - #2976 --------- Signed-off-by: vladmandic <mandic00@live.com> Co-authored-by: Vladimir Mandic <mandic00@live.com>
|
@vladmandic The v0.18.1 patch release is out. |
add torch.distributed.is_available() checks to detect torch variants that do not have distributed support.
closes #2958