Skip to content

detect if torch.distributed is available#2963

Merged
BenjaminBossan merged 1 commit into
huggingface:mainfrom
vladmandic:main
Dec 17, 2025
Merged

detect if torch.distributed is available#2963
BenjaminBossan merged 1 commit into
huggingface:mainfrom
vladmandic:main

Conversation

@vladmandic

Copy link
Copy Markdown
Contributor

add torch.distributed.is_available() checks to detect torch variants that do not have distributed support.
closes #2958

Signed-off-by: vladmandic <mandic00@live.com>
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan BenjaminBossan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flawless, thanks for the PR. The failing CI is just some GH hiccup.

@BenjaminBossan BenjaminBossan merged commit c65c886 into huggingface:main Dec 17, 2025
9 of 10 checks passed
@vladmandic

Copy link
Copy Markdown
Contributor Author

@BenjaminBossan can i ask what's the plan for new peft release? last release was nov-13, quite some time ago.
and this issue is blocking lora use on a very popular model (z-image) and gpu (amd) combo.

@BenjaminBossan

Copy link
Copy Markdown
Member

I didn't know this was a common use case, that's good to know. We don't have enough for a full release at the moment, but I could see us doing patch release if it helps.

@vladmandic

Copy link
Copy Markdown
Contributor Author

I didn't know this was a common use case, that's good to know. We don't have enough for a full release at the moment, but I could see us doing patch release if it helps.

patch release is fine, its just that i have a pretty big install base and this is right now the most common open issue - and i cant point users to install from main every time.

fyi, why is this so common? therock build of torch-rocm-for-windows doesn't have distributed support and other than that, its by far the best way to run anything torch based if you have amd gpu on windows. sure, not as common as nvidia, but its still a really big community. and z-image model is currently the most popular new model, so each new user tries that.

@BenjaminBossan

Copy link
Copy Markdown
Member

Great, we'll aim at a patch release this week then.

Also, thanks for the further background info. It's always good for us to know the use cases of PEFT in the wild.

BenjaminBossan pushed a commit to BenjaminBossan/peft that referenced this pull request Jan 8, 2026
E.g. it's not available for the torch rocm build.

Signed-off-by: vladmandic <mandic00@live.com>
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Jan 8, 2026
BenjaminBossan added a commit that referenced this pull request Jan 9, 2026
* FIX Transformers v5 fixes (#2934)

With the v5 rc being out, we should now ensure that the PEFT tests pass.
This PR contains fixes to achieve that.

1. hub_online_once was failing because
transformers.utils.hub._is_offline_mode no longer exists. Using the new
function instead if transformers v5 is detected.

2.
tests/test_encoder_decoder_models.py::TestEncoderDecoderModels::test_merge_layers[LoraConfig-config_kwargs10-peft-internal-testing/tiny-random-BartForConditionalGeneration]
failing due to TrainableTokensWrapper not being applied to all layers
owing to changes to _tied_weights_keys.

3. While working on this, I discovered a tangential bug in
TrainableTokensLayer.get_merged_weights. This method returns a
torch.Tensor but the expected type is nn.Parameter (since foo.bar.weight
is supposed to be a nn.Parameter). This type mismatch would cause
torch's model.get_parameter, which I used in
_get_module_names_tied_with_embedding, to fail. At first, I wanted to
change the return type to nn.Parameter but this causes all kinds of
issues. Therefore, I left this bug as is. Instead, in
_get_module_names_tied_with_embedding, I opted to use attrgetter instead
of model.get_parameter.

* FIX Detect if torch.distributed is available (#2963)

E.g. it's not available for the torch rocm build.

Signed-off-by: vladmandic <mandic00@live.com>

* FIX Don't implicitly require transformers v4.52 (#2976)

Resolves #2975

In #2826, we inadvertently added a dependency on transformers v4.52 to
PEFT. However, this is really only needed under very specific
circumstances (aLoRA + gradient checkpointing). With this PR, unless
we're in these circumstances, this requirement is no longer there.

* Release: v0.18.1

Contains the following changes:

- #2934
- #2963
- #2976

---------

Signed-off-by: vladmandic <mandic00@live.com>
Co-authored-by: Vladimir Mandic <mandic00@live.com>
@BenjaminBossan

Copy link
Copy Markdown
Member

@vladmandic The v0.18.1 patch release is out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

peft assumes that torch has distributed support without checking

3 participants