Skip to content

FEAT Add hotswapping functionality#2120

Merged
BenjaminBossan merged 18 commits intohuggingface:mainfrom
BenjaminBossan:feat-add-hotswap-2
Oct 23, 2024
Merged

FEAT Add hotswapping functionality#2120
BenjaminBossan merged 18 commits intohuggingface:mainfrom
BenjaminBossan:feat-add-hotswap-2

Conversation

@BenjaminBossan
Copy link
Copy Markdown
Member

See also huggingface/diffusers#9453

The idea of hotswapping an adapter is the following: We can already load multiple adapters, e.g. two LoRAs, at the same time. But sometimes, we want to load one LoRA and then replace its weights in-place with the LoRA weights of another adapter. This is now possible the hotswap_adapter function.

In general, this should be faster than deleting one adapter and loading the adapter in its place, which would be the current way to achieve the same final outcome. Another advantage of hotswapping is that it prevents re-compilation in case the PEFT model is already compiled. This can save quite a lot of time.

There are some caveats for hotswapping:

  • It only works for the same PEFT method, so no swapping LoRA and LoHa.
  • Right now, only LoRA is properly supported.
  • The adapters must be compatible (e.g. same LoRA alpha, same target modules).

See also huggingface/diffusers#9453

The idea of hotswapping an adapter is the following: We can already load
multiple adapters, e.g. two LoRAs, at the same time. But sometimes, we
want to load one LoRA and then replace its weights in-place with the
LoRA weights of another adapter. This is now possible the
hotswap_adapter function.

In general, this should be faster than deleting one adapter and loading
the adapter in its place, which would be the current way to achieve the
same final outcome. Another advantage of hotswapping is that it prevents
re-compilation in case the PEFT model is already compiled. This can save
quite a lot of time.

There are some caveats for hotswapping:

- It only works for the same PEFT method, so no swapping LoRA and LoHa.
- Right now, only LoRA is properly supported.
- The adapters must be compatible (e.g. same LoRA alpha, same target
  modules).
return peft_model_state_dict, mismatched


def _insert_adapter_name_into_state_dict(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same code as before, but factored out into a function so that it can be reused for hotswapping.

else:
state_dict = peft_model_state_dict

if config.peft_type in (
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is unrelated but I wanted to clean this up.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan BenjaminBossan marked this pull request as ready for review October 16, 2024 17:19
Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool work! I have left a couple of comments. Let me know if they make sense.

Comment thread docs/source/_toctree.yml
Comment thread docs/source/package_reference/hotswap.md
Comment thread docs/source/package_reference/hotswap.md Outdated
Comment thread src/peft/utils/hotswap.py
Comment thread src/peft/utils/hotswap.py Outdated
Comment thread src/peft/utils/hotswap.py
# real check: model now behaves again like adapter 0
assert torch.allclose(output0, output_loaded_back0, atol=atol, rtol=rtol)

def test_hotswap_incompatible_config_params_raises(self, tmp_path):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yiyixuxu has very nice PoC of support this to some extent:
huggingface/diffusers#9453 (comment)

Maybe we could leverage that?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenjaminBossan I think this went unnoticed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, sorry, I somehow missed this.

My plan would be to restrict this feature to require same alphas and, when wanting to avoid recompilation, also same rank. I would address those issues in a follow up PR to keep this already big PR from growing even further. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. That works for me.

Then I guess we need to work on that follow-up PR first before making progress in the diffusers PR (huggingface/diffusers#9453).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends. If you think that without these features, it's not useful enough, we should wait to create the right impact.

Regarding the different LoRA sizes, IIUC, it would only work with padding the weights to the largest size. This is not something we can automate, as we don't know the largest size ahead of time.

As for the alphas, we would need to ensure that converting to scalars has no adverse effects on other things, which is why I wanted to exclude this from the PR for now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, okay, thanks for explaining. Yeah, without the support for varied rank LoRAs and alphas, this feature won't have much value in the diffusion world, sadly.

Perhaps we can ship this iteration first and work on supporting varied ranks and alphas afterward.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would be the idea. For now, I've documented the limitations but as YiYi showed, we should hopefully be able to work around them.

Is there anything left to do in this PR?

Comment thread tests/test_initialization.py Outdated
Comment on lines +1775 to +1791
# check that the recompilation message is not present
assert "__recompiles" not in stderr.decode()

# contingency check: without hotswapping, we *do* get recompilation
process = subprocess.Popen(
[sys.executable, file_name, "0"], env=env, stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

# Communicate will read the output and error streams, preventing deadlock
stdout, stderr = process.communicate()
exit_code = process.returncode

# sanity check:
assert exit_code == 0

# check that the recompilation message is not present
assert "__recompiles" in stderr.decode()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests!

Comment thread tests/run_compiled_model_hotswap.py
Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

My main comment is around https://github.com/huggingface/peft/pull/2120/files#r1804391193. LMK if that makes sense.

- It only works for the same PEFT method, so no swapping LoRA and LoHa, for example.
- Right now, only LoRA is properly supported.
- The adapters must be compatible (e.g. same LoRA alpha, same target modules).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add a note saying this is not limited to transformers and works with diffusers, too. But if we wanna wait until huggingface/diffusers#9453 is figured out and merged, I will understand.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a sentence. It should already work with diffusers models when users use the hotswap_adapter function, it's just not natively in diffusers yet, so I'm fine with adding it.

# real check: model now behaves again like adapter 0
assert torch.allclose(output0, output_loaded_back0, atol=atol, rtol=rtol)

def test_hotswap_incompatible_config_params_raises(self, tmp_path):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenjaminBossan I think this went unnoticed?

torch_device = "cuda" if torch.cuda.is_available() else "cpu"


def get_small_unet():
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also add a note saying that currently, it does not work in the full pipeline context when compile is enabled.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Copy Markdown
Member

@sayakpaul sayakpaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your patience! Excellent start!

@BenjaminBossan BenjaminBossan merged commit cff2a45 into huggingface:main Oct 23, 2024
@BenjaminBossan BenjaminBossan deleted the feat-add-hotswap-2 branch October 23, 2024 11:33
Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
The idea of hotswapping an adapter is the following: We can already load
multiple adapters, e.g. two LoRAs, at the same time. But sometimes, we
want to load one LoRA and then replace its weights in-place with the
LoRA weights of another adapter. This is now possible the
hotswap_adapter function.

In general, this should be faster than deleting one adapter and loading
the adapter in its place, which would be the current way to achieve the
same final outcome. Another advantage of hotswapping is that it prevents
re-compilation in case the PEFT model is already compiled. This can save
quite a lot of time.

There are some caveats for hotswapping:

- It only works for the same PEFT method, so no swapping LoRA and LoHa.
- Right now, only LoRA is properly supported.
- The adapters must be compatible (e.g. same LoRA alpha, same target
  modules).
- To avoid recompilation, ranks must be identical

See also huggingface/diffusers#9453
BenjaminBossan added a commit to BenjaminBossan/peft that referenced this pull request Jun 27, 2025
When the diffusers hotswap tests were added to PEFT in huggingface#2120, the
diffusers test was marked as xfail because hotswapping was not yet
implemented in diffusers. This has long been achieved but the test was
not updated.

This PR now updates the diffusers test in PEFT and removes the xfail.
The new test is basically a copy of the corresponding test in diffusers.
Moreover, I enhanced the test according to huggingface#2611 to also ensure that
there are no CUDA graph re-records.
BenjaminBossan added a commit that referenced this pull request Jul 2, 2025
When the diffusers hotswap tests were added to PEFT in #2120, the
diffusers test was marked as xfail because hotswapping was not yet
implemented in diffusers. This has long been achieved but the test was
not updated.

This PR now updates the diffusers test in PEFT and removes the xfail.
The new test is basically a copy of the corresponding test in diffusers.
Moreover, I enhanced the test according to #2611 to also ensure that
there are no CUDA graph re-records.
efraimdahl pushed a commit to efraimdahl/peft that referenced this pull request Jul 12, 2025
When the diffusers hotswap tests were added to PEFT in huggingface#2120, the
diffusers test was marked as xfail because hotswapping was not yet
implemented in diffusers. This has long been achieved but the test was
not updated.

This PR now updates the diffusers test in PEFT and removes the xfail.
The new test is basically a copy of the corresponding test in diffusers.
Moreover, I enhanced the test according to huggingface#2611 to also ensure that
there are no CUDA graph re-records.
cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025
* generalizes vst script

* precommit

* change launch command to use accelerate

* updates docs

* rename to sft_vlm

* fix script location

* fix formatting

* comma

* add model link

* fix name

---------

Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants