FEAT Add hotswapping functionality by BenjaminBossan · Pull Request #2120 · huggingface/peft

BenjaminBossan · 2024-10-01T11:10:25Z

The idea of hotswapping an adapter is the following: We can already load multiple adapters, e.g. two LoRAs, at the same time. But sometimes, we want to load one LoRA and then replace its weights in-place with the LoRA weights of another adapter. This is now possible the hotswap_adapter function.

In general, this should be faster than deleting one adapter and loading the adapter in its place, which would be the current way to achieve the same final outcome. Another advantage of hotswapping is that it prevents re-compilation in case the PEFT model is already compiled. This can save quite a lot of time.

There are some caveats for hotswapping:

It only works for the same PEFT method, so no swapping LoRA and LoHa.
Right now, only LoRA is properly supported.
The adapters must be compatible (e.g. same LoRA alpha, same target modules).

See also huggingface/diffusers#9453 The idea of hotswapping an adapter is the following: We can already load multiple adapters, e.g. two LoRAs, at the same time. But sometimes, we want to load one LoRA and then replace its weights in-place with the LoRA weights of another adapter. This is now possible the hotswap_adapter function. In general, this should be faster than deleting one adapter and loading the adapter in its place, which would be the current way to achieve the same final outcome. Another advantage of hotswapping is that it prevents re-compilation in case the PEFT model is already compiled. This can save quite a lot of time. There are some caveats for hotswapping: - It only works for the same PEFT method, so no swapping LoRA and LoHa. - Right now, only LoRA is properly supported. - The adapters must be compatible (e.g. same LoRA alpha, same target modules).

BenjaminBossan · 2024-10-01T11:11:11Z

    return peft_model_state_dict, mismatched


+def _insert_adapter_name_into_state_dict(


This is the same code as before, but factored out into a function so that it can be reused for hotswapping.

BenjaminBossan · 2024-10-01T11:11:31Z

    else:
        state_dict = peft_model_state_dict

-    if config.peft_type in (


This change is unrelated but I wanted to clean this up.

HuggingFaceDocBuilderDev · 2024-10-01T11:13:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

pytest -s not working??

sayakpaul

Very cool work! I have left a couple of comments. Let me know if they make sense.

sayakpaul · 2024-10-17T09:02:51Z

+        # real check: model now behaves again like adapter 0
+        assert torch.allclose(output0, output_loaded_back0, atol=atol, rtol=rtol)
+
+    def test_hotswap_incompatible_config_params_raises(self, tmp_path):


@yiyixuxu has very nice PoC of support this to some extent:
huggingface/diffusers#9453 (comment)

Maybe we could leverage that?

@BenjaminBossan I think this went unnoticed?

Ah yes, sorry, I somehow missed this.

My plan would be to restrict this feature to require same alphas and, when wanting to avoid recompilation, also same rank. I would address those issues in a follow up PR to keep this already big PR from growing even further. WDYT?

Alright. That works for me.

Then I guess we need to work on that follow-up PR first before making progress in the diffusers PR (huggingface/diffusers#9453).

I guess it depends. If you think that without these features, it's not useful enough, we should wait to create the right impact.

Regarding the different LoRA sizes, IIUC, it would only work with padding the weights to the largest size. This is not something we can automate, as we don't know the largest size ahead of time.

As for the alphas, we would need to ensure that converting to scalars has no adverse effects on other things, which is why I wanted to exclude this from the PR for now.

Oh, okay, thanks for explaining. Yeah, without the support for varied rank LoRAs and alphas, this feature won't have much value in the diffusion world, sadly.

Perhaps we can ship this iteration first and work on supporting varied ranks and alphas afterward.

Yes, that would be the idea. For now, I've documented the limitations but as YiYi showed, we should hopefully be able to work around them.

Is there anything left to do in this PR?

sayakpaul · 2024-10-17T09:05:20Z

+        # check that the recompilation message is not present
+        assert "__recompiles" not in stderr.decode()
+
+        # contingency check: without hotswapping, we *do* get recompilation
+        process = subprocess.Popen(
+            [sys.executable, file_name, "0"], env=env, stdout=subprocess.PIPE, stderr=subprocess.PIPE
+        )
+
+        # Communicate will read the output and error streams, preventing deadlock
+        stdout, stderr = process.communicate()
+        exit_code = process.returncode
+
+        # sanity check:
+        assert exit_code == 0
+
+        # check that the recompilation message is not present
+        assert "__recompiles" in stderr.decode()


Nice tests!

Not supported for now.

Equivalent to test in diffusers #9453

Marker needs to be removed when diffusers merges the hotswap feature.

sayakpaul

Thanks!

My main comment is around https://github.com/huggingface/peft/pull/2120/files#r1804391193. LMK if that makes sense.

sayakpaul · 2024-10-21T13:24:23Z

+- It only works for the same PEFT method, so no swapping LoRA and LoHa, for example.
+- Right now, only LoRA is properly supported.
+- The adapters must be compatible (e.g. same LoRA alpha, same target modules).
+


Could add a note saying this is not limited to transformers and works with diffusers, too. But if we wanna wait until huggingface/diffusers#9453 is figured out and merged, I will understand.

I added a sentence. It should already work with diffusers models when users use the hotswap_adapter function, it's just not natively in diffusers yet, so I'm fine with adding it.

sayakpaul · 2024-10-21T13:26:01Z

+        # real check: model now behaves again like adapter 0
+        assert torch.allclose(output0, output_loaded_back0, atol=atol, rtol=rtol)
+
+    def test_hotswap_incompatible_config_params_raises(self, tmp_path):


@BenjaminBossan I think this went unnoticed?

sayakpaul · 2024-10-21T13:27:43Z

+torch_device = "cuda" if torch.cuda.is_available() else "cpu"
+
+
+def get_small_unet():


Could also add a note saying that currently, it does not work in the full pipeline context when compile is enabled.

sayakpaul

Thanks for your patience! Excellent start!

The idea of hotswapping an adapter is the following: We can already load multiple adapters, e.g. two LoRAs, at the same time. But sometimes, we want to load one LoRA and then replace its weights in-place with the LoRA weights of another adapter. This is now possible the hotswap_adapter function. In general, this should be faster than deleting one adapter and loading the adapter in its place, which would be the current way to achieve the same final outcome. Another advantage of hotswapping is that it prevents re-compilation in case the PEFT model is already compiled. This can save quite a lot of time. There are some caveats for hotswapping: - It only works for the same PEFT method, so no swapping LoRA and LoHa. - Right now, only LoRA is properly supported. - The adapters must be compatible (e.g. same LoRA alpha, same target modules). - To avoid recompilation, ranks must be identical See also huggingface/diffusers#9453

When the diffusers hotswap tests were added to PEFT in huggingface#2120, the diffusers test was marked as xfail because hotswapping was not yet implemented in diffusers. This has long been achieved but the test was not updated. This PR now updates the diffusers test in PEFT and removes the xfail. The new test is basically a copy of the corresponding test in diffusers. Moreover, I enhanced the test according to huggingface#2611 to also ensure that there are no CUDA graph re-records.

When the diffusers hotswap tests were added to PEFT in #2120, the diffusers test was marked as xfail because hotswapping was not yet implemented in diffusers. This has long been achieved but the test was not updated. This PR now updates the diffusers test in PEFT and removes the xfail. The new test is basically a copy of the corresponding test in diffusers. Moreover, I enhanced the test according to #2611 to also ensure that there are no CUDA graph re-records.

When the diffusers hotswap tests were added to PEFT in huggingface#2120, the diffusers test was marked as xfail because hotswapping was not yet implemented in diffusers. This has long been achieved but the test was not updated. This PR now updates the diffusers test in PEFT and removes the xfail. The new test is basically a copy of the corresponding test in diffusers. Moreover, I enhanced the test according to huggingface#2611 to also ensure that there are no CUDA graph re-records.

* generalizes vst script * precommit * change launch command to use accelerate * updates docs * rename to sft_vlm * fix script location * fix formatting * comma * add model link * fix name --------- Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>

BenjaminBossan commented Oct 1, 2024

View reviewed changes

BenjaminBossan added 8 commits October 1, 2024 14:32

Skip compile tests if not on Linux

0f4cb22

Merge branch 'main' into feat-add-hotswap-2

352a80a

Test for recompilation

992a108

CI debugging

ff7060b

More debugging

aeb4697

pytest -s not working??

More debugging

1338c0e

More debug

7bada5d

Remove debugging code

3a995ea

BenjaminBossan marked this pull request as ready for review October 16, 2024 17:19

BenjaminBossan requested a review from sayakpaul October 16, 2024 17:19

sayakpaul mentioned this pull request Oct 17, 2024

[LoRA] Implement hot-swapping of LoRA huggingface/diffusers#9453

Merged

6 tasks

sayakpaul reviewed Oct 17, 2024

View reviewed changes

BenjaminBossan added 8 commits October 17, 2024 14:47

Add example code for hotswapping

0f179ce

Add check for LoRA, as others are not supported

51f2b95

Remove obsolete comment

075621d

Make style

97eea73

Remove test for swapping IA3<->IA3

510f5c3

Not supported for now.

Add test for hotswapping compiled diffusers unet

a711472

Equivalent to test in diffusers #9453

Debug github CI failure

37366fc

Roll back debug code, mark test as xfail

f5b2270

Marker needs to be removed when diffusers merges the hotswap feature.

BenjaminBossan requested a review from sayakpaul October 18, 2024 16:49

sayakpaul reviewed Oct 21, 2024

View reviewed changes

Reviewer feedback: docs, comments

1536f7d

sayakpaul approved these changes Oct 23, 2024

View reviewed changes

BenjaminBossan merged commit cff2a45 into huggingface:main Oct 23, 2024

BenjaminBossan deleted the feat-add-hotswap-2 branch October 23, 2024 11:33

BenjaminBossan mentioned this pull request Jun 27, 2025

TST Update diffusers hotswap tests #2619

Merged

		return peft_model_state_dict, mismatched


		def _insert_adapter_name_into_state_dict(

		torch_device = "cuda" if torch.cuda.is_available() else "cpu"


		def get_small_unet():

Conversation

BenjaminBossan commented Oct 1, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 1, 2024

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants