FIX: Avoid CUDA Graph re-record when hotswapping LoRAs. by yemreck · Pull Request #2611 · huggingface/peft

yemreck · 2025-06-25T12:02:35Z

Motivation:

In some cases, recorded CUDA Graph gets silently invalided if TORCH_LOGS="perf_hints" not provided, due to data pointer changes.

Torch log:

static input data pointer changed.
...
input name: arg17_1. data pointer changed from x to y. input stack trace:
...
  File "/opt/conda/lib/python3.11/site-packages/diffusers/models/resnet.py", line 346, in forward
    temb = self.time_emb_proj(temb)[:, :, None, None]
  File "/opt/conda/lib/python3.11/site-packages/peft/tuners/lora/layer.py", line 727, in forward
    result = result + lora_B(lora_A(dropout(x))) * scaling
...

This leads to an unexpected CUDA Graph re-record.

We can avoid this by copying LoRA weights in place without changing underlying tensor itself. I didn't see any performance regressions with in place copy.

yemreck · 2025-06-25T12:21:46Z

todo:
tests don’t cover this case yet. didnt have time to make an isolated repro.

BenjaminBossan

Thanks for this PR. I'm not an expert in all the intricacies of torch.compile, so this is very welcome. If you have an example of where this helps, please share it and we can work on a unit test.

yemreck · 2025-06-26T13:23:21Z

It should help all re-record cases related to weight change but even current tests have some weird re-record with current pr (cudagraphs so finicky to get it working properly). If you profile test case with torch.profiler, you should see "CUDAGraphs.record_function" only once but we see it twice.

My case is in company code so ill update the pr once I manage to get a minimal working repro.

BenjaminBossan · 2025-06-26T14:49:45Z

My case is in company code so ill update the pr once I manage to get a minimal working repro.

Thanks for effort.

sayakpaul · 2025-06-26T15:32:19Z

Thanks very much for the PR.

If you profile test case with torch.profiler, you should see "CUDAGraphs.record_function" only once but we see it twice.

You meant with this PR, we see it only once? 👁️ The PR looks good to me. We should merge after a unit-test.

@anijain2305 if you have comments.

anijain2305 · 2025-06-26T16:30:31Z

@zou3519 on improving the ux for cudagraphs

yemreck · 2025-06-27T00:07:32Z

Turns out, we just needed a couple of configs to raise error on static input changes. also updated tests to include a third forward pass since first forward pass always going to be run_eager to warmup.

Current main

================================================ short test summary info ================================================
FAILED tests/test_gpu_examples.py::TestHotSwapping::test_hotswapping_compiled_model_does_not_trigger_recompilation[ranks0] - RuntimeError: static input data pointer changed.
FAILED tests/test_gpu_examples.py::TestHotSwapping::test_hotswapping_compiled_model_does_not_trigger_recompilation[ranks2] - RuntimeError: static input data pointer changed.

pr

tests/test_gpu_examples.py::TestHotSwapping::test_hotswapping_compiled_model_does_not_trigger_recompilation[ranks0] PASSED [ 33%]
tests/test_gpu_examples.py::TestHotSwapping::test_hotswapping_compiled_model_does_not_trigger_recompilation[ranks1] PASSED [ 66%]
tests/test_gpu_examples.py::TestHotSwapping::test_hotswapping_compiled_model_does_not_trigger_recompilation[ranks2] PASSED [100%]

yemreck · 2025-06-27T00:17:02Z

Thanks very much for the PR.

If you profile test case with torch.profiler, you should see "CUDAGraphs.record_function" only once but we see it twice.

You meant with this PR, we see it only once? 👁️ The PR looks good to me. We should merge after a unit-test.

@anijain2305 if you have comments.

Yes, with this PR, CUDAGraph.record_function will be called only once. Third and later calls will be replayed.

This improves initial forward call after hotswap greatly in scenarios where LoRas swapped frequently

HuggingFaceDocBuilderDev · 2025-06-27T05:20:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan

Thanks a lot @keepdying for identifying this torch.compile issue, fixing it, and extending the test case to cover it. I tested this locally (PR CI does not use GPU) and the test case indeed fails without the fix and works with it.

The failing CI is unrelated, so this PR is good to be merged.

yemreck · 2025-06-27T09:39:57Z

Glad I could help. Thanks for this great library

When the diffusers hotswap tests were added to PEFT in huggingface#2120, the diffusers test was marked as xfail because hotswapping was not yet implemented in diffusers. This has long been achieved but the test was not updated. This PR now updates the diffusers test in PEFT and removes the xfail. The new test is basically a copy of the corresponding test in diffusers. Moreover, I enhanced the test according to huggingface#2611 to also ensure that there are no CUDA graph re-records.

sayakpaul · 2025-06-27T10:27:07Z

@BenjaminBossan do we wanna propagate this to diffusers too?

BenjaminBossan · 2025-06-27T10:49:10Z

@sayakpaul The fix should not need propagating, so I assume you mean the tests should be updated. Yes, indeed, I was starting with updating the diffusers test in PEFT: #2619. If that is good, we can apply the same logic to the diffusers tests.

When the diffusers hotswap tests were added to PEFT in #2120, the diffusers test was marked as xfail because hotswapping was not yet implemented in diffusers. This has long been achieved but the test was not updated. This PR now updates the diffusers test in PEFT and removes the xfail. The new test is basically a copy of the corresponding test in diffusers. Moreover, I enhanced the test according to #2611 to also ensure that there are no CUDA graph re-records.

When the diffusers hotswap tests were added to PEFT in huggingface#2120, the diffusers test was marked as xfail because hotswapping was not yet implemented in diffusers. This has long been achieved but the test was not updated. This PR now updates the diffusers test in PEFT and removes the xfail. The new test is basically a copy of the corresponding test in diffusers. Moreover, I enhanced the test according to huggingface#2611 to also ensure that there are no CUDA graph re-records.

FIX: inplace copy to avoid cudagraph recomp

d5c3ed9

BenjaminBossan reviewed Jun 25, 2025

View reviewed changes

Comment thread tests/test_gpu_examples.py

TST add cudagraph re-record case in hotswap test

a8c0685

yemreck changed the title ~~FIX: Avoid possible CUDA Graph re-record when hotswapping LoRAs.~~ FIX: Avoid CUDA Graph re-record when hotswapping LoRAs. Jun 27, 2025

sayakpaul approved these changes Jun 27, 2025

View reviewed changes

BenjaminBossan approved these changes Jun 27, 2025

View reviewed changes

BenjaminBossan merged commit bbc9f5d into huggingface:main Jun 27, 2025
2 of 14 checks passed

BenjaminBossan mentioned this pull request Jun 27, 2025

TST Update diffusers hotswap tests #2619

Merged

efraimdahl pushed a commit to efraimdahl/peft that referenced this pull request Jul 12, 2025

FIX Avoid CUDA Graph re-record with hotswap (huggingface#2611)

17acf0f

cyyever pushed a commit to cyyever/peft that referenced this pull request Sep 4, 2025

👋 Drop MDX (huggingface#2611)

949db23

Conversation

yemreck commented Jun 25, 2025

Uh oh!

yemreck commented Jun 25, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yemreck commented Jun 26, 2025

Uh oh!

BenjaminBossan commented Jun 26, 2025

Uh oh!

sayakpaul commented Jun 26, 2025

Uh oh!

anijain2305 commented Jun 26, 2025

Uh oh!

yemreck commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yemreck commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 27, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yemreck commented Jun 27, 2025

Uh oh!

sayakpaul commented Jun 27, 2025

Uh oh!

BenjaminBossan commented Jun 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yemreck commented Jun 27, 2025 •

edited

Loading

yemreck commented Jun 27, 2025 •

edited

Loading