Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
younesbelkada
left a comment
There was a problem hiding this comment.
Thanks ! I left few open questions - what do you think?
| if adapter in target.lora_A or adapter in target.lora_embedding_A: | ||
| valid_adapters.append(adapter) | ||
| valid_weights.append(weight) | ||
| valid_weights.append(weight * target.scaling[adapter]) |
There was a problem hiding this comment.
is this change somehow breaking?
There was a problem hiding this comment.
Hello Younes, this should be the correct usage, earlier the implementation was incorrect as it was missing scaling factor.
sayakpaul
left a comment
There was a problem hiding this comment.
Left some nits.
I really like how you have broken down the core merging methods in terms of small logical components that are shared across.
I think it definitely makes sense to also add a detailed guide about these methods on the PEFT doc site. But this can come later. Cc: @MKhalusova @stevhliu
stevhliu
left a comment
There was a problem hiding this comment.
Really nice!
Do you think it'd also be nice to create a separate API reference page for these merging utilities so it's easy for users to find? I can work on this in a separate PR in addition to the guide suggested by @sayakpaul :)
|
Hi @pacman100 @younesbelkada @stevhliu @sayakpaul, I am Prateek Yadav, the author of TIES-Merging. Let me know if there is some way I can contribute to adding these merging methods to the Transformers or the PEFT library. Also, I agree with others here that having some sort of changes to the documentation or updates to the README can help the users to be aware of these merging methods, and how to use them. Let me know if there is some for me help! Thanks, |
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Hi @sayakpaul, I guess this is the pull request trying to document these merging methods |
|
Ah, you're correct. Thanks for the mention. |
|
Hi @sayakpaul @pacman100, is there a timeline on when this is expected to be merged? |
|
I think we can merge now. @younesbelkada WDYT? |
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks a lot @pacman100 for this excellent PR. Super well done and clean implementation + refactor!
In my review, I didn't check the correctness of the implementations in detail, since the original authors already did that and are much more qualified for this. Instead, I focused on the other parts, please take a look. Most things should be fairly easy adjustments.
Regarding the new functions in merge_utils.py, I think it will be good to add some unit tests for those in a future PR.
| # be careful, because output adapter rank may be really big if mixing a lot of adapters | ||
| new_rank = sum(adapters_ranks) | ||
| elif combination_type == "svd": | ||
| elif "svd" in combination_type: |
There was a problem hiding this comment.
I would prefer:
| elif "svd" in combination_type: | |
| elif (combination_type == "svd") or (combination_type == "ties_svd"): |
This is to prevent potential bugs in the future where we add a new combination type with a strange name that has "svd" as a substring.
There was a problem hiding this comment.
but it is also meant for dare_linear_svd, dare_ties_svd too
There was a problem hiding this comment.
Oh I see. So maybe let's check all these options explicitly, or use if combination_type.endswith("svd"), which should be less error prone.
| return sign == majority_sign | ||
|
|
||
|
|
||
| def disjoint_merge(task_tensors: torch.Tensor, majority_sign_mask: torch.Tensor) -> torch.Tensor: |
There was a problem hiding this comment.
task_tensors is not a list of tensors but a tensor, right? Below, task_tensors is always a list of tensors. I wonder if it would make sense to use a different variable name for this function to avoid confusion.
|
Hi @pacman100 and @BenjaminBossan, I went into detail over the merge_utils file and left some comments, I feel like some parts need to be corrected and might have bugs if I am not missing something. I think I accidentally left many comments as opposed to reviewing but I guess that should be fine. |
Co-Authored-By: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com> Co-Authored-By: Yu Le <55241218+yule-buaa@users.noreply.github.com> Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for addressing the issues from my side. This LGTM now. One small comment left, but it's not a must.
| # be careful, because output adapter rank may be really big if mixing a lot of adapters | ||
| new_rank = sum(adapters_ranks) | ||
| elif combination_type == "svd": | ||
| elif "svd" in combination_type: |
There was a problem hiding this comment.
Oh I see. So maybe let's check all these options explicitly, or use if combination_type.endswith("svd"), which should be less error prone.
|
Hi @pacman100, Thanks for working on this and merging this pull request. As we discussed earlier, I was wondering if this would be announced as a part of the next PEFT version or if should I just announce it on my Twitter right away. I feel like it would be much better to announce it with the next PEFT version, however, I am not sure when that would be. Moreover, the PR on adding the docs is still ongoing. Thanks, |
|
Hey Prateek! @pacman100 and I are working on a blog post to discuss this which we plan to release very soon. We will of course give everyone the due credits there. I think it makes sense to also do a release for this feature so that users don't have to install Does that work for you? |
|
Hi @sayakpaul, yes that sounds good to me looking forward to it. I will tweet about this once you do this release. Moreover, if you someone to proof read the blog post then I would be happy to help you guys with it. Prateek |
|
Hi @pacman100 and @sayakpaul, Le Yu |
|
Appreciate all the support. We will keep you posted. |
|
|
||
| def ties( | ||
| task_tensors: List[torch.Tensor], | ||
| weights: torch.Tensor, |
There was a problem hiding this comment.
Hi @pacman100 @sayakpaul, I have just one last suggestion. It would be ideal to set the default weight for TIES to all ones because that's a setting we experimented with in our paper and kind of works well if people do not want to try out different values. You can look at the results with the red cross in this table, they are with default weights of all ones. This makes TIES as simple to use as basic averaging by avoiding the necessity to select the weights.
There was a problem hiding this comment.
So by default TIES should have 20% of the parameters remaining and mixing weight = 1 for each peft module.
* add code * update docstring * quality * fix test * fix test * fix svd embedding layer merging * fixes * fixes * Update model.py * Add test and example * quality * fix tests * update the example * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * address comments * address comments and add co-authors Co-Authored-By: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com> Co-Authored-By: Yu Le <55241218+yule-buaa@users.noreply.github.com> Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * quality * Update merge_utils.py * revert * address comments * address comment --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com> Co-authored-by: Yu Le <55241218+yule-buaa@users.noreply.github.com> Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* add code * update docstring * quality * fix test * fix test * fix svd embedding layer merging * fixes * fixes * Update model.py * Add test and example * quality * fix tests * update the example * Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * address comments * address comments and add co-authors Co-Authored-By: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com> Co-Authored-By: Yu Le <55241218+yule-buaa@users.noreply.github.com> Co-Authored-By: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> * quality * Update merge_utils.py * revert * address comments * address comment --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: Prateek Yadav <15224633+prateeky2806@users.noreply.github.com> Co-authored-by: Yu Le <55241218+yule-buaa@users.noreply.github.com> Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

What does this PR do?
ties,dare_linear,dare_ties,ties_svd,dare_linear_svd,dare_ties_svd.Example of

ties_svdis shown below (https://github.com/pacman100/peft-dreambooth-ui/blob/main/lora_merging.ipynb):LLM LoRA merging example:

To do: