Skip to content

Possible fix of wrong scale in weight decomposition#16151

Merged
AUTOMATIC1111 merged 1 commit intodevfrom
dora-fix
Jul 6, 2024
Merged

Possible fix of wrong scale in weight decomposition#16151
AUTOMATIC1111 merged 1 commit intodevfrom
dora-fix

Conversation

@KohakuBlueleaf
Copy link
Copy Markdown
Collaborator

Description

Should resolve this: Comfy-Org/ComfyUI#3922

Checklist:

@ntc-ai
Copy link
Copy Markdown

ntc-ai commented Jul 5, 2024

Hi, this patch is wrong.

This variable is used by model authors to normalize the trained LoRA/DoRA from -1 to 1. With this patch the DoRA breaks when alpha changes instead of normalizing.

More: KohakuBlueleaf/LyCORIS#196

@KohakuBlueleaf
Copy link
Copy Markdown
Collaborator Author

KohakuBlueleaf commented Jul 5, 2024

it breaks bcuz it is not how alpha works.
I'm maintaining correct math formula.
Not maintaining what you want to do.
You can create an extension to achieve your goal.

@ntc-ai
Copy link
Copy Markdown

ntc-ai commented Jul 5, 2024

What is this useful for?
Are there references to the correct math formula?
There is no alpha in the DoRA whitepaper.

@KohakuBlueleaf
Copy link
Copy Markdown
Collaborator Author

KohakuBlueleaf commented Jul 5, 2024

What is this useful for? Are there references to the correct math formula? There is no alpha in the DoRA whitepaper.

wrong formula: W + scale * alpha/dim * (wd(W + BA) - W)
correct formula: W + scale * (wd(W + alpha/dim * BA) - W)

I think this is very intuitive that alpha/dim should not affect the "- W" part.

@KohakuBlueleaf
Copy link
Copy Markdown
Collaborator Author

Also that's how training of DoRA in my repo works.
I just reproduce the same formula of it.

@AUTOMATIC1111 AUTOMATIC1111 merged commit 372a8e0 into dev Jul 6, 2024
@AUTOMATIC1111 AUTOMATIC1111 deleted the dora-fix branch July 6, 2024 06:48
@ntc-ai
Copy link
Copy Markdown

ntc-ai commented Jul 6, 2024

Hi, this PR should be reverted. Here are the reasons why:

  • With this PR, changing alpha breaks the model, even setting it to zero. This is different than how LoRAs work where alpha=0 turns off the LoRA.
  • There is no alpha in the whitepaper
  • This PR breaks existing trained DoRAs that have modified alpha based on the original code
  • alpha is not trainable and (alpha/dim) is 1.0 in most cases
  • I brought attention to an issue in the referenced implementation, not a1111. This code was correct before the changes.

@KohakuBlueleaf
Copy link
Copy Markdown
Collaborator Author

KohakuBlueleaf commented Jul 6, 2024

Hi, this PR should be reverted. Here are the reasons why:

  • With this PR, changing alpha breaks the model, even setting it to zero. This is different than how LoRAs work where alpha=0 turns off the LoRA.
  • There is no alpha in the whitepaper
  • This PR breaks existing trained DoRAs that have modified alpha based on the original code
  • alpha is not trainable and (alpha/dim) is 1.0 in most cases
  • I brought attention to an issue in the referenced implementation, not a1111. This code was correct before the changes.
  1. That's not how alpha works. DoRA never ensure it will not affect anything when BA part is all zero
  2. You should never modified alpha after training for all the models. If you want to do any special weight scaling. modify them with your custom code and equation.
  3. This code was NOT correct before the changes. Already give you the formula.

@ntc-ai
Copy link
Copy Markdown

ntc-ai commented Jul 7, 2024

  1. This code has been live for 4 months. The change you're implementing breaks functionality for anyone who has been using it during this time.

  2. You are making a breaking change without anyone requesting it - in fact I'm requesting the opposite!

  3. There's no external reference or evidence supporting your 'correct' formula. It seems to be based solely on your own intuition.

  4. Several important points and evidence towards the prior solution being correct remain unaddressed.

  5. In this formulation, alpha has no use. The best case I see is for gradient scaling but there are better ways to do that.

I hope you can understand the impact of these changes on users.

If you have the 'special weight scaling' code, feel free to share it. I don't think its possible in this formulation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants