Skip to content

Fix the LoRA dropout issue in the Anima model and implement support for network_reg_alphas during LoRA training.#2272

Merged
kohya-ss merged 5 commits intokohya-ss:sd3from
duongve13112002:fix_bug_anima_lora_dropout
Feb 23, 2026
Merged

Fix the LoRA dropout issue in the Anima model and implement support for network_reg_alphas during LoRA training.#2272
kohya-ss merged 5 commits intokohya-ss:sd3from
duongve13112002:fix_bug_anima_lora_dropout

Conversation

@duongve13112002
Copy link
Copy Markdown
Contributor

Hi @kohya-ss

  • During testing, I discovered a bug where training immediately crashes when using rank_dropout.
    The issue is related to the Flux LoRA implementation.

  • This pull request fixes the rank_dropout issue and adds support for network_reg_alphas, allowing different alpha values to be set per module during LoRA training for the Anima model.

@kohya-ss
Copy link
Copy Markdown
Owner

Thank you for the pull request, and sorry for the delayed review.

Regarding network_reg_alphas, this is an interesting idea, but I believe the same effect can be achieved by adjusting the learning rate per module. Adding this new option would expand the hyperparameter search space without a clear benefit, and could lead to confusion for users.

Would it be possible to remove the network_reg_alphas feature from this PR and keep only the rank_dropout fix?

Thank you for your understanding.

@duongve13112002
Copy link
Copy Markdown
Contributor Author

Yeah, I removed it. However, I think it would be great if we could customize network_reg_alphas for training different ranks for each LoRA component.

@kohya-ss
Copy link
Copy Markdown
Owner

Thank you for the quick update!

Just to share the reasoning behind my decision: The effective scaling of LoRA in each module is:

$$\text{effective scale} = \text{lr}_i \times \frac{\alpha_i}{r_i}$$

Since we can already control both lr and r per module, adding per-module alpha does not provide a fundamentally new axis of control — changing alpha can always be replicated by adjusting lr in the opposite direction.

In my understanding, the original motivation for introducing alpha in the LoRA paper was as a stabilization mechanism to eliminate the need for hyperparameter re-tuning when changing rank. With a fixed alpha, you can change rank and keep the rest of your hyperparameters as-is. So I think alpha was not designed as a parameter to be actively tuned per module.

@duongve13112002
Copy link
Copy Markdown
Contributor Author

Oh, I think the code is clear now. Would you mind reviewing it? I agree with you about the alpha rank. If you have any question feel free to ask

@kohya-ss
Copy link
Copy Markdown
Owner

Thank you for update again!

The intention of this process is to match the mask and lx shapes, so the following process may be sufficient.

            # rank dropout
            if self.rank_dropout is not None and self.training:
                # Simplify to avoid unsqueeze operations
                mask_shape = [lx.size(0)] + [1] * (len(lx.size()) - 2) + [self.lora_dim]
                mask = torch.rand(mask_shape, device=lx.device) > self.rank_dropout
                lx = lx * mask

@duongve13112002
Copy link
Copy Markdown
Contributor Author

I'm not sure. In my opinion, we should separate it, because if lora_down is a Conv2d layer, the training may be break.

@kohya-ss
Copy link
Copy Markdown
Owner

Oh, I see. Sorry, I was wrong. You're right. Conv2d is channel first.

@kohya-ss kohya-ss merged commit 609d129 into kohya-ss:sd3 Feb 23, 2026
3 checks passed
kohya-ss added a commit that referenced this pull request Feb 23, 2026
kohya-ss added a commit that referenced this pull request Feb 23, 2026
* feat: Add LoHa/LoKr network support for SDXL and Anima

- networks/network_base.py: shared AdditionalNetwork base class with architecture auto-detection (SDXL/Anima) and generic module injection
- networks/loha.py: LoHa (Low-rank Hadamard Product) module with HadaWeight custom autograd, training/inference classes, and factory functions
- networks/lokr.py: LoKr (Low-rank Kronecker Product) module with factorization, training/inference classes, and factory functions
- library/lora_utils.py: extend weight merge hook to detect and merge LoHa/LoKr weights alongside standard LoRA

Linear and Conv2d 1x1 layers only; Conv2d 3x3 (Tucker decomposition) support will be added separately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: Enhance LoHa and LoKr modules with Tucker decomposition support

- Added Tucker decomposition functionality to LoHa and LoKr modules.
- Implemented new methods for weight rebuilding using Tucker decomposition.
- Updated initialization and weight handling for Conv2d 3x3+ layers.
- Modified get_diff_weight methods to accommodate Tucker and non-Tucker modes.
- Enhanced network base to include unet_conv_target_modules for architecture detection.

* fix: rank dropout handling in LoRAModule for Conv2d and Linear layers, see #2272 for details

* doc: add dtype comment for load_safetensors_with_lora_and_fp8 function

* fix: enhance architecture detection to support InferSdxlUNet2DConditionModel for gen_img.py

* doc: update model support structure to include Lumina Image 2.0, HunyuanImage-2.1, and Anima-Preview

* doc: add documentation for LoHa and LoKr fine-tuning methods

* Update networks/network_base.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update docs/loha_lokr.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: refactor LoHa and LoKr imports for weight merging in load_safetensors_with_lora_and_fp8 function

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants