Fix the LoRA dropout issue in the Anima model and implement support for network_reg_alphas during LoRA training.#2272
Conversation
|
Thank you for the pull request, and sorry for the delayed review. Regarding Would it be possible to remove the Thank you for your understanding. |
|
Yeah, I removed it. However, I think it would be great if we could customize network_reg_alphas for training different ranks for each LoRA component. |
|
Thank you for the quick update! Just to share the reasoning behind my decision: The effective scaling of LoRA in each module is: Since we can already control both lr and r per module, adding per-module alpha does not provide a fundamentally new axis of control — changing alpha can always be replicated by adjusting lr in the opposite direction. In my understanding, the original motivation for introducing alpha in the LoRA paper was as a stabilization mechanism to eliminate the need for hyperparameter re-tuning when changing rank. With a fixed alpha, you can change rank and keep the rest of your hyperparameters as-is. So I think alpha was not designed as a parameter to be actively tuned per module. |
|
Oh, I think the code is clear now. Would you mind reviewing it? I agree with you about the alpha rank. If you have any question feel free to ask |
|
Thank you for update again! The intention of this process is to match the mask and lx shapes, so the following process may be sufficient. |
|
I'm not sure. In my opinion, we should separate it, because if lora_down is a Conv2d layer, the training may be break. |
|
Oh, I see. Sorry, I was wrong. You're right. Conv2d is channel first. |
* feat: Add LoHa/LoKr network support for SDXL and Anima - networks/network_base.py: shared AdditionalNetwork base class with architecture auto-detection (SDXL/Anima) and generic module injection - networks/loha.py: LoHa (Low-rank Hadamard Product) module with HadaWeight custom autograd, training/inference classes, and factory functions - networks/lokr.py: LoKr (Low-rank Kronecker Product) module with factorization, training/inference classes, and factory functions - library/lora_utils.py: extend weight merge hook to detect and merge LoHa/LoKr weights alongside standard LoRA Linear and Conv2d 1x1 layers only; Conv2d 3x3 (Tucker decomposition) support will be added separately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: Enhance LoHa and LoKr modules with Tucker decomposition support - Added Tucker decomposition functionality to LoHa and LoKr modules. - Implemented new methods for weight rebuilding using Tucker decomposition. - Updated initialization and weight handling for Conv2d 3x3+ layers. - Modified get_diff_weight methods to accommodate Tucker and non-Tucker modes. - Enhanced network base to include unet_conv_target_modules for architecture detection. * fix: rank dropout handling in LoRAModule for Conv2d and Linear layers, see #2272 for details * doc: add dtype comment for load_safetensors_with_lora_and_fp8 function * fix: enhance architecture detection to support InferSdxlUNet2DConditionModel for gen_img.py * doc: update model support structure to include Lumina Image 2.0, HunyuanImage-2.1, and Anima-Preview * doc: add documentation for LoHa and LoKr fine-tuning methods * Update networks/network_base.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update docs/loha_lokr.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: refactor LoHa and LoKr imports for weight merging in load_safetensors_with_lora_and_fp8 function --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Hi @kohya-ss
During testing, I discovered a bug where training immediately crashes when using rank_dropout.
The issue is related to the Flux LoRA implementation.
This pull request fixes the rank_dropout issue and adds support for network_reg_alphas, allowing different alpha values to be set per module during LoRA training for the Anima model.