LoRA training QoL improvements: UI progress bar, deterministic seeding, make gradient checkpointing optional#8668
LoRA training QoL improvements: UI progress bar, deterministic seeding, make gradient checkpointing optional#8668spacepxl wants to merge 1 commit intoComfy-Org:masterfrom
Conversation
|
@KohakuBlueleaf what do you think? |
Not sure the UI part, but others are easy. Will do them after I finish the refactor (which will affect seed) |
@comfyanonymous Maybe you can consider to merge this at first than I will resolve conflict on my PR |
|
Hi @spacepxl |

Adding the UI progress bar allows users to see the training progress in the UI (obviously) but also makes it possible to cancel training.
Gradient checkpointing, especially with so many checkpoints, is computationally expensive and not necessary if memory isn't a constraint. I left it enabled by default but disabling it is a free speed boost:
As for seeding, I replaced the unused generator and instead temporarily store the global RNG states, seed everything, then restore after training is finished. This seeds the weight initialization without needing to pass a generator function all over the place. The RNG of weight initialization is pretty significant, if it's allowed to be random then workflows which directly incorporate lora training instead of loading a trained file would be impossible to reproduce. It also seeds timestep sampling, which is the main factor driving training loss at small batch sizes.
With this change, fp32 training is now fully deterministic, although bf16 training is still partially nondeterministic, and I wasn't able to track down the cause of that. I'm guessing it could be related to stochastic rounding?