Save/loading AdamW optimizer (for hypernetworks)#3975
Save/loading AdamW optimizer (for hypernetworks)#3975AUTOMATIC1111 merged 16 commits intoAUTOMATIC1111:masterfrom
Conversation
|
Now this is just a thought, but would it be possible to save the optimizer state in a separate file next to the hypernetwork? That would remove the need to prune the file afterwards. Optimizer "sidecars" might also enable reuse of an optimizer state when restarting from scratch? Please disregard if that would not work, I'm not familiar with the technical details. |
|
is there any demonstration of the beneft this brings |
|
@AUTOMATIC1111 Yes, AdamW (and Momentum based optimizers) uses adaptive learning rate, which is estimated from its momentum. If we try to start from zero, AdamW will use given pure learning rate, and observe the effect. If we resume properly, AdamW uses given trajectory, generally lower learning rate. Loading optimizer state does not make training deterministic, since it still uses random when its trying to avoid local minima. Multiple users in discord reported that HN training continuing from saved checkpoints does not works well, especially at beginning, HN sometimes tends to 'die' quickly for some reason. Also its is known that generally lowering learning rate down at 'resuming' helps it. This should be an optimizer's fault - from not knowing anything about its previous trajectory. Here's a general discussion about loading optimizers. With this patch, I never observed drastic style change or transition of preview images like before. But this might mean that someone might want to nuke optimizer state to intentionally trigger style change / transition? |
|
Someone suggested me that if we can have separate file for optimizer, which can be distributed separately. |
|
Finished implementing and testing. Files will be saved separately as *.pt and *.pt.optim file. To use valid optimizer, it will now use hash value itself. Closes #4048 |
|
Temporarily closed for resolving conflict |
|
Finished testing again:
|
|
I think there isn't a scenario where the user would want to put optimizer opts into checkpoint itself when saving to separate file exists. I don't want to have useless options so please remove that and the code to support it. |
|
Yeah I removed that option (merging optimizer in checkpoint itself). Now its only saving it into separate |
|
@aria1th Whenever you load a hypernetwork that also has a .optim file in the hypernetwork directory, the UI says that it not only loads the hypernetwork, but also the optimizer. Is this intended? I'm not trying to resume training with the optimizer, I'm just selecting the hypernetwork for normal inference. |



Closes #3894
since its log is messed up.
Optimizers, especially Adam and its variants are recommended to save and load its state.
This patch offers way to save / load optimizer state, also supports for user-selected optimizer types, such as "SGD", "Adam", etc.
If Selecting optimizer type is enabled, this line has to be changed for safety:
if hypernetwork.optimizer_state_dict:to, whatever like
if hypernetwork.optimizer_name == hypernetwork_optimizer_type and hypernetwork.optimizer_state_dictto prevent loading wrong state dict for mismatching optimizer types.
Users will see new option in Training section:
This option should be only enabled when they plan to continue training in future.
Training can continue without saving optimizer state, but some user reported that it was blowing up sometimes when its continued from checkpoint... must by bad luck of optimizer...
For releasing HN, it is recommended to turn off the option (with Apply button) before saving / interrupting training.
Standard (1, 2, 1) network file size comparision is here, it is roughly 3x size difference.
Current Task
Save and load optimizer state dict
People complained about optimizer not resuming properly, it was because we don't save optimizer state dict.
Generalized way to save / load optimizers
This is for generalizing optimizer resuming process. It does not necessarily mean it will offer more optimizer options immediately.