Skip to content

Save/loading AdamW optimizer (for hypernetworks)#3975

Merged
AUTOMATIC1111 merged 16 commits intoAUTOMATIC1111:masterfrom
aria1th:force-push-patch-13
Nov 5, 2022
Merged

Save/loading AdamW optimizer (for hypernetworks)#3975
AUTOMATIC1111 merged 16 commits intoAUTOMATIC1111:masterfrom
aria1th:force-push-patch-13

Conversation

@aria1th
Copy link
Copy Markdown
Collaborator

@aria1th aria1th commented Oct 30, 2022

Closes #3894

since its log is messed up.

Optimizers, especially Adam and its variants are recommended to save and load its state.

This patch offers way to save / load optimizer state, also supports for user-selected optimizer types, such as "SGD", "Adam", etc.

If Selecting optimizer type is enabled, this line has to be changed for safety:
if hypernetwork.optimizer_state_dict:
to, whatever like
if hypernetwork.optimizer_name == hypernetwork_optimizer_type and hypernetwork.optimizer_state_dict

to prevent loading wrong state dict for mismatching optimizer types.

Users will see new option in Training section:

image

This option should be only enabled when they plan to continue training in future.

Training can continue without saving optimizer state, but some user reported that it was blowing up sometimes when its continued from checkpoint... must by bad luck of optimizer...

For releasing HN, it is recommended to turn off the option (with Apply button) before saving / interrupting training.

Standard (1, 2, 1) network file size comparision is here, it is roughly 3x size difference.

image

Current Task

  • Save and load optimizer state dict
    People complained about optimizer not resuming properly, it was because we don't save optimizer state dict.

  • Generalized way to save / load optimizers
    This is for generalizing optimizer resuming process. It does not necessarily mean it will offer more optimizer options immediately.

@Arilziem
Copy link
Copy Markdown

Arilziem commented Nov 1, 2022

Now this is just a thought, but would it be possible to save the optimizer state in a separate file next to the hypernetwork? That would remove the need to prune the file afterwards.

Optimizer "sidecars" might also enable reuse of an optimizer state when restarting from scratch? Please disregard if that would not work, I'm not familiar with the technical details.
[Edit: This comment was invisible until 2022-11-11 as my fresh account was flagged at the time of writing, thankfully @aria1th considered it after I emailed them]

@AUTOMATIC1111
Copy link
Copy Markdown
Owner

is there any demonstration of the beneft this brings

@aria1th
Copy link
Copy Markdown
Collaborator Author

aria1th commented Nov 3, 2022

@AUTOMATIC1111 Yes, AdamW (and Momentum based optimizers) uses adaptive learning rate, which is estimated from its momentum.

If we try to start from zero, AdamW will use given pure learning rate, and observe the effect.

If we resume properly, AdamW uses given trajectory, generally lower learning rate.

Loading optimizer state does not make training deterministic, since it still uses random when its trying to avoid local minima.

Multiple users in discord reported that HN training continuing from saved checkpoints does not works well, especially at beginning, HN sometimes tends to 'die' quickly for some reason. Also its is known that generally lowering learning rate down at 'resuming' helps it.

This should be an optimizer's fault - from not knowing anything about its previous trajectory.

Here's a general discussion about loading optimizers.

With this patch, I never observed drastic style change or transition of preview images like before. But this might mean that someone might want to nuke optimizer state to intentionally trigger style change / transition?

@aria1th
Copy link
Copy Markdown
Collaborator Author

aria1th commented Nov 3, 2022

Someone suggested me that if we can have separate file for optimizer, which can be distributed separately.
I think its a good idea, so I'll try working on it.

@aria1th
Copy link
Copy Markdown
Collaborator Author

aria1th commented Nov 3, 2022

image

Finished implementing and testing.
image

Files will be saved separately as *.pt and *.pt.optim file.

To use valid optimizer, it will now use hash value itself.

Closes #4048

@aria1th
Copy link
Copy Markdown
Collaborator Author

aria1th commented Nov 4, 2022

Temporarily closed for resolving conflict

@aria1th aria1th closed this Nov 4, 2022
@aria1th
Copy link
Copy Markdown
Collaborator Author

aria1th commented Nov 4, 2022

Finished testing again:
HNs file itself won't contain optimizer in any case.

  1. Tested if HN can be saved and loaded with separate optimizer file.
  2. Tested - HNs itself are properly saved without hash in name
  3. Tested - HNs will not use optimizer file if its saved hash is different.
  4. Tested - HNs won't save optimizer file if the option is disabled.

@aria1th aria1th reopened this Nov 4, 2022
@AUTOMATIC1111
Copy link
Copy Markdown
Owner

I think there isn't a scenario where the user would want to put optimizer opts into checkpoint itself when saving to separate file exists. I don't want to have useless options so please remove that and the code to support it.

@aria1th
Copy link
Copy Markdown
Collaborator Author

aria1th commented Nov 4, 2022

Yeah I removed that option (merging optimizer in checkpoint itself). Now its only saving it into separate *.optim file.
I'll change explanation in shared.py

@AUTOMATIC1111 AUTOMATIC1111 merged commit e96c434 into AUTOMATIC1111:master Nov 5, 2022
@Leon-Schoenbrunn
Copy link
Copy Markdown

@aria1th Whenever you load a hypernetwork that also has a .optim file in the hypernetwork directory, the UI says that it not only loads the hypernetwork, but also the optimizer. Is this intended? I'm not trying to resume training with the optimizer, I'm just selecting the hypernetwork for normal inference.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants