Serialization and Online learning
Hi
I saw the "Add model serialization" is on the Trello to do list.
If I can serialize the model, can I just reload the old model and just continue the training with the new interactions data? But I guess there would be learning rate problem with the Adam optimizer at least. What do you do in practice? Can you recommend me something to read?
Thank you!
This should be correct. I am working on tests to make sure that this really is true.
In principle the parameters of the optimizer will get serialized as well, so there should be no problem in resuming training.
Thank you for the reply! I think optimizers like SGD could easily generalize. However, the optimizers like Adam, the LR for each parameter was adjusted according to the history of gradients. The gradients could get quite small for existing parameters. I don't know if this is expected behavior, as the new interaction data is more important than the historical data?
Certainly for Adagrad the learning rate goes to zero as the number of training examples gets large. I'm less sure this is true of Adam: I suspect if may converged to some small but non-zero value.
I think this reflects the fact that not a lot of applications run true online models, where the parameters are updated as the data comes in. It's much more common to fit once, publish, and retrain from scratch once new data is available.
You may also be interested in the literature on SGD with restarts. I haven't followed it closely but they seem to have some intriguing results.
@maciejkula Optimizers like FTRL seems to be useful for the online learning in the recommendation system, by heavily regularize the weights I think. I cannot find a pytorch implementation yet. Do you have any experience in them?
Not really. While I think adagrad is a poor choice (learning rate goes to zero), I suspect Adam, SGD, and SGD with momentum will all work quite well in this setting.
You could verify this by plotting the learning rates from the optimizer as you fit the model on more and more data?