Variable.volatile forces outputs to not require gradients if any of the inputs are marked volatile. This works OK in the forward pass, but we're forced to change the meaning in the backwards. Gradients are sometimes volatile and sometimes not, which is awkward if you add them back to parameters, such as in optimizers.
We should replace volatile with a context manager in Python. (Chainer already did this with no_backprop_mode()).
At the C++ level, we should replace it with a thread-local global switch.
This will simplify the logic in the backwards: by default backwards() will set "no-backprop mode", unless create_graph is True.
Variable.volatile forces outputs to not require gradients if any of the inputs are marked volatile. This works OK in the forward pass, but we're forced to change the meaning in the backwards. Gradients are sometimes volatile and sometimes not, which is awkward if you add them back to parameters, such as in optimizers.
We should replace
volatilewith a context manager in Python. (Chainer already did this withno_backprop_mode()).At the C++ level, we should replace it with a thread-local global switch.
This will simplify the logic in the backwards: by default
backwards()will set "no-backprop mode", unlesscreate_graphis True.