🚀 Feature
nn.Module.register_buffer() should get a keyword argument persistent=True. If set to False, the buffer will not be included in the output of state_dict(), and not loaded in _load_state_dict().
Motivation
I repeatedly come across cases where I want to precompute a tensor at construction time that is then used in every forward() call. Let's take a Discrete Cosine Transform module as an example. It can be computed by generating a DCT matrix and applying a matrix product. Assuming the input size is fixed at construction time, it would be wasteful to recompute the matrix in every forward call.
I currently have three options: Making it an nn.Parameter, registering it as a buffer, or directly storing it as an attribute.
The first two cause it to be included in state_dict. It would be wasteful to store the matrix in every model, and it would lock me in to that implementation -- if I decide to implement the DCT differently, I will have to implement a state dict hook that discards the matrix when loading older models.
The third one does not include it in state_dict, but also does not convert it when calling .cuda(), double() etc.. This can be fixed by overriding _apply(), but I don't want to do this in every Module and it would cause the model to not work with data_parallel() (which explicitly copies only the parameters and buffers).
Pitch
With register_buffer(name, tensor, persistent=False), I would want a buffer to be registered that is not stored and restored, but otherwise treated as any other buffer. The docstring of register_buffer already speaks of "persistent buffers", so it seems sensible to also allow "non-persistent buffers". From what I understand, this would only require changes in state_dict() and _load_state_dict(), as well as a way to track which buffers are non-persistent.
Alternatives
My use case is about what would be a constant in graph-based frameworks: Something that can be computed once and reused, something that is independent of the input to forward(). It would be possible to have a nn.Constant wrapper class similar to nn.Parameter, that is registered when assigning to a Module attribute, and included whenever a model is moved or replicated. But my impression is that there are several places in Pytorch that assume a model only has parameters and buffers, and would need updating to know about constants. Furthermore, it's not important that the value is constant, so that's a too narrow concept.
There once was a proposal for "calculated parameters" (#7313 (comment)) that would also fit my use case, but would cause more overhead for me, the humble developer -- I would need to implement a Module that computes the DCT matrix. All I want is a way to store Tensors as Module attributes that are not included in the state dict, but moved across devices just like parameters or buffers.
🚀 Feature
nn.Module.register_buffer()should get a keyword argumentpersistent=True. If set toFalse, the buffer will not be included in the output ofstate_dict(), and not loaded in_load_state_dict().Motivation
I repeatedly come across cases where I want to precompute a tensor at construction time that is then used in every
forward()call. Let's take a Discrete Cosine Transform module as an example. It can be computed by generating a DCT matrix and applying a matrix product. Assuming the input size is fixed at construction time, it would be wasteful to recompute the matrix in every forward call.I currently have three options: Making it an
nn.Parameter, registering it as a buffer, or directly storing it as an attribute.The first two cause it to be included in
state_dict. It would be wasteful to store the matrix in every model, and it would lock me in to that implementation -- if I decide to implement the DCT differently, I will have to implement a state dict hook that discards the matrix when loading older models.The third one does not include it in
state_dict, but also does not convert it when calling.cuda(),double()etc.. This can be fixed by overriding_apply(), but I don't want to do this in every Module and it would cause the model to not work withdata_parallel()(which explicitly copies only the parameters and buffers).Pitch
With
register_buffer(name, tensor, persistent=False), I would want a buffer to be registered that is not stored and restored, but otherwise treated as any other buffer. The docstring ofregister_bufferalready speaks of "persistent buffers", so it seems sensible to also allow "non-persistent buffers". From what I understand, this would only require changes instate_dict()and_load_state_dict(), as well as a way to track which buffers are non-persistent.Alternatives
My use case is about what would be a constant in graph-based frameworks: Something that can be computed once and reused, something that is independent of the input to
forward(). It would be possible to have ann.Constantwrapper class similar tonn.Parameter, that is registered when assigning to aModuleattribute, and included whenever a model is moved or replicated. But my impression is that there are several places in Pytorch that assume a model only has parameters and buffers, and would need updating to know about constants. Furthermore, it's not important that the value is constant, so that's a too narrow concept.There once was a proposal for "calculated parameters" (#7313 (comment)) that would also fit my use case, but would cause more overhead for me, the humble developer -- I would need to implement a Module that computes the DCT matrix. All I want is a way to store Tensors as Module attributes that are not included in the state dict, but moved across devices just like parameters or buffers.