Implementation of activations as subclasses of the torch.nn.Module

# 🚀 Feature request


I guess it would be best to ask this first: are there some specific reasons why activation functions in https://github.com/huggingface/transformers/blob/master/src/transformers/activations.py are not subclasses of the `torch.nn.Module` ?

If there are, then we can probably ignore everything else below :) .
If there aren't, then it might be interesting to consider implementing them that way (I would be happy to work on a PR for it). 
A few advantages (that I'm aware of) with activations as subclasses of the `torch.nn.Module`:
1. it's easy to check which activations are used in the model by just running: `print(my_bert)`. Currently one has to check the config file for it, which is also not that bad but this just makes it a bit more convenient. Just like printing the torchvision models `print(resnet50)`, one can immediately see which activations are being used in the model.
2. composing layers with for example `nn.Sequential` would be possible (I'm not sure if this is possible when activations are implemented as python functions)
3. attaching pytorch hooks to activation modules would be possible (I think this is the most important advantage)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of activations as subclasses of the torch.nn.Module #15364

🚀 Feature request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementation of activations as subclasses of the torch.nn.Module #15364

Description

🚀 Feature request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions