My template for deep learning projects using PyTorch, PyTorch Lightning and Hydra. It's designed with modularity in mind by allowing to hot-swap different types of models, datasets, losses and optimization procedures. It also supports easy iteration over different hyperparameters while preserving reproducibility.
All requirements are listed in environment.yml. I highly recommend using anaconda for
installing the dependencies like
so:
conda env create -f environment.yml -n <my-env-name>Depending on your hardware you might need to change the version of cudatoolkit in environment.yml. If you don't
want to use an nvidia GPU, you can also just remove this entry.
If you wish to not use anaconda, you can still check each package's version
in environment.yml and install them manually using pip / put them into a requirements.txt file.
|
Basic building blocks and optimization for deep neural networks:
|
|
|
Wrapper around PyTorch for better code structure / easy use of accelerators:
|
|
|
Composable configurations in yaml, also overridable through the command line:
|
|
Let's say you want to introduce a custom PyTorch model to the project that you call the "Beeg Yoshi MLP" which is simply a ridiculously wide MLP.
# model/mlp.py
from torch.nn import Module, Linear, Sequential, ReLU
class BeegYoshiMLP(Module):
"""The Beeg Yoshi MLP. It's width will be the input_dim * beeg_factor.
"""
def __init__(self, input_dim: int, output_dim: int, beeg_factor: int):
super().__init__()
h_dim = input_dim * beeg_factor
self.seq = Sequential(Linear(input_dim, h_dim),
ReLU(),
Linear(h_dim, output_dim))
def forward(self, inputs):
return self.seq(inputs)We've placed this code in model/mlp.py. As you will see in the next step, the package structure is mirroring the
config structure. As long as you can reference your class by its module path you can also change this structure.
However, I'd encourage you to also mirror package and config structure.
# conf/model/beeg_mlp.yaml
# Module path to our class. The _target_ field will be used
# to reference the class at instantiation and all other entries will be passed to the
_target_: model.mlp.BeegYoshiMLP
# we will be training on fashion mnist, hence these dimensions
input_dim: 784
output_dim: 10
# Tip: if you want to dynamically change dimensions based on the dataset you could use
# hydra's interpolation to reference the datamodule config e.g.:
# input_dim: ${datamodule.input_size}
# check the hydra documentation for details
# the default beeg_factor
beeg_factor: 10Because we've placed our yaml file in conf/model/ it is now part of the model config group. This means we can
select it from the defaults list in our root config file conf/config.yaml:
# conf/config.yaml
defaults:
- model: beeg_mlp
- datamodule: fashion_mnist
# ...python train.py Yup that's it. Our config will be automatically assembled and used in train.py to instantiate and assemble our
components, then run the training procedure by calling Trainer.fit() on our datamodule (atm fashion-mnist).
If you don't want to edit the yaml file you can also override the model and its parameters via CLI:
python train.py model=beeg_mlp model.beeg_factor=20Hydra also provides a simple interface for gridsearch (there's also auto-ml plugins for hydra, check the docs!):
python train.py -m model.beeg_factor=1,5,10Hydra will store any logs (including the configuration used) in a dedicated log directory. You can change the directory
pattern at the bottom of conf/config.yaml. This is also where e.g. tensorboard logs will end up.
This is basically all you need to know to make full use of this template. Each config group and its subgroups are
found in conf/ and mirrored in the package structure. Just like the model, each part of training can be hot-swapped
as soon as a config file and corresponding python class exist. This includes: DataModules, callbacks, loggers,
optimizers, metrics and even the general training/optimization procedure (loop).
When assembled, the Trainer is composed as follows:
Trainer
├── cfg.trainer** # trainer args found in config.yaml
├── loop
│ ├── loss
│ ├── optimizer
│ ├── metrics
│ └── model
├── logger
└── callbacksThis is the main script where training is started. Here, all training components are instantiated and assembled based on the hydra config.
Loop is a LightningModule encapsulating model, optimizer, loss function (PyTorch) and metrics
(torchmetrics). It is responsible for defining train, validation and test steps (including calling metrics).
The DataModule stores any code for fetching, pre-processing and iterating over a dataset by returning DataLoaders.
Logger that will be passed to the
Trainer, needs to implement the Logger interface.
Convenience class for managing a set of torchmetrics. Used in the default training loop to update metric state.


