Provide documentation on how to train a Transformers4Rec model using multiple GPU with `DataParallel` (DP) and `DistributedDataParallel` (DDP). - [x] Short explanation of DP and DDP with links to PyTorch documentation - [x] Describe code snippets, command line examples and environment variables necessary to use the `Trainer` with `DP` and `DDP` - [x] Table comparing runtimes x number of GPUs for one of the integration tests for single GPU, `DP` and `DDP` - [x] Describe that LR needs to be increased when using DP and DDP
Provide documentation on how to train a Transformers4Rec model using multiple GPU with
DataParallel(DP) andDistributedDataParallel(DDP).TrainerwithDPandDDPDPandDDP