Documentation on multi-GPU training with DataParallel and DistributedDataParallel

Provide documentation on how to train a Transformers4Rec model using multiple GPU with `DataParallel` (DP) and `DistributedDataParallel` (DDP).

- [x] Short explanation of DP and DDP with links to PyTorch documentation
- [x] Describe code snippets, command line examples and environment variables necessary to use the `Trainer` with `DP` and `DDP`
- [x] Table comparing runtimes x number of GPUs for one of the integration tests for single GPU, `DP` and `DDP`
- [x] Describe that LR needs to be increased when using DP and DDP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation on multi-GPU training with DataParallel and DistributedDataParallel #492

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Documentation on multi-GPU training with DataParallel and DistributedDataParallel #492

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions