Add docstrings and the parameter to row_groups_per_part to the MerlinDataLoader class#590
Add docstrings and the parameter to row_groups_per_part to the MerlinDataLoader class#590
row_groups_per_part to the MerlinDataLoader class#590Conversation
Documentation previewhttps://nvidia-merlin.github.io/Transformers4Rec/review/pr-590 |
bbozkaya
left a comment
There was a problem hiding this comment.
I tested on 2 GPUs. It works fine when loading partitioned train data. For validation whoever, it seems to be using only 1 GPU and 1 partition. Is this expected or does it also need to be addressed?
Thank you for testing the solution! For the validation step, we rely on how HF transformers are setting the DDP training, and it seems that they don't wrap the model in a DDP mode for evaluation (training=False) (here). So it is expected that the validation runs on a single GPU but I don't know the motivation behind. I posted a question on HF forum to better understand the behavior of the Trainer in DDP+evaluation mode. |
Fixes #550
@bbozkaya runs different tests (see image below) of repartitioning a parquet file (using pandas or cudf) and it seems that

MerlinDataLoaderalways loads the dataset files with 1 partition even though we partition to multiple groups when saving the parquet file (as recommended here). To take into account these partitions, we should pass the parameterrow_groups_per_part=Trueto the merlin.io.Dataset.Goals ⚽
row_groups_per_parttoMerlinDataLoaderso as to load the dataset with the correct partitions.MerlinDataLoaderto explain the different parameters.