Skip to content

DistributedReadingService Requirements #427

@ejguan

Description

@ejguan

🚀 The feature

The followings are the requirements for DistributedReadingService

  • Automatic sharding
    • Shard based on Rank, WorldSize, num_workers
  • Determinism
    • Generate the same seed deterministically for all Shuffler per epoch.
    • Generate different seeds for process-local RNGs to perform different random transformation.
  • FullSyncDataPipe
    • For uneven dataset, we need to attach FullSyncDataPipe to stop iteration for all DataLoader2s when a DataLoader2 in any distributed process is depleted.
    • We can take advantage of https://pytorch.org/tutorials/advanced/generic_join.html#what-is-join as the counter and carry out an all-reduce per iteration to sync between distributed procesess.

Motivation, pitch

Make DataLoader2 working in distributed training with all syntax sugar.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions