Skip to content

DistributedReadingService supports multi-processing reading #911

@xiaosu-zhu

Description

@xiaosu-zhu

🚀 The feature

TorchData is a great work for better data loading! I have tried it and it gives me a nice workflow with tidy code-style.❤️

When using DDP, I work with the DataLoader2 where reading_service=DistributedReadingService(). I find this service runs one worker for outputting datas per node. This means it has lower reading throughput than the legacy DataLoader, which utilizes multiple workers with the total worker number = num_workers * world_size.

Therefore, is it possible to combine DistributedReadingService with multi-processing reading? This could be possibly done by introducing PrototypeMultiProcessingReadingService into DistributedReadingService (Just guessing. I'm not a pro for handling this.).

Motivation, pitch

I think this feature could be a part of #427 . The detailed motivation is declared above.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions