Determinsim about Local shuffle/random_op after `sharding_filter`

### 🐛 Describe the bug

## Current state of determinism

Using `DataLoader2` + `PrototypeMultiProcessingReadingService` as an example:
1. Before each iteration starts, a distributed shared seed will be generated ([link](https://github.com/pytorch/data/blob/b1b3406389fa6665ec3422ae5e1ebf65ae7e2938/torchdata/dataloader2/reading_service.py#L282-L285))
2. With multiprocessing, each subprocess will reset all of shuffle operations to the same random seeds at the beginning of each iteration based on the distributed shared seed in step 1. ([link](https://github.com/pytorch/data/blob/b1b3406389fa6665ec3422ae5e1ebf65ae7e2938/torchdata/dataloader2/reading_service.py#L212-L217))
3. And, `torch`, `numpy` and `python.random` will get a different process-local seeds for each subprocess ([link](https://github.com/pytorch/data/blob/b1b3406389fa6665ec3422ae5e1ebf65ae7e2938/torchdata/dataloader2/reading_service.py#L219-L233))

## Additional feature

For the step 2 in the last section, we set the same shuffle seed across distribtued/mp workers because we want to make sure the shuffled data can be sharded in a mutually exclusive and collectively exhaustive manner.
An additional feature is needed to make sure all random operations after `sharding_filter` having the different seeds across workers to preserve fully data randomization.

Let's say we have a pipeline as:
```
data_source.shuffle().sharding_filter().map(fn).batch(8).shuffle()
```
We will have the random state shared for the first `shuffle`, but different states for the second `shuffle`. And, those states should be generated in a deterministic manner so we will be able to reproduce it.

### Versions

main branch

cc: @msaroufim @VitalyFedyunin 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determinsim about Local shuffle/random_op after `sharding_filter` #885

🐛 Describe the bug

Current state of determinism

Additional feature

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Determinsim about Local shuffle/random_op after sharding_filter #885

Description

🐛 Describe the bug

Current state of determinism

Additional feature

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Determinsim about Local shuffle/random_op after `sharding_filter` #885