Skip to content

[3/n] DataLoader2 initial support for randomness control#801

Closed
ejguan wants to merge 1 commit intometa-pytorch:mainfrom
ejguan:export-D38947827
Closed

[3/n] DataLoader2 initial support for randomness control#801
ejguan wants to merge 1 commit intometa-pytorch:mainfrom
ejguan:export-D38947827

Conversation

@ejguan
Copy link
Contributor

@ejguan ejguan commented Sep 29, 2022

Fixes #885

Add the support for DataLoader2 to control randomness over the pipeline:

  • Implement SeedGenerator
    • spawn to generate sub-SeedGenerators for distributed workers
    • generate_seed to generate unique seeds
    • generate_shared_seed to generate distributed shared seeds
  • Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Differential Revision: D38947827

Last step for #885

@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Sep 29, 2022
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@ejguan ejguan marked this pull request as draft September 29, 2022 20:32
ejguan added a commit to ejguan/data that referenced this pull request Sep 29, 2022
Summary:
Pull Request resolved: meta-pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: 17db1e13fe8685f6b2817f72c0e199edfaf3a3a1
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Sep 29, 2022
Summary:
Pull Request resolved: meta-pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: 5ae5065ab7aceb35e9f966c3d6bc585eb07c8ba5
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@ejguan ejguan marked this pull request as ready for review September 30, 2022 15:07
@ejguan ejguan changed the title DataLoader2 initial support for randomness control [1/n] DataLoader2 initial support for randomness control Sep 30, 2022
@ejguan
Copy link
Contributor Author

ejguan commented Sep 30, 2022

I might need to re-create a new PR via ghexport to support a stack of Diffs.

ejguan added a commit to ejguan/data that referenced this pull request Oct 4, 2022
…vice (meta-pytorch#801)

Summary:
Pull Request resolved: meta-pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Reviewed By: Miiira

Differential Revision: D38947827

fbshipit-source-id: 932cabdf1df5e0feafa44a3d2bc50c290360d323
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Oct 4, 2022
…vice (meta-pytorch#801)

Summary:
Pull Request resolved: meta-pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: 38cfc46ce3fbda6872a988fa27c072ff80d79c3c
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Oct 4, 2022
…vice (meta-pytorch#801)

Summary:
Pull Request resolved: meta-pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: fab10a21fecf76e9b5f5c2296fbf930c3af14d2d
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Oct 5, 2022
…vice (meta-pytorch#801)

Summary:
Pull Request resolved: meta-pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Differential Revision: D38947827

fbshipit-source-id: c3018a408b78dd8d2e2858350edbb762ece10d37
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Oct 6, 2022
…vice (meta-pytorch#801)

Summary:
Pull Request resolved: meta-pytorch#801

Add the initial support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
- Change API of `ReadingService` to take seed generator from DataLoader2

Reviewed By: NivekT

Differential Revision: D38947827

fbshipit-source-id: 21761db17cab2f1c9ef89058b6a53f53abe0590f
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@ejguan ejguan changed the title [1/n] DataLoader2 initial support for randomness control [3/n] DataLoader2 initial support for randomness control Dec 29, 2022
ejguan added a commit to ejguan/data that referenced this pull request Jan 17, 2023
…vice (meta-pytorch#801)

Summary:
Fixes meta-pytorch#885

Pull Request resolved: meta-pytorch#801

Add the support for DataLoader2 to control randomness over the pipeline:
- Implement `SeedGenerator`
  - `spawn` to generate sub-SeedGenerators for distributed workers
  - `generate_seed` to generate unique seeds
  - `generate_shared_seed` to generate distributed shared seeds
- Change API of `ReadingService` to take seed generator from DataLoader2. Then, the SeedGenerator of `DataLoader2` becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Reviewed By: NivekT

Differential Revision: D38947827

fbshipit-source-id: e1a434460b4a5d43461e982debe875808b4241db
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ejguan added a commit to ejguan/data that referenced this pull request Jan 17, 2023
Summary:
Fixes meta-pytorch#885

Add the support for DataLoader2 to control randomness over the pipeline:
- Implement SeedGenerator
  - `spawn` to generate sub-SeedGenerators for distributed workers
  - `generate_seed` to generate unique seeds
  - `generate_shared_seed` to generate distributed shared seeds
- Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Last step for meta-pytorch#885

Pull Request resolved: meta-pytorch#801

Reviewed By: NivekT

Differential Revision: D38947827

Pulled By: ejguan

fbshipit-source-id: 006bf17cbb51b2d5a39d647ca86401b0483c7812
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

ejguan added a commit to ejguan/data that referenced this pull request Jan 17, 2023
Summary:
Fixes meta-pytorch#885

Add the support for DataLoader2 to control randomness over the pipeline:
- Implement SeedGenerator
  - `spawn` to generate sub-SeedGenerators for distributed workers
  - `generate_seed` to generate unique seeds
  - `generate_shared_seed` to generate distributed shared seeds
- Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Last step for meta-pytorch#885

Pull Request resolved: meta-pytorch#801

Reviewed By: NivekT

Differential Revision: D38947827

Pulled By: ejguan

fbshipit-source-id: b6fa81de133a0613e8c96ce17b136d897ca80201
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

Summary:
Fixes meta-pytorch#885

Add the support for DataLoader2 to control randomness over the pipeline:
- Implement SeedGenerator
  - `spawn` to generate sub-SeedGenerators for distributed workers
  - `generate_seed` to generate unique seeds
  - `generate_shared_seed` to generate distributed shared seeds
- Change API of ReadingService to take seed generator from DataLoader2. Then, the SeedGenerator of DataLoader2 becomes the source of truth of randomness within the whole data pipeline.

A separate PR will be added for online doc regarding determinism.

Last step for meta-pytorch#885

Pull Request resolved: meta-pytorch#801

Reviewed By: NivekT

Differential Revision: D38947827

Pulled By: ejguan

fbshipit-source-id: 2f852b89cb1d638e1b9222df838786eb8855afa4
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D38947827

@facebook-github-bot
Copy link
Contributor

@ejguan merged this pull request in 38e0d03.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Determinsim about Local shuffle/random_op after sharding_filter

3 participants