Skip to content

[DataLoader2] Fix apply_sharding to accept one sharding_filter per branch#90769

Closed
ejguan wants to merge 3 commits intopytorch:masterfrom
ejguan:fix_apply_sharding
Closed

[DataLoader2] Fix apply_sharding to accept one sharding_filter per branch#90769
ejguan wants to merge 3 commits intopytorch:masterfrom
ejguan:fix_apply_sharding

Conversation

@ejguan
Copy link
Contributor

@ejguan ejguan commented Dec 13, 2022

Changes:

  • Allow multiple sharding_filter in the pipeline as long as they are not on the same branch
  • Add test

Example:

graph TD;
DP1-->sharding_filter_1;
sharding_filter_1-->DP3;
DP2-->sharding_filter_2;
sharding_filter_2-->DP4;
DP3-->DP4;
DP4-->output;
Loading

In order to properly shard DP1 and DP2, we should allow multiple sharding_filters

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Dec 13, 2022
@pytorch-bot
Copy link

pytorch-bot bot commented Dec 13, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90769

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 6db5dd5:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan requested review from NivekT and wenleix December 13, 2022 17:26
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this!

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the error message!

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 14, 2022
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

facebook-github-bot pushed a commit to meta-pytorch/data that referenced this pull request Dec 28, 2022
…processing (#919)

Summary:
This PR is created on top of #555. And, this PR extends `PrototypeMultiprocessingReadingService` to accept non-replicable DataPipe.

And, this PR depends on pytorch/pytorch#90769

### Main Changes
- Add a way to launch a process to fetch data from non-replicable DataPipes and send data to worker processes in a round-robin manner
  - Add `ShardingRoundRobinDispatcher` (functional name `sharding_round_robin_dispatch`) to indicate non-replicable DataPipe
  - Add `MultipleDataPipesToQueuesLoop` to connect non-sharding process to request/response queues
  - Add `find_lca_non_replicable_dp` as a graph function to determine the lowest common ancestor of all non-replicabble DataPipes. This would guarantee that all non-replicable DataPipes will be running in a single dispatching process
  - In each multiprocessing worker process,
    - If All datapipes are replicable, apply multiprocessing sharding to the graph
    - If not, worker would use `find_replicable_branches` to apply mp sharding to those replicable branches, because all non-replicable branches have been properly sharded by routing data round-robinly to worker processes.
- Properly get `ResetEpochResponse` from protocol via `get_response_reset_epoch`
- [x] Add tests for two graph functions
- [x] Add test to launch non-shardable DataPipe process
- Add documents
  - [x] Replicable DataPipe/Non-replicable DataPipe in multiprocessing
  - [x] How PrototypeMPRS handles the above two types of DataPipe
Please check the link for doc: https://ejguan.github.io/dataloader2.html#dynamic-sharding
## nit Changes
- Rename `Spawn` to `Create` as the process has not been started

Pull Request resolved: #919

Reviewed By: wenleix

Differential Revision: D42004034

Pulled By: ejguan

fbshipit-source-id: 5b0b1cb7c2781c4f45240d21f37d457b9729b9a4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: dataloader release notes category topic: improvements topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants