Add support to keep non-replicable DataPipe in the main process by ejguan · Pull Request #950 · meta-pytorch/data

ejguan · 2023-01-19T14:22:02Z

This is useful when fullsync is in the pipeline and we don't want to make this DataPipe running in the worker process

Changes

Change the function names that is dispatching-related to dispatching_xxx
Make fullsync DataPipe non-replicable
Add _find_replicable_branches to find the last DataPipe prior to any non-replicable DataPipe
- Add graph tests
In PrototypeMultiprocessingReadingService, make sure only replicable_datapipe sent to worker process. And, replace the replicable_datapipe with the worker_consumer_datapipe.

facebook-github-bot · 2023-01-19T21:45:21Z

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torchdata/dataloader2/reading_service.py

wenleix · 2023-01-19T21:50:54Z

test/dataloader2/test_dataloader2.py

    return d * 2


+class NonReplicableDataPipe(IterDataPipe):


Looks like IterDataPipe is evolving towards DAG nodes...

ShardingRoundRobinDispatcherIterDataPipe is also non-replicable right ? But they seem to have different graph rewrite strategy (one tries to move to dispatch process and one tries to keep in main process)

wenleix · 2023-01-19T22:52:32Z

test/dataloader2/test_dataloader2.py

+    @mp_ctx_parametrize
+    def test_non_replicable_datapipe(self, ctx) -> None:
+        r"""
+        For the pipeline with non-replicable DataPipe, make sure


This non-replicable DataPipe also cannot be ShardingRRDispatchDP, right?

No, it can't. Like I said, ShardingRRDispatchDP labels the prior graph non-shardable and other non-shardable DataPipe labels itself and subsequent graph non-shardable.

test/test_graph.py

torchdata/dataloader2/utils/dispatch.py

wenleix · 2023-01-19T23:12:01Z

torchdata/dataloader2/reading_service.py

+            self._main_prefetch_datapipe = end_datapipe
+
+        # Attach non-replicable DataPipes
+        if replicable_dp is not datapipe:


I guess this happens when we have non-replicable data pipe (and need to keep it in the main process).

And the reason to replace replicable_dp with end_datapipe is because end_datapipe has "exchange sink" attached? (i.e. the _IterateQueueDataPipes on line 291)?

Doing that because we send replicable_dp to worker process, which returns an end_datapipe with the exchange sink.

non_rep_dp1 (datapipe) -> non_rep_dp2 -> rep_dp_1 (replicable_dp) -> rep_dp2 -> ...

After replacement, the graph becomes

non_rep_dp1 (datapipe) -> non_rep_dp2 -> end_datapipe

In this case, we want to return non_rep_dp1 (datapipe) rather than the end_datapipe.

For the case that the whole graph is replicable

rep_dp_1 (replicable_dp/datapipe) -> rep_dp2 -> ...

After launching mp, graph becomes

end_datapipe

So, we just need to keep end_datapipe.

test/test_graph.py

facebook-github-bot · 2023-01-20T17:17:42Z

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-01-23T17:47:56Z

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-01-23T20:54:42Z

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-01-24T01:15:06Z

@ejguan merged this pull request in 2ca1fa6.

…-pytorch#950) Summary: This is useful when `fullsync` is in the pipeline and we don't want to make this DataPipe running in the worker process ### Changes - Change the function names that is dispatching-related to `dispatching_xxx` - Make `fullsync` DataPipe non-replicable - Add `_find_replicable_branches` to find the last DataPipe prior to any non-replicable DataPipe - Add graph tests - In `PrototypeMultiprocessingReadingService`, make sure only `replicable_datapipe` sent to worker process. And, replace the `replicable_datapipe` with the `worker_consumer_datapipe`. Pull Request resolved: meta-pytorch#950 Reviewed By: wenleix, NivekT, Miiira Differential Revision: D42617776 Pulled By: ejguan fbshipit-source-id: 1138203507934b089025e290597b473ef9be32bb

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 19, 2023

ejguan requested a review from wenleix January 19, 2023 14:37

Miiira approved these changes Jan 19, 2023

View reviewed changes

torchdata/dataloader2/reading_service.py Show resolved Hide resolved

torchdata/dataloader2/reading_service.py Show resolved Hide resolved

wenleix approved these changes Jan 19, 2023

View reviewed changes

NivekT approved these changes Jan 23, 2023

View reviewed changes

ejguan added 5 commits January 23, 2023 17:46

Add graph function to find replicable branches

db96020

Keep non-shardable DataPipe in the main process for ProtoMPRS

8a34004

Fix mypy

1ca4315

Add a test to validate production use case of fullsync

839348f

Fix nit comments

2e0a080

ejguan force-pushed the keep_non_replicable_in_main_process branch from 6446dd3 to 2e0a080 Compare January 23, 2023 17:47

ejguan mentioned this pull request Jan 23, 2023

DistributedReadingService supports multi-processing reading #911

Closed

Fix worker_prefetch

276f003

facebook-github-bot closed this in 2ca1fa6 Jan 24, 2023

facebook-github-bot added the Merged label Jan 24, 2023

ejguan mentioned this pull request Jan 25, 2023

Allow passing in init_process_group kwargs to fullsync datapipe #868

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support to keep non-replicable DataPipe in the main process#950

Add support to keep non-replicable DataPipe in the main process#950
ejguan wants to merge 6 commits intometa-pytorch:mainfrom
ejguan:keep_non_replicable_in_main_process

ejguan commented Jan 19, 2023 •

edited

Loading

Uh oh!

facebook-github-bot commented Jan 19, 2023

Uh oh!

Uh oh!

Uh oh!

wenleix Jan 19, 2023

Uh oh!

wenleix Jan 19, 2023

Uh oh!

ejguan Jan 20, 2023

Uh oh!

Uh oh!

Uh oh!

wenleix Jan 19, 2023

Uh oh!

ejguan Jan 20, 2023

Uh oh!

Uh oh!

facebook-github-bot commented Jan 20, 2023

Uh oh!

facebook-github-bot commented Jan 23, 2023

Uh oh!

facebook-github-bot commented Jan 23, 2023

Uh oh!

facebook-github-bot commented Jan 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

ejguan commented Jan 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

facebook-github-bot commented Jan 19, 2023

Uh oh!

Uh oh!

Uh oh!

wenleix Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

wenleix Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

ejguan Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wenleix Jan 19, 2023

Choose a reason for hiding this comment

Uh oh!

ejguan Jan 20, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot commented Jan 20, 2023

Uh oh!

facebook-github-bot commented Jan 23, 2023

Uh oh!

facebook-github-bot commented Jan 23, 2023

Uh oh!

facebook-github-bot commented Jan 24, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ejguan commented Jan 19, 2023 •

edited

Loading