Refactor sharding data pipe into a seperate file#94095
Refactor sharding data pipe into a seperate file#94095wenleix wants to merge 1 commit intopytorch:masterfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94095
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 11e365b: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
ShardingFilterIterDataPipe into a seperate fileShardingFilterIterDataPipe into a seperate file
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
73b1dc2 to
d20f165
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
d20f165 to
f4ed8a4
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
f4ed8a4 to
9243a21
Compare
9243a21 to
e430106
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
e430106 to
08480eb
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
08480eb to
73f02d9
Compare
73f02d9 to
2e3e805
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
2e3e805 to
18deef7
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
0a89bb2 to
ca08787
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
ca08787 to
c63b86d
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
c63b86d to
4c25ed3
Compare
Summary: X-link: meta-pytorch/data#987 Pull Request resolved: pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 Test Plan: sandcastle and CI How to run unit tests in buck related to such changes? :) Reviewed By: seemethere fbshipit-source-id: fcd2a4e57b15fdf7411c959a38c377ac114f0ecb
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
4c25ed3 to
11e365b
Compare
Summary: Pull Request resolved: #987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 Reviewed By: seemethere fbshipit-source-id: b5f2f6d8b3574d0d98b22ca55134c143786c7c2c
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Move
ShardingFilterIterDataPipeinto a dedicated file.Also, propose to have a dedicated parent class (
_ShardingIterDataPipe) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composableCallable[[Iterable], Iterable]. So we don't need to based on whetheris_shardableandapply_shardingare presented in DataPipe ingraph_settings.py. But open to other discussions.Open question: Should
ShardingRoundRobinDispatcherIterDataPipe also be considered as a
_ShardingIterDataPipe? (e.g. this sharding is executed by replicating (the metadata), whileShardingRoundRobinDispatcherIterDataPipehints too expensive to replicate so requires round robin data exchange/dispatch).Differential Revision: D43014692