Refactor ShardingFilterIterDataPipe into a seperate file (#94095)#987
Refactor ShardingFilterIterDataPipe into a seperate file (#94095)#987wenleix wants to merge 1 commit intometa-pytorch:mainfrom
ShardingFilterIterDataPipe into a seperate file (#94095)#987Conversation
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 fbshipit-source-id: 223a9503fc5c3c78b2aebb30fe9665bb081f37ee
4c15f34 to
bfe7125
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
1 similar comment
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 fbshipit-source-id: 74713ac091b1dcb215687a9f572dc96d79d89c7e
bfe7125 to
e778b3c
Compare
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 fbshipit-source-id: 917937c78c6839a43b8595f649ada9fe829a01fd
e778b3c to
11315e0
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 fbshipit-source-id: a1d3bcb02309ca7ee4d991a31945d4550f4174b9
11315e0 to
f6d31be
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 Reviewed By: ejguan fbshipit-source-id: d7b450b89989f0a05c6e70fb8d7290893918a32a
f6d31be to
31c3a2e
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
…rch#987) Summary: Pull Request resolved: meta-pytorch#987 X-link: pytorch/pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 Reviewed By: seemethere fbshipit-source-id: 79a2ee53d279a4dc8da84eb2af3e5c801ff3d1ea
31c3a2e to
1af1c08
Compare
|
This pull request was exported from Phabricator. Differential Revision: D43014692 |
Summary: X-link: meta-pytorch/data#987 Pull Request resolved: pytorch#94095 Differential Revision: D43014692 Move `ShardingFilterIterDataPipe` into a dedicated file. Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`. But open to other discussions. ### Open question Should [ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch). D43014692 Test Plan: sandcastle and CI How to run unit tests in buck related to such changes? :) Reviewed By: seemethere fbshipit-source-id: fcd2a4e57b15fdf7411c959a38c377ac114f0ecb
|
This pull request has been merged in 914d3dd. |
|
This is BC breaking. The original module ( For us it is not a large problem to change, but other libraries that rely on that might break in the upcoming release. |
|
Just to give some unsolicited advice: I would push to make all modules below Using such an architecture would have prevented this, because the datapipe would still be exposed under |
|
@pmeier |
|
I was more concern on the |
Huh, I guess we never looked for that given that the module it is in is public. But fair enough, I'm gonna change that on our side. |
Sounds good. Let me import them back . Thanks for reporting this! |
…eprecated (pytorch#94527) Summary: Pull Request resolved: pytorch#94527 pytorch#94095 moves this into `tud.datapipes.iter.sharding`. However, since previously this is a public API, this is a BC break change. As discussed in meta-pytorch/data#987 (comment), we will have backward compatbile support but with deprecated warning. Test Plan: ``` buck test mode/opt //caffe2/test:datapipe ``` Reviewed By: ejguan Differential Revision: D43161015 fbshipit-source-id: 82049979981b83aefd247843fbc2c004a7e7a90a
…eprecated (#94527) Summary: #94095 moves this into `tud.datapipes.iter.sharding`. However, since previously this is a public API, this is a BC break change. As discussed in meta-pytorch/data#987 (comment), we will have backward compatbile support but with deprecated warning. Differential Revision: D43161015 Pull Request resolved: #94527 Approved by: https://github.com/ejguan, https://github.com/NivekT
Summary:
Move
ShardingFilterIterDataPipeinto a dedicated file.Also, propose to have a dedicated parent class (
_ShardingIterDataPipe) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composableCallable[[Iterable], Iterable]. But open to other discussions.Open question: Should
ShardingRoundRobinDispatcherIterDataPipe also be considered as a
_ShardingIterDataPipe? (e.g. this sharding is executed by replicating (the metadata), whileShardingRoundRobinDispatcherIterDataPipehints too expensive to replicate so requires round robin data exchange/dispatch).Differential Revision:
D43014692
X-link: pytorch/pytorch#94095
D43014692