Skip to content

Refactor sharding data pipe into a seperate file#94095

Closed
wenleix wants to merge 1 commit intopytorch:masterfrom
wenleix:export-D43014692
Closed

Refactor sharding data pipe into a seperate file#94095
wenleix wants to merge 1 commit intopytorch:masterfrom
wenleix:export-D43014692

Conversation

@wenleix
Copy link
Contributor

@wenleix wenleix commented Feb 4, 2023

Move ShardingFilterIterDataPipe into a dedicated file.

Also, propose to have a dedicated parent class (_ShardingIterDataPipe) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable Callable[[Iterable], Iterable]. So we don't need to based on whether is_shardable and apply_sharding are presented in DataPipe in graph_settings.py. But open to other discussions.

Open question: Should
ShardingRoundRobinDispatcherIterDataPipe also be considered as a _ShardingIterDataPipe? (e.g. this sharding is executed by replicating (the metadata), while ShardingRoundRobinDispatcherIterDataPipe hints too expensive to replicate so requires round robin data exchange/dispatch).

Differential Revision: D43014692

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 4, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94095

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 11e365b:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: dataloader release notes category label Feb 4, 2023
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Feb 4, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: wenleix / name: Wenlei Xie (f4ed8a498e9e4856ac8661e6cda59a357403a5bf)

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

@wenleix wenleix changed the title [RFC] Refactor ShardingFilterIterDataPipe into a seperate file [WIP, RFC] Refactor ShardingFilterIterDataPipe into a seperate file Feb 4, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

1 similar comment
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

Summary:
X-link: meta-pytorch/data#987

Pull Request resolved: pytorch#94095

Differential Revision:
D43014692

Move `ShardingFilterIterDataPipe` into a dedicated file.

Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`.  But open to other discussions.

### Open question
Should
[ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch).

D43014692

Test Plan:
sandcastle and CI

How to run unit tests in buck related to such changes? :)

Reviewed By: seemethere

fbshipit-source-id: fcd2a4e57b15fdf7411c959a38c377ac114f0ecb
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D43014692

facebook-github-bot pushed a commit to meta-pytorch/data that referenced this pull request Feb 7, 2023
Summary:
Pull Request resolved: #987

X-link: pytorch/pytorch#94095

Differential Revision:
D43014692

Move `ShardingFilterIterDataPipe` into a dedicated file.

Also, propose to have a dedicated parent class (`_ShardingIterDataPipe`) for sharding data pipe, as this seems more like a "system/engine-level" datapipe that gives strong hints to RS on how to execute, and needs first-class citizen treatment in RS (compared with other "user-level" datapipe that are mostly composable `Callable[[Iterable], Iterable]`.  But open to other discussions.

### Open question
Should
[ShardingRoundRobinDispatcherIterDataPipe](https://github.com/pytorch/data/blob/01fc76200354501b057bb439b43a1f05f609dd0a/torchdata/datapipes/iter/util/sharding.py#L16-L17) also be considered as a `_ShardingIterDataPipe`? (e.g. this sharding is executed by replicating (the metadata), while `ShardingRoundRobinDispatcherIterDataPipe` hints too expensive to replicate so requires round robin data exchange/dispatch).

D43014692

Reviewed By: seemethere

fbshipit-source-id: b5f2f6d8b3574d0d98b22ca55134c143786c7c2c
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 7, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged release notes: dataloader release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants