[DCP][HF] Add option to parallelize reads in HF Storage Reader#160205
[DCP][HF] Add option to parallelize reads in HF Storage Reader#160205ankitageorge wants to merge 5 commits intogh/ankitageorge/21/basefrom
Conversation
Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/160205
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit ecd2d16 with merge base e1a64b7 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) ghstack-source-id: 300233804 Pull Request resolved: #160205
|
This pull request was exported from Phabricator. Differential Revision: D79478188 |
Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Parallelize reading of data behind thread_count argument to HFStorageReader Pull Request resolved: #160205 ghstack-source-id: 302209653 @exported-using-ghexport Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/)
|
This pull request was exported from Phabricator. Differential Revision: D79478188 |
…ader" Parallelize reading of data behind thread_count argument to HFStorageReader Test plan: ensure existing tests pass and run a job successfully with these changes Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Parallelize reading of data behind thread_count argument to HFStorageReader Pull Request resolved: #160205 ghstack-source-id: 302487332 @exported-using-ghexport Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/)
|
This pull request was exported from Phabricator. Differential Revision: D79478188 |
meetv18
left a comment
There was a problem hiding this comment.
LGTM :)
Any way to modularize this and also implement it for FileSystemReader?
| except queue.Empty: | ||
| pass | ||
|
|
||
| def read_data(self, plan: LoadPlan, planner: LoadPlanner) -> Future[None]: |
There was a problem hiding this comment.
Is there a plan to introduce the multithreading for save path as well?
There was a problem hiding this comment.
it's already there
|
|
||
| target_tensor.copy_(tensor) | ||
| planner.commit_tensor(req, target_tensor) | ||
| if self.thread_count <= 1 or len(per_file) <= 1: |
There was a problem hiding this comment.
We need a test for this method.
…ader" Parallelize reading of data behind thread_count argument to HFStorageReader Test plan: ensure existing tests pass and run a job successfully with these changes Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Parallelize reading of data behind thread_count argument to HFStorageReader Pull Request resolved: #160205 ghstack-source-id: 304719983 @exported-using-ghexport Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/)
|
This pull request was exported from Phabricator. Differential Revision: D79478188 |
…ader" Parallelize reading of data behind thread_count argument to HFStorageReader Test plan: ensure existing tests pass and run a job successfully with these changes Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) cc H-Huang awgu wanchaol fegin fduwjj wz337 wconstab d4l3k pragupta [ghstack-poisoned]
Parallelize reading of data behind thread_count argument to HFStorageReader Pull Request resolved: #160205 ghstack-source-id: 304745088 @exported-using-ghexport Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/)
|
This pull request was exported from Phabricator. Differential Revision: D79478188 |
|
@pytorchmergebot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ch#160205) Parallelize reading of data behind thread_count argument to HFStorageReader Test plan: ensure existing tests pass and run a job successfully with these changes Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) Pull Request resolved: pytorch#160205 Approved by: https://github.com/meetv18
…ch#160205) Parallelize reading of data behind thread_count argument to HFStorageReader Test plan: ensure existing tests pass and run a job successfully with these changes Differential Revision: [D79478188](https://our.internmc.facebook.com/intern/diff/D79478188/) Pull Request resolved: pytorch#160205 Approved by: https://github.com/meetv18
Parallelize reading of data behind thread_count argument to HFStorageReader
Test plan: ensure existing tests pass and run a job successfully with these changes
Stack from ghstack (oldest at bottom):
Differential Revision: D79478188
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta