Implement InProcessReadingService#1139
Conversation
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
| worker_init_fn: Optional[Callable[[DataPipe, WorkerInfo], DataPipe]] = None, | ||
| worker_reset_fn: Optional[Callable[[DataPipe, WorkerInfo, SeedGenerator], DataPipe]] = None, | ||
| ): | ||
| if num_workers == 0: |
There was a problem hiding this comment.
I'm curious if num_workers == 1, what is the benefit of using this instead of SingleProcessRS? Will it actually be faster (probably slower)?
There was a problem hiding this comment.
Using num_workers=1 would create a single worker process that is dedicated to loading data. And, the main process would work on the model training. The use case might be preventing the data part drain the CPU resource.
| worker_init_fn: Optional[Callable[[DataPipe, WorkerInfo], DataPipe]] = None, | ||
| worker_reset_fn: Optional[Callable[[DataPipe, WorkerInfo, SeedGenerator], DataPipe]] = None, | ||
| ) -> None: | ||
| assert num_workers > 0, "Please use `InProcessReadingService` for num_workers=0" |
There was a problem hiding this comment.
I am making a BC-breaking change here. MPRS won't accept num_workers=0 anymore.
In previous commits, I have been trying to add a __new__ to return InProcessReadingService when num_workers=0. However, it doesn't work well with pickling due to clone and mp.
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
test/dataloader2/test_random.py
Outdated
| """ | ||
| py_random_num = random.randint(0, 2 ** 32) | ||
| np_random_num = np.random.randint(0, 2 ** 32) | ||
| np_random_num = np.random.randint(0, 2 ** 32 - 1) |
There was a problem hiding this comment.
Somehow this problem wasn't revealed until this PR
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
NivekT
left a comment
There was a problem hiding this comment.
Thanks! LGTM! Let's also add it to this documentation table here:
https://pytorch.org/data/beta/dataloader2.html#readingservice
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
| if hasattr(dp, "resume") and callable(dp.resume): | ||
| dp.resume() |
There was a problem hiding this comment.
I need to skip QueueWrapper here as well.
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Fixes #1107
Fixes #720
Fixes #616
Changes
InProcessReadingService(Willing to take any suggestion on naming)MakeInProcessReadingServiceas the defaultreading_servicetoDataLoader2.Then,reading_servicealways has a value, and remove the logic ofreading_serviceis None.MultiProcessingReadingServicenum_workers=0, raise a warning