Skip to content

Implement InProcessReadingService#1139

Closed
ejguan wants to merge 11 commits intometa-pytorch:mainfrom
ejguan:single_rs
Closed

Implement InProcessReadingService#1139
ejguan wants to merge 11 commits intometa-pytorch:mainfrom
ejguan:single_rs

Conversation

@ejguan
Copy link
Contributor

@ejguan ejguan commented Apr 21, 2023

Fixes #1107
Fixes #720
Fixes #616

Changes

  • Implement InProcessReadingService (Willing to take any suggestion on naming)
    • Control shuffle and sharding (noop)
    • Add support to pause/resume/limit
  • Make InProcessReadingService as the default reading_service to DataLoader2.
    • Then, reading_service always has a value, and remove the logic of reading_service is None.
  • Modify MultiProcessingReadingService
    • When num_workers=0, raise a warning

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 21, 2023
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan requested a review from NivekT April 21, 2023 17:08
worker_init_fn: Optional[Callable[[DataPipe, WorkerInfo], DataPipe]] = None,
worker_reset_fn: Optional[Callable[[DataPipe, WorkerInfo, SeedGenerator], DataPipe]] = None,
):
if num_workers == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious if num_workers == 1, what is the benefit of using this instead of SingleProcessRS? Will it actually be faster (probably slower)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using num_workers=1 would create a single worker process that is dedicated to loading data. And, the main process would work on the model training. The use case might be preventing the data part drain the CPU resource.

worker_init_fn: Optional[Callable[[DataPipe, WorkerInfo], DataPipe]] = None,
worker_reset_fn: Optional[Callable[[DataPipe, WorkerInfo, SeedGenerator], DataPipe]] = None,
) -> None:
assert num_workers > 0, "Please use `InProcessReadingService` for num_workers=0"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am making a BC-breaking change here. MPRS won't accept num_workers=0 anymore.

In previous commits, I have been trying to add a __new__ to return InProcessReadingService when num_workers=0. However, it doesn't work well with pickling due to clone and mp.

@ejguan ejguan changed the title Implement SingleProcessingReadingService Implement InProcessReadingService Apr 24, 2023
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan requested a review from NivekT April 24, 2023 16:28
"""
py_random_num = random.randint(0, 2 ** 32)
np_random_num = np.random.randint(0, 2 ** 32)
np_random_num = np.random.randint(0, 2 ** 32 - 1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this problem wasn't revealed until this PR

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM! Let's also add it to this documentation table here:

https://pytorch.org/data/beta/dataloader2.html#readingservice

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Comment on lines +456 to +486
if hasattr(dp, "resume") and callable(dp.resume):
dp.resume()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to skip QueueWrapper here as well.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan merged this pull request in 8f9d123.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

3 participants