Skip to content

[2/n] Add setting function to set seeds to the graph#894

Closed
ejguan wants to merge 4 commits intometa-pytorch:mainfrom
ejguan:shuffle_after_sharding
Closed

[2/n] Add setting function to set seeds to the graph#894
ejguan wants to merge 4 commits intometa-pytorch:mainfrom
ejguan:shuffle_after_sharding

Conversation

@ejguan
Copy link
Contributor

@ejguan ejguan commented Nov 14, 2022

Changes

  • Update list_dps with a new argument (exclusive) to make sure all DataPipes will be excluded even though they are the predecessors of the non-excluded DataPipe.
  • Update list_dps to use BFS to return the list of DataPipes.
  • Add _set_worker_seed_for_dp_graph to set seeds for DataPipes before sharding_filter with the same seed generator and set different seeds for DataPipes after sharding_filter with the worker-local seed generator.

Step 2 for #885

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 14, 2022
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

return datapipe


def _set_graph_seeds(datapipe: DataPipe, seed_generator: torch.Generator, worker_id: int = 0) -> DataPipe:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using a private name because I don't like the name of this function. LMK if you have any better idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. _set_worker_seed
  2. _set_worker_seed_for_dp_graph

@ejguan ejguan requested a review from NivekT November 15, 2022 15:17
@ejguan ejguan added the topic: new feature topic category label Nov 15, 2022
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! A few comments:

return datapipe


def _set_graph_seeds(datapipe: DataPipe, seed_generator: torch.Generator, worker_id: int = 0) -> DataPipe:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. _set_worker_seed
  2. _set_worker_seed_for_dp_graph

@ejguan ejguan force-pushed the shuffle_after_sharding branch from db08b88 to fd7808a Compare November 16, 2022 20:05
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The BFS looks good to me. Can we add a test that confirms the result's ordering? (It might already exist.)

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants