Skip to content

[1/n] Add graph function to list DataPipes from graph#888

Closed
ejguan wants to merge 2 commits intometa-pytorch:mainfrom
ejguan:list_dps
Closed

[1/n] Add graph function to list DataPipes from graph#888
ejguan wants to merge 2 commits intometa-pytorch:mainfrom
ejguan:list_dps

Conversation

@ejguan
Copy link
Contributor

@ejguan ejguan commented Nov 7, 2022

Add a list_dps function to list DataPipes from the graph.

  • It's similar to get_all_graph_pipes from pytorch core
  • An extra argument of exclude_dps to exclude the DataPipe and its prior graph from the result.

Reason to add this function:

  • It's required to set random states differently for DataPipe before/after sharding_filter
graph = traverse_dps(datapipe)
sf_dps = find_dps(graph, ShardingFilter)

# DataPipes prior to `sharding_filter`
p_dps = []
for sf_dp in sf_dps:
    p_dps.extend(list_dps(traverse_dps(sf_dp)))

# DataPipes after `sharding_filter`
a_dps = list_dps(graph, exclude_dps=sf_dps)

Step 1 for #885

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 7, 2022
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan force-pushed the list_dps branch 2 times, most recently from f65a440 to 5993878 Compare November 8, 2022 18:28
@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

for dp_id, (dp, src_graph) in g.items():
if dp_id not in cache:
cache.add(dp_id)
dps.append(dp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note, this doesn't guarantee any order and returns a list. This should suffice for the current use case.

I wonder if people will want a Dict (basically just traverse_dps without the IDs). To be clear, this seems fine to me for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we treat the graph as a tree, there can be in different orders like how do we traverse a tree. Pre-order; post-order, etc. It highly depends on users' preference especially our graph can be more complicated than a tree. I would say let's wait until we have received any use case.

@facebook-github-bot
Copy link
Contributor

@ejguan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ejguan ejguan added the topic: new feature topic category label Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants