[core][compiled graphs] Fix execution schedules with collective operations#53007
Merged
stephanie-wang merged 17 commits intoray-project:masterfrom May 20, 2025
Merged
[core][compiled graphs] Fix execution schedules with collective operations#53007stephanie-wang merged 17 commits intoray-project:masterfrom
stephanie-wang merged 17 commits intoray-project:masterfrom
Conversation
added 14 commits
May 12, 2025 23:25
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Author
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
added 2 commits
May 15, 2025 15:53
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
stephanie-wang
approved these changes
May 16, 2025
| ) | ||
| expected_schedule = actor_to_execution_schedule[0] | ||
| for schedule in actor_to_execution_schedule[1:]: | ||
| assert schedule == expected_schedule |
8 tasks
stephanie-wang
pushed a commit
that referenced
this pull request
May 29, 2025
…3111) This PR unifies the scheduling implementation for the NCCL P2P and collective operation nodes. The logic remains the same: (1) P2P case: When a NCCL send node is selected, its downstream NCCL recv nodes are also selected; (2) Collective case: When a NCCL collective node is selected, its corresponding NCCL collective nodes are also selected. Previously, the NCCL P2P case was implemented by selecting the recv nodes if a send node is detected, and the NCCL collective case was implemented by maintaining a set of pending collective nodes. We unify the implementation for both cases. Concretely, they both maintain a set of (pending) synchronous nodes named `sync_idxs` and `pending_sync_idxs`. The synchronous nodes denote the P2P send/recv nodes or the collective nodes. The NCCL P2P/collective operation is ready when `sync_idxs == pending_sync_idxs`. Test cases are updated to reflect the use of synchronous nodes for both NCCL P2P and collective nodes. This PR is a follow-up of #53007. They are parts of #48649 planning to be merged incrementally. --------- Signed-off-by: Weixin Deng <weixin@cs.washington.edu>
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Given an input DAG of SPMD training strategies such as DDP, after DAG compile, the first actor will generate different execution schedules than others. This is due to the current scheduling policy, when there are multiple ready operation nodes such as
actor1.compute(non-NCCL) andactor4.collective(NCCL, for actor1-4, there's only one collective operation node that's eventually ready), the policy does not know actor1 has both the non-NCCLactor1.computeand the NCCLactor4.collective. This leads to actor1 scheduling theactor1.computefirst, and actor1-4 scheduling thecollectivenext.We update the policy to push all the collective operations nodes into candidates when the last of them is ready. In the previous example, actor1 will have both
actor1.computeandactor1.collectiveas candidates. In a DAG of SPMD strategies, all the actors pop either thecomputeor thecollectivetogether.We also update the policy to simply prioritize the NCCL operation node over the non-NCCL. This will lead to NCCL operations to be scheduled as soon as possible. It is safe to do so under the current settings of CUDA streams in the system, because each NCCL read/write/collective stream only allows one outstanding NCCL kernel at a time.
We add a test
test_collective_dag.py::test_exec_schedules_ddpto verify the generated schedules are identical across workers for the DDP stragegy. Other tests are updated to reflect the changes of prioritizing the NCCL operation node over the non-NCCL.Related issue number
This PR is part of #48649 planning to be merged incrementally.
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.