Skip to content

[Collective][PR 3/6] Other collectives#12864

Merged
richardliaw merged 7 commits intoray-project:masterfrom
zhisbug:ray-collective-pr3
Dec 21, 2020
Merged

[Collective][PR 3/6] Other collectives#12864
richardliaw merged 7 commits intoray-project:masterfrom
zhisbug:ray-collective-pr3

Conversation

@zhisbug
Copy link
Copy Markdown
Contributor

@zhisbug zhisbug commented Dec 15, 2020

Why are these changes needed?

This is the third PR for the project Collective-in-Ray.

To make each PR more manageable and friendly to reviewers, we break the entire project code into 6 incremental PRs:
See a list below:

  1. (merged) Basic infrastructure; an in-actor collective interface ray.util.collective.init_collective_group(*args, **kwargs); support for two collectives allreduce and barrier; some testing infrastructure, etc.
  2. Driver-program interface, which includes: (1) the second interface: actor.options(collective_options, ...).remote() and the third interface declare_collective_group(actors, collective_options, ...).
  3. (this one) Support for other collectives: allgather, broadcast, reduce, reducescatter; refactor the tests into distributed tests and single-node cluster tests.
  4. Communicator caching, and support for num_gpus > 2 per actor/task.
  5. CUDA stream management.
  6. docs, examples, etc.

This is the third one (3/6).

Related issue number

RFC #12174

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@zhisbug
Copy link
Copy Markdown
Contributor Author

zhisbug commented Dec 15, 2020

@richardliaw

Copy link
Copy Markdown
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Roughly looks good (skipped most of the test code!)

Ping when comments addressed.

@zhisbug
Copy link
Copy Markdown
Contributor Author

zhisbug commented Dec 21, 2020

@richardliaw all comments addressed. Test passed on my local cluster. Fine to merge!

Copy link
Copy Markdown
Contributor

@richardliaw richardliaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! will merge after master shows lint passes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants