Skip to content

Add finishAndThrow function to ProcessGroup::Work, and use with Gloo#40405

Closed
osalpekar wants to merge 3 commits intogh/osalpekar/46/basefrom
gh/osalpekar/46/head
Closed

Add finishAndThrow function to ProcessGroup::Work, and use with Gloo#40405
osalpekar wants to merge 3 commits intogh/osalpekar/46/basefrom
gh/osalpekar/46/head

Conversation

@osalpekar
Copy link
Copy Markdown
Contributor

@osalpekar osalpekar commented Jun 22, 2020

Stack from ghstack:

This adds a finishAndThrow function that completes the work object,
sets an exception if one is provided by the user, and throws an exception (if
it is already set or passed by the caller). This is now done by grabbing the
lock just once and simplifies the wait functions in ProcessGroupGloo.

Differential Revision: D22174890

This adds a finishAndThrow function that completes the work object,
sets an exception if one is provided by the user, and throws an exception (if
it is already set or passed by the caller). This is now done by grabbing the
lock just once and simplifies the wait functions in ProcessGroupGloo.

Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/)

[ghstack-poisoned]
… with Gloo"

This adds a finishAndThrow function that completes the work object,
sets an exception if one is provided by the user, and throws an exception (if
it is already set or passed by the caller). This is now done by grabbing the
lock just once and simplifies the wait functions in ProcessGroupGloo.

Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/)

[ghstack-poisoned]
osalpekar added a commit that referenced this pull request Jun 22, 2020
Pull Request resolved: #40405

This adds a finishAndThrow function that completes the work object,
sets an exception if one is provided by the user, and throws an exception (if
it is already set or passed by the caller). This is now done by grabbing the
lock just once and simplifies the wait functions in ProcessGroupGloo.
ghstack-source-id: 106381735

Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/)
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Jun 22, 2020

💊 CI failures summary and remediations

As of commit 4506aad (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 4 times.

}

// Completes the Work object and throws the exception.
finishAndThrow(exception);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, why this cannot be moved to the catch block, saving a std::exception_ptr var?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably can be moved inside - just that we still need to mark the work object as completed even if there's no exception. Since this logic sequence (mark complete, set exception, then throw if needed) is pretty common across the ProcessGroup backends, it would be nice to have a helper function that does all this and only grabs the lock once.

void ProcessGroup::Work::finishAndThrow(std::exception_ptr exception) {
std::unique_lock<std::mutex> lock(mutex_);
completed_ = true;
exception_ = exception;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could nullptr exception overwrite exception_, hence we miss the previous exception?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this could happen, but this is possible in the gloo case - if no exception is caught by waitSend/waitRecv, then exception_ is set to nullptr and nothing is thrown, even if an exception was previously set.

We can remove the nullptr from being the default value for the exception_ptr argument, so the user has to explicitly pass in a nullptr if they want to overwrite the last exception.

… with Gloo"

This adds a finishAndThrow function that completes the work object,
sets an exception if one is provided by the user, and throws an exception (if
it is already set or passed by the caller). This is now done by grabbing the
lock just once and simplifies the wait functions in ProcessGroupGloo.

Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/)

[ghstack-poisoned]
osalpekar added a commit that referenced this pull request Jun 24, 2020
Pull Request resolved: #40405

This adds a finishAndThrow function that completes the work object,
sets an exception if one is provided by the user, and throws an exception (if
it is already set or passed by the caller). This is now done by grabbing the
lock just once and simplifies the wait functions in ProcessGroupGloo.
ghstack-source-id: 106516114

Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/)
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in 0c923ee.

@facebook-github-bot facebook-github-bot deleted the gh/osalpekar/46/head branch June 28, 2020 14:17
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…ytorch#40405)

Summary:
Pull Request resolved: pytorch#40405

This adds a finishAndThrow function that completes the work object,
sets an exception if one is provided by the user, and throws an exception (if
it is already set or passed by the caller). This is now done by grabbing the
lock just once and simplifies the wait functions in ProcessGroupGloo.
ghstack-source-id: 106516114

Test Plan: CI

Differential Revision: D22174890

fbshipit-source-id: ea74702216c4328187c8d193bf39e1fea43847f6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants