Add finishAndThrow function to ProcessGroup::Work, and use with Gloo#40405
Add finishAndThrow function to ProcessGroup::Work, and use with Gloo#40405osalpekar wants to merge 3 commits intogh/osalpekar/46/basefrom
Conversation
This adds a finishAndThrow function that completes the work object, sets an exception if one is provided by the user, and throws an exception (if it is already set or passed by the caller). This is now done by grabbing the lock just once and simplifies the wait functions in ProcessGroupGloo. Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/) [ghstack-poisoned]
… with Gloo" This adds a finishAndThrow function that completes the work object, sets an exception if one is provided by the user, and throws an exception (if it is already set or passed by the caller). This is now done by grabbing the lock just once and simplifies the wait functions in ProcessGroupGloo. Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/) [ghstack-poisoned]
Pull Request resolved: #40405 This adds a finishAndThrow function that completes the work object, sets an exception if one is provided by the user, and throws an exception (if it is already set or passed by the caller). This is now done by grabbing the lock just once and simplifies the wait functions in ProcessGroupGloo. ghstack-source-id: 106381735 Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/)
💊 CI failures summary and remediationsAs of commit 4506aad (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 4 times. |
| } | ||
|
|
||
| // Completes the Work object and throws the exception. | ||
| finishAndThrow(exception); |
There was a problem hiding this comment.
curious, why this cannot be moved to the catch block, saving a std::exception_ptr var?
There was a problem hiding this comment.
It probably can be moved inside - just that we still need to mark the work object as completed even if there's no exception. Since this logic sequence (mark complete, set exception, then throw if needed) is pretty common across the ProcessGroup backends, it would be nice to have a helper function that does all this and only grabs the lock once.
| void ProcessGroup::Work::finishAndThrow(std::exception_ptr exception) { | ||
| std::unique_lock<std::mutex> lock(mutex_); | ||
| completed_ = true; | ||
| exception_ = exception; |
There was a problem hiding this comment.
Could nullptr exception overwrite exception_, hence we miss the previous exception?
There was a problem hiding this comment.
Yes, this could happen, but this is possible in the gloo case - if no exception is caught by waitSend/waitRecv, then exception_ is set to nullptr and nothing is thrown, even if an exception was previously set.
We can remove the nullptr from being the default value for the exception_ptr argument, so the user has to explicitly pass in a nullptr if they want to overwrite the last exception.
… with Gloo" This adds a finishAndThrow function that completes the work object, sets an exception if one is provided by the user, and throws an exception (if it is already set or passed by the caller). This is now done by grabbing the lock just once and simplifies the wait functions in ProcessGroupGloo. Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/) [ghstack-poisoned]
Pull Request resolved: #40405 This adds a finishAndThrow function that completes the work object, sets an exception if one is provided by the user, and throws an exception (if it is already set or passed by the caller). This is now done by grabbing the lock just once and simplifies the wait functions in ProcessGroupGloo. ghstack-source-id: 106516114 Differential Revision: [D22174890](https://our.internmc.facebook.com/intern/diff/D22174890/)
|
This pull request has been merged in 0c923ee. |
…ytorch#40405) Summary: Pull Request resolved: pytorch#40405 This adds a finishAndThrow function that completes the work object, sets an exception if one is provided by the user, and throws an exception (if it is already set or passed by the caller). This is now done by grabbing the lock just once and simplifies the wait functions in ProcessGroupGloo. ghstack-source-id: 106516114 Test Plan: CI Differential Revision: D22174890 fbshipit-source-id: ea74702216c4328187c8d193bf39e1fea43847f6
Stack from ghstack:
This adds a finishAndThrow function that completes the work object,
sets an exception if one is provided by the user, and throws an exception (if
it is already set or passed by the caller). This is now done by grabbing the
lock just once and simplifies the wait functions in ProcessGroupGloo.
Differential Revision: D22174890