Use Future's then() API to fix RPC profiling#38352
Use Future's then() API to fix RPC profiling#38352rohan-varma wants to merge 6 commits intogh/rohan-varma/127/basefrom
Conversation
Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/) [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit c4dbe49 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/) [ghstack-poisoned]
Pull Request resolved: #38352 Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix ghstack-source-id: 103957346 Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/)
Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/) [ghstack-poisoned]
Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/) [ghstack-poisoned]
Pull Request resolved: #38352 Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix ghstack-source-id: 103983281 Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/)
Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/) [ghstack-poisoned]
Pull Request resolved: #38352 Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix ghstack-source-id: 104042479 Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/)
mrshenli
left a comment
There was a problem hiding this comment.
Thanks for fixing! I added some minor comments.
| auto& rec = getRecordFunctionFromTensor(handle); | ||
| rec._end(); | ||
| }); | ||
| return fut->constValue(); |
There was a problem hiding this comment.
Let's add a comment to explain why we need to forward the return value here. IIUC, it is because _invoke_rpc actually returns this future?
There was a problem hiding this comment.
Yeah, we get the future which has the actual result from _invoke_rpc. Then, if profiling we run this, and we want to ensure that the correct value of the future from _invoke_rpc is propagated. Will add this to the comment
| _call_end_callbacks_on_fut(tensor, fut); | ||
| auto profiledFut = _call_end_callbacks_on_fut(tensor, fut); | ||
| // return future that completes when profiling callbacks have run. | ||
| jit::push(stack, profiledFut); |
There was a problem hiding this comment.
IIUC, push takes rvalue reference. If so, shall we use std::move() on the profilerFut?
| .def( | ||
| "_get_profiling_future", | ||
| [](const PyRRef& self) { | ||
| return std::make_shared<jit::PythonFutureWrapper>( | ||
| self.getProfilingFuture()); | ||
| }, | ||
| py::call_guard<py::gil_scoped_release>(), | ||
| R"( | ||
| Returns future that completes when the profiling event corresponding | ||
| to the creation of this RRef on the remote node has been recorded. | ||
| )") | ||
| .def( | ||
| "_set_profiling_future", | ||
| [](PyRRef& self, | ||
| const std::shared_ptr<jit::PythonFutureWrapper>& | ||
| wrappedFuture) { | ||
| self.setProfilingFuture(wrappedFuture->fut); | ||
| }, | ||
| py::call_guard<py::gil_scoped_release>(), |
There was a problem hiding this comment.
These two seems are very light functions, should I hold GIL instead to avoid ctx switch?
mrshenli
left a comment
There was a problem hiding this comment.
Lint failure is real.
{
path: 'torch/testing/_internal/distributed/rpc/jit/rpc_test.py',
start_line: 2,
end_line: 2,
start_column: 1,
end_column: 1,
annotation_level: 'failure',
message: "[F401] 'unittest' imported but unused"
}
Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/) [ghstack-poisoned]
Pull Request resolved: #38352 Fixes the RPC profiling by using the `then()` API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix ghstack-source-id: 104086114 Differential Revision: [D21506940](https://our.internmc.facebook.com/intern/diff/D21506940/)
|
This pull request has been merged in 4d4895a. |
Summary: Pull Request resolved: pytorch#38352 Fixes the RPC profiling by using the `then()` API added in pytorch#37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix ghstack-source-id: 104086114 Test Plan: Re-enabled profiling tests Differential Revision: D21506940 fbshipit-source-id: 35cde22f0551c825c9bc98ddc24cca412878a63a
Stack from ghstack:
Fixes the RPC profiling by using the
then()API added in #37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value)To make this work for RRef, we add a
_set_profiling_futureto set the profiling future, and_get_profiling_futureto retrieve this future and wait on it in the tests.Re-enabled profiling tests and stress tested them 1000 times to verify the fix
Differential Revision: D21506940