[distributed][rpc] separate out rpc to sync and async apis#26570
[distributed][rpc] separate out rpc to sync and async apis#26570rohan-varma wants to merge 16 commits intogh/rohan-varma/9/basefrom
Conversation
This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
…sync apis" Per #24247, this splits out the async and sync rpc implementations to their own functions. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
…sync apis" Per #24247, this splits out the async and sync rpc implementations to their own functions. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Per #24247, this splits out the async and sync rpc implementations to their own functions. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90533401 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90575024 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
|
||
|
|
||
| @_require_initialized | ||
| def rpc(to, func, args=None, kwargs=None, async_call=False): |
There was a problem hiding this comment.
Hey @rohan-varma, thanks for adding this. IIUC, the original goal was actually removing rpc and splitting it into rpc_sync and rpc_async. The reason for this is that, if rpc does not have a fixed return type, we won't be able to TorchScript it.
There was a problem hiding this comment.
@mrshenli Are we concerned about backwards compatibility here? I'm not completely familiar with the backwards compatibility protocols we follow, but won't completely removing rpc immediately break the code of whoever is using it when they update to the new version?
There was a problem hiding this comment.
Currently test/test_rpc.py and test/test_dist_autograd.py are the only dependency we have I believe, it should be sufficient to modify those accordingly. But I think your solution is more appropriate by marking rpc as deprecated. Let's land this PR and then remove dist.rpc later. :)
|
|
||
|
|
||
| @_require_initialized | ||
| def rpc(to, func, args=None, kwargs=None, async_call=False): |
There was a problem hiding this comment.
Currently test/test_rpc.py and test/test_dist_autograd.py are the only dependency we have I believe, it should be sufficient to modify those accordingly. But I think your solution is more appropriate by marking rpc as deprecated. Let's land this PR and then remove dist.rpc later. :)
| _agent, _to_worker_id(to), serialize(PythonUDF(func, args, kwargs))) | ||
|
|
||
| warnings.warn( | ||
| """dist.rpc is deprecated. Use dist.rpc_async for asynchronous |
There was a problem hiding this comment.
nit: shall we modify test_rpc.py and test_dist_autograd in this PR as well? Otherwise, this warning message will show up in tests?
There was a problem hiding this comment.
Yep, sounds good
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90651940 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
@pytorchbot retest this please |
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90684557 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90689755 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
@pytorchbot retest this please |
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90761640 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
@pytorchbot retest this please |
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90886603 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
@pytorchbot retest this please |
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90976894 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 90997440 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
Per #24247, this splits out the async and sync rpc implementations to their own functions. Previously, the user would call `dist.rpc` and pass in an `async` flag if they wanted to run it asynchronously. This change introduces two new methods: `rpc_sync` and `rpc_async` that return the result or a future containing the result, respectively. The common code is moved to `_invoke_rpc`. This way, we can have stronger type hinting and make the type (async vs sync) of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the function. Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/) [ghstack-poisoned]
Pull Request resolved: #26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 91023006 Differential Revision: [D17509975](https://our.internmc.facebook.com/intern/diff/D17509975/)
|
This pull request has been merged in fb8fd6d. |
| import torch.distributed as dist | ||
| from common_distributed import MultiProcessTestCase | ||
| from common_utils import load_tests, run_tests | ||
| from torch.distributed.rpc import RpcBackend |
There was a problem hiding this comment.
This section (or at least this line) should be put after the if check, otherwise the import will fail. This is currently blocking the Windows builds.
Error:
04:57:13 Traceback (most recent call last):
04:57:13 File "test_rpc.py", line 12, in <module>
04:57:13 from torch.distributed.rpc import RpcBackend
04:57:13 File "C:\Jenkins\workspace\pytorch-builds\pytorch-win-ws2016-cuda9-cudnn7-py3-test2\build\win_tmp\build\torch\distributed\rpc.py", line 3, in <module>
04:57:13 from . import invoke_rpc_builtin, invoke_rpc_python_udf, invoke_remote_builtin
04:57:13 ImportError: cannot import name 'invoke_rpc_builtin'There was a problem hiding this comment.
@peterjc123 thanks for pointing this out, I am putting up a diff now.
There was a problem hiding this comment.
…27126) Summary: These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in #26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this. Pull Request resolved: #27126 Differential Revision: D17682761 Pulled By: rohan-varma fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
|
Windows build issue fixed in #27126. |
Summary: Pull Request resolved: pytorch#26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 91023006 Test Plan: Tests in test/test_rpc.py should all pass. Differential Revision: D17509975 fbshipit-source-id: cc331ec1fdd50c31fa1fae3ba73ddb95d17dad64
…ytorch#27126) Summary: These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in pytorch#26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this. Pull Request resolved: pytorch#27126 Differential Revision: D17682761 Pulled By: rohan-varma fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
Summary: Pull Request resolved: pytorch#26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 91023006 Test Plan: Tests in test/test_rpc.py should all pass. Differential Revision: D17509975 fbshipit-source-id: cc331ec1fdd50c31fa1fae3ba73ddb95d17dad64
…ytorch#27126) Summary: These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in pytorch#26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this. Pull Request resolved: pytorch#27126 Differential Revision: D17682761 Pulled By: rohan-varma fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
Summary: Pull Request resolved: pytorch#26570 This diff splits out the async and sync rpc implementations to their own functions. This way, we can have stronger typehinting and make the type of RPC being done more explicit to our users, as opposed to them having to pass in an async flag to the functiohn. ghstack-source-id: 91023006 Test Plan: Tests in test/test_rpc.py should all pass. Differential Revision: D17509975 fbshipit-source-id: cc331ec1fdd50c31fa1fae3ba73ddb95d17dad64
…ytorch#27126) Summary: These imports need to go after the check to not break windows since rpc is not supported on windows. I ran the linter and that moved it to before the check in pytorch#26570. This wasn't caught by the windows tests in the PR since it was failing due to a JIT-related reason: (https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/49433/console). In the future, we should probably have a warning/a comment/some message about discouraging running linters? Thanks to peterjc123 for flagging this. Pull Request resolved: pytorch#27126 Differential Revision: D17682761 Pulled By: rohan-varma fbshipit-source-id: 300c74290d734eb8e5880104d2b76dd64217b696
Stack from ghstack:
Per #24247, this splits out the async and sync rpc implementations to their
own functions. Previously, the user would call
dist.rpcand pass in anasyncflag if they wanted to run it asynchronously. This change introduces two new methods:rpc_syncandrpc_asyncthat return the result or a future containing the result, respectively. The common code is moved to_invoke_rpc.This way, we can have stronger type hinting and make the type (async vs sync) of
RPC being done more explicit to our users, as opposed to them having to pass
in an async flag to the function.
Differential Revision: D17509975