Skip to content

Add test case for init_rpc_backend#26997

Closed
xush6528 wants to merge 14 commits intogh/xush6528/2/basefrom
gh/xush6528/2/head
Closed

Add test case for init_rpc_backend#26997
xush6528 wants to merge 14 commits intogh/xush6528/2/basefrom
gh/xush6528/2/head

Conversation

@xush6528
Copy link
Copy Markdown
Contributor

@xush6528 xush6528 commented Sep 27, 2019

Stack from ghstack:

Differential Revision: D17637468

Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
@pytorchbot pytorchbot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Sep 27, 2019
@xush6528 xush6528 requested a review from satgera September 27, 2019 21:56
Comment thread torch/distributed/rpc.py
elif is_rpc_backend_registered(backend):
_agent = init_rpc_backend(
backend,
self_rank=self_rank,
Copy link
Copy Markdown
Contributor

@satgera satgera Sep 27, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case in test_rpc.py where you register a backend and call init ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@satgera working on it. It requires me to patch init_rref_context

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would I be correct if I assume this is not needed for now as init_rpc_backend takes *args and **args, and the current registered backend we have does not really use self_rank? So this is only to make sure we do pass all args from _init_rpc to init_rpc_backend?

Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
xush6528 added a commit that referenced this pull request Sep 27, 2019
Pull Request resolved: #26997

Reverting accidental change in #26919
ghstack-source-id: 90955970

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)
@xush6528
Copy link
Copy Markdown
Contributor Author

@mrshenli open-source tests are still good.

I broke fb internal init_rpc_backend_handler.
I will make the fb-internal init_rpc_backend_handler does not take self_rank in a following PR.

Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
xush6528 added a commit that referenced this pull request Sep 29, 2019
Pull Request resolved: #26997

Reverting accidental change in #26919
ghstack-source-id: 90998862

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)
Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
@pytorchbot pytorchbot added the module: tests Issues related to tests (not the torch.testing module) label Sep 29, 2019
@xush6528
Copy link
Copy Markdown
Contributor Author

disabled rpc test in py2.

Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
xush6528 added a commit that referenced this pull request Sep 29, 2019
Pull Request resolved: #26997

Reverting accidental change in #26919
ghstack-source-id: 91003067

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)
Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
xush6528 added a commit that referenced this pull request Sep 30, 2019
Pull Request resolved: #26997

Reverting accidental change in #26919
ghstack-source-id: 91033024

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)
@xush6528
Copy link
Copy Markdown
Contributor Author

Checked. failed tests are test_sparse_allreduce_checks, test_cpp_cuda. not related to this PR.

Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
xush6528 added a commit that referenced this pull request Oct 1, 2019
Pull Request resolved: #26997

Reverting accidental change in #26919
ghstack-source-id: 91071519

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)
Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
xush6528 added a commit that referenced this pull request Oct 1, 2019
Pull Request resolved: #26997

Reverting accidental change in #26919
ghstack-source-id: 91086987

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)
Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
Comment thread test/rpc_test.py

from torch.distributed.rpc import RpcBackend
from common_utils import load_tests
from torch.distributed.rpc_api import RpcBackend
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this module before, is it new? If so, why was it added?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see it is added here. Why the rename?

Copy link
Copy Markdown
Contributor Author

@xush6528 xush6528 Oct 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"rpc" module was shadowed by "rpc" method imported to "torch.distributed".
This caused the patch replacing the wrong target.
Fixed it by renaming the module to "rpc_api".

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks. I believe @rohan-varma is working on a PR to rename those to rpc_sync and rpc_async, right? Then this will no longer be necessary.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pietern @xush6528 , that change is landed in #26570. But it doesn't remove the rpc function, it instead adds a warning to callers who use it and encourage them to call rpc_sync or rpc_async. We eventually want to entirely remove the rpc function as you mentioned, but wasn't sure if we could do this yet due to backwards compatibility issues, so after a discussion with @mrshenli (comments on the PR) we decided to go with this approach.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohan-varma When we do, can you make sure to revert the rename? If it's an internal only module, it shouldn't be named _api, but rather use an underscore prefix, like all other private API like things in PyTorch.

Copy link
Copy Markdown
Contributor

@rohan-varma rohan-varma Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pietern, the rpc_api.py module is part of our external-facing API though, right? It contains things like get_worker_id, remote, rpc_sync, etc. Should we define __all__ to make it clear what our external facing APIs are?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel ideally we want to put all external api names in __init__.py and has a module implementing these api names.

It's a mess at the moment.
In __init__.py defines def init_model_parallel
In rpc_api.py, other API is defined (as Rohan listed).

Copy link
Copy Markdown
Contributor

@rohan-varma rohan-varma Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is being further tracked in #27229

Reverting accidental change in #26919

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)

[ghstack-poisoned]
xush6528 added a commit that referenced this pull request Oct 1, 2019
Pull Request resolved: #26997

Reverting accidental change in #26919
ghstack-source-id: 91126906

Differential Revision: [D17637468](https://our.internmc.facebook.com/intern/diff/D17637468/)
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in 00e5882.

@xush6528 xush6528 changed the title Fix init_rpc_backend callsite Add test case for init_rpc_backend Oct 2, 2019
@facebook-github-bot facebook-github-bot deleted the gh/xush6528/2/head branch October 28, 2019 22:22
pdlive215 pushed a commit to pdlive215/pytorch that referenced this pull request Nov 27, 2019
Summary:
Pull Request resolved: pytorch#26997

Reverting accidental change in pytorch#26919
ghstack-source-id: 91126906

Reviewed By: zhaojuanmao

Differential Revision: D17637468

fbshipit-source-id: 9ffcf4b15b37effe6b5d5f82338ff89298c82a52
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#26997

Reverting accidental change in pytorch#26919
ghstack-source-id: 91126906

Reviewed By: zhaojuanmao

Differential Revision: D17637468

fbshipit-source-id: 9ffcf4b15b37effe6b5d5f82338ff89298c82a52
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#26997

Reverting accidental change in pytorch#26919
ghstack-source-id: 91126906

Reviewed By: zhaojuanmao

Differential Revision: D17637468

fbshipit-source-id: 9ffcf4b15b37effe6b5d5f82338ff89298c82a52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: tests Issues related to tests (not the torch.testing module) oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants