[TensorPipe/RPC] Add TensorPipe dependency#36695
[TensorPipe/RPC] Add TensorPipe dependency#36695beauby wants to merge 6 commits intopytorch:masterfrom
Conversation
💊 Build failures summary and remediationsAs of commit d77bb29 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 334 times. |
60bf52c to
2aea1a7
Compare
0e78760 to
5e19985
Compare
69561dc to
6a2ef10
Compare
6a2ef10 to
dd3e498
Compare
|
cc @cdluminate |
ezyang
left a comment
There was a problem hiding this comment.
I'm going to put my foot down and say no, you're not allowed to download cmake as part of the build process for TensorPipe. Here are your options:
- Negotiate with the infra team (e.g., @malfet, @orionr, @seemethere, @kostmo and co) to bump the minimum version of cmake for PyTorch globally to 3.7
- Require cmake 3.7 when building with TensorPipe (and just not allow building TensorPipe otherwise)
- Make TensorPipe work on 3.5 (e.g., if PyTorch has a good reason for supporting 3.5)
|
Pulling a specific version of cmake during the building process is weird. I second the 3rd option provided above, as a user. |
|
@beauby @lw @jiayisuse do we know what's missing in cmake 3.5 that prevents TensorPipe to work with it? |
f53fc0b to
b6a0857
Compare
|
PR (as well as upstream project) have been updated to build tensorpipe with cmake 3.5. |
mrshenli
left a comment
There was a problem hiding this comment.
Are mobile build errors relevant? I saw they are built with USE_DISTRIBUTED=0.
There was a problem hiding this comment.
Question: (not requesting for change) this will build TensorPipe's own libuv? IIRC, Gloo would find libuv using PkgConfig at least on MacOS. Will these two libuv libs cause any problem?
There was a problem hiding this comment.
Question: (not requesting for change) this will build TensorPipe's own libuv?
Yes, this is the case.
IIRC, Gloo would find libuv using PkgConfig at least on MacOS.
This is supported by TensorPipe as well, but requires pkg-config to be available.
Will these two libuv libs cause any problem?
Probably not, but we might as well ensure we use the same one. Two strategies here: either have libuv pre-installed as a requirement for building pytorch, or have pytorch's build explicitly depend on libuv (when USE_DISTRIBUTED=ON) as a submodule.
There was a problem hiding this comment.
At the moment, however, if libuv is installed on the system and can be found through pkg-config, both Gloo and TensorPipe will use that version, right? Could we say that this is enough for this initial version?
There was a problem hiding this comment.
@lw At the moment we force building a custom libuv within tensorpipe when building pytorch, but this can be changed.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@beauby has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
A question from Debian packager: will tensorpipe replace gloo? Or they will be used simultaneously for a while? |
|
@cdluminate In the RPC module we're aiming to converge on TensorPipe once it reaches feature parity with Gloo. This may not happen in time for v1.6, so that version will depend on both Gloo and TensorPipe. However, even once that happens, Gloo would keep being used for the collectives in torch.distributed, so the whole of PyTorch will end up depending on both Gloo and TensorPipe. |
Got it. Thanks for the info! |
No description provided.