Skip to content

[TensorPipe/RPC] Add TensorPipe dependency#36695

Closed
beauby wants to merge 6 commits intopytorch:masterfrom
beauby:tensorpipe-dependency
Closed

[TensorPipe/RPC] Add TensorPipe dependency#36695
beauby wants to merge 6 commits intopytorch:masterfrom
beauby:tensorpipe-dependency

Conversation

@beauby
Copy link
Copy Markdown
Contributor

@beauby beauby commented Apr 16, 2020

No description provided.

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Apr 16, 2020

💊 Build failures summary and remediations

As of commit d77bb29 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 334 times.

@beauby beauby force-pushed the tensorpipe-dependency branch 5 times, most recently from 60bf52c to 2aea1a7 Compare April 17, 2020 17:55
@beauby beauby force-pushed the tensorpipe-dependency branch 23 times, most recently from 0e78760 to 5e19985 Compare April 21, 2020 03:37
@beauby beauby force-pushed the tensorpipe-dependency branch 3 times, most recently from 69561dc to 6a2ef10 Compare April 24, 2020 11:01
@beauby beauby changed the title [WIP][TensorPipe/RPC] Add TensorPipe dependency [TensorPipe/RPC] Add TensorPipe dependency Apr 24, 2020
Comment thread cmake/Dependencies.cmake Outdated
@beauby beauby force-pushed the tensorpipe-dependency branch from 6a2ef10 to dd3e498 Compare April 26, 2020 19:02
@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Apr 27, 2020

cc @cdluminate

Copy link
Copy Markdown
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to put my foot down and say no, you're not allowed to download cmake as part of the build process for TensorPipe. Here are your options:

  1. Negotiate with the infra team (e.g., @malfet, @orionr, @seemethere, @kostmo and co) to bump the minimum version of cmake for PyTorch globally to 3.7
  2. Require cmake 3.7 when building with TensorPipe (and just not allow building TensorPipe otherwise)
  3. Make TensorPipe work on 3.5 (e.g., if PyTorch has a good reason for supporting 3.5)

@cdluminate
Copy link
Copy Markdown
Contributor

Pulling a specific version of cmake during the building process is weird. I second the 3rd option provided above, as a user.

@mrshenli
Copy link
Copy Markdown
Contributor

@beauby @lw @jiayisuse do we know what's missing in cmake 3.5 that prevents TensorPipe to work with it?

@beauby beauby force-pushed the tensorpipe-dependency branch 4 times, most recently from f53fc0b to b6a0857 Compare April 27, 2020 21:22
@beauby
Copy link
Copy Markdown
Contributor Author

beauby commented Apr 28, 2020

PR (as well as upstream project) have been updated to build tensorpipe with cmake 3.5.

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are mobile build errors relevant? I saw they are built with USE_DISTRIBUTED=0.

Comment thread cmake/Dependencies.cmake Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: (not requesting for change) this will build TensorPipe's own libuv? IIRC, Gloo would find libuv using PkgConfig at least on MacOS. Will these two libuv libs cause any problem?

Copy link
Copy Markdown
Contributor Author

@beauby beauby Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: (not requesting for change) this will build TensorPipe's own libuv?

Yes, this is the case.

IIRC, Gloo would find libuv using PkgConfig at least on MacOS.

This is supported by TensorPipe as well, but requires pkg-config to be available.

Will these two libuv libs cause any problem?

Probably not, but we might as well ensure we use the same one. Two strategies here: either have libuv pre-installed as a requirement for building pytorch, or have pytorch's build explicitly depend on libuv (when USE_DISTRIBUTED=ON) as a submodule.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, however, if libuv is installed on the system and can be found through pkg-config, both Gloo and TensorPipe will use that version, right? Could we say that this is enough for this initial version?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lw At the moment we force building a custom libuv within tensorpipe when building pytorch, but this can be changed.

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! Thanks! The current version is very laconic. I don't have further concerns, but I am not confident if there is any other issues we need to address. @malfet or @ezyang could you please check/stamp this PR?

Comment thread cmake/Dependencies.cmake
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beauby has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@beauby merged this pull request in 8a30553.

@cdluminate
Copy link
Copy Markdown
Contributor

A question from Debian packager: will tensorpipe replace gloo? Or they will be used simultaneously for a while?

@lw
Copy link
Copy Markdown
Contributor

lw commented May 1, 2020

@cdluminate In the RPC module we're aiming to converge on TensorPipe once it reaches feature parity with Gloo. This may not happen in time for v1.6, so that version will depend on both Gloo and TensorPipe. However, even once that happens, Gloo would keep being used for the collectives in torch.distributed, so the whole of PyTorch will end up depending on both Gloo and TensorPipe.

@cdluminate
Copy link
Copy Markdown
Contributor

@cdluminate In the RPC module we're aiming to converge on TensorPipe once it reaches feature parity with Gloo. This may not happen in time for v1.6, so that version will depend on both Gloo and TensorPipe. However, even once that happens, Gloo would keep being used for the collectives in torch.distributed, so the whole of PyTorch will end up depending on both Gloo and TensorPipe.

Got it. Thanks for the info!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants