Skip to content

Enable TensorPipe's SHM transport#50760

Closed
lw wants to merge 5 commits intogh/lw/110/basefrom
gh/lw/110/head
Closed

Enable TensorPipe's SHM transport#50760
lw wants to merge 5 commits intogh/lw/110/basefrom
gh/lw/110/head

Conversation

@lw
Copy link
Copy Markdown
Contributor

@lw lw commented Jan 19, 2021

Stack from ghstack:

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.

Differential Revision: D23814828

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)

[ghstack-poisoned]
lw added a commit that referenced this pull request Jan 19, 2021
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)

ghstack-source-id: 119985580
Pull Request resolved: #50760
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)

[ghstack-poisoned]
lw added a commit that referenced this pull request Jan 20, 2021
Pull Request resolved: #50760

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.
ghstack-source-id: 120032384

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)

[ghstack-poisoned]
lw added a commit that referenced this pull request Jan 25, 2021
Pull Request resolved: #50760

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.
ghstack-source-id: 120286786

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)
Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add tests for this? E.g., by disabling higher priority channels?

@lw
Copy link
Copy Markdown
Contributor Author

lw commented Jan 25, 2021

That's a good question, and a point I don't yet have a good idea about. It would be nice if we could run the entire PyTorch RPC suite with every transport and/or channel from TensorPipe (it would give us lots of coverage) but I don't think it's feasible to do so in the CI, since each new "configuration" of the TP agent would add ~25-30 minutes to the CI (for each pull request, by every developer).

In an ideal world we wouldn't need to do that, however, if we had a comprehensive and reliable per-backend test suite in TensorPipe's CI. Unfortunately we're not there yet. (And I don't think we'll ever be able to guarantee that it covers all the patterns and scenarios that could happen in PyTorch's RPC suite).

Hence... I don't know. Do you have any suggestions?

@lw
Copy link
Copy Markdown
Contributor Author

lw commented Jan 25, 2021

BTW, I had re-pushed this PR today in order to finally debug the failure I had consistently seen last time, but now it seems to be gone. I didn't do anything to help fix it, so I'm a bit worried that it's just a lucky run and it will come out again... I'll maybe try to rerun some tests to see if that changes anything.

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)

[ghstack-poisoned]
lw added a commit that referenced this pull request Jan 26, 2021
Pull Request resolved: #50760

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.
ghstack-source-id: 120386778

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)
@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 26, 2021

Codecov Report

Merging #50760 (9a68d87) into gh/lw/110/base (1935880) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@                Coverage Diff                 @@
##           gh/lw/110/base   #50760      +/-   ##
==================================================
- Coverage           80.88%   80.88%   -0.01%     
==================================================
  Files                1931     1931              
  Lines              210560   210568       +8     
==================================================
- Hits               170312   170309       -3     
- Misses              40248    40259      +11     

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)

[ghstack-poisoned]
lw added a commit that referenced this pull request Jan 27, 2021
Pull Request resolved: #50760

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.
ghstack-source-id: 120470890

Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)
@mrshenli
Copy link
Copy Markdown
Contributor

Hence... I don't know. Do you have any suggestions?

One temporary solution might be adding tests to SLOW_TESTS for now, it won't block most PRs and only runs on master.

Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you wanna launch a ci-all for this PR?

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in b77f72b.

@facebook-github-bot facebook-github-bot deleted the gh/lw/110/head branch January 31, 2021 15:18
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#50760

The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.

It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.
ghstack-source-id: 120470890

Test Plan: Exported three times to CircleCI with tests consistently passing

Reviewed By: mrshenli

Differential Revision: D23814828

fbshipit-source-id: f355cb6515776debad536924de4f4d3fbb05a874
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants