Enable TensorPipe's SHM transport#50760
Conversation
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/) [ghstack-poisoned]
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/) ghstack-source-id: 119985580 Pull Request resolved: #50760
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/) [ghstack-poisoned]
Pull Request resolved: #50760 The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. ghstack-source-id: 120032384 Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/) [ghstack-poisoned]
Pull Request resolved: #50760 The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. ghstack-source-id: 120286786 Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)
mrshenli
left a comment
There was a problem hiding this comment.
Do we need to add tests for this? E.g., by disabling higher priority channels?
|
That's a good question, and a point I don't yet have a good idea about. It would be nice if we could run the entire PyTorch RPC suite with every transport and/or channel from TensorPipe (it would give us lots of coverage) but I don't think it's feasible to do so in the CI, since each new "configuration" of the TP agent would add ~25-30 minutes to the CI (for each pull request, by every developer). In an ideal world we wouldn't need to do that, however, if we had a comprehensive and reliable per-backend test suite in TensorPipe's CI. Unfortunately we're not there yet. (And I don't think we'll ever be able to guarantee that it covers all the patterns and scenarios that could happen in PyTorch's RPC suite). Hence... I don't know. Do you have any suggestions? |
|
BTW, I had re-pushed this PR today in order to finally debug the failure I had consistently seen last time, but now it seems to be gone. I didn't do anything to help fix it, so I'm a bit worried that it's just a lucky run and it will come out again... I'll maybe try to rerun some tests to see if that changes anything. |
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/) [ghstack-poisoned]
Pull Request resolved: #50760 The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. ghstack-source-id: 120386778 Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)
Codecov Report
@@ Coverage Diff @@
## gh/lw/110/base #50760 +/- ##
==================================================
- Coverage 80.88% 80.88% -0.01%
==================================================
Files 1931 1931
Lines 210560 210568 +8
==================================================
- Hits 170312 170309 -3
- Misses 40248 40259 +11 |
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/) [ghstack-poisoned]
Pull Request resolved: #50760 The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. ghstack-source-id: 120470890 Differential Revision: [D23814828](https://our.internmc.facebook.com/intern/diff/D23814828/)
One temporary solution might be adding tests to SLOW_TESTS for now, it won't block most PRs and only runs on master. |
mrshenli
left a comment
There was a problem hiding this comment.
Do you wanna launch a ci-all for this PR?
|
This pull request has been merged in b77f72b. |
Summary: Pull Request resolved: pytorch#50760 The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine. It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails. ghstack-source-id: 120470890 Test Plan: Exported three times to CircleCI with tests consistently passing Reviewed By: mrshenli Differential Revision: D23814828 fbshipit-source-id: f355cb6515776debad536924de4f4d3fbb05a874
Stack from ghstack:
The SHM transport uses shared-memory-backed ringbuffers to transfer small payloads between processes on the same machine.
It was disabled in v1.6 due to a CMake mishap but we've since realized that it also doesn't work that well in docker and other setups. Enabling it here to see whether CircleCI fails.
Differential Revision: D23814828