Skip to content

Add InferenceMode TLS to ThreadLocalState.#55822

Closed
ailzhang wants to merge 1 commit intogh/ailzhang/63/basefrom
gh/ailzhang/63/head
Closed

Add InferenceMode TLS to ThreadLocalState.#55822
ailzhang wants to merge 1 commit intogh/ailzhang/63/basefrom
gh/ailzhang/63/head

Conversation

@ailzhang
Copy link
Contributor

@ailzhang ailzhang commented Apr 12, 2021

Stack from ghstack:

Differential Revision: D27721285

@albanD and @ezyang suggested this change in the first PR. It was no longer applicable since I used the original raw_local_dispatch_key_set TLS in that PR. But then I forgot about it when I added a new TLS to reduce instruction counts.Thanks @bhosmer for catching this!!
@linbinyu has helped confirmed that this unblocks the issue described in https://fb.quip.com/DlSrAmTW4Wdf! Thanks!

ailzhang pushed a commit that referenced this pull request Apr 12, 2021
ghstack-source-id: 3e5b3b6
Pull Request resolved: #55822
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 12, 2021

💊 CI failures summary and remediations

As of commit 727a8ca (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Apr 12 23:42:23 RuntimeError: Process 0 terminated or timed out after 100.07652425765991 seconds
Apr 12 23:42:23 ======================================================================
Apr 12 23:42:23 ERROR [100.141s]: test_py_tensors_multi_async_call (__main__.TensorPipeRpcTestWithSpawn)
Apr 12 23:42:23 ----------------------------------------------------------------------
Apr 12 23:42:23 Traceback (most recent call last):
Apr 12 23:42:23   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 322, in wrapper
Apr 12 23:42:23     self._join_processes(fn)
Apr 12 23:42:23   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 515, in _join_processes
Apr 12 23:42:23     self._check_return_codes(elapsed_time)
Apr 12 23:42:23   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 563, in _check_return_codes
Apr 12 23:42:23     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time))
Apr 12 23:42:23 RuntimeError: Process 0 terminated or timed out after 100.07652425765991 seconds
Apr 12 23:42:23 
Apr 12 23:42:23 ----------------------------------------------------------------------
Apr 12 23:42:23 Ran 356 tests in 1293.837s
Apr 12 23:42:23 
Apr 12 23:42:23 FAILED (errors=1, skipped=6)
Apr 12 23:42:23 
Apr 12 23:42:23 Generating XML reports...
Apr 12 23:42:23 Generated XML report: test-reports/python-unittest/distributed.rpc.test_tensorpipe_agent/TEST-TensorPipeDdpComparisonTestWithSpawn-20210412232049.xml
Apr 12 23:42:23 Generated XML report: test-reports/python-unittest/distributed.rpc.test_tensorpipe_agent/TEST-TensorPipeDdpUnderDistAutogradTestWithSpawn-20210412232049.xml
Apr 12 23:42:23 Generated XML report: test-reports/python-unittest/distributed.rpc.test_tensorpipe_agent/TEST-TensorPipeDistAutogradTestWithSpawn-20210412232049.xml

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Copy link

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤘 😁

@facebook-github-bot
Copy link
Contributor

@ailzhang merged this pull request in da01f43.

@facebook-github-bot facebook-github-bot deleted the gh/ailzhang/63/head branch April 16, 2021 14:16
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary: Pull Request resolved: pytorch#55822

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D27721285

Pulled By: ailzhang

fbshipit-source-id: c978927f8cb3a91de45635b8279e166a3d5652ab
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants