Skip to content

Fix prefix store timeout bug#53928

Closed
H-Huang wants to merge 2 commits intogh/H-Huang/12/basefrom
gh/H-Huang/12/head
Closed

Fix prefix store timeout bug#53928
H-Huang wants to merge 2 commits intogh/H-Huang/12/basefrom
gh/H-Huang/12/head

Conversation

@H-Huang
Copy link
Copy Markdown
Contributor

@H-Huang H-Huang commented Mar 12, 2021

Stack from ghstack:

HashStoreTest was taking forever to run. Turns out it was because a default timeout is set when creating Store() and setTimeout for prefixStore is not actually able to change the timeout of the underlying store.

After removing the default timeout and updating setTimeout, this will save ~10 minutes for all of the gcc_test CI runs.

Differential Revision: D27025275

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added oncall: distributed Add this issue/PR to distributed oncall triage queue cla signed labels Mar 12, 2021
H-Huang added a commit that referenced this pull request Mar 12, 2021
ghstack-source-id: 77d2dc9
Pull Request resolved: #53928
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Mar 12, 2021

💊 CI failures summary and remediations

As of commit 9c5ccc7 (more details on the Dr. CI page):


  • 1/3 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)
  • 2/3 broken upstream at merge base 90dfdef on Mar 12 from 2:49am to 4:28pm

🚧 2 fixed upstream failures:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is older than viable/strict, run these commands:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

HashStoreTest was taking forever to run. Turns out it was because a default timeout is set when creating Store() and setTimeout for prefixStore is not actually able to change the timeout of the underlying store.

After removing the default timeout and updating setTimeout, this will save ~10 minutes for all of the gcc_test CI runs.

Differential Revision: [D27025275](https://our.internmc.facebook.com/intern/diff/D27025275)

[ghstack-poisoned]
H-Huang added a commit that referenced this pull request Mar 13, 2021
ghstack-source-id: 6b4ac67
Pull Request resolved: #53928
Copy link
Copy Markdown
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for fixing!

@mrshenli
Copy link
Copy Markdown
Contributor

test failures are irrelevant:

Mar 13 02:00:06 .0 = <zip object at 0x7f901f94be88>
Mar 13 02:00:06 
Mar 13 02:00:06 >   [np.testing.assert_allclose(out, ort_out, rtol=rtol, atol=atol) for out, ort_out in zip(outputs, ort_outs)]
Mar 13 02:00:06 E   AssertionError: 
Mar 13 02:00:06 E   Not equal to tolerance rtol=0.001, atol=1e-07
Mar 13 02:00:06 E   
Mar 13 02:00:06 E   Mismatched elements: 6 / 12 (50%)
Mar 13 02:00:06 E   Max absolute difference: 0.89644474
Mar 13 02:00:06 E   Max relative difference: 0.
Mar 13 02:00:06 E    x: array([[0.896445, 0.455628, 0.632306],
Mar 13 02:00:06 E          [0.168859, 0.293888, 0.518522],
Mar 13 02:00:06 E          [0.896445, 0.455628, 0.632306],
Mar 13 02:00:06 E          [0.307423, 0.634079, 0.490093]], dtype=float32)
Mar 13 02:00:06 E    y: array([[0.      , 0.      , 0.      ],
Mar 13 02:00:06 E          [0.168859, 0.293888, 0.518522],
Mar 13 02:00:06 E          [0.      , 0.      , 0.      ],
Mar 13 02:00:06 E          [0.307423, 0.634079, 0.490093]], dtype=float32)
Mar 13 02:00:06 
Mar 13 02:00:06 test/onnx/test_pytorch_onnx_onnxruntime.py:92: AssertionError
Mar 13 02:00:06 =============================== warnings summary ===============================
Mar 13 02:00:06 ../../../../opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py:3
Mar 13 02:00:06   /opt/conda/lib/python3.6/site-packages/torch/utils/cpp_extension.py:3: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
Mar 13 02:00:06     import imp

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@H-Huang merged this pull request in 7f88840.

@facebook-github-bot facebook-github-bot deleted the gh/H-Huang/12/head branch March 19, 2021 14:16
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#53928

HashStoreTest was taking forever to run. Turns out it was because a default timeout is set when creating Store() and setTimeout for prefixStore is not actually able to change the timeout of the underlying store.

After removing the default timeout and updating setTimeout, this will save ~10 minutes for all of the gcc_test CI runs.

Test Plan: Imported from OSS

Reviewed By: mrshenli

Differential Revision: D27025275

Pulled By: H-Huang

fbshipit-source-id: 650c8c1eb8b166da1d412ed88e765747a2ca2069
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants