CUDACachingHostAllocatorImpl skip event query during capture#164001
CUDACachingHostAllocatorImpl skip event query during capture#164001jeffdaily wants to merge 2 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/164001
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit de3ff2b with merge base 2f85de0 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert -m "failed internal error with multiple errors found: Not equal to tolerance rtol=0.1, atol=0.1 |
|
❌ 🤖 pytorchbot command failed: |
|
@pytorchbot revert -m "failed internal error with multiple errors found: Not equal to tolerance rtol=0.1, atol=0.1.." -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
…164001)" This reverts commit 4cf2900. Reverted #164001 on behalf of https://github.com/yangw-dev due to failed internal error with multiple errors found: Not equal to tolerance rtol=0.1, atol=0.1.. ([comment](#164001 (comment)))
|
@jeffdaily your PR has been successfully reverted. |
|
@yangw-dev Can I get any more information than that? How am I supposed to fix this? |
it seems like there is test internally run_inference_model_predictions: Not equal to tolerance rtol=0.1, atol=0.1 please reach out pytorch folks who has internal access for more details. |
|
@atalman @yangw-dev was the error transient or definitely root caused to this PR's changes? |
…ytorch#164001)" This reverts commit 4cf2900. Reverted pytorch#164001 on behalf of https://github.com/yangw-dev due to failed internal error with multiple errors found: Not equal to tolerance rtol=0.1, atol=0.1.. ([comment](pytorch#164001 (comment)))
|
ping @atalman @yangw-dev |
|
I suspect this PR will be replaced by #167507. |
|
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
The CUDACachingAllocator already does this, so there is precedent.