Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157694
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled Job, 1 Unrelated FailureAs of commit 25db3ac with merge base f8c0a4b ( CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-jammy-rocm-py3.10 / test (default, 1, 2, linux.rocm.gpu.2) Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -f "skip rocm queued CI" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Motivation
#155451 decoupled
torch._C._storage_Use_Countfrom CUDA and introduced a corresponding unit test:pytorch/test/test_torch.py
Lines 257 to 262 in 815545f
However, this test fails when PyTorch is built with debug assertions enabled. @clee2000 disabled this UT in #156731. The root cause is that
_cdatais obtained from anintrusive_ptr, not aweak_intrusive_ptr. As a result, callingc10::weak_intrusive_ptr::use_counton it triggers the internal assertion:pytorch/c10/util/intrusive_ptr.h
Lines 912 to 917 in 815545f
For example:
This violates the expected invariant inside
weak_intrusive_ptr::use_count, which assumes the pointer was originally constructed from a validweak_intrusive_ptr. Actually,storage_implis obtained from anintrusive_ptr.pytorch/torch/csrc/Module.cpp
Lines 2105 to 2109 in 815545f
Solution
Use
c10::intrusive_ptr::use_countinstead.