Conversation
|
Tagging subscribers to this area: @agocke |
There was a problem hiding this comment.
Pull request overview
This PR addresses a GC-handle-related crash (#117138) by ensuring GC handle destruction sites run in cooperative GC mode, and by tightening the contract on common handle-destruction helpers.
Changes:
- Add cooperative-mode transitions (
GCX_COOP) before various GC handle destruction calls across VM, interop, debugger, and binder code paths. - Update
DestroyHandleCommon’s contract to require cooperative mode (MODE_COOPERATIVE) rather than allowing any mode. - Adjust a few cleanup loops/destructors to perform handle destruction under a cooperative-mode scope.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/threads.cpp | Switch to coop before destroying thread-associated handles. |
| src/coreclr/vm/syncblk.cpp | Switch to coop before destroying SyncBlock lock handle. |
| src/coreclr/vm/jitinterface.h | Switch to coop before freeing per-JIT handles in CEEInfo dtor. |
| src/coreclr/vm/interoplibinterface_comwrappers.cpp | Switch to coop before destroying refcounted handles exposed to interop. |
| src/coreclr/vm/gchandleutilities.h | Tighten destruction helper contract to require cooperative mode. |
| src/coreclr/vm/exinfo.h | Switch to coop before destroying exception throwable handle. |
| src/coreclr/vm/exinfo.cpp | Switch to coop before destroying exception throwable handle in ReleaseResources. |
| src/coreclr/vm/encee.cpp | Switch to coop during EnC sync block cleanup when destroying dependent handles. |
| src/coreclr/vm/eedbginterfaceimpl.cpp | Switch to coop before destroying debug-created handles. |
| src/coreclr/vm/dynamicmethod.cpp | Switch to coop before destroying long weak handle for resolver. |
| src/coreclr/vm/dllimportcallback.cpp | Switch to coop before destroying long weak handle in thunk termination. |
| src/coreclr/vm/comconnectionpoints.h | Switch to coop before destroying connection cookie handle. |
| src/coreclr/vm/comcallablewrapper.cpp | Switch to coop before destroying refcounted CCW handle. |
| src/coreclr/vm/clrex.cpp | Switch to coop before destroying throwable handle in exception destructor. |
| src/coreclr/vm/appdomain.cpp | Switch to coop before destroying pinned heap handle bucket strong handle. |
| src/coreclr/debug/ee/debugger.h | Switch to coop before destroying long weak handles on SHash entry removal. |
| src/coreclr/debug/ee/debugger.cpp | Switch to coop before destroying debugger-disposed handles and force-catch table handles. |
| src/coreclr/binder/customassemblybinder.cpp | Switch to coop before destroying ALC weak/strong handles on release. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/coreclr/vm/jitinterface.h:399
CEEInfo::~CEEInfo()is still annotated withLIMITED_METHOD_CONTRACT, but it now explicitly switches GC mode (GCX_COOP) and destroys GC handles.LIMITED_METHOD_CONTRACTis intended for trivial leaf methods, and this destructor is doing non-trivial runtime/GC work; the contract should be updated to aCONTRACTL(or equivalent) that accurately declares the GC mode behavior and lock-taking semantics of the handle destruction path.
virtual ~CEEInfo()
{
LIMITED_METHOD_CONTRACT;
#if !defined(DACCESS_COMPILE)
// Free all handles used by JIT
if (m_pJitHandles != nullptr)
{
GCX_COOP();
OBJECTHANDLE* elements = m_pJitHandles->GetElements();
unsigned count = m_pJitHandles->GetCount();
for (unsigned i = 0; i < count; i++)
{
| // This can be called from threads without a managed Thread (e.g. via | ||
| // COM release on a native thread). Only switch to cooperative mode | ||
| // when a managed thread exists; otherwise the MODE_COOPERATIVE | ||
| // contract check in DestroyHandleCommon is a no-op. | ||
| if (GetThreadNULLOk() != nullptr) | ||
| { | ||
| GCX_COOP(); | ||
| DestroyRefcountedHandle(objectHandle); | ||
| } | ||
| else | ||
| { | ||
| DestroyRefcountedHandle(objectHandle); | ||
| } |
There was a problem hiding this comment.
DestroyHandle is called in both branches here; the only difference is whether a GCX_COOP scope is entered. This can be simplified to avoid duplicated code (e.g., conditionally enter the coop scope when a managed thread exists, then call DestroyRefcountedHandle once).
| ::OBJECTHANDLE objectHandle = static_cast<::OBJECTHANDLE>(handle); | ||
|
|
||
| DestroyRefcountedHandle(objectHandle); | ||
| // This can be called from threads without a managed Thread (e.g. via |
There was a problem hiding this comment.
This will make the contract infrastructure happy, but it is not fixing the race condition on this path.
There was a problem hiding this comment.
We may be able to create a slow DestroyHandle variant that is safe to call in pre-emptive mode, including threads that the runtime does not know about.
src/coreclr/vm/gchandleutilities.h
Outdated
| } | ||
| CONTRACTL_END; | ||
|
|
||
| // Null the handle value first so that GC will not follow a stale |
There was a problem hiding this comment.
This won't solve the problem. DestroyHandleOfType nulls the handle already (first line in TableFreeSingleHandleToCache ).
We need to make sure that the GC handles are not getting scanned by the GC when the handle is released in preemptive mode. We can do that by taking the "handle manager lock" - take the slow path in GC handle allocator by skipping all lock-free caches.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/coreclr/debug/ee/debugger.cpp:11003
- GCX_COOP_EEINTERFACE_IFTHREAD() is a no-op when running on the native debugger helper thread (no managed Thread), but the code still calls DestroyHandle helpers that ultimately use the normal (unlocked) handle free path. If DisposeHandle can run concurrently with GC, this still risks the original race/crash. Consider using the new locked/unsafe handle destruction path when g_pEEInterface->GetThread() == NULL (mapping CorDebugHandleType to the corresponding HNDTYPE_), similar to other call sites that branch on GetThreadNULLOk().
// Switch to cooperative mode if a managed thread exists.
// On the native debugger helper thread (no managed Thread),
// the IFTHREAD variant is a no-op and the MODE_COOPERATIVE
// contract in DestroyHandleCommon is also a no-op.
GCX_COOP_EEINTERFACE_IFTHREAD();
switch (handleType)
{
case HANDLE_STRONG:
DestroyStrongHandle(objectHandle);
break;
case HANDLE_WEAK_TRACK_RESURRECTION:
DestroyLongWeakHandle(objectHandle);
break;
case HANDLE_PINNED:
DestroyPinningHandle(objectHandle);
break;
| CrstHolder ch(&pTable->Lock); | ||
|
|
||
| // return the handle to the table's cache (under lock) | ||
| TableFreeSingleHandleToCache(pTable, uType, handle); |
There was a problem hiding this comment.
HndDestroyHandleLocked takes pTable->Lock and then calls TableFreeSingleHandleToCache, which can call TableCacheMissOnFree and attempt to take pTable->Lock again. That re-entrant lock acquisition can deadlock. Consider implementing a lock-aware free path (e.g., a TableFreeSingleHandleToCacheLocked variant that, on cache miss, calls TableQuickRebalanceCache directly under the already-held lock) instead of calling the existing helper that may lock internally.
| TableFreeSingleHandleToCache(pTable, uType, handle); | |
| TableFreeSingleHandleToCacheLocked(pTable, uType, handle); |
| CAN_TAKE_LOCK; | ||
| } | ||
| CONTRACTL_END; | ||
|
|
There was a problem hiding this comment.
DestroyHandleUnsafe bypasses DestroyHandleCommon and currently does not call DiagHandleDestroyed. That means GC profiling callbacks (HandleDestroyed) won’t fire for handle destroys that go through the unsafe path. If the unsafe path is intended to be functionally equivalent (just safer wrt GC scanning), it should likely invoke DiagHandleDestroyed as well (possibly guarded if profiler callbacks require an attached runtime thread).
| DiagHandleDestroyed(handle); |
| STRESS_LOG2(LF_GC, LL_INFO1000, "DestroyHandleLocked: *%p->%p\n", handle, *(_UNCHECKED_OBJECTREF *)handle); | ||
|
|
There was a problem hiding this comment.
HndDestroyHandleLocked dereferences the handle value in the STRESS_LOG call before acquiring pTable->Lock. This defeats the purpose of the locked variant (safe w/ concurrent GC scanning) and can still race/crash when called from preemptive mode / unknown threads. Consider moving any handle-value dereference (or removing it from logging) until after the table lock is held.
|
|
||
| DestroyShortWeakHandle(m_ExposedObject); | ||
| DestroyStrongHandle(m_StrongHndToExposedObject); | ||
| // The thread is already detached from TLS (GetThread() returns NULL), |
There was a problem hiding this comment.
We can move this to Thread::CooperativeCleanup.
| @@ -2297,8 +2297,11 @@ Thread::~Thread() | |||
| // Destroy any handles that we're using to hold onto exception objects | |||
| SafeSetThrowables(NULL); | |||
There was a problem hiding this comment.
SetLastThrownObject can call DestroyHandle too
| // and from threads that the runtime does not know about. | ||
| // It takes the handle table lock to prevent races with GC scanning, | ||
| // bypassing the lock-free cache path. | ||
| inline void DestroyHandleUnsafe(OBJECTHANDLE handle, HandleType type) |
There was a problem hiding this comment.
This can use more descriptive name, like DestroyHandleInPreeemptiveMode
Fixes #117138 crash.
Addresses the idea in this comment: #117138 (comment)
I've placed the GCX_COOP everywhere where the contract is either PREEMPTIVE or ANY (but also checked the callers of the callers)
It was not clear to me what is the right contract for destructors, though.