Skip to content

Delete GC handles in COOP mode#125350

Draft
EgorBo wants to merge 6 commits intodotnet:mainfrom
EgorBo:delete-handles-coop
Draft

Delete GC handles in COOP mode#125350
EgorBo wants to merge 6 commits intodotnet:mainfrom
EgorBo:delete-handles-coop

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Mar 9, 2026

Fixes #117138 crash.

Addresses the idea in this comment: #117138 (comment)

I've placed the GCX_COOP everywhere where the contract is either PREEMPTIVE or ANY (but also checked the callers of the callers)
It was not clear to me what is the right contract for destructors, though.

Copilot AI review requested due to automatic review settings March 9, 2026 21:46
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @agocke
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a GC-handle-related crash (#117138) by ensuring GC handle destruction sites run in cooperative GC mode, and by tightening the contract on common handle-destruction helpers.

Changes:

  • Add cooperative-mode transitions (GCX_COOP) before various GC handle destruction calls across VM, interop, debugger, and binder code paths.
  • Update DestroyHandleCommon’s contract to require cooperative mode (MODE_COOPERATIVE) rather than allowing any mode.
  • Adjust a few cleanup loops/destructors to perform handle destruction under a cooperative-mode scope.

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/coreclr/vm/threads.cpp Switch to coop before destroying thread-associated handles.
src/coreclr/vm/syncblk.cpp Switch to coop before destroying SyncBlock lock handle.
src/coreclr/vm/jitinterface.h Switch to coop before freeing per-JIT handles in CEEInfo dtor.
src/coreclr/vm/interoplibinterface_comwrappers.cpp Switch to coop before destroying refcounted handles exposed to interop.
src/coreclr/vm/gchandleutilities.h Tighten destruction helper contract to require cooperative mode.
src/coreclr/vm/exinfo.h Switch to coop before destroying exception throwable handle.
src/coreclr/vm/exinfo.cpp Switch to coop before destroying exception throwable handle in ReleaseResources.
src/coreclr/vm/encee.cpp Switch to coop during EnC sync block cleanup when destroying dependent handles.
src/coreclr/vm/eedbginterfaceimpl.cpp Switch to coop before destroying debug-created handles.
src/coreclr/vm/dynamicmethod.cpp Switch to coop before destroying long weak handle for resolver.
src/coreclr/vm/dllimportcallback.cpp Switch to coop before destroying long weak handle in thunk termination.
src/coreclr/vm/comconnectionpoints.h Switch to coop before destroying connection cookie handle.
src/coreclr/vm/comcallablewrapper.cpp Switch to coop before destroying refcounted CCW handle.
src/coreclr/vm/clrex.cpp Switch to coop before destroying throwable handle in exception destructor.
src/coreclr/vm/appdomain.cpp Switch to coop before destroying pinned heap handle bucket strong handle.
src/coreclr/debug/ee/debugger.h Switch to coop before destroying long weak handles on SHash entry removal.
src/coreclr/debug/ee/debugger.cpp Switch to coop before destroying debugger-disposed handles and force-catch table handles.
src/coreclr/binder/customassemblybinder.cpp Switch to coop before destroying ALC weak/strong handles on release.

Copilot AI review requested due to automatic review settings March 10, 2026 02:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 20 out of 20 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/coreclr/vm/jitinterface.h:399

  • CEEInfo::~CEEInfo() is still annotated with LIMITED_METHOD_CONTRACT, but it now explicitly switches GC mode (GCX_COOP) and destroys GC handles. LIMITED_METHOD_CONTRACT is intended for trivial leaf methods, and this destructor is doing non-trivial runtime/GC work; the contract should be updated to a CONTRACTL (or equivalent) that accurately declares the GC mode behavior and lock-taking semantics of the handle destruction path.
    virtual ~CEEInfo()
    {
        LIMITED_METHOD_CONTRACT;

#if !defined(DACCESS_COMPILE)
        // Free all handles used by JIT
        if (m_pJitHandles != nullptr)
        {
            GCX_COOP();
            OBJECTHANDLE* elements = m_pJitHandles->GetElements();
            unsigned count = m_pJitHandles->GetCount();
            for (unsigned i = 0; i < count; i++)
            {

Comment on lines +270 to +282
// This can be called from threads without a managed Thread (e.g. via
// COM release on a native thread). Only switch to cooperative mode
// when a managed thread exists; otherwise the MODE_COOPERATIVE
// contract check in DestroyHandleCommon is a no-op.
if (GetThreadNULLOk() != nullptr)
{
GCX_COOP();
DestroyRefcountedHandle(objectHandle);
}
else
{
DestroyRefcountedHandle(objectHandle);
}
Copy link

Copilot AI Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DestroyHandle is called in both branches here; the only difference is whether a GCX_COOP scope is entered. This can be simplified to avoid duplicated code (e.g., conditionally enter the coop scope when a managed thread exists, then call DestroyRefcountedHandle once).

Copilot uses AI. Check for mistakes.
::OBJECTHANDLE objectHandle = static_cast<::OBJECTHANDLE>(handle);

DestroyRefcountedHandle(objectHandle);
// This can be called from threads without a managed Thread (e.g. via
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the contract infrastructure happy, but it is not fixing the race condition on this path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may be able to create a slow DestroyHandle variant that is safe to call in pre-emptive mode, including threads that the runtime does not know about.

}
CONTRACTL_END;

// Null the handle value first so that GC will not follow a stale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't solve the problem. DestroyHandleOfType nulls the handle already (first line in TableFreeSingleHandleToCache ).

We need to make sure that the GC handles are not getting scanned by the GC when the handle is released in preemptive mode. We can do that by taking the "handle manager lock" - take the slow path in GC handle allocator by skipping all lock-free caches.

Copilot AI review requested due to automatic review settings March 11, 2026 03:10
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

src/coreclr/debug/ee/debugger.cpp:11003

  • GCX_COOP_EEINTERFACE_IFTHREAD() is a no-op when running on the native debugger helper thread (no managed Thread), but the code still calls DestroyHandle helpers that ultimately use the normal (unlocked) handle free path. If DisposeHandle can run concurrently with GC, this still risks the original race/crash. Consider using the new locked/unsafe handle destruction path when g_pEEInterface->GetThread() == NULL (mapping CorDebugHandleType to the corresponding HNDTYPE_), similar to other call sites that branch on GetThreadNULLOk().
            // Switch to cooperative mode if a managed thread exists.
            // On the native debugger helper thread (no managed Thread),
            // the IFTHREAD variant is a no-op and the MODE_COOPERATIVE
            // contract in DestroyHandleCommon is also a no-op.
            GCX_COOP_EEINTERFACE_IFTHREAD();

            switch (handleType)
            {
            case HANDLE_STRONG:
                DestroyStrongHandle(objectHandle);
                break;
            case HANDLE_WEAK_TRACK_RESURRECTION:
                DestroyLongWeakHandle(objectHandle);
                break;
            case HANDLE_PINNED:
                DestroyPinningHandle(objectHandle);
                break;

CrstHolder ch(&pTable->Lock);

// return the handle to the table's cache (under lock)
TableFreeSingleHandleToCache(pTable, uType, handle);
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HndDestroyHandleLocked takes pTable->Lock and then calls TableFreeSingleHandleToCache, which can call TableCacheMissOnFree and attempt to take pTable->Lock again. That re-entrant lock acquisition can deadlock. Consider implementing a lock-aware free path (e.g., a TableFreeSingleHandleToCacheLocked variant that, on cache miss, calls TableQuickRebalanceCache directly under the already-held lock) instead of calling the existing helper that may lock internally.

Suggested change
TableFreeSingleHandleToCache(pTable, uType, handle);
TableFreeSingleHandleToCacheLocked(pTable, uType, handle);

Copilot uses AI. Check for mistakes.
CAN_TAKE_LOCK;
}
CONTRACTL_END;

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DestroyHandleUnsafe bypasses DestroyHandleCommon and currently does not call DiagHandleDestroyed. That means GC profiling callbacks (HandleDestroyed) won’t fire for handle destroys that go through the unsafe path. If the unsafe path is intended to be functionally equivalent (just safer wrt GC scanning), it should likely invoke DiagHandleDestroyed as well (possibly guarded if profiler callbacks require an attached runtime thread).

Suggested change
DiagHandleDestroyed(handle);

Copilot uses AI. Check for mistakes.
Comment on lines +418 to +419
STRESS_LOG2(LF_GC, LL_INFO1000, "DestroyHandleLocked: *%p->%p\n", handle, *(_UNCHECKED_OBJECTREF *)handle);

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HndDestroyHandleLocked dereferences the handle value in the STRESS_LOG call before acquiring pTable->Lock. This defeats the purpose of the locked variant (safe w/ concurrent GC scanning) and can still race/crash when called from preemptive mode / unknown threads. Consider moving any handle-value dereference (or removing it from logging) until after the table lock is held.

Copilot uses AI. Check for mistakes.

DestroyShortWeakHandle(m_ExposedObject);
DestroyStrongHandle(m_StrongHndToExposedObject);
// The thread is already detached from TLS (GetThread() returns NULL),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this to Thread::CooperativeCleanup.

@@ -2297,8 +2297,11 @@ Thread::~Thread()
// Destroy any handles that we're using to hold onto exception objects
SafeSetThrowables(NULL);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetLastThrownObject can call DestroyHandle too

// and from threads that the runtime does not know about.
// It takes the handle table lock to prevent races with GC scanning,
// bypassing the lock-free cache path.
inline void DestroyHandleUnsafe(OBJECTHANDLE handle, HandleType type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can use more descriptive name, like DestroyHandleInPreeemptiveMode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

AF: *(_UNCHECKED_OBJECTREF *)handle == NULL (HndCreateHandle called by getJitHandleForObject)

3 participants