Skip to content

Conversation

@alexey-zakharov
Copy link
Contributor

Motivation:

Fix crash in ThreadLocalBlock::FreeTLM code on thread exit which happens during Assembly unloading when Assembly contains a class with a ThreadStatic variable.

This is a new iteration of #99998 which caused issues on CI.

Details:

There seem to be a race condition between LoaderAllocator cleanup during garbage collection and thread locals cleanup on thread exit. Racing stacks are the following:

ALC/LoaderAllocator cleanup

 	coreclr.dll!EEToProfInterfaceImpl::ModuleUnloadStarted(unsigned __int64 moduleId) Line 3735	C++	Symbols loaded.
 	coreclr.dll!ProfControlBlock::DoProfilerCallbackHelper<int (__cdecl*)(ProfilerInfo *),long (__cdecl*)(EEToProfInterfaceImpl *,unsigned __int64),unsigned __int64>(ProfilerInfo * pProfilerInfo, int(*)(ProfilerInfo *) condition, HRESULT(*)(EEToProfInterfaceImpl *, unsigned __int64) callback, HRESULT * pHR, unsigned __int64 <args_0>) Line 284	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ProfControlBlock::DoOneProfilerIteration(ProfilerInfo *) Line 199	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ProfControlBlock::IterateProfilers(ProfilerCallbackType) Line 207	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ProfControlBlock::DoProfilerCallback(ProfilerCallbackType) Line 295	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ProfControlBlock::ModuleUnloadStarted(unsigned __int64) Line 691	C++	Symbols loaded.
 	coreclr.dll!Module::Destruct() Line 662	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ClassLoader::FreeModules() Line 1884	C++	Symbols loaded.
 	coreclr.dll!ClassLoader::~ClassLoader() Line 1946	C++	Symbols loaded.
 	coreclr.dll!Assembly::Terminate(int) Line 311	C++	Symbols loaded.
 	coreclr.dll!Assembly::~Assembly() Line 244	C++	Symbols loaded.
 	coreclr.dll!Assembly::`scalar deleting destructor'(unsigned int __flags)	C++	Symbols loaded.
 	coreclr.dll!DomainAssembly::~DomainAssembly() Line 91	C++	Symbols loaded.
 	coreclr.dll!DomainAssembly::`scalar deleting destructor'(unsigned int __flags)	C++	Symbols loaded.
>	coreclr.dll!LoaderAllocator::GCLoaderAllocators(LoaderAllocator * pOriginalLoaderAllocator) Line 570	C++	Symbols loaded.
 	coreclr.dll!LoaderAllocator::Destroy(QCall::LoaderAllocatorHandle pLoaderAllocator) Line 702	C++	Symbols loaded.
 	coreclr.dll!LoaderAllocator_Destroy(QCall::LoaderAllocatorHandle pLoaderAllocator) Line 718	C++	Symbols loaded.
 	System.Private.CoreLib.dll!00007ffa9c4308f9()	Unknown	No symbols loaded.
 	coreclr.dll!FastCallFinalizeWorker() Line 26	Unknown	Symbols loaded.
 	coreclr.dll!MethodTable::CallFinalizer(Object * obj) Line 4908	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!CallFinalizer(Object *) Line 75	C++	Symbols loaded.
 	coreclr.dll!FinalizerThread::FinalizeAllObjects() Line 110	C++	Symbols loaded.
 	coreclr.dll!FinalizerThread::FinalizerThreadWorker(void * args) Line 354	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ManagedThreadBase_DispatchInner(ManagedThreadCallState *) Line 7222	C++	Symbols loaded.
 	coreclr.dll!ManagedThreadBase_DispatchMiddle(ManagedThreadCallState * pCallState) Line 7266	C++	Symbols loaded.
 	coreclr.dll!ManagedThreadBase_DispatchOuter(ManagedThreadCallState * pCallState) Line 7425	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ManagedThreadBase_NoADTransition(void(*)(void *)) Line 7494	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!ManagedThreadBase::FinalizerBase(void(*)(void *)) Line 7513	C++	Symbols loaded.
 	coreclr.dll!FinalizerThread::FinalizerThreadStart(void * args) Line 403	C++	Symbols loaded.
 	kernel32.dll!BaseThreadInitThunk()	Unknown	Symbols loaded.
 	ntdll.dll!RtlUserThreadStart()	Unknown	Symbols loaded.

Thread exit with threadlocal cleanup

>	coreclr.dll!LoaderAllocator::SetHandleValue(unsigned __int64 handle, Object *) Line 992	C++	Symbols loaded.
 	coreclr.dll!LoaderAllocator::FreeHandle(unsigned __int64 handle) Line 884	C++	Symbols loaded.
 	coreclr.dll!ThreadLocalBlock::FreeTLM(unsigned __int64 i, int isThreadShuttingdown) Line 59	C++	Symbols loaded.
 	coreclr.dll!ThreadLocalBlock::FreeTable() Line 93	C++	Symbols loaded.
 	[Inline Frame] coreclr.dll!Thread::DeleteThreadStaticData() Line 7704	C++	Symbols loaded.
 	coreclr.dll!Thread::OnThreadTerminate(int holdingLock) Line 2956	C++	Symbols loaded.
 	coreclr.dll!DestroyThread(Thread * th) Line 924	C++	Symbols loaded.
 	coreclr.dll!ThreadNative::KickOffThread(void * pass) Line 239	C++	Symbols loaded.
 	kernel32.dll!BaseThreadInitThunk()	Unknown	Symbols loaded.
 	ntdll.dll!RtlUserThreadStart()	Unknown	Symbols loaded.

with an exception which is thrown with the following details:

Exception thrown: read access violation.
**loaderAllocator** was nullptr.
devenv_iJFpiWBvYe

loaderAllocator value from the LOADERALLOCATORREF loaderAllocator = (LOADERALLOCATORREF)ObjectFromHandle(m_hLoaderAllocatorObjectHandle); call is NULL which makes sense, because finalizer thread just disposed data for some of the same LoaderAllocator assemblies.

Note: enabling COR_PRF_MONITOR_MODULE_LOADS CLR profiler flag increases chances for the race condition as it slows down a bit complete loader destruction.

Changes:

I've looked at the following options as a solution to the problem:

  • Null check for loaderAllocator
  • Wait for GC/Finalizers done before killing thread
  • Skip cleaning up thread locals for unloaded loaders

Based on the discussion the acceptable solution would be to:

  • use coop mode to ensure the loader is not collected while we delete thread local handles (we already do it when setting a handle to NULL, so we elevate the scope with a minimal performance consequences)
  • if loader was already collected we do not need to reset handles as they are going to be deallocated anyway further down the loader destruction

@ghost ghost added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 29, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label May 29, 2024
@alexey-zakharov alexey-zakharov marked this pull request as ready for review May 29, 2024 08:34
@alexey-zakharov
Copy link
Contributor Author

@jkotas @davidwrighton checks are green for this PR

@jkotas jkotas added area-VM-coreclr and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 29, 2024
@jkotas jkotas requested a review from davidwrighton May 29, 2024 14:27
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented May 29, 2024

/azp run runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@jkotas jkotas merged commit 8669b74 into dotnet:main May 29, 2024
@jkotas
Copy link
Member

jkotas commented May 29, 2024

/backport to release/8.0-staging

@github-actions
Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/9293876461

@github-actions
Copy link
Contributor

@jkotas backporting to release/8.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Ensure LoaderAllocator can't be collected while we clean handles on collectible LoaderAllocators
Using index info to reconstruct a base tree...
M	src/coreclr/vm/threadstatics.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/vm/threadstatics.cpp
CONFLICT (content): Merge conflict in src/coreclr/vm/threadstatics.cpp
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Ensure LoaderAllocator can't be collected while we clean handles on collectible LoaderAllocators
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

@github-actions
Copy link
Contributor

@jkotas an error occurred while backporting to release/8.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

@jkotas
Copy link
Member

jkotas commented May 29, 2024

@alexey-zakharov If you would like to see this backported to .NET 8, could you please submit a PR against release/8.0-staging branch with the merge conflicts resolved?

@alexey-zakharov
Copy link
Contributor Author

@jkotas I've created a backport PR - #102872
thanks for taking the fix in!

@alexey-zakharov alexey-zakharov deleted the upstream-fix-threadstatic-cleanup branch May 30, 2024 10:37
Ruihan-Yin pushed a commit to Ruihan-Yin/runtime that referenced this pull request May 30, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jun 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-VM-coreclr community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants