[WIP] Collectible Assemblies and AssemblyLoadContext#8677
[WIP] Collectible Assemblies and AssemblyLoadContext#8677xoofx wants to merge 47 commits intodotnet:masterfrom
Conversation
|
Adding some other details I forgot:
|
|
cc @rahku @gkhanna79 |
The GC heap has to be always walkable. It means that all objects, even the dead ones that points to unloaded loader allocator have to have a valid method table. Gen2 turns all dead objects into free space. That's why it is safe to free the loader allocator memory only after the Gen2 GC. In theory, it may possible to invent a special pass over the GC heap to turn all objects that point to the unloaded loader allocator into free space, but it would not be very different from Gen2 GC and the synchronization between this special pass and the GC would be tricky. |
The object graph that became unreachable has still references between objects, so there should not be any fundamental problem with getting this event fired. The current design for the reflection.emit collectible assemblies depends on the GC to figure out that it is no longer reachable. It means that all references to it have to be done via WeakHandles, etc. It would be likely pretty inconvenient in practice. It may be useful to have explicit |
So
Explicit modeThe native Here I'm just a little bit worried about calling back the In order to secure the access for the explicit I guess that the Implicit node
What do you think? |
|
I do not understand why the implicit mode cannot fire the unloading event. |
|
What would be passed as the sender for the |
|
Yes, poorly written or malicious code can resurrect it - but it can resurrect it in number of other ways too, so it is not introducing a new unique problem. |
Actually, re-reading the code with my original problem with Unloading, it was only due to the current workaround with subscribing the |
|
I assume the default assembly load context would not go through the explicit unload path during shutdown. If that's the case, only the default assembly load context would need to subscribe to ProcessExit so that the Unloading event on that instance is raised upon shutdown. |
|
Also, what about non-default assembly load contexts on shutdown? Finalizers aren't guaranteed to run on shutdown anymore, and the Unloading event was meant to provide a notification of shutdown, so the event would still need to be raised, if those instances don't go through the explicit unload path somehow during shutdown. |
|
@kouvel good catch about the shutdown process and finalizers. |
|
I guess one possibility is to keep track of assembly load contexts that have not yet been unloaded in a static weak reference collection. Unload and the finalizer would remove the instance from the collection and raise the Unloading event. A static method can be registered for the ProcessExit event, to raise the Unloading event on the remaining instances in the collection. |
d4a0b51 to
9490418
Compare
|
So I have pushed a new version. But for this, I'm not really sure how to solve this, as it goes deeply into dependencies between components accessing concurrently cache of the AppDomain, so currently, the code is not threadsafe for this... The solution might be obvious for someone familiar with this codebase, or the problem is more fundamental... |
|
Ok, so the problem above is indirectly caused by deleting |
|
I'm stuck with understanding what the ICLRPrivBinder is supposed to be... When called on At this line and after the I'm a bit lost between these different Could someone explain why we have two different ICLRPrivBinder for the DomainAssembly and the PEAssembly? |
|
More precisely, the |
|
After finding some documentation about The calling sequence from AssemblyLoadContext::LoadFromPath used in a Load override is like this: So the (1) AddAssemblyToCache is using a spec bound to the BINDER_SPACE::Assembly allocated for the loaded assembly. My question is: Why (1) would not get the parent binder of BINDER_SPACE::Assembly, which in our case is binderALC to push it to the cache? Is it relevant to store this one? (as it looks like the spec is per assembly...) In fact my original code from cleaning the AssemblyBindingCache was doing it on the DomainAssembly value and not the spec key, so It was working. But when I switched back to use an So I guess that when trying to remove the cache (1), I will remove also (2) by getting the parent GetBinder() on it. Not sure if it is all clear and the right way to do it... Any hints If I'm doing something obviously wrong would be appreciated! 😅 |
|
nevermind, the problem is somewhere else, the |
9490418 to
d2dd977
Compare
|
So I just pushed a stable and working version. 🎉 Finally got rid of the caching problems by removing from the cache based on the DomainAssembly and not on the AssemblySpec (thought I had a bug but discovered that there was many entries with slightly different AssemblySpec in the AssemblySpecBindingCache while they could point to the same DomainAssembly) The There is a slight increase of memory when running over 1M of reload, but I haven't been able to find exactly what is leaking (tried taking Memory heap native snapshot from VS2015 but not sure how much it is reliable...) There are a few bits not completely polished (missing a few |
|
Going to look at the failing tests and how to add new tests related to collectible ALC. |
src/vm/mscorlib.h
Outdated
| @@ -1,4 +1,4 @@ | |||
| // Licensed to the .NET Foundation under one or more agreements. | |||
| // Licensed to the .NET Foundation under one or more agreements.ASSEMBLY | |||
| Unloading = null; | ||
| // Initialize the VM side of AssemblyLoadContext if not already done. | ||
| isDefault = fRepresentsTPALoadContext; | ||
| var thisWeakHandle = GCHandle.Alloc(this, GCHandleType.Weak); |
There was a problem hiding this comment.
The unloadability comes with extra overhead. We do not want to make everybody (even folks who do not need unloading) to pay for it. There should be a new constructor that takes flag on whether the ALC is unloadable.
There was a problem hiding this comment.
I'm wondering where is the overhead exactly coming from? (the GC that has more work to do on a collectible assembly?).
Currently the unload is automatically done on the ALC finalizer if not called explicitly. So that would mean that if an ALC is not unloadable, we would throw an exception on the Unload method? Behind the scene, the only difference would be that the native Assemblies + AssemblyLoadContext + AssemblyLoaderAllocator would not me marked collectible I guess.
There was a problem hiding this comment.
GC has more work to do, JIT generates less efficient code in some cases, runtime data structures are bigger, ... .
| // Remove this instance from the assembly to cleanup on ProcessExit | ||
| lock (assemblyLoadContextAliveList) | ||
| { | ||
| if (isProcessExiting) |
There was a problem hiding this comment.
There is no finalization on shutdown in .NET Core - this path should be unreachable.
There was a problem hiding this comment.
Ok, I was not sure. So a finalizer cannot be called while callbacks on AppDomain.ProcessExit method is running? (GC.Collect for example has no effect there?)
There was a problem hiding this comment.
The finalizer can be still called if the object became unreachable before and was sitting in the finalization queue when shutdown started. (I should have said that this path should not be necessary; not unreachable.)
There was a problem hiding this comment.
Good, so a collectible boolean is definitely necessary.
|
Here is a small test that I have written that fails with the asserts in My test shows how to get this feature tested under stress, e.g. replace loading of hello.exe by loading all CoreCLR tests. https://github.com/Microsoft/dotnet-reliability has harness that we are using for stress testing. It may be a good idea to build a stress mode for unloadable ALC into it. cc @schaabs Also, we will need to make sure that it works well with other features - interop, debugger, ... . The overall direction looks reasonable to me so far. @xoofx Thank you for doing this! |
|
@karelz All right, thanks :) |
|
So, here's the official word to set expectations - likely not what most people hoped for (sorry for that
|
|
@karelz You're right, that's not what we want. But remember that this is an open-source project now. So, what can the community do to help improve on that? |
|
@masonwheeler The community (in the person of @xoofx) already went as far as it could, but if I remember correctly, the specialized test environment was (is?) not accessible to the community, so it was up to MS to finish the job. In other words, this issue is dead. |
|
@kwaclaw Sorry, but no. I'm not willing to accept that. This is a feature that has not only been needed, but obviously needed, from the days of .NET Framework version 1. We've waited too long on Microsoft not doing anything about it. Now that it's finally available to the community to work on it, Microsoft continuing to not do anything about it is no longer a reasonable excuse. What can the community do to improve this? |
|
What about creating a community-driven stress test environment?
…On Feb 7, 2018 7:46 PM, "masonwheeler" ***@***.***> wrote:
@kwaclaw <https://github.com/kwaclaw> Sorry, but no. I'm not willing to
accept that.
This is a feature that has not only been needed, but *obviously* needed,
from the days of .NET Framework version 1. We've waited too long on
Microsoft not doing anything about it. Now that it's finally available to
the community to work on it, Microsoft continuing to not do anything about
it is no longer a reasonable excuse.
What can the community do to improve this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8677 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGGWB21FXYb7JolrWWha0xLPAitc4hqqks5tSkPLgaJpZM4LQMS7>
.
|
|
@DemiMarie That sounds like a good idea. Who here has experience with stress testing? |
|
Very disappointing that this feature is not considered important. We really wanted to use .net core for a larger project but will now have to use another platform. |
|
@DemiMarie I think a community-driven stress test won't work because if I remember the thread correctly, Microsoft would only accept the pull request if it passed their own stress test environment. However, Microsoft is not able to share this environment with the community and has no intention of performing the stress testing themselves. Therefore I conclude that this pull request/feature is dead for the foreseeable future. |
|
Yuck. Perhaps someone needs to maintain a fork.
…On Feb 24, 2018 2:07 PM, "Karl Waclawek" ***@***.***> wrote:
@DemiMarie <https://github.com/demimarie> I think a community-driven
stress test won't work because if I remember the thread correctly,
Microsoft would only accept the pull request if it passed their own stress
test environment.
However, Microsoft is not able to share this environment with the
community and has no intention of performing the stress testing themselves.
Therefore I conclude that this pull request/feature is dead for the
foreseeable future.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8677 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGGWB0EFvx3GGPwgrU3xGBCKnTgf4I9eks5tYF4EgaJpZM4LQMS7>
.
|
|
Months of work? Really? I find that hard to believe. This is really bad news. I am surprised that such a needed feature has been shelved. |
|
There have a few problems that ms team could not be solved. 1.Not enough skilled C++ programmers. |
|
|
Whats going on here? Can someone from MS fix this? |
|
@seriouz Yes, but they're choosing not to. Even though this is something that .NET has needed from the very beginning, which developers have been requesting from the very beginning, even though someone else has gone and done for them the work that they chose not to do, they won't integrate it until it's been tested in a special testing system that they're choosing not to use and they won't permit anyone else to do for them. They could. But they aren't. |
|
It's ridiculous... |
|
The changes in this PR are a good prototype. It is very light on testing. It has not been hard to find problems in it by just looking at the code that should have been found by testing it well instead. Majority work on this feature is on testing and getting it to the right quality level. Our estimate is that this needs months, likely close to a year, of experienced runtime developer to make it happen. We were not able to dedicate one for this due to competing priorities. I do acknowledge that we could have handle this better. I am sorry. The sources for the stress testing system are at https://github.com/Microsoft/dotnet-reliability. It is basically xunit with stress testing extensions. It is a special system, but we have not been actively preventing anybody from using it. Inventing a new system for testing this feature would be ok too similar to what we have for GC stress or JIT stress. We are starting to plan for where we are going to invest once .NET Core 2.1 ships. Having better story for code unloading is high on our list. I am hopeful that it will become a part of plan for the next release. |
|
@jkotas Thanks, that's really helpful! 👍 OK, so now we have the means to build a stress test environment. Anyone up for a bit of work on this? |
|
Expect this function |
This is just a rebased version of dotnet#8677
This is just a rebased version of dotnet#8677
This is just a rebased version of dotnet#8677
|
Follow up PR #18476 |
|
Superseded by #18476 |
This is just a rebased version of dotnet#8677
This is a followup of the issue #1684 (work done by @Rohansi) and the original issue #552
This is PR is not mergeable as it is. It is a single commit for now to simplify the tracking of things that are changed. I will recommit everything once I have cleanup some bits here and there (the changes are very rough now... I haven't followed also using the define
FEATURE_COLLECTIBLE_ALC...etc.)The difference with #1684 is that a single
AssemblyLoaderAllocatoris now associated to theAssemblyLoadContext. Without changing much this class, I have added the ability to add aDomainAssembly(instead of the previousSetDomainAssemblymethod) to chain it to another domain assembly that is in the same ALC (via the memberDomainAssembly::m_NextDomainAssemblyInSameALC). It allows to use the original code for single collectible assembly.Also, the
AssemblyLoadContextcreates a weak handle on itself instead of a strong handle (otherwise it cannot self finalize). The Destroy of this context doesn't actually delete the native ALC but instead release a strong handle to theLoaderAllocatormanaged object (associated to theAssemblyLoaderAllocator). Then the finalizer of theLoaderAllocatorvia theLoaderAllocatorScoutwill be called, and there we will cleanup everything, including releasing the nativeAssemblyLoadContext.A simple test with an assembly loaded in loop seems to indicate that the memory is stable and nothing so far is crashing, but I expect many problems in many places, so this is really an early POC.
I originally didn't use the PR #1684 and tried to perform the changes on my side, but I had to cherry pick some parts of the previous PR while encountering the same issues. One thing I'm really not happy currently is the code in the
LoaderAllocator::GCLoaderAllocators_RemoveAssemblies(I had to fight with issues with part of the codeGC_NOTRIGGERand others incompatibleGC_TRIGGERS...etc.). One thing I don't like is to have multiple methods to remove all the internal caches associated to aDomainAssemblyin theAppDomain. This code should be part ofAppDomaininstead. There is also some issues with some maps that are using someAssemblySpecthat were problematic because they were doing some GC_TRIGGERS (and the originalGCLoaderAllocators_RemoveAssemblieswasGC_NOTRIGGER)... so yeah, this part has to be heavily rewrite/cleanup somehow.Comments are welcome (sorry if it is a bit of a mess!)
cc: @jkotas @gkhanna79 @rahku