[WIP] Implement an intrinsic for delegate lambdas#125901
[WIP] Implement an intrinsic for delegate lambdas#125901MichalPetryka wants to merge 20 commits intodotnet:mainfrom
Conversation
The idea behind the original proposal was that the codegen is going take care of the caching behind the scenes to minimize the binary size (and startup) overheads. If the IL is required to have a field, it dilutes the benefit of the special intrinsic. It may be better to give up a bit more and just go with the alternative in the proposal. This needs numbers to decide.
What is Roslyn expected to generate for lambdas in generic types with this design? |
src/coreclr/tools/aot/ILCompiler.RyuJit/JitInterface/CorInfoImpl.RyuJit.cs
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| int allocationSiteId = _delegateId.GetOrAdd(method, static (_, compiler) => Interlocked.Increment(ref compiler._nextMethodId), this); |
There was a problem hiding this comment.
This doesn't look deterministic. The way I understood #85014 was that this would be unique per TScope/TDelegate/ftn address. This seems to add another dimension.
There was a problem hiding this comment.
This was a temporary hack to generate unique allocation sites for NAOT, in the final impl I'd planning to reimplement it here more akin to the string literal implementation with a separate map for the instances.
Also, AFAIR this is only used for deduplicating the frozen instances, so I assumed this wouldn't impact determinism/uniqueness at all?
There was a problem hiding this comment.
We have multiple CorInfoImpl instances in memory during multithreaded compilation. Which ones will process which methods is random. The allocationSiteId is used to form a symbolic name for the frozen object. I assume at minimum we'll have random names in debug info. There may also be other problems. Yes, this will need a separate map like string literals.
There was a problem hiding this comment.
I've improved the code a bit now but it's still using the shared code which has the allocation site there. I put the method hashcode there for now, is that unique enough or do I need to separate it out totally to not use an id?
...veaot/System.Private.CoreLib/src/System/Runtime/CompilerServices/RuntimeHelpers.NativeAot.cs
Show resolved
Hide resolved
Why not use a default instance (no .ctor call, just allocated) for shared generics? It would be the most efficient option for generic types.
A field would be required for only shared generics and unloadable assemblies, right?
If delegates could be made frozen, then NAOT wouldn't need this? |
The field caching idea is not a fundamental requirement for this implementation, I'm just not aware of any way to avoid overhead on every access for cases where we can't expand otherwise.
Do you have any specific way of benchmarking in mind? I'm not sure what would be the best way to compare, file size checks aren't too easy without Roslyn support since we need a bigger assembly for the difference to be meaningful and comparing access perf for unexpanded is also non trivial cause of needing correct dictionary keys.
The idea would be to generate a single non generic class for all lambda methods and non-generic fields and put generic methods in there (fields for them would need separate classes). |
That'd be the way I'd implement this, it'd just add a bit of code to the implementation (since we'd ideally cache the instances for all delegates and such) and I wanted to wait for that until we're sure it will be neeeded.
AFAIR yes, other than when the GC fails to allocate frozen instances (unless we'd complicate even further like string literals do and allocate on POH/use pinned handles then and still hardcode the instance in assembly.)
This already allocates delegates as frozen, the question would rather be if Roslyn would use the intrinsic in cctor bodies, if yes we don't want to block interpreting them cause of the intrinsic. |
|
@jkotas @MichalStrehovsky After converting my tests from reflection to IL (for NAOT to be able to track them properly), I've noticed that |
I assume that you will get an exception if you try to call the function pointer returned by |
ECMA-335 spec covers this in "II.15.2 Static, instance, and virtual methods": Abstract virtual methods (which shall only be defined in abstract classes or interfaces) shall be called RuntimeMethodHandle.GetFunctionPointer docs say: For instance method handles, the value is not easily usable from user code and is meant exclusively for usage within the runtime. So this checks out. |
I did not test calling it, only using it to create a delegate which did work fine. |
Measure cost of an (unexecuted) lambda that just returns a unique integer: IL binary size, memory footprint in JIT, NativeAOT binary size. Before/after. The easiest way to do that is by creating a test with like million lambdas. |
Implements a basic intrinsic for creating delegate singletons, to be used by Roslyn for lambdas and method group conversions.
Creates delegates closed over null instances to save on memory, this makes it reject instance methods on generic types since those need an instance.
Uses a field for caching non frozen delegates since otherwise we'd have a noticeable perf regression on every access for cases that can't be expanded in the JIT (shared generics, unloadable assemblies). This also significantly simplifies the implementation.
TODO:
cc @jkotas @MichalStrehovsky @EgorBo
Depends on #99200 (without it this is a GC hole)
Blocked by #126284
Closes #85014