Emit shorter opcodes in ILGenerator.Emit(OpCode, int)#35427
Emit shorter opcodes in ILGenerator.Emit(OpCode, int)#35427jkotas merged 1 commit intodotnet:masterfrom
Conversation
The Ldloc, Stloc, Ldloca, Ldc_I4, Ldarg, Ldarga, and Starg opcodes all have shorter variants that take up less space in the IL instruction stream. ILGenerator.Emit(OpCode, LocalBuilder) already special cases Ldloc, Stloc, and Ldloca to automatically translate those into their shorter forms where applicable, but similar logic doesn't exist in Emit(OpCode, int) for Ldc_I4, Ldarg, Ldarga, and Starg. Instead, various other libraries higher in the stack that use reflection emit either end up doing all the special-casing with their own helper routines to do the shrinking, or they just forego it and end up with larger IL than is necessary. This PR just moves the logic down into Emit(OpCode, int) such that all uses can benefit, and removes the special-casing duplication from the other libraries.
src/coreclr/src/System.Private.CoreLib/src/System/Reflection/Emit/ILGenerator.cs
Show resolved
Hide resolved
danmoseley
left a comment
There was a problem hiding this comment.
Nice optimization, how did this not happen already over the years?
|
Could one imagine an analogous abstraction that saves me having to pick from |
Maybe for backward branches. Not for forward, at least not without a lot more complexity / post-processing of the IL stream. |
|
@stephentoub if you’re looking at ref emit can you also nuke the allocations 😁? |
Which allocations here are impactful, and how? |
|
This was an optimization @pakrym did to reduce allocations while building the callsite for object creation during DI. |
|
Using an array poll in ILGenerator might be a good start. |
|
I understand techniques could be used to avoid allocation. I'm asking what does it help, other than lowering an allocation number? Is ref emit being used in ASP.NET somewhere on a hot path? |
|
Yes, DependencyInjection uses it to emit the service resolvers: https://github.com/dotnet/runtime/blob/e3ffd343ad5bd3a999cb9515f59e6e7a777b2c34/src/libraries/Microsoft.Extensions.DependencyInjection/src/ServiceLookup/ILEmit/ILEmitResolverBuilder.cs BTW, it has the same optimizations as this PR addressed |
On a hot path? That's not a one-time / infrequent action? And you had evidence that reducing managed allocation here moved a throughout / startup measurement, rather than just lowering a bytes allocated counter? |
That code is for ldloc/stloc. Specialization of those already existed, even on .NET Framework: |
Most of the service resolver building happens in parallel with app startup so it's not a recurring hot path but improvements in it might improve the startup time.
Adding that optimization did improve resolver service building performance at the time of writing I don't think I did app startup time measurements at the time.
Huh. I wrote some useless code then. |
Pushing it all the way to 0. I'm going to turn the GC into a heap scanner 😄 |
Pooling has costs. Extra code to avoid allocation needs to be loaded and JIT'd and has costs (beyond complexity and maintenance costs). Allocation counts as a sole metric can easily tune for the wrong thing, and an occasional allocation can yield better all-up performance than the extra code to avoid it.
Thanks. I'm trying to gauge the benefits to actually investing more here. Those wins you saw, that was from reducing allocation alone, or were there other aspects of the change that were the most impactful? Link to a PR? Thanks. |
I assume that you measured it by microbenchmarks. The microbenchmarks do not account for static footprint and startup time that matter quite bit for real apps too.
This suggest that you may have just shifted the cost from one place to a different place, while increasing complexity.
Well, we may be getting to the point where the focus on reducing allocations is starting to hurt more than help. I am finding myself telling people to allocate more quite often recently to avoid writing hard to maintain overengineered code (e.g. https://github.com/dotnet/runtime/pull/35383/files#r414296004). |
It was done as part of working on the larger PR:
What other place? The optimization is precalculating the approximate size of IL body to avoid extensive resizing/allocations. It happens inline with operation being benchmarked. |
|
I was obviously kidding as I don’t want to make things worse but it’s gonna be hard to convince m to not allocate if it’s easy to do so and there’s no negative performance impact. If the code is much harder to maintain and there’s no benefit, or if the pooling overhead defeats the purpose then sure. Otherwise don’t allocate |
If this happens for almost free, it is fine. It was not obvious to me. It looked like there is a special pass for pre-calculating from brief look. |
The Ldloc, Stloc, Ldloca, Ldc_I4, Ldarg, Ldarga, and Starg opcodes all have shorter variants that take up less space in the IL instruction stream. ILGenerator.Emit(OpCode, LocalBuilder) already special cases Ldloc, Stloc, and Ldloca to automatically translate those into their shorter forms where applicable, but similar logic doesn't exist in Emit(OpCode, int) for Ldc_I4, Ldarg, Ldarga, and Starg. As a result, various other libraries higher in the stack that use reflection emit end up doing all the special-casing with their own helper routines to do the shrinking (or they just forego it and end up with larger IL than is necessary).
This PR just consolidates the special-casing logic down into Emit(OpCode, int).