-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Open
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone
Description
From #80129, several of the Vector2/3/4 and Vector<T> APIs involving reading from or writing to a span were changed from intrinsics to managed methods.
For the most part this is generally correct/good. However, there are some notable performance differences due to the IR that gets created.
Notably, consider the following minimal and self-contained example:
private static (int, int) Load(int[] array, int index)
{
if ((index < 0) || ((array.Length - index) < 2))
{
throw new ArgumentOutOfRangeException();
}
return (array[index + 0], array[index + 1]);
}This creates two notable trees (similarly if a throw helper is used):
STMT00000 ( 0x000[E-] ... ??? )
[000003] ----------- * JTRUE void
[000002] ----------- \--* LT int
[000000] ----------- +--* LCL_VAR int V01 arg1
[000001] ----------- \--* CNS_INT int 0
and
STMT00004 ( 0x004[E-] ... ??? )
[000018] ---X------- * JTRUE void
[000017] ---X------- \--* GE int
[000015] ---X------- +--* SUB int
[000013] ---X------- | +--* ARR_LENGTH int
[000012] ----------- | | \--* LCL_VAR ref V00 arg0
[000014] ----------- | \--* LCL_VAR int V01 arg1
[000016] ----------- \--* CNS_INT int 2
This is significantly different from the intrinsic handling which directly created BOUNDS_CHECK nodes:
[000067] ---X------- +--* COMMA ref
[000061] ---X------- | +--* BOUNDS_CHECK_ArgRng void
[000055] ----------- | | +--* LCL_VAR int V11 loc5
[000060] ---X------- | | \--* ARR_LENGTH int
[000054] ----------- | | \--* LCL_VAR ref V08 loc2
[000066] ---X------- | \--* COMMA ref
[000065] ---X------- | +--* BOUNDS_CHECK_ArgRng void
[000063] ----------- | | +--* ADD int
[000062] ----------- | | | +--* LCL_VAR int V11 loc5
[000056] ----------- | | | \--* CNS_INT int 3
[000064] ---X------- | | \--* ARR_LENGTH int
[000058] ----------- | | \--* LCL_VAR ref V08 loc2
[000057] ----------- | \--* LCL_VAR ref V08 loc2
[000059] ----------- \--* LCL_VAR int V11 loc5
Because these aren't BOUNDS_CHECK nodes, the JIT throughput is not only "less efficient" but the optimizations that kick in are as well and it results in overall worse codegen.
Thealexbarney and neon-sunset
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI