Skip to content

Manual bounds checks are less efficient #80256

@tannergooding

Description

@tannergooding

From #80129, several of the Vector2/3/4 and Vector<T> APIs involving reading from or writing to a span were changed from intrinsics to managed methods.

For the most part this is generally correct/good. However, there are some notable performance differences due to the IR that gets created.

Notably, consider the following minimal and self-contained example:

private static (int, int) Load(int[] array, int index)
{
    if ((index < 0) || ((array.Length - index) < 2))
    {
        throw new ArgumentOutOfRangeException();
    }

    return (array[index + 0], array[index + 1]);
}

This creates two notable trees (similarly if a throw helper is used):

STMT00000 ( 0x000[E-] ... ??? )
               [000003] -----------                         *  JTRUE     void  
               [000002] -----------                         \--*  LT        int   
               [000000] -----------                            +--*  LCL_VAR   int    V01 arg1         
               [000001] -----------                            \--*  CNS_INT   int    0

and

STMT00004 ( 0x004[E-] ... ??? )
               [000018] ---X-------                         *  JTRUE     void  
               [000017] ---X-------                         \--*  GE        int   
               [000015] ---X-------                            +--*  SUB       int   
               [000013] ---X-------                            |  +--*  ARR_LENGTH int   
               [000012] -----------                            |  |  \--*  LCL_VAR   ref    V00 arg0         
               [000014] -----------                            |  \--*  LCL_VAR   int    V01 arg1         
               [000016] -----------                            \--*  CNS_INT   int    2

This is significantly different from the intrinsic handling which directly created BOUNDS_CHECK nodes:

               [000067] ---X-------                            +--*  COMMA     ref   
               [000061] ---X-------                            |  +--*  BOUNDS_CHECK_ArgRng void  
               [000055] -----------                            |  |  +--*  LCL_VAR   int    V11 loc5         
               [000060] ---X-------                            |  |  \--*  ARR_LENGTH int   
               [000054] -----------                            |  |     \--*  LCL_VAR   ref    V08 loc2         
               [000066] ---X-------                            |  \--*  COMMA     ref   
               [000065] ---X-------                            |     +--*  BOUNDS_CHECK_ArgRng void  
               [000063] -----------                            |     |  +--*  ADD       int   
               [000062] -----------                            |     |  |  +--*  LCL_VAR   int    V11 loc5         
               [000056] -----------                            |     |  |  \--*  CNS_INT   int    3
               [000064] ---X-------                            |     |  \--*  ARR_LENGTH int   
               [000058] -----------                            |     |     \--*  LCL_VAR   ref    V08 loc2         
               [000057] -----------                            |     \--*  LCL_VAR   ref    V08 loc2         
               [000059] -----------                            \--*  LCL_VAR   int    V11 loc5 

Because these aren't BOUNDS_CHECK nodes, the JIT throughput is not only "less efficient" but the optimizations that kick in are as well and it results in overall worse codegen.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions