Skip to content

HW intrinsics API declaration is incorrect for Sse41.Insert() that operates on vector of 32-bit floats #10383

@voinokin

Description

@voinokin

The [V]INSERTPS operation differs from similarly named operations that maps to [V]PINSRW (SSE2+) and [V]PINSRB/D/Q (SSE4.1+).

Here's how it is declared in API:

        /// <summary>
        /// __m128 _mm_insert_ps (__m128 a, __m128 b, const int imm8)
        ///   INSERTPS xmm, xmm/m32, imm8
        /// </summary>
        public static Vector128<float> Insert(Vector128<float> value, float data, byte index) => Insert(value, data, index);

In fact, the operation either loads the value from [m32] and merges it with source XMM reg at specified position, or merges value of selected 32-bit element from XMM reg (2nd operand) with source XMM reg (1st operand).
Additionally, it can zero some or all elements of result.

Here's how it is implemented in CPU:

INSERTPS (128-bit Legacy SSE version)

IF (SRC = REG) THEN COUNT_S←imm8[7:6]
    ELSE COUNT_S←0
COUNT_D ←imm8[5:4]
ZMASK ←imm8[3:0]
CASE (COUNT_S) OF
    0: TMP←SRC[31:0]
    1: TMP←SRC[63:32]
    2: TMP←SRC[95:64]
    3: TMP←SRC[127:96]
ESAC;
CASE (COUNT_D) OF
    0: TMP2[31:0]←TMP
        TMP2[127:32] ←DEST[127:32]
    1: TMP2[63:32]←TMP
        TMP2[31:0] ←DEST[31:0]
        TMP2[127:64] ←DEST[127:64]
    2: TMP2[95:64]←TMP
        TMP2[63:0] ←DEST[63:0]
        TMP2[127:96] ←DEST[127:96]
    3: TMP2[127:96]←TMP
        TMP2[95:0] ←DEST[95:0]
ESAC;
IF (ZMASK[0] = 1) THEN DEST[31:0]←00000000H
    ELSE DEST[31:0]←TMP2[31:0]
IF (ZMASK[1] = 1) THEN DEST[63:32]←00000000H
    ELSE DEST[63:32]←TMP2[63:32]
IF (ZMASK[2] = 1) THEN DEST[95:64]←00000000H
    ELSE DEST[95:64]←TMP2[95:64]
IF (ZMASK[3] = 1) THEN DEST[127:96]←00000000H
    ELSE DEST[127:96]←TMP2[127:96]
DEST[MAXVL-1:128] (Unmodified)

VINSERTPS (VEX.128 and EVEX encoded version)

IF (SRC = REG) THEN COUNT_S←imm8[7:6]
    ELSE COUNT_S←0
COUNT_D ← imm8[5:4]
ZMASK ← imm8[3:0]
CASE (COUNT_S) OF
    0: TMP←SRC2[31:0]
    1: TMP←SRC2[63:32]
    2: TMP←SRC2[95:64]
    3: TMP←SRC2[127:96]
ESAC;
CASE (COUNT_D) OF
    0: TMP2[31:0]←TMP
        TMP2[127:32] ← SRC1[127:32]
    1: TMP2[63:32]←TMP
        TMP2[31:0] ← SRC1[31:0]
        TMP2[127:64] ← SRC1[127:64]
    2: TMP2[95:64]←TMP
        TMP2[63:0] ← SRC1[63:0]
        TMP2[127:96] ← SRC1[127:96]
    3: TMP2[127:96]←TMP
        TMP2[95:0] ← SRC1[95:0]
ESAC;
IF (ZMASK[0] = 1) THEN DEST[31:0]←00000000H
    ELSE DEST[31:0]←TMP2[31:0]
IF (ZMASK[1] = 1) THEN DEST[63:32]←00000000H
    ELSE DEST[63:32]←TMP2[63:32]
IF (ZMASK[2] = 1) THEN DEST[95:64]←00000000H
    ELSE DEST[95:64]←TMP2[95:64]
IF (ZMASK[3] = 1) THEN DEST[127:96]←00000000H
    ELSE DEST[127:96]←TMP2[127:96]
DEST[MAXVL-1:128] ← 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIbug

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions