-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Closed
dotnet/coreclr
#17637Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIbug
Milestone
Description
The [V]INSERTPS operation differs from similarly named operations that maps to [V]PINSRW (SSE2+) and [V]PINSRB/D/Q (SSE4.1+).
Here's how it is declared in API:
/// <summary>
/// __m128 _mm_insert_ps (__m128 a, __m128 b, const int imm8)
/// INSERTPS xmm, xmm/m32, imm8
/// </summary>
public static Vector128<float> Insert(Vector128<float> value, float data, byte index) => Insert(value, data, index);
In fact, the operation either loads the value from [m32] and merges it with source XMM reg at specified position, or merges value of selected 32-bit element from XMM reg (2nd operand) with source XMM reg (1st operand).
Additionally, it can zero some or all elements of result.
Here's how it is implemented in CPU:
INSERTPS (128-bit Legacy SSE version)
IF (SRC = REG) THEN COUNT_S←imm8[7:6]
ELSE COUNT_S←0
COUNT_D ←imm8[5:4]
ZMASK ←imm8[3:0]
CASE (COUNT_S) OF
0: TMP←SRC[31:0]
1: TMP←SRC[63:32]
2: TMP←SRC[95:64]
3: TMP←SRC[127:96]
ESAC;
CASE (COUNT_D) OF
0: TMP2[31:0]←TMP
TMP2[127:32] ←DEST[127:32]
1: TMP2[63:32]←TMP
TMP2[31:0] ←DEST[31:0]
TMP2[127:64] ←DEST[127:64]
2: TMP2[95:64]←TMP
TMP2[63:0] ←DEST[63:0]
TMP2[127:96] ←DEST[127:96]
3: TMP2[127:96]←TMP
TMP2[95:0] ←DEST[95:0]
ESAC;
IF (ZMASK[0] = 1) THEN DEST[31:0]←00000000H
ELSE DEST[31:0]←TMP2[31:0]
IF (ZMASK[1] = 1) THEN DEST[63:32]←00000000H
ELSE DEST[63:32]←TMP2[63:32]
IF (ZMASK[2] = 1) THEN DEST[95:64]←00000000H
ELSE DEST[95:64]←TMP2[95:64]
IF (ZMASK[3] = 1) THEN DEST[127:96]←00000000H
ELSE DEST[127:96]←TMP2[127:96]
DEST[MAXVL-1:128] (Unmodified)
VINSERTPS (VEX.128 and EVEX encoded version)
IF (SRC = REG) THEN COUNT_S←imm8[7:6]
ELSE COUNT_S←0
COUNT_D ← imm8[5:4]
ZMASK ← imm8[3:0]
CASE (COUNT_S) OF
0: TMP←SRC2[31:0]
1: TMP←SRC2[63:32]
2: TMP←SRC2[95:64]
3: TMP←SRC2[127:96]
ESAC;
CASE (COUNT_D) OF
0: TMP2[31:0]←TMP
TMP2[127:32] ← SRC1[127:32]
1: TMP2[63:32]←TMP
TMP2[31:0] ← SRC1[31:0]
TMP2[127:64] ← SRC1[127:64]
2: TMP2[95:64]←TMP
TMP2[63:0] ← SRC1[63:0]
TMP2[127:96] ← SRC1[127:96]
3: TMP2[127:96]←TMP
TMP2[95:0] ← SRC1[95:0]
ESAC;
IF (ZMASK[0] = 1) THEN DEST[31:0]←00000000H
ELSE DEST[31:0]←TMP2[31:0]
IF (ZMASK[1] = 1) THEN DEST[63:32]←00000000H
ELSE DEST[63:32]←TMP2[63:32]
IF (ZMASK[2] = 1) THEN DEST[95:64]←00000000H
ELSE DEST[95:64]←TMP2[95:64]
IF (ZMASK[3] = 1) THEN DEST[127:96]←00000000H
ELSE DEST[127:96]←TMP2[127:96]
DEST[MAXVL-1:128] ← 0
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIbug