SSE2 — OpCode List
(under construction -- this list might be incomplete?!)
Additionally, with AMD64's 64/128 bit register extensions some of the functionality changes...
Arithmetic:
addpd - Adds 2 64bit doubles.
addsd - Adds bottom 64bit doubles.
subpd - Subtracts 2 64bit doubles.
subsd - Subtracts bottom 64bit doubles.
mulpd - Multiplies 2 64bit doubles.
mulsd - Multiplies bottom 64bit doubles.
divpd - Divides 2 64bit doubles.
divsd - Divides bottom 64bit doubles.
maxpd - Gets largest of 2 64bit doubles for 2 sets.
maxsd - Gets largets of 2 64bit doubles to bottom set.
minpd - Gets smallest of 2 64bit doubles for 2 sets.
minsd - Gets smallest of 2 64bit values for bottom set.
paddb - Adds 16 8bit integers.
paddw - Adds 8 16bit integers.
paddd - Adds 4 32bit integers.
paddq - Adds 2 64bit integers.
paddsb - Adds 16 8bit integers with saturation.
paddsw - Adds 8 16bit integers using saturation.
paddusb - Adds 16 8bit unsigned integers using saturation.
paddusw - Adds 8 16bit unsigned integers using saturation.
psubb - Subtracts 16 8bit integers.
psubw - Subtracts 8 16bit integers.
psubd - Subtracts 4 32bit integers.
psubq - Subtracts 2 64bit integers.
psubsb - Subtracts 16 8bit integers using saturation.
psubsw - Subtracts 8 16bit integers using saturation.
psubusb - Subtracts 16 8bit unsigned integers using saturation.
psubusw - Subtracts 8 16bit unsigned integers using saturation.
pmaddwd - Multiplies 16bit integers into 32bit results and adds results.
pmulhw - Multiplies 16bit integers and returns the high 16bits of the result.
pmullw - Multiplies 16bit integers and returns the low 16bits of the result.
pmuludq - Multiplies 2 32bit pairs and stores 2 64bit results.
rcpps - Approximates the reciprocal of 4 32bit singles.
rcpss - Approximates the reciprocal of bottom 32bit single.
sqrtpd - Returns square root of 2 64bit doubles.
sqrtsd - Returns square root of bottom 64bit double.
Logic:
andnpd - Logically NOT ANDs 2 64bit doubles.
andnps - Logically NOT ANDs 4 32bit singles.
andpd - Logically ANDs 2 64bit doubles.
pand - Logically ANDs 2 128bit registers.
pandn - Logically Inverts the first 128bit operand and ANDs with the second.
por - Logically ORs 2 128bit registers.
pslldq - Logically left shifts 1 128bit value.
psllq - Logically left shifts 2 64bit values.
pslld - Logically left shifts 4 32bit values.
psllw - Logically left shifts 8 16bit values.
psrad - Arithmetically right shifts 4 32bit values.
psraw - Arithmetically right shifts 8 16bit values.
psrldq - Logically right shifts 1 128bit values.
psrlq - Logically right shifts 2 64bit values.
psrld - Logically right shifts 4 32bit values.
psrlw - Logically right shifts 8 16bit values.
pxor - Logically XORs 2 128bit registers.
orpd - Logically ORs 2 64bit doubles.
xorpd - Logically XORs 2 64bit doubles.
Compare:
cmppd - Compares 2 pairs of 64bit doubles.
cmpsd - Compares bottom 64bit doubles.
comisd - Compares bottom 64bit doubles and stores result in
EFLAGS.
ucomisd - Compares bottom 64bit doubles and stores result in
EFLAGS. (
QNaNs don't throw exceptions with
ucomisd, unlike
comisd.
pcmpxxb - Compares 16 8bit integers.
pcmpxxw - Compares 8 16bit integers.
pcmpxxd - Compares 4 32bit integers.
Compare Codes (the
xx parts above):
eq - Equal to.
lt - Less than.
le - Less than or equal to.
ne - Not equal.
nlt - Not less than.
nle - Not less than or equal to.
ord - Ordered.
unord - Unordered.
Conversion:
cvtdq2pd - Converts 2 32bit integers into 2 64bit doubles.
cvtdq2ps - Converts 4 32bit integers into 4 32bit singles.
cvtpd2pi - Converts 2 64bit doubles into 2 32bit integers in an
MMX register.
cvtpd2dq - Converts 2 64bit doubles into 2 32bit integers in the bottom of an
XMM
register.
cvtpd2ps - Converts 2 64bit doubles into 2 32bit singles in the bottom of an
XMM
register.
cvtpi2pd - Converts 2 32bit integers into 2 32bit singles in the bottom of an
XMM
register.
cvtps2dq - Converts 4 32bit singles into 4 32bit integers.
cvtps2pd - Converts 2 32bit singles into 2 64bit doubles.
cvtsd2si - Converts 1 64bit double to a 32bit integer in a
GPR.
cvtsd2ss - Converts bottom 64bit double to a bottom 32bit single. Tops are unchanged.
cvtsi2sd - Converts a 32bit integer to the bottom 64bit double.
cvtsi2ss - Converts a 32bit integer to the bottom 32bit single.
cvtss2sd - Converts bottom 32bit single to bottom 64bit double.
cvtss2si - Converts bottom 32bit single to a 32bit integer in a
GPR.
cvttpd2pi - Converts 2 64bit doubles to 2 32bit integers using truncation into an
MMX
register.
cvttpd2dq - Converts 2 64bit doubles to 2 32bit integers using truncation.
cvttps2dq - Converts 4 32bit singles to 4 32bit integers using truncation.
cvttps2pi - Converts 2 32bit singles to 2 32bit integers using truncation into an
MMX
register.
cvttsd2si - Converts a 64bit double to a 32bit integer using truncation into a
GPR.
cvttss2si - Converts a 32bit single to a 32bit integer using truncation into a
GPR.
Load/Store:
(is "minimize cache pollution" the same as "without using cache"??)
movq - Moves a 64bit value, clearing the top 64bits of an
XMM register.
movsd - Moves a 64bit double, leaving tops unchanged if move is between two
XMMregisters.
movapd - Moves 2 aligned 64bit doubles.
movupd - Moves 2 unaligned 64bit doubles.
movhpd - Moves top 64bit value to or from an
XMM register.
movlpd - Moves bottom 64bit value to or from an
XMM register.
movdq2q - Moves bottom 64bit value into an
MMX register.
movq2dq - Moves an
MMX register value to the bottom of an
XMM register. Top
is cleared to zero.
movntpd - Moves a 128bit value to memory without using the cache. NT is "Non
Temporal."
movntdq - Moves a 128bit value to memory without using the cache.
movnti - Moves a 32bit value without using the cache.
maskmovdqu - Moves 16 bytes based on sign bits of another
XMM register.
pmovmskb - Generates a 16bit Mask from the sign bits of each byte in an
XMM register.
Shuffling:
pshufd - Shuffles 32bit values in a complex way.
pshufhw - Shuffles high 16bit values in a complex way.
pshuflw - Shuffles low 16bit values in a complex way.
unpckhpd - Unpacks and interleaves top 64bit doubles from 2 128bit sources into 1.
unpcklpd - Unpacks and interleaves bottom 64bit doubles from 2 128 bit sources into 1.
punpckhbw - Unpacks and interleaves top 8 8bit integers from 2 128bit sources into 1.
punpckhwd - Unpacks and interleaves top 4 16bit integers from 2 128bit sources into 1.
punpckhdq - Unpacks and interleaves top 2 32bit integers from 2 128bit sources into 1.
punpckhqdq - Unpacks and interleaces top 64bit integers from 2 128bit sources into 1.
punpcklbw - Unpacks and interleaves bottom 8 8bit integers from 2 128bit sources into 1.
punpcklwd - Unpacks and interleaves bottom 4 16bit integers from 2 128bit sources into 1.
punpckldq - Unpacks and interleaves bottom 2 32bit integers from 2 128bit sources into 1.
punpcklqdq - Unpacks and interleaces bottom 64bit integers from 2 128bit sources into 1.
packssdw - Packs 32bit integers to 16bit integers using saturation.
packsswb - Packs 16bit integers to 8bit integers using saturation.
packuswb - Packs 16bit integers to 8bit unsigned integers unsing saturation.
Cache Control:
clflush - Flushes a Cache Line from all levels of cache.
lfence - Guarantees that all memory loads issued before the
lfence
instruction are completed before anyloads after the
lfence instruction.
mfence - Guarantees that all memory reads and writes issued before
the
mfence instruction are completed before any reads or writes
after the
mfence instruction.
pause - Pauses execution for a set amount of time.