Skip to content

[mono] Implement Sse2.AddSaturate using LLVM#32116

Merged
EgorBo merged 4 commits intodotnet:masterfrom
EgorBo:mono-sse2-addsaturate
Feb 11, 2020
Merged

[mono] Implement Sse2.AddSaturate using LLVM#32116
EgorBo merged 4 commits intodotnet:masterfrom
EgorBo:mono-sse2-addsaturate

Conversation

@EgorBo
Copy link
Member

@EgorBo EgorBo commented Feb 11, 2020

Mono currently supports a limited subset of Sse1-Sse42 intrinsics used only by CoreLib internally. So once a new API is used to optimize things there, we also have to implement it in Mono. So it recently happened in #32036 (Sse2.AddSaturate was used for the first time).

So this PR implements it for mono (with all Sse2 overloads) using named LLVM intrinsics. It turns out it's different between LLVM6 (we currently based on) and LLVM9 (we plan to migrate to soon).

for Vector128<byte> overload it emits

%result = call <8 x i16> @llvm.x86.sse2.paddus.b(<16 x i8> %left, <16 x i8> %right)

which is then emitted as vpaddusw

{INTRINS_SSE_ROUNDPD, "llvm.x86.sse41.round.pd"},
{INTRINS_SSE_PTESTZ, "llvm.x86.sse41.ptestz"},
{INTRINS_SSE_INSERTPS, "llvm.x86.sse41.insertps"},
#if LLVM_API_VERSION >= 800
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EgorBo EgorBo requested a review from vargaz February 11, 2020 13:51
@EgorBo EgorBo assigned imhameed and unassigned imhameed Feb 11, 2020
@EgorBo EgorBo requested a review from imhameed February 11, 2020 13:51
@tannergooding
Copy link
Member

@EgorBo, is there a doc on how to add these as work happens so we can get both done at the same time in the future?

Is it worth doing it for ARM64 as well and would the process be similar?

@EgorBo
Copy link
Member Author

EgorBo commented Feb 11, 2020

@EgorBo, is there a doc on how to add these as work happens so we can get both done at the same time in the future?

@tannergooding Not yet, the code needs some refactoring as there are too many repeatable patterns. The problem is the fact that sometimeы hw intrinsics are implemented as LLVM named intrinsics but in most cases we have to use native llvm-vector operators to implement them. But I'll try to compose a small doc. But first we need to add "Mono-LLVM CI lane" to catch such things (intrinsics are implemented only when LLVM back-end is available) 🙂

Is it worth doing it for ARM64 as well and would the process be similar?

We currently don't support them yet (there is a PR but it's WIP and only for scalar intrinsics so far)

@EgorBo EgorBo merged commit 4a5442b into dotnet:master Feb 11, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants