Vectorize Acos for Vector64/128/256/512 and TensorPrimitives#126423
Vectorize Acos for Vector64/128/256/512 and TensorPrimitives#126423stephentoub wants to merge 6 commits intomainfrom
Conversation
Add vectorized Acos implementations for float and double across all SIMD vector types. - AcosDouble: AMD acos.c [5,4] rational polynomial with s1/c high-precision reconstruction - AcosSingle: widens to double, uses AMD acosf.c [3,1] rational polynomial - Hook up TensorPrimitives.Acos to use vectorized implementations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds vectorized Acos implementations for float/double across the System.Runtime.Intrinsics SIMD surface area and enables TensorPrimitives.Acos to use those SIMD implementations when available.
Changes:
- Introduces
Vector64/128/256/512.Acospublic APIs (ref + CoreLib implementations). - Implements vectorized
acoskernels inVectorMathfor bothdoubleandfloat(via widening todouble). - Enables
TensorPrimitives.Acosvectorization forfloat/doubleand adjusts tensor test tolerances forAcos/AcosPi.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs | Adds new public ref declarations for Vector*.Acos overloads. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/VectorMath.cs | Adds the core vectorized AcosDouble / AcosSingle implementations. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs | Exposes Vector64.Acos overloads and scalar fallback helper. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs | Exposes Vector128.Acos overloads and fallback to Vector64.Acos. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs | Exposes Vector256.Acos overloads and fallback to Vector128.Acos. |
| src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs | Exposes Vector512.Acos overloads and fallback to Vector256.Acos. |
| src/libraries/System.Numerics.Tensors/tests/TensorPrimitives.Generic.cs | Adjusts tolerances for Acos / AcosPi validation. |
| src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Acos.cs | Enables vectorized TensorPrimitives.Acos for float/double using Vector*.Acos. |
| TVectorDouble result = TVectorDouble.ConditionalSelect(transformMask, vTransform, vNormal); | ||
|
|
||
| // Handle special cases: |x| > 1 returns NaN, x = ±1 returns 0 or π | ||
| TVectorDouble absXGreaterThanOne = TVectorDouble.GreaterThan(TVectorDouble.Abs(x), TVectorDouble.One); |
There was a problem hiding this comment.
In AcosDouble, absXGreaterThanOne recomputes Abs(x) even though ax (the absolute value) was already computed earlier. Reuse ax here to avoid redundant vector work and keep the special-case handling consistent with the rest of the method.
| TVectorDouble absXGreaterThanOne = TVectorDouble.GreaterThan(TVectorDouble.Abs(x), TVectorDouble.One); | |
| TVectorDouble absXGreaterThanOne = TVectorDouble.GreaterThan(ax, TVectorDouble.One); |
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs
Outdated
Show resolved
Hide resolved
|
@copilot please evaluate and address the feedback |
This comment has been minimized.
This comment has been minimized.
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/e622ed4f-7db6-4d3e-a09c-d121e640f0fe Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/e622ed4f-7db6-4d3e-a09c-d121e640f0fe Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Addressed all four pieces of feedback in commits e7ddf77 and 548d000:
|
- Fix extra leading space in AcosDouble, AcosSingle, and AcosSingleCoreDouble methods to match the indentation of AsinDouble/AsinSingle - Replace magic float literal 3.1415927f with float.Pi for clarity Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
🤖 Copilot Code Review — PR #126423Note This review was AI-generated by Copilot using Claude Opus 4.6 (primary) and GPT-5.4 (secondary). Holistic AssessmentMotivation: Justified. Approach: Correct and well-structured. The implementation mirrors the existing Summary: Detailed Findings✅ Mathematical Correctness — Algorithms are soundBoth
Key verifications:
✅ Special Case Handling — All edge cases covered
Intermediate NaN/division-by-zero at boundary values (e.g., ✅ Structural Consistency — Matches Asin pattern exactly
✅ Test Coverage — Adequate
|
| // where R(x^2) is a [5,4] rational minimax approximation (same as asin.c). | ||
|
|
||
| // Rational polynomial coefficients (same as asin.c / AMD acos.c) | ||
| const double C1 = 0.227485835556935010735943483075; |
There was a problem hiding this comment.
@copilot where do these polynomials come from?
The aocl-libm acos implementation uses 0x1.55555555552aap-3 as the value for C1, which is bitwise 0x3fc55555555552aa, representing 0.166666666666647700356662653575767762959003448486328125 as the infinitely precise value.
The algorithm needs to be ported from the aocl-libm-ose source.
There was a problem hiding this comment.
Ported the correct algorithm from src/optimized/acos.c and src/optimized/acosf.c in commit 58d2ed8. The previous implementation incorrectly used the rational polynomial from src/ref/acos.c. The new implementation uses:
- AcosDouble: 12-term Sollya polynomial (C1-C12) matching the optimized acos.c, with split-constant reconstruction
(A - poly) + A/(B + poly) + Bfor precision - AcosSingleCoreDouble: 5-term Sollya polynomial (C1-C5) matching the optimized acosf.c, same reconstruction approach
All 152 Acos vector tests pass.
…coefficients Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/22f9ee13-f6b9-4c35-ad05-63ff26534e31 Co-authored-by: tannergooding <10487869+tannergooding@users.noreply.github.com>
Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/22f9ee13-f6b9-4c35-ad05-63ff26534e31 Co-authored-by: tannergooding <10487869+tannergooding@users.noreply.github.com>
Description
Add vectorized Acos implementations for float and double across all SIMD vector types, with unit tests and XML doc consistency.
Changes
Vector64/128/256/512.Acospublic APIs (ref + CoreLib implementations).acoskernels inVectorMathfor bothdoubleandfloat(via widening todouble), ported from AMD'saocl-libm-oseoptimized source (src/optimized/acos.candsrc/optimized/acosf.c) using Sollya polynomial coefficients — 12-term polynomial for double precision, 5-term polynomial for float precision.TensorPrimitives.Acosvectorization forfloat/doubleand adjusts tensor test tolerances forAcos/AcosPi.<inheritdoc cref=...>for Vector128/256/512 Acos XML docs, matching the established pattern used by Asin/Cos.AcosDoubleandAcosSingletest data toGenericMathTestMemberDatawith 19 test cases each covering edge cases (NaN, ±∞, ±0, ±1, and values across the valid input range).AcosDoubleTestandAcosSingleTestmethods to Vector64Tests, Vector128Tests, Vector256Tests, and Vector512Tests (152 tests total, all passing).AcosDoubleuses a single generic type parameter (TVectorDouble) since the optimized algorithm does not require theTVectorUInt64bit-manipulation used inAsinDouble.