Vectorize Acos for Vector64/128/256/512 and TensorPrimitives by stephentoub · Pull Request #126423 · dotnet/runtime

stephentoub · 2026-04-01T18:13:08Z

Description

Add vectorized Acos implementations for float and double across all SIMD vector types, with unit tests and XML doc consistency.

Changes

Introduces Vector64/128/256/512.Acos public APIs (ref + CoreLib implementations).
Implements vectorized acos kernels in VectorMath for both double and float (via widening to double), ported from AMD's aocl-libm-ose optimized source (src/optimized/acos.c and src/optimized/acosf.c) using Sollya polynomial coefficients — 12-term polynomial for double precision, 5-term polynomial for float precision.
Enables TensorPrimitives.Acos vectorization for float/double and adjusts tensor test tolerances for Acos/AcosPi.
Uses <inheritdoc cref=...> for Vector128/256/512 Acos XML docs, matching the established pattern used by Asin/Cos.
Adds AcosDouble and AcosSingle test data to GenericMathTestMemberData with 19 test cases each covering edge cases (NaN, ±∞, ±0, ±1, and values across the valid input range).
Adds AcosDoubleTest and AcosSingleTest methods to Vector64Tests, Vector128Tests, Vector256Tests, and Vector512Tests (152 tests total, all passing).
All polynomial coefficients are bitwise-verified against the aocl-libm-ose hex float constants.
AcosDouble uses a single generic type parameter (TVectorDouble) since the optimized algorithm does not require the TVectorUInt64 bit-manipulation used in AsinDouble.

Add vectorized Acos implementations for float and double across all SIMD vector types. - AcosDouble: AMD acos.c [5,4] rational polynomial with s1/c high-precision reconstruction - AcosSingle: widens to double, uses AMD acosf.c [3,1] rational polynomial - Hook up TensorPrimitives.Acos to use vectorized implementations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Adds vectorized Acos implementations for float/double across the System.Runtime.Intrinsics SIMD surface area and enables TensorPrimitives.Acos to use those SIMD implementations when available.

Changes:

Introduces Vector64/128/256/512.Acos public APIs (ref + CoreLib implementations).
Implements vectorized acos kernels in VectorMath for both double and float (via widening to double).
Enables TensorPrimitives.Acos vectorization for float/double and adjusts tensor test tolerances for Acos/AcosPi.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs	Adds new public ref declarations for `Vector*.Acos` overloads.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/VectorMath.cs	Adds the core vectorized `AcosDouble` / `AcosSingle` implementations.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs	Exposes `Vector64.Acos` overloads and scalar fallback helper.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs	Exposes `Vector128.Acos` overloads and fallback to `Vector64.Acos`.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs	Exposes `Vector256.Acos` overloads and fallback to `Vector128.Acos`.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs	Exposes `Vector512.Acos` overloads and fallback to `Vector256.Acos`.
src/libraries/System.Numerics.Tensors/tests/TensorPrimitives.Generic.cs	Adjusts tolerances for `Acos` / `AcosPi` validation.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Acos.cs	Enables vectorized `TensorPrimitives.Acos` for `float`/`double` using `Vector*.Acos`.

Copilot · 2026-04-01T18:20:12Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/VectorMath.cs

+             TVectorDouble result = TVectorDouble.ConditionalSelect(transformMask, vTransform, vNormal);
+
+             // Handle special cases: |x| > 1 returns NaN, x = ±1 returns 0 or π
+             TVectorDouble absXGreaterThanOne = TVectorDouble.GreaterThan(TVectorDouble.Abs(x), TVectorDouble.One);


In AcosDouble, absXGreaterThanOne recomputes Abs(x) even though ax (the absolute value) was already computed earlier. Reuse ax here to avoid redundant vector work and keep the special-case handling consistent with the rest of the method.

Suggested change

TVectorDouble absXGreaterThanOne = TVectorDouble.GreaterThan(TVectorDouble.Abs(x), TVectorDouble.One);

TVectorDouble absXGreaterThanOne = TVectorDouble.GreaterThan(ax, TVectorDouble.One);

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs

stephentoub · 2026-04-01T18:20:45Z

@copilot please evaluate and address the feedback

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/e622ed4f-7db6-4d3e-a09c-d121e640f0fe Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>

Copilot · 2026-04-01T19:01:27Z

@copilot please evaluate and address the feedback

Addressed all four pieces of feedback in commits e7ddf77 and 548d000:

Reuse ax: Changed TVectorDouble.Abs(x) to ax in absXGreaterThanOne computation
<inheritdoc> for XML docs: Switched Vector128/256/512 Acos docs to use <inheritdoc cref=...> matching the Asin/Cos pattern (Vector64 → Vector128 → Vector256 → Vector512)
Acos tests: Added AcosDouble/AcosSingle test data to GenericMathTestMemberData and Acos test methods to all four vector test files (152 tests, all passing)

- Fix extra leading space in AcosDouble, AcosSingle, and AcosSingleCoreDouble methods to match the indentation of AsinDouble/AsinSingle - Replace magic float literal 3.1415927f with float.Pi for clarity Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated no new comments.

github-actions · 2026-04-02T18:19:43Z

🤖 Copilot Code Review — PR #126423

Note

This review was AI-generated by Copilot using Claude Opus 4.6 (primary) and GPT-5.4 (secondary).

Holistic Assessment

Motivation: Justified. TensorPrimitives.Acos was explicitly marked TODO: Vectorize, and the corresponding Asin was already vectorized in the same codebase. Vectorized trig functions provide significant batch-processing speedups. This is a natural and expected extension.

Approach: Correct and well-structured. The implementation mirrors the existing Asin vectorization pattern exactly — same VectorMath infrastructure, same Vector64→128→256→512 decomposition, same TensorPrimitives operator hookup. The math is based on AMD's AOCL-LIBM-OSE (BSD-3 licensed, referenced in THIRD-PARTY-NOTICES.TXT). The double-precision implementation shares the same [5,4] rational minimax coefficients (C1–C6, D1–D5) as AsinDouble, and the single-precision version correctly widens to double before computing.

Summary: ⚠️ Needs Human Review. The code is mathematically correct and structurally consistent with the existing Asin vectorization. Both independent model reviews (Claude Opus 4.6 and GPT-5.4) found no correctness issues. The one item requiring human judgment is the new public API surface — 8 new public methods appear in the ref assembly without a visible api-approved issue link. Given the author and that Asin counterparts already exist, this is very likely approved, but a human reviewer should confirm.

Detailed Findings

✅ Mathematical Correctness — Algorithms are sound

Both AcosDouble and AcosSingleCoreDouble correctly implement the standard acos approximation:

|x| ≤ 0.5: acos(x) = π/2 - (x + x³·R(x²)) via rational polynomial
|x| > 0.5: sign-dependent reconstruction using acos(x) = π - 2·asin(√((1-|x|)/2)) (negative) or 2·asin(√((1-x)/2)) (positive)

Key verifications:

Polynomial coefficients C1–C6 and D1–D5 in AcosDouble are identical to AsinDouble (as expected — both share the same underlying rational minimax)
The s1/c high-precision bit-masking (0xFFFFFFFF00000000) matches AsinDouble exactly
AcosSingleCoreDouble correctly omits the s1/c correction since double precision is sufficient for float output (matching AsinSingleCoreDouble's approach)
The sign-dependent transform path correctly handles acos being a non-odd function (unlike asin where sign can simply be factored out)
PI constant 3.1415926535897933e+00 (0x400921fb54442d18) is the correct IEEE 754 double representation

✅ Special Case Handling — All edge cases covered

Input	`AcosDouble`	`AcosSingle`	Expected	Correct
`	x	> 1`	→ NaN	→ NaN
`x = 1`	→ 0	→ 0	0	✅
`x = -1`	→ PI	→ `float.Pi`	π	✅
`NaN`	→ NaN	→ NaN	NaN	✅
`±Inf`	→ NaN	→ NaN	NaN	✅
`±0`	→ π/2	→ π/2	π/2	✅

Intermediate NaN/division-by-zero at boundary values (e.g., sqrt(negative) when |x| > 1, or (r - s1*s1)/(s + s1) when x = ±1) are all masked by the final ConditionalSelect special-case handlers. This is the same branchless SIMD approach used by AsinDouble.

✅ Structural Consistency — Matches Asin pattern exactly

VectorMath methods: AcosDouble/AcosSingle/AcosSingleCoreDouble mirror AsinDouble/AsinSingle/AsinSingleCoreDouble structurally
Vector64/128/256/512: All Acos methods use the same IsHardwareAccelerated dispatch, wider-vector upgrade path, and lower/upper decomposition as their Asin counterparts
TensorPrimitives: AcosOperator uses the same NET11_0_OR_GREATER gating and type dispatch as AsinOperator
Ref assembly: Acos entries are correctly placed alphabetically before Asin in all four Vector classes
XML docs: Vector64 defines full doc comments; Vector128→256→512 chain inheritdoc correctly
Scalar fallback: Vector64.Acos<T> matches the existing Vector64.Asin<T> loop pattern

✅ Test Coverage — Adequate

Theory-based tests added for all 4 vector sizes (Vector64, Vector128, Vector256, Vector512) × 2 types (float, double)
Test data in GenericMathTestMemberData.AcosDouble/AcosSingle covers: NaN, ±Inf, ±1, ±0, and 14 intermediate values at mathematically significant points (π/4, 1/√2, ln(2), 2/π, log₁₀(e), 1/π)
Variance tolerances use CrossPlatformMachineEpsilon scaled appropriately
TensorPrimitives.Generic.cs tolerance updated from null to 2e-9/1e-6f to account for MultiplyAddEstimate differences — reasonable for vectorized approximations

⚠️ API Surface — Needs human confirmation of approval

This PR adds 8 new public methods to the ref assembly (Vector64.Acos(float), Vector64.Acos(double), etc. for all 4 vector types). No linked issue with api-approved label was found in the commit messages. Given that:

The author (stephentoub) is a .NET runtime maintainer and API reviewer
The identical Asin counterpart APIs already exist
This follows an established vectorization pattern

...this is very likely already approved, but a human reviewer should confirm the API approval status before merge.

Generated by Code Review for issue #126423 · ◷

tannergooding · 2026-04-06T16:22:47Z

src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/VectorMath.cs

+            // where R(x^2) is a [5,4] rational minimax approximation (same as asin.c).
+
+            // Rational polynomial coefficients (same as asin.c / AMD acos.c)
+            const double C1 = 0.227485835556935010735943483075;


@copilot where do these polynomials come from?

The aocl-libm acos implementation uses 0x1.55555555552aap-3 as the value for C1, which is bitwise 0x3fc55555555552aa, representing 0.166666666666647700356662653575767762959003448486328125 as the infinitely precise value.

The algorithm needs to be ported from the aocl-libm-ose source.

Ported the correct algorithm from src/optimized/acos.c and src/optimized/acosf.c in commit 58d2ed8. The previous implementation incorrectly used the rational polynomial from src/ref/acos.c. The new implementation uses:

AcosDouble: 12-term Sollya polynomial (C1-C12) matching the optimized acos.c, with split-constant reconstruction (A - poly) + A / (B + poly) + B for precision

AcosSingleCoreDouble: 5-term Sollya polynomial (C1-C5) matching the optimized acosf.c, same reconstruction approach

All 152 Acos vector tests pass.

…coefficients Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/22f9ee13-f6b9-4c35-ad05-63ff26534e31 Co-authored-by: tannergooding <10487869+tannergooding@users.noreply.github.com>

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/22f9ee13-f6b9-4c35-ad05-63ff26534e31 Co-authored-by: tannergooding <10487869+tannergooding@users.noreply.github.com>

stephentoub added the area-System.Runtime.Intrinsics label Apr 1, 2026

Copilot AI review requested due to automatic review settings April 1, 2026 18:13

dotnet-policy-service bot assigned stephentoub Apr 1, 2026

Copilot started reviewing on behalf of stephentoub April 1, 2026 18:14 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Copilot started work on behalf of stephentoub April 1, 2026 18:20 View session

This comment has been minimized.

Sign in to view

Copilot AI and others added 2 commits April 1, 2026 18:57

Address review feedback: reuse ax, use inheritdoc, add Acos tests

e7ddf77

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/e622ed4f-7db6-4d3e-a09c-d121e640f0fe Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>

Remove unrelated files accidentally included from main branch checkout

548d000

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/e622ed4f-7db6-4d3e-a09c-d121e640f0fe Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 1, 2026 19:00

stephentoub review requested due to automatic review settings April 1, 2026 19:00

Copilot finished work on behalf of stephentoub April 1, 2026 19:02

Copilot AI review requested due to automatic review settings April 2, 2026 17:33

Copilot started reviewing on behalf of stephentoub April 2, 2026 17:34 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

This comment has been minimized.

Sign in to view

build-analysis bot mentioned this pull request Apr 2, 2026

Android arm32 device not found (armeabi-v7a architecture unavailable) #125440

Open

tannergooding reviewed Apr 6, 2026

View reviewed changes

Copilot started work on behalf of tannergooding April 6, 2026 16:23 View session

Copilot AI and others added 2 commits April 6, 2026 17:00

Port Acos from optimized aocl-libm-ose source with Sollya polynomial …

58d2ed8

…coefficients Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/22f9ee13-f6b9-4c35-ad05-63ff26534e31 Co-authored-by: tannergooding <10487869+tannergooding@users.noreply.github.com>

Fix acosf C1 coefficient and remove unused TVectorUInt64 type parameter

5c3e19f

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/22f9ee13-f6b9-4c35-ad05-63ff26534e31 Co-authored-by: tannergooding <10487869+tannergooding@users.noreply.github.com>

Copilot AI review requested due to automatic review settings April 6, 2026 17:05

stephentoub review requested due to automatic review settings April 6, 2026 17:05

Copilot finished work on behalf of tannergooding April 6, 2026 17:07

Copilot AI requested a review from tannergooding April 6, 2026 17:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize Acos for Vector64/128/256/512 and TensorPrimitives#126423

Vectorize Acos for Vector64/128/256/512 and TensorPrimitives#126423
stephentoub wants to merge 6 commits intomainfrom
copilot/vectorize-acos

stephentoub commented Apr 1, 2026 •

edited by Copilot AI

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stephentoub commented Apr 1, 2026

Uh oh!

This comment has been minimized.

Copilot AI commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

This comment has been minimized.

github-actions bot commented Apr 2, 2026

Uh oh!

tannergooding Apr 6, 2026

Uh oh!

Copilot AI Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	TVectorDouble absXGreaterThanOne = TVectorDouble.GreaterThan(TVectorDouble.Abs(x), TVectorDouble.One);
	TVectorDouble absXGreaterThanOne = TVectorDouble.GreaterThan(ax, TVectorDouble.One);

Conversation

stephentoub commented Apr 1, 2026 • edited by Copilot AI Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stephentoub commented Apr 1, 2026

Uh oh!

This comment has been minimized.

Copilot AI commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

This comment has been minimized.

github-actions bot commented Apr 2, 2026

🤖 Copilot Code Review — PR #126423

Holistic Assessment

Detailed Findings

✅ Mathematical Correctness — Algorithms are sound

✅ Special Case Handling — All edge cases covered

✅ Structural Consistency — Matches Asin pattern exactly

✅ Test Coverage — Adequate

⚠️ API Surface — Needs human confirmation of approval

Uh oh!

tannergooding Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

stephentoub commented Apr 1, 2026 •

edited by Copilot AI

Loading