Skip to content

Vectorize Acosh for Vector64/128/256/512 and TensorPrimitives#126424

Open
stephentoub wants to merge 2 commits intomainfrom
copilot/vectorize-acosh
Open

Vectorize Acosh for Vector64/128/256/512 and TensorPrimitives#126424
stephentoub wants to merge 2 commits intomainfrom
copilot/vectorize-acosh

Conversation

@stephentoub
Copy link
Copy Markdown
Member

@stephentoub stephentoub commented Apr 1, 2026

Description

Add vectorized Acosh implementations for float and double across all SIMD vector types.

Changes:

  • Exposes new Vector64/128/256/512.Acosh APIs (float/double) and updates the intrinsics ref surface in correct alphabetical order.
  • Introduces VectorMath.AcoshDouble / VectorMath.AcoshSingle implementations used by the new vector APIs.
  • AcoshDouble uses (x-1)*(x+1) instead of x²-1 to avoid catastrophic cancellation near x=1, with a large-value optimization using log(2) + log(x) for x > 2^28.
  • AcoshSingle widens to double precision, calls AcoshDouble, and narrows the result back for improved accuracy.
  • Enables TensorPrimitives.Acosh vectorization for float/double with test tolerances of 1e-14 (double) and 1e-6 (float).

Add vectorized Acosh implementations for float and double across all SIMD vector types.
- AcoshDouble: uses log identity (log(x + sqrt(x^2 - 1))) for vectorization
- AcoshSingle: widens to double and calls AcoshDouble
- Hook up TensorPrimitives.Acosh to use vectorized implementations

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 1, 2026 18:13
@dotnet-policy-service
Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime-intrinsics
See info in area-owners.md if you want to be subscribed.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds SIMD-enabled Acosh support across Vector64/128/256/512 and enables vectorization for TensorPrimitives.Acosh (float/double), wiring everything through new implementations in VectorMath.

Changes:

  • Exposes new Vector64/128/256/512.Acosh APIs (float/double) and updates the intrinsics ref surface.
  • Introduces VectorMath.AcoshDouble / VectorMath.AcoshSingle implementations used by the new vector APIs.
  • Enables TensorPrimitives.Acosh vectorization for float/double and adjusts test tolerances.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs Adds new Acosh overloads to the public ref surface for all vector widths.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/VectorMath.cs Implements vectorized Acosh core routines used by vector APIs.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector64.cs Adds Vector64.Acosh (float/double) and scalar fallback loop.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector128.cs Adds Vector128.Acosh (float/double) with fallback composition.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector256.cs Adds Vector256.Acosh (float/double) with fallback composition.
src/libraries/System.Private.CoreLib/src/System/Runtime/Intrinsics/Vector512.cs Adds Vector512.Acosh (float/double) with fallback composition.
src/libraries/System.Numerics.Tensors/tests/TensorPrimitives.Generic.cs Updates Acosh tolerance used by generic tensor tests.
src/libraries/System.Numerics.Tensors/src/System/Numerics/Tensors/netcore/TensorPrimitives.Acosh.cs Makes Acosh vectorizable for float/double and routes vector calls to Vector*.Acosh.
Comments suppressed due to low confidence (3)

src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs:490

  • Same ordering issue as above: Acosh should appear before Asin in this alphabetically ordered list of Vector256 APIs. Please reorder to maintain consistent ref file ordering.
    src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs:944
  • Same ordering issue as above: Acosh should be listed before Asin in the Vector512 ref API list (the surrounding entries are alphabetically ordered).
    src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs:1397
  • Same ordering issue as above: Acosh should be listed before Asin in the Vector64 ref API list to preserve alphabetical ordering.

@stephentoub
Copy link
Copy Markdown
Member Author

@copilot please evaluate and address the feedback

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Note

This review was generated by Copilot.

🤖 Copilot Code Review — PR #126424

Holistic Assessment

Motivation: Vectorizing Acosh for SIMD vector types and TensorPrimitives is well-motivated — it follows the established pattern from Asin (already vectorized) and fills a TODO: Vectorize gap in the codebase. The mathematical approach is sound and the implementation is numerically careful.

Approach: The implementation correctly uses the identity acosh(x) = log(x + sqrt(x² − 1)) with special-case handling for values near 1 (to avoid catastrophic cancellation) and large values (to avoid overflow). The float path widens to double for accuracy, consistent with other functions. The overall pattern closely follows the existing Asin vectorization.

Summary: ⚠️ Needs Human Review. The code is mathematically correct and follows established patterns. However, the new public API surface in the ref file is not in alphabetical order, and I could not locate a linked api-approved issue (which may exist but is not referenced in the PR). A human reviewer should confirm these points.


Detailed Findings

⚠️ Ref File Ordering — Acosh placed after Asin instead of before

In src/libraries/System.Runtime.Intrinsics/ref/System.Runtime.Intrinsics.cs, the new Acosh entries are placed after Asin in all four Vector types (Vector64, Vector128, Vector256, Vector512). Alphabetically, Acosh (Ac...) should come before Asin (As...), since 'c' < 's'.

Current order:

AnyWhereAllBitsSet
Asin(double)
Asin(float)
Acosh(double)  ← new
Acosh(float)   ← new
AsByte
```

Expected alphabetical order:
```
AnyWhereAllBitsSet
Acosh(double)
Acosh(float)
Asin(double)
Asin(float)
AsByte

The same ordering issue exists in the implementation files (Vector64.cs, Vector128.cs, Vector256.cs, Vector512.cs), where Acosh methods are placed after Asin methods but should precede them alphabetically — consistent with the existing pattern where Asin comes before Cos which comes before Sin.

⚠️ API Approval — No linked api-approved issue found

This PR adds 8 new public API methods (Acosh for float and double across Vector64/128/256/512) with corresponding ref assembly entries. Per dotnet/runtime process, new public APIs typically require a linked issue with the api-approved label. The commit message and PR branch contain no issue references (Fixes #N, Closes #N). This may be covered by a broader vectorization tracking issue or batch approval — a human reviewer should confirm.

✅ Mathematical Correctness — All identities verified

The AcoshDouble implementation is mathematically sound:

  • Normal case log(x + sqrt(x² − 1)): Standard identity, verified correct.
  • Near-one case sqrt(2(x − 1)) for x ≤ 1 + 2⁻²⁵: Valid first-order Taylor approximation of acosh(1 + ε) ≈ sqrt(2ε). The threshold 2⁻²⁵ is well-chosen to prevent catastrophic cancellation in x² − 1 while the approximation error remains < 2.5e-9 relative.
  • Large case ln(2) + ln(x) for x > 2²⁸: Correct asymptotic identity. The threshold 2²⁸ is more conservative than AMD AOCL's 1/sqrt(ε) (~6.7e7), which is fine.
  • Domain check x < 1 → NaN: Correct. x = 1 correctly falls through to nearOneMask producing sqrt(0) = 0 = acosh(1).
  • ConditionalSelect ordering: nanMask applied last (highest priority) correctly overrides all other results for out-of-domain inputs. No mask overlap between nearOneMask and largeMask ranges.
  • SIMD-correct: All paths compute eagerly (as required by branchless SIMD), and invalid intermediate results (e.g., sqrt(negative) → NaN for x < 1) are properly masked out by ConditionalSelect.

✅ Pattern Consistency — Matches existing Asin vectorization

The Acosh methods in Vector64/128/256/512 follow the exact same structure as the existing Asin methods:

  • Hardware-accelerated path calls into VectorMath.AcoshDouble/AcoshSingle
  • Fallback path decomposes into lower/upper halves
  • Float path checks for next-wider hardware acceleration (Vector128.IsHardwareAccelerated for Vector64 float, etc.)
  • Vector64<T>.Acosh<T> scalar fallback loop uses same pattern as Asin<T>

✅ Test Coverage — Appropriate tolerances

The test tolerance change from null (exact match) to DetermineTolerance(doubleTolerance: 1e-14, floatTolerance: 1e-6f) is appropriate and consistent with similar vectorized functions (Cosh uses 1e-14 double, Exp2 uses 1e-14/1e-5f).

💡 AcoshSingle Attribution Comment — Slightly misleading

The AcoshSingle comment states "This code is based on acoshf from amd/aocl-libm-ose" and describes the AMD three-range approach (including log1p), but the actual implementation simply widens to double and calls AcoshDouble. The code is inspired by the AMD approach (which also widens to double for float accuracy) but doesn't implement the described algorithm. The comment also lists a copyright range of "2008-2022" while THIRD-PARTY-NOTICES.TXT says "2008-2020". Consider clarifying that the implementation uses widening-to-double rather than the described multi-range approach directly.

💡 AcoshOperator Visibility — privateinternal

AcoshOperator<T> was changed from private to internal. The pattern is mixed across operators — AsinhOperator, AtanOperator, AtanhOperator are internal, while AsinOperator, AcosOperator are private. Not a blocking concern, but worth noting the inconsistency with the closely related AsinOperator (which is vectorized and private).

Generated by Code Review for issue #126424 ·

…racy, fix comment

Agent-Logs-Url: https://github.com/dotnet/runtime/sessions/43d57cf9-0cfe-4d16-9977-8c3294328c02

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 1, 2026

@copilot please evaluate and address the feedback

Addressed all three feedback items in 3544d66:

  1. Alphabetical ordering: Moved Acosh before Asin at all 4 locations in the ref file.
  2. Near-one accuracy: Replaced x²-1 with (x-1)*(x+1) to avoid catastrophic cancellation near x=1, and removed the inaccurate sqrt(2*(x-1)) fast-path entirely — the reformulated expression handles values near 1 accurately since LogDouble internally uses a log1p polynomial for reduced arguments near 1.
  3. Misleading comment: Updated the AcoshSingle comment to accurately describe the widen-to-double implementation.

All 1575 TensorPrimitives tests pass (including Acosh with 1e-14 double tolerance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants