Add support for array API to RidgeCV, RidgeClassifier and RidgeClassifierCV by jeromedockes · Pull Request #27961 · scikit-learn/scikit-learn

jeromedockes · 2023-12-14T15:34:38Z

Reference Issues/PRs

Towards #26024.

This PR extends the one for Ridge (still WIP, #27800) to use the array API in RidgeCV and RidgeClassifierCV (when cv="gcv")

What does this implement/fix? Explain your changes.

this could make those estimators faster as an important part of their computational cost is due to compute either an eigendecomposition of XX^T or an SVD of X

Any other comments?

The _RidgeGCV has numerical precision issues when computations are done in float32, which is why ATM in the main branch it always uses float64
I'm not sure what should be done for array API inputs on devices that do not have float64

not handled yet:

RidgeClassifierCV

github-actions · 2023-12-14T15:36:09Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 8f3f0d9. Link to the linter CI: here}

jeromedockes · 2023-12-15T14:34:48Z

I think the test failures for Ridge and RidgeCV arise from r2_score and will be handled in #27904
For RidgeClassifierCV we need to support the array API in LabelBinarizer

ogrisel · 2024-03-14T14:29:10Z

While I am thinking about it, please don't forget to update:

https://scikit-learn.org/dev/modules/array_api.html#support-for-array-api-compatible-inputs

ogrisel · 2024-05-15T13:15:38Z

sklearn/linear_model/_ridge.py

+        if sparse.issparse(X):
+            dtype = np.float64
+        else:
+            dtype = [xp.float64, xp.float32]


Contrary to what I said in this morning meeting, I think we might want to implement the following logic:

if the input namespace/device supports xp.float64 upcasting, then do the upcast (as we currently do with NumPy)

if not (e.g. pytorch + MPS device combination), accept that we have degraded numerical performance, adjust the tolerance in the tests accordingly and document this limited numerical precision guarantee in our Array API doc.

I think this is the strategy we are leaning towards in the review of #27113. During the review of the r2_score PR, I believe that @adrinjalali preferred that approach.

In a future PR, we might decide to drop the float32 -> float64 upcast in general for this estimator (as it silently triggers a potentially very large and unexpected memory allocation which is a usability problem in itself, even with NumPy) but I would rather make this decision independently of Array API support.

how would you recommend I check if the upcasting is possible? should I temporarily copy the max_precision_float_dtype and supported_float_dtypes changes from 27113 until it is merged? or is there already a utility in scikit-learn for checking that which I missed?

Feel free to copy with a TODO comment to remove redundant code once #27113 is merged to be able to decouple the 2 reviews.

when we do the upcast with what precision should we store the coefficients and intercept? I guess for prediction we do not need the extra precision so we should use X's original dtype?

sklearn/linear_model/_ridge.py

sklearn/utils/_array_api.py

adrinjalali

This is neat! From my point of view LGTM. But I haven't checked the tests or mathematical correctness.

sklearn/linear_model/_ridge.py

ogrisel · 2025-10-10T12:56:16Z

I removed the stalled label since @OmarManzoor is pushing new commits to finalize this important PR.

ogrisel · 2025-10-10T15:45:10Z

Hum, _atol_for_type is used by the silhouette_samples function itself. I am not sure what to do:

we could change the silhouette tests to adjust for the new _atol_for_type for semantics;
or find a way to change the tols only for the array API tests without changing the other uses if _atol_for_type;
or rewrite silhouette_samples to not rely on _atol_for_type and instead do its own internal tol adjustment, and reserve _atol_for_type for testing purposes.

ogrisel · 2025-10-10T15:47:15Z

I think I would be in favor of the 3rd option.

ogrisel · 2025-10-10T15:47:50Z

@jeremiedbb if you have opinions on tol settings ;)

jeremiedbb · 2025-10-10T16:55:43Z

I also think that option 3 is more appropriate. In silhouette_score it's used as an eps to filter out extremely small values, whereas in testing it's used a tol (close semantics but I find a small difference between the 2).
By the way I find that 100 * finfo.eps is quite low. I wouldn't mind increasing it a bit

OmarManzoor · 2025-10-10T17:42:30Z

I also think that we should remove this dependency from silhouette score and directly calculate atol using the original factor within the function while keeping _atol_for_type as it is in the latest commit where we increase the factor by 10.

jeromedockes · 2025-10-12T19:12:38Z

@jeromedockes just checking, are you still interested in working on this?

I'm sorry for the late reply, @lucyleeow ! It's not for lack of interest, but unfortunately I really don't have the time at the moment. I should have said so earlier to avoid stalling it. I'm glad to see you picked it up @OmarManzoor , thanks!!

OmarManzoor · 2025-10-13T07:05:00Z

sklearn/linear_model/_ridge.py

+            decision = self.decision_function(X)
+            xp, is_array_api, device_ = get_namespace_and_device(decision)
+            max_float_dtype = _max_precision_float_dtype(xp, device=device_)
+            scores = 2.0 * xp.astype(decision > 0, max_float_dtype) - 1.0


The xp.astype(decision > 0, max_float_dtype) was previously hardcoded to
xp.astype(decision > 0, xp.float32) but I decided to use the max_float_dtype instead. Let me know if we should instead revert back to xp.float32

ogrisel

Thanks, @OmarManzoor, for pushing this to the finish line. A final pass of nitpicks but otherwise, LGTM.

sklearn/linear_model/_ridge.py

sklearn/linear_model/_base.py

…fierCV (scikit-learn#27961) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Omar Salman <omar.salman@arbisoft.com>

github-actions bot added module:linear_model module:preprocessing module:utils labels Dec 14, 2023

jeromedockes marked this pull request as draft December 14, 2023 15:34

betatim added the Array API label Feb 1, 2024

jeromedockes mentioned this pull request Mar 13, 2024

add array api support in label binarizer #28626

Closed

1 task

jeromedockes added 3 commits May 15, 2024 09:38

arrayapi support in ridgecv

e36ba01

update array_api & validation & test

fba836a

_

54dbac2

jeromedockes force-pushed the ridgecv-arrayapi branch from 6be83f7 to 54dbac2 Compare May 15, 2024 08:55

array_api.rst & whatsnew

588a9e7

ogrisel reviewed May 15, 2024

View reviewed changes

betatim reviewed May 16, 2024

View reviewed changes

sklearn/linear_model/_ridge.py Outdated Show resolved Hide resolved

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

jeromedockes added 2 commits May 23, 2024 12:34

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

bf217f6

address review comments

f7926a6

adrinjalali approved these changes May 25, 2024

View reviewed changes

sklearn/linear_model/_ridge.py Show resolved Hide resolved

jeromedockes added 11 commits May 28, 2024 09:28

copy over supporte_float_dtypes and max_precision_float_dtype from 27113

cc58c72

upcast to float64 when possible

1b53c90

fix choice of attributes' dtype

d550d68

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

4e39a62

classifier

16d7b96

convert alphas

5d91b8e

y and sample_weight follow x

8299539

class weight

63ec92b

fix prepare_data & _score

d6a2207

check X and coef_ in same namespace

3297080

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

bfa5293

lucyleeow added the Stalled label Oct 1, 2025

OmarManzoor added 4 commits October 10, 2025 12:42

Merge branch 'main' into ridgecv-arrayapi

de38c8f

Remove the move_to_namespace_and_device

ed0a58e

Some updates

453deda

Update docs

749c92d

OmarManzoor changed the title ~~Add support for array API to RidgeCV~~ Add support for array API to RidgeCV, RidgeClassifier and RidgeClassifierCV Oct 10, 2025

OmarManzoor added 2 commits October 10, 2025 17:10

Update atol

c87b61e

Reset to original atol

5420188

ogrisel removed the Stalled label Oct 10, 2025

Remove the additional condition for check

9b5fe33

ogrisel mentioned this pull request Oct 10, 2025

Register dpnp as a new array API namespace to test against #32460

Draft

Increase _atol_for_type by a factor of 10

a6379e3

Some updates and atol modification in silhouette_samples

dfadbd4

OmarManzoor reviewed Oct 13, 2025

View reviewed changes

OmarManzoor added the CUDA CI label Oct 13, 2025

github-actions bot removed the CUDA CI label Oct 13, 2025

ogrisel approved these changes Oct 13, 2025

View reviewed changes

sklearn/linear_model/_ridge.py Outdated Show resolved Hide resolved

sklearn/linear_model/_ridge.py Outdated Show resolved Hide resolved

sklearn/linear_model/_base.py Outdated Show resolved Hide resolved

Updates based on PR suggestions

8f3f0d9

ogrisel enabled auto-merge (squash) October 13, 2025 15:00

ogrisel merged commit 5f1491a into scikit-learn:main Oct 13, 2025
36 checks passed

github-project-automation bot moved this from In Progress to Done in Array API Oct 13, 2025

lucyleeow mentioned this pull request Oct 27, 2025

Add move_to function to convert array namespace and device to namespace and device #31829

Merged

Uh oh!

Conversation

jeromedockes commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

jeromedockes commented Dec 15, 2023

Uh oh!

ogrisel commented Mar 14, 2024

Uh oh!

ogrisel May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeromedockes May 23, 2024

Choose a reason for hiding this comment

Uh oh!

ogrisel May 23, 2024

Choose a reason for hiding this comment

Uh oh!

jeromedockes May 28, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Oct 10, 2025

Uh oh!

ogrisel commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Oct 10, 2025

Uh oh!

ogrisel commented Oct 10, 2025

Uh oh!

jeremiedbb commented Oct 10, 2025

Uh oh!

OmarManzoor commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeromedockes commented Oct 12, 2025

Uh oh!

OmarManzoor Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jeromedockes commented Dec 14, 2023 •

edited

Loading

github-actions bot commented Dec 14, 2023 •

edited

Loading

ogrisel May 15, 2024 •

edited

Loading

ogrisel commented Oct 10, 2025 •

edited

Loading

OmarManzoor commented Oct 10, 2025 •

edited

Loading