Add support for array API to RidgeCV, RidgeClassifier and RidgeClassifierCV#27961
Add support for array API to RidgeCV, RidgeClassifier and RidgeClassifierCV#27961ogrisel merged 63 commits intoscikit-learn:mainfrom
Conversation
|
I think the test failures for Ridge and RidgeCV arise from r2_score and will be handled in #27904 |
|
While I am thinking about it, please don't forget to update: |
6be83f7 to
54dbac2
Compare
sklearn/linear_model/_ridge.py
Outdated
| if sparse.issparse(X): | ||
| dtype = np.float64 | ||
| else: | ||
| dtype = [xp.float64, xp.float32] |
There was a problem hiding this comment.
Contrary to what I said in this morning meeting, I think we might want to implement the following logic:
- if the input namespace/device supports
xp.float64upcasting, then do the upcast (as we currently do with NumPy) - if not (e.g. pytorch + MPS device combination), accept that we have degraded numerical performance, adjust the tolerance in the tests accordingly and document this limited numerical precision guarantee in our Array API doc.
I think this is the strategy we are leaning towards in the review of #27113. During the review of the r2_score PR, I believe that @adrinjalali preferred that approach.
In a future PR, we might decide to drop the float32 -> float64 upcast in general for this estimator (as it silently triggers a potentially very large and unexpected memory allocation which is a usability problem in itself, even with NumPy) but I would rather make this decision independently of Array API support.
There was a problem hiding this comment.
how would you recommend I check if the upcasting is possible? should I temporarily copy the max_precision_float_dtype and supported_float_dtypes changes from 27113 until it is merged? or is there already a utility in scikit-learn for checking that which I missed?
There was a problem hiding this comment.
Feel free to copy with a TODO comment to remove redundant code once #27113 is merged to be able to decouple the 2 reviews.
There was a problem hiding this comment.
when we do the upcast with what precision should we store the coefficients and intercept? I guess for prediction we do not need the extra precision so we should use X's original dtype?
adrinjalali
left a comment
There was a problem hiding this comment.
This is neat! From my point of view LGTM. But I haven't checked the tests or mathematical correctness.
|
I removed the stalled label since @OmarManzoor is pushing new commits to finalize this important PR. |
|
Hum,
|
|
I think I would be in favor of the 3rd option. |
|
@jeremiedbb if you have opinions on tol settings ;) |
|
I also think that option 3 is more appropriate. In silhouette_score it's used as an |
|
I also think that we should remove this dependency from silhouette score and directly calculate atol using the original factor within the function while keeping _atol_for_type as it is in the latest commit where we increase the factor by 10. |
I'm sorry for the late reply, @lucyleeow ! It's not for lack of interest, but unfortunately I really don't have the time at the moment. I should have said so earlier to avoid stalling it. I'm glad to see you picked it up @OmarManzoor , thanks!! |
sklearn/linear_model/_ridge.py
Outdated
| decision = self.decision_function(X) | ||
| xp, is_array_api, device_ = get_namespace_and_device(decision) | ||
| max_float_dtype = _max_precision_float_dtype(xp, device=device_) | ||
| scores = 2.0 * xp.astype(decision > 0, max_float_dtype) - 1.0 |
There was a problem hiding this comment.
The xp.astype(decision > 0, max_float_dtype) was previously hardcoded to
xp.astype(decision > 0, xp.float32) but I decided to use the max_float_dtype instead. Let me know if we should instead revert back to xp.float32
ogrisel
left a comment
There was a problem hiding this comment.
Thanks, @OmarManzoor, for pushing this to the finish line. A final pass of nitpicks but otherwise, LGTM.
…fierCV (scikit-learn#27961) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Reference Issues/PRs
Towards #26024.
This PR extends the one for Ridge (still WIP, #27800) to use the array API in
RidgeCVandRidgeClassifierCV(when cv="gcv")What does this implement/fix? Explain your changes.
this could make those estimators faster as an important part of their computational cost is due to compute either an eigendecomposition of XX^T or an SVD of X
Any other comments?
The
_RidgeGCVhas numerical precision issues when computations are done in float32, which is why ATM in the main branch it always uses float64I'm not sure what should be done for array API inputs on devices that do not have float64
not handled yet: