TST Add common test for mixed array API inputs for metrics#32755
TST Add common test for mixed array API inputs for metrics#32755ogrisel merged 75 commits intoscikit-learn:mainfrom
Conversation
sklearn/metrics/tests/test_common.py
Outdated
| if metric_name in CLASSIFICATION_METRICS: | ||
| # These should all accept binary label input as there are no | ||
| # `CLASSIFICATION_METRICS` that are in `METRIC_UNDEFINED_BINARY` and are NOT | ||
| # `partial`s (which we do not test for in array API) | ||
| y1 = xp_input.asarray([0, 0, 1, 1], device=array_input.device) | ||
| y2 = xp_ref.asarray([0, 1, 0, 1], device=reference.device) | ||
| elif metric_name in {**CONTINUOUS_CLASSIFICATION_METRICS, **CURVE_METRICS}: | ||
| if metric_name not in METRIC_UNDEFINED_BINARY: | ||
| # Continuous binary input | ||
| y1 = xp_input.asarray([0.5, 0.2, 0.7, 0.6], device=array_input.device) | ||
| y2 = xp_ref.asarray([1, 0, 1, 0], device=reference.device) | ||
| else: | ||
| # Continuous but shape (n_samples, n_labels) | ||
| y1 = xp_input.asarray([[0.5, 0.2, 0.7, 0.6]], device=array_input.device) | ||
| y2 = xp_ref.asarray([[1, 0, 1, 0]], device=reference.device) | ||
| elif metric_name in REGRESSION_METRICS: | ||
| y1 = xp_input.asarray([2.0, 0.1, 1.0, 4.0], device=array_input.device) | ||
| y2 = xp_ref.asarray([0.5, 0.5, 2, 2], device=reference.device) | ||
| elif metric_name in PAIRWISE_METRICS: |
There was a problem hiding this comment.
Not familiar with metric types, I think I got this right but open to improvements!
|
Gentle ping on this. We have already started adding support for mixed arrays in some metrics (#32422) - and we test mixed array (by checking for string numpy Note also that it would be useful to get #32793 in before we add more continuous metrics, but that hinges on this so it would be nice to push this forward! |
Sounds like a good plan. For estimators we do something similar, the basic test just checks that the namespace, device and shapes match. As well as in the estimator specific tests where the values are compared |
|
Note: I have realised that there are certain metrics that will allow string numpy Since I want to add testing for string Related: https://github.com/scikit-learn/scikit-learn/pull/32422/files#r2575666004 |
We did change the channels used to download CUDA related stuff in #33212. Not 100% sure this is related. EDIT: sorry for the noise, it's not related because we have a move_to failure on the MPS CI (unrelated to the lockfile of the CUDA CI). |
I agree with your analysis but I am not sure which project should this issue be reported to: NumPy or PyTorch as the traceback is not explicit about which library raises the |
|
Actually, for the torch MPS to NumPy conversion we could leverage DLPACK but only if we explicitly pass the CPU device which is understood by both NumPy and PyTorch: >>> import torch, numpy as np
>>> x_torch = torch.arange(5, device='mps')
>>> np.from_dlpack(x_torch, device=None)
Traceback (most recent call last):
Cell In[16], line 1
np.from_dlpack(x_torch, device=None)
RuntimeError: Unsupported device in DLTensor.
>>> np.from_dlpack(x_torch, device="cpu")
array([0, 1, 2, 3, 4]) |
I am going to say NumPy because I cannot find "Unsupported device in DLTensor" in the PyTorch codebase, but i can in the NumPy codebase. Also I get the same error when converting from both CuPy and PyTorch MPS to numpy.
Reading the
If the I wonder if we should specify NumPy device to be "cpu" instead of None? Relevant PRs: #29119 , #30454 Edit: I guess if we are careful in specifying device to be the output of |
The if is_numpy_array(x):
return "cpu"For this PR I think it's fine to use it and we should not let this question block it. There are only two open points for me, otherwise this PR looks good: |
StefanieSenger
left a comment
There was a problem hiding this comment.
With the newest simplifictions on the test case building, this PR looks fine to me.
Thanks for the work, @lucyleeow! Approving.
ogrisel
left a comment
There was a problem hiding this comment.
LGTM, thank you very much @lucyleeow!
|
Thanks all! |
Reference Issues/PRs
Follow up to #31829
Related to #31274 (and loosely #28668)
What does this implement/fix? Explain your changes.
Adds a common test to check that a metric supports. The idea is that we can just add metrics to this common test as we add support.
Had many questions during implementation, but thought it would still be nice to have a draft PR to look at.
y_trueandy_pred- sample weight is not used, and other kwargs (e.g., for continuous metrics cycling through "average": ("micro", "macro", "weighted")) are not checkedI'll write other items as comments on the code.
The other implementation I thought about was to include it in
check_array_api_metricby adding an optionalcheck_mixed_inputkwarg. This avoids the 'messy' conditionals at the start of the currenttest_mixed_namespace_input_compliancewhere we work out what type of input should go in. But this approach had other problems:yield_namespace_device_dtype_combinationsand the new mixed array combinations intest_array_api_compliance, and somehow skip the mixed input for metrics that don't yet support itpartial(check_array_api_*, check_mixed_input=True)check_array_api_metricto handle accepting 2 namespace/devicesAny other comments?
cc @ogrisel @betatim @lesteve