FIX param_distribution param of HalvingRandomSearchCV accepts list of dicts #26893
Conversation
param_distribution param of HalvingRandomSearchCV accepts lists of dicts param_distribution param of HalvingRandomSearchCV accepts list of dicts
|
I think the main reasons of the failures come from a too small dataset and not enough candidates. Here's a slightly modified version: def test_halving_random_search_cv_results():
X, y = make_classification(n_samples=150, n_features=4, random_state=42)
params = [
{"kernel": ["rbf"], "C": expon(scale=10), "gamma": expon(scale=0.1)},
{"kernel": ["poly"], "degree": [2, 3]},
]
param_keys = ("param_C", "param_degree", "param_gamma", "param_kernel")
score_keys = (
"mean_test_score",
"mean_train_score",
"rank_test_score",
"split0_test_score",
"split1_test_score",
"split2_test_score",
"split0_train_score",
"split1_train_score",
"split2_train_score",
"std_test_score",
"std_train_score",
"mean_fit_time",
"std_fit_time",
"mean_score_time",
"std_score_time",
)
extra_keys = ("n_resources", "iter")
search = HalvingRandomSearchCV(
SVC(),
cv=3,
param_distributions=params,
return_train_score=True,
random_state=0,
)
search.fit(X, y)
n_candidates = sum(search.n_candidates_)
cv_results = search.cv_results_
# Check results structure
check_cv_results_keys(
cv_results, param_keys, score_keys, n_candidates, extra_keys
)
check_cv_results_array_types(search, param_keys, score_keys)
assert all(
(
cv_results["param_C"].mask[i]
and cv_results["param_gamma"].mask[i]
and not cv_results["param_degree"].mask[i]
)
for i in range(n_candidates)
if cv_results["param_kernel"][i] == "poly"
)
assert all(
(
not cv_results["param_C"].mask[i]
and not cv_results["param_gamma"].mask[i]
and cv_results["param_degree"].mask[i]
)
for i in range(n_candidates)
if cv_results["param_kernel"][i] == "rbf"
)
|
jeremiedbb
left a comment
There was a problem hiding this comment.
Please also add a changelog entry for 1.3.1 in v1.3.rst
|
Thanks for reviewing, @jeremiedbb, for your help and the explanations. I have made the changes according to your suggestions and kind of understood your reasoning. I still haven't understood the asserts for the masked arrays though (all the candidates appeared as |
|
After reviewing this together with @adrinjalali, I have also modified the two not fully functioning tests I had idicated in the issue (#26885). Please have a look. :) |
|
And you would need to solve the conflict in the changelog. |
betatim
left a comment
There was a problem hiding this comment.
Thanks for the fix! LGTM (looks good to me)
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
|
I've finished the last few things. Thanks everyone for your support. :) |
…t of dicts (scikit-learn#26893) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…t of dicts (scikit-learn#26893) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…t of dicts (#26893) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…t of dicts (scikit-learn#26893) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
What does this implement/fix? Explain your changes.
Closes #26885
Fixed that the
param_distributionparam ofHalvingRandomSearchCVaccepts lists of dicts and updated documentation.I also tried to implement a test, using
test_random_search_cv_resultsas a template, as you suggested @glemaitre , but I encountered several problems, that I could not resolve.The template implementation calls two functions (
check_cv_results_array_typesandcheck_cv_results_keys), that check and compare the occurrence of params. But those might not always be present (like 'param_degree' is only a key incv_resultsfor the poly kernel, not for rbf). (HalvingSerach' cv_results will also have two additional keys, compared to the other searches, these tests are used for: "iter", "n_resources")I cannot see a way to use the assert tests in the end of the template test, because HalvingGridSearchCV will mask part of the candidates, as part of the process. So, checking for this is not going to work, I assume.
I have determined the value for
n_proportion = 6by looking atcv_results[key].shape,which is for sure the wrong way around.I have commented the test out and hope for your advice. At the moment the test fails because of KeyError (
param_degree) in one of the util functions.I could write a much simpler test, that captures the insertion of
param_distributionas a list of dicts.