Replace boston in ensemble test_forest#16927
Conversation
|
I think the failures are due to the use of the california dataset, this is a message from one of the failures: But other tests use the california dataset as well so I don't understand the cause of the failure... |
|
Hi @lucyleeow the failing test seems to be related to pytest-dev/pytest#6925, as only CI with pytest 5.4.1 are failing. |
|
It might be because i didn't add |
|
While reviewing your PRs, this dataset looks have pretty poor results on test sets compared to the training set, which means the default parameters are overfitting: from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import cross_validate
X, y = load_diabetes(return_X_y=True)
results = cross_validate(RandomForestRegressor(random_state=0), X, y,return_train_score=True)
print("train score", results['train_score'].mean())
print("test score", results['test_score'].mean())
# train score 0.9210138461843774
# test score 0.4230661480472566 |
|
Oh wow that's a big difference. What kind of tests should I be careful on? I might not be good at assessing this e.g., Edit: should I tune parameters and use tuned parameters for all the tests? |
|
I would try to to use |
|
Hum maybe diabetes is too hard of a dataset to expect good generalization accuracy from the |
|
Or maybe this is good enough for such tests. It's still significantly better than random. |
|
Or feel free to use |
|
I tried using Tried Is this a problem with the way the dataset is generated? Since Gives:
Regardles, happy to keep as diabetes as well. |
|
I think that one expects the OOB score to be really close to the score that you will obtain on the test set. So for this test, I would make the diff between the OOB and the test and check that it is smaller than I am really surprised for the diabetes results indeed. |
| @@ -389,7 +389,7 @@ def check_oob_score(name, X, y, n_estimators=20): | |||
| assert abs(test_score - est.oob_score_) < 0.1 | |||
There was a problem hiding this comment.
Indeed we are already doing this for the classification. I think it makes sense to do that for the regression.
We only require a comment to mention that in the first case, this is a diff between accuracies and in the second one a diff between R2.
I read the code wrong and thought we were fitting and testing on the same data, which is why I thought oob_score would always be worse. It makes sense. Will fix, thanks @glemaitre |
|
Thanks @glemaitre. I amended all test to use the generated regression dataset. |
|
Thanks @lucyleeow |
Reference Issues/PRs
Towards #16155
What does this implement/fix? Explain your changes.
Replaces boston dataset with
subset of california housingdiabetes dataset insklearn/ensemble/tests/test_forest.pyAny other comments?
Did not use diabetes dataset due to poor R2 score and oob score intest_oob_score_regressors(as picked by @adrinjalali in prev PR).Poor R2 score in
test_oob_score_regressorswith diabetes dataset. Happy to change to California/another dataset if this is a problem.